Validation ITCM-2-SNOW prototype :
• Target date for prototype evaluation : 22-Oct. 14:00
• Assumption: LAS will still exist and will be the starting point. It is also the place pointing the Operators to procedures (as today).
• The validation will follow three essential steps:
- Analysis of the use cases, verification of listed points in a demo session, collect new issues
- Review criteria, define essential requirements and showstoppers discovered during the use cases analysis
- Have all essential aspects solved, agree on a list of enhancements (and timescale)
• Comparison of the resulted tickets from both Remedy and ITCM-2-SNOW during the test period (no issue forgotten). Agreed (Ivan to implement duplicate feature, Zhechka to do comparison).
• Migration of remaining ITCM tickets once ITCM-2-Snow will be production: will depend on how many are still unclosed (options: terminate them in Remedy, manual import –if few, bulk import –if many).
Criteria for validation:
1) Use cases : full chain check of use cases for Operators
- a)
Incident created manually by Operator (work log) decided to go with a RQF, maybe even using templates when convenient
-
For a non-host entity (goods reception,…) a list of categories must be provided by Anhony Grossir
-
Automatically closed (pure “log-only”) manual (and immediate) completion as for any other RQF
- b) Alarm creating (from LAS) a log-only for Operator, followed up by him.
-
What’s not dealt with yet? a predefined view already exist ⇒ (which one)
-
Anything going-on with this machine? (i.e. other open tickets existing?) the history of a machine is available ⇒ (how?)
- Apply procedure if any and:
-
Fixed ⇒ Close case ⇒ go Resolved
and in solution: as per procedure
. envisage a template
-
Not fixed ⇒ Escalate ⇒ change FE: easiest is to start typing Sys Admin
and the proposal comes immediately.
- c) Alarm creating a log-only for Operator, needs escalation, depends on “contract type”:
-
to Service Manager, ⇒ send an email (letter icon),
proposed recipient will be taken from CI
-
to SysAdmins (
if machine managed by them):
-
as “standard” ⇒ change FE: easiest is to start typing Sys Admin
and the proposal comes immediately.
-
as “piquet call”
the category will be used to flag a Piquet call
to be implemented
-
only the Operator can set it ⇒ no, any member of the FE can alter it, but all changes are tracked. See Note 1 .
-
SAO management can also set/unset it (to correct obvious mistakes) ⇒ yes, but for the same reason: in same FE
-
Tracking of this type of ticket should be easy
provided it is not re-assigned to another FE
-
Depends (still) on importance of machines (visible to Operators/SM/SA? Xmas rules!) ⇒ no, but will have to even in long term as critical/non-critical importance will remain
Note 1 : from the Administrative Circular No 23:
The Head of Department ... shall designate the persons authorised to call out
members of the personnel on stand-by duty to deal with an emergency.
- In our case, Console Operators and SysAdmin direct management (usually through the Operators anyway). Normally, no one else, and it would be better if the tool would restrict it.
-
I'm assuming here that regular users cannot create master incidents and that the latter are different from the regular incidents. Otherwise, anyone could create a piquet call ticket (confusing) and if the call is indeed placed (because you may then think you're entitled to do so), then this is certainly in contradiction with the administrative circular.
The persons referred to in (above) shall be responsible:
a) for noting in a log book:
i. the name of the member of the personnel on stand-by duty called out to deal with the emergency;
ii. the time of the call-out and the reasons for it;
...
- Notation in log book is what we achieve by having tickets clearly tagged as piquet calls.
It should not be possible to change the FE afterward, or the tracking will be lost.
- Name of the called SysAdmin can be entered manually by the Operator, picked from the FE members list.
- Time is automatic when saving the ticket, reason in this case will be the alarm name (= short description)
The number of hours of stand-by duty performed by a member of the personnel
shall be recorded in a statement to which the latter has access.
- Since no specific visibility restriction is foreseen, master incidents will be accessible like any other ticket.
2) Use cases: full chain check of use cases for SysAdmins
- a) Single alarm results in a ticket for the SysAdmins (via Operator), solved by a SysAdmin.
Test both Quattor and Puppet nodes. as seen on screen amongst the created tickets
-
History access: is someone else in the team dealing with that machine?
-
Reassign the case to another SysAdmin (pre-select not possible e.g. by piquet, which becomes a regular assignement )
bulk assignement could be possible but is not implemented
-
Enter details in “work notes” (average 15kB, not unusual up to ~50kB)
-
Can I set a comment? (e.g. “node draining” or first conclusions) ⇒ have to use the short description, pre-filled with alarm name
-
Can I suspend a ticket for myself? ⇒ can use the Waiting for user
status as it is not used otherwise
-
Can I set a reminder ? (e.g. “remind me in 2 days”) ⇒ no and will not be possible
-
Eventually, select the root cause (classify –list possible? 64 in ITCM: to be reviewed, cut down to ~30) ⇒ proposal to abandon such categorisation, but to be checked with Vincent
-
How to correlate/relate tickets? (attach to another one) one ticket can be related to a master
-
Possibility to fork a RQF with CI info attached to? (e.g. machine retired, but needs to have Operations to remove it physically) ⇒ no a new RQF has to be created with CI information copied to it
- b) Single alarm results in a ticket for the SysAdmins, escalated to Service Manager (fixed by SM)
-
Same as 2.a.1-4
-
Service Manager taken from CI (IT-contact) ⇒ manually, either using the email button or adding it to the watch list
-
Ticket passed to the selected FE ⇒ possible if the FE is known + (transition period: what if Service Manager is not a FE?) use the email button + IT-contact in CI
Because of the specific classification, Piquet calls
should not be passed on to another FE. Rule by procedure, not feasible by tool.
-
If ticket comes back, it should be to the team (if original SysAdmin takes it will be decided internally) possible, but out of our control, as this is left to the choice of who is sending the ticket back
- c) Single alarm results in a ticket for the SysAdmins, waiting on Service Manager or expert (fixed by SA)
-
Same as 2.a.1-4
-
Easy way to contact the third party (email button?) ⇒ 2 possibilities: right-click add IT-contact to watchlist
and email button
-
Service Manager proposed from CI, but editable (if other expert needed) ⇒ must use the wait for 3rd party
function
-
Can I set a reason? (built in?) ⇒ use the Third Party
field
-
Can I set a reminder? ⇒ no and will not be implemented
-
Can it be “paired” with another ticket? (aka Service Manager is another FE, updated in his ticket are propagated back into original/!SysAdmin ticket) ⇒ no but usual notifications will be fired and the Sysadmin will have anyway to go back to his original ticket, where the reply has to be appended
- d) Single alarm results in a ticket for the SysAdmins, triggering Vendor Call. Note: each vendor FE will hold vendor and sub-contractor, and only individuals will be allowed i.e. for the technicians
-
Creation of vendor call ticket, check on warranty validity ⇒ to be implemented
-
If ok, the CI associated to the vendor call must contain at least: serial number, host name, location/rack name ⇒ will be implemented
-
Visibility: further checks of such ticket by SysAdmin should be possible. ⇒ because SysAdmin will be in watch list and special feature will allow this
-
Edit the “work notes” (or “communications”?), insert details (average 8kB, 40kB usual, max seen 480kB)
-
Attach documents (logs; limited to size ~15MB in Remedy) ⇒ to be checked if search is possible in attachments, assumed they are text
-
Suspend SysAdmin ticket (“parent”?) when call placed ⇒ not present, but do-able
-
Should the SysAdmin be on the watch list or will he be the caller? ⇒ auto added to watch list and auto updated if assignee is changed in the master incident
-
In case the Service Manager needs to agree on stoppage, his name/email should come from associated CI, and associated to watchlist. “Waiting on 3rd party” should then be possible. ⇒ also compatible with restrictions on FE (per vendor) .
-
Who will be notified if SLA OLA breached? ⇒ Vendor (even before OLA is breached) + FE manager (i.e. Anthony Grossir) ⇒ more than one individual needed here
-
OLA will be defined by categories ⇒ (Anthony Grossir will follow this up), one category becomes "time to solve problem" and no longer what the problem was ⇒ but this is tracked elsewhere
-
no longer notification to SysAdmins ⇒ bug or feature?
-
Anthony Grossir to specify email/communication channels with Vendors
-
a view can be defined (in standard way) to list pending repairs, even with "time to complete" column
-
How will the SysAdmin be notified that he can resume his original ticket? Auto-resume? ⇒ yes, upon VC completion, the master incident ticket comes back in progress
but no particular notification is implemented for that
-
Can a VC be re-opened by a SysAdmin? ⇒ yes, as longer he is in the watch list -special feature for SysAdmins
- e) Alarms creating multiple hosts tickets: review “a-c,f”, forbid “d” (or
enhance it - this is what is implemented) for cases:
-
One host, several alarms (should be similar to single alarm use case)
-
One alarm, multiple hosts:
-
constraint= all hosts in same cluster/subcluster/hostgroup ⇒ no difference with today situation as longer as coming from LAS
-
Enter details in “work notes” (up to 140kB, could go 900kB ?)
-
Several hosts, several alarms:
-
constraint= all hosts in same cluster/subcluster/hostgroup ⇒ no difference with today situation as longer as coming from LAS
-
constraint= exactly same alarm(s) on all hosts
-
Automatic/assisted splitting (enhancement):
-
e.g. hardware failures ⇒ produce individual single host tickets
-
should propose a list to choose from ⇒ achieved with a right-click on the top of ticket, proposal of possible nodes for action
- f)
Incident created manually for a host, by either Operator, SysAdmin or Service Manager will be available directly via the portal
-
Test both Quattor (lxfsrg21a03
) and Puppet (lxbsp2907
) nodes. ⇒ to be tested with the portal
-
Possibility to attach CI (hostname, cluster, etc…). Mandatory? ⇒ to be tested with the portal, see if a template could be needed and possible
-
Follows one of previous use cases ⇒ to be tested
- g) Bulk updates on several tickets at a time:
-
Filtering based on state (other criteria? Hwmodel? SysAdmin? Mine? Team?), free selection for “apply to” ⇒ will be implemented
-
For the comments/work notes
-
For the status/closure codes/etc…
-
Produce/export a list of selected machines (a la wassh syntax) ⇒ to be tested, exports possible but must be practical
-
Possible actions (⇒ to be defined with SysAdmins )
3) Availability of predefined views:
- a)
Operator: list of incidents created recently (by him, from LAS) and not yet dealt with
- b)
Operator: list of live entries (hostnames, submitter: useful to correlate new alarms with on-going activity) ⇒ to be defined, but easy
- c)
Operator: machine history (useful in case of spare parts shipping, retrieve ticket to update)
- d)
Operator: pending vendor calls for equipment located at SafeHost (to validate access for vendors) ⇒ not discussed yet, related to Location field
- e)
SysAdmin: pending ticket (not yet assigned to individuals, supposed to be standard in SNOW)
- f)
SysAdmin: assigned to me (supposed to be standard in SNOW)
- g)
SysAdmin: quick overview (how many tickets in hand by each SysAdmin, whole team, suspended, vendor calls, urgent, high weight (>49, >80), …). List of existing Remedy macros?
-
Useful columns : HostName / Date / priority / Cluster / Status / Description / Weight / Number of machines / Submitter / Assigned to ⇒ to be defined, but easy
-
View 1 (by default) : Own tickets in hand/Suspended/completed
-
View 2: Tickets assigned to SysAdmin team (= not yet assigned to individuals = pending)
-
View 3: Tickets “Urgent” and “weight/importance >49” assigned to SysAdmin team ⇒ importance will become a boolean on long term (critical machine or not) but meanwhile have to use current values. Also urgent
tickets are not in the current concept of the tool, will have to deal with Priority
field
-
View 4: All the tickets in hand/suspended/completed for each SysAdmins ⇒ to be defined, but easy
-
Others: Piquet calls last week, tickets currently suspended, vendor calls not yet completed, urgent tickets and high weight (>49, >80), …) ⇒ to be defined, some will be easy, other a bit more complex
- h)
Logger: a list by domain/IT-groups, clusters, sub-clusters (Anthony G. to review) ⇒ not deeply discussed, but there is a group by
feature
- i)
Logger: quick search of keyword in “hostnames” and “description” fields (imbedded in portal ?) ⇒ not discussed
4) Interfaces with essential tools
- a)
Generation of the pre-report for the daily morning minutes ⇒ get Elisiano (or Anthony G.) script and pass it to Zhechka/ServiceNow ⇒ script maybe not needed if SNOW reporting is flexible enough to produce desired output
- b)
A guide for custom queries (cgi-scripts: alarms since 18:00 previous day, open VC for SafeHost, …) ⇒ not really discussed but general agreement is that documentation should be available
5) Notifications
discussion postponed
- a) OLA breach only if due date defined, mail SysAdmin or the team if absent (built in?)
- b) urgency/priority has changed since last update by SysAdmin, mail SysAdmin or the team if absent (built in?)
- c) SMS to SysAdmins: when a ticket for an important machine is waiting since too long (customizable)
- d) SMS to SysAdmins: when a ticket is turned into “urgent” (will this feature be possible/available?)
- e) Easy way to switch on/off notification by SMS (assuming rota schemas are not yet possible)
- f) Mail to the team in case of update of a ticket of an absent SysAdmin (built in?)
- g)
Auto-add recipients in watch list? (as done in Remedy with itcmccemail field –not a SysAdmin requirement)
6) Usability aspects:
partly discussed
Assessment of the overall tool, determine/define essential aspects; should incorporate SysAdmin feedback
- a)
Reactivity: search, updates, re-ordering of lists ⇒ _we wil get what we paid for, and will have to do with that _
- b)
Ergonomics: these aspects must be evaluated when the prototype will be available in the training instance
- location of buttons, no awkward aspects
- look and feel (phrasing aspects, transliteration from Remedy nomenclature)
- changes highlighting:
-
ticket reassigned back to the team
-
breached OLA visible in the “list” view ⇒ may need to add extra column
-
urgency/priority has changed since last update by SysAdmin
-
chronological view of updates (i.e. possibility to merge “Additional comments” and “Work notes”) ⇒ no but embedded Activity
list is giving this, even though entries have to be expanded one by one
-
possibility to display/handle lists bigger than 100 lines per page. ⇒ no, but there is a grouping facility to aggregate similar entries, hence reduce the list length
- c)
Access to predefined reports/views: not part of the prototype itself, but addressed separately (with Patricia)
- With Fabio for SysAdmin work
- With Anthony/Benoit for Operators/DCS work
- d)
Advancement steps in workflow? (aka breadcrumb, all in fulfilment? –maybe for processes like installation or retire?) ⇒ nice idea but will have to be discussed with a low priority
--
FabioTrevisani - 15-Oct-2012 ; updated 25-Oct-2012