Validation ITCM-2-SNOW prototype :

• Target date for prototype evaluation : 22-Oct. 14:00

• Assumption: LAS will still exist and will be the starting point. It is also the place pointing the Operators to procedures (as today).

• The validation will follow three essential steps:

  1. Analysis of the use cases, verification of listed points in a demo session, collect new issues
  2. Review criteria, define essential requirements and showstoppers discovered during the use cases analysis
  3. Have all essential aspects solved, agree on a list of enhancements (and timescale)

• Comparison of the resulted tickets from both Remedy and ITCM-2-SNOW during the test period (no issue forgotten). Agreed (Ivan to implement duplicate feature, Zhechka to do comparison).

• Migration of remaining ITCM tickets once ITCM-2-Snow will be production: will depend on how many are still unclosed (options: terminate them in Remedy, manual import –if few, bulk import –if many).

Criteria for validation:

1) Use cases : full chain check of use cases for Operators

  1. a) DONE Incident created manually by Operator (work log) decided to go with a RQF, maybe even using templates when convenient
    1. ALERT! For a non-host entity (goods reception,…) a list of categories must be provided by Anhony Grossir
    2. DONE Automatically closed (pure “log-only”) manual (and immediate) completion as for any other RQF
  2. b) Alarm creating (from LAS) a log-only for Operator, followed up by him.
    1. DONE What’s not dealt with yet? a predefined view already exist ⇒ (which one)
    2. DONE Anything going-on with this machine? (i.e. other open tickets existing?) the history of a machine is available ⇒ (how?)
    3. Apply procedure if any and:
      1. DONE Fixed ⇒ Close case ⇒ go Resolved and in solution: as per procedure. envisage a template
      2. DONE Not fixed ⇒ Escalate ⇒ change FE: easiest is to start typing Sys Admin and the proposal comes immediately.
  3. c) Alarm creating a log-only for Operator, needs escalation, depends on “contract type”:
    1. DONE to Service Manager, ⇒ send an email (letter icon), ALERT! proposed recipient will be taken from CI
    2. DONE to SysAdmins (ALERT! if machine managed by them):
      1. DONE as “standard” ⇒ change FE: easiest is to start typing Sys Admin and the proposal comes immediately.
      2. DONE as “piquet call” ALERT! the category will be used to flag a Piquet call to be implemented
        1. ALERT! only the Operator can set it ⇒ no, any member of the FE can alter it, but all changes are tracked. See Note 1 .
        2. HELP SAO management can also set/unset it (to correct obvious mistakes) ⇒ yes, but for the same reason: in same FE
        3. DONE Tracking of this type of ticket should be easy ALERT! provided it is not re-assigned to another FE
        4. ALERT! Depends (still) on importance of machines (visible to Operators/SM/SA? Xmas rules!) ⇒ no, but will have to even in long term as critical/non-critical importance will remain

Note 1 : from the Administrative Circular No 23:

The Head of Department ... shall designate the persons authorised to call out 
members of the personnel on stand-by duty to deal with an emergency.

  • In our case, Console Operators and SysAdmin direct management (usually through the Operators anyway). Normally, no one else, and it would be better if the tool would restrict it.
  • ALERT! I'm assuming here that regular users cannot create master incidents and that the latter are different from the regular incidents. Otherwise, anyone could create a piquet call ticket (confusing) and if the call is indeed placed (because you may then think you're entitled to do so), then this is certainly in contradiction with the administrative circular.

The persons referred to in (above) shall be responsible:
a) for noting in a log book:
  i. the name of the member of the personnel on stand-by duty called out to deal with the emergency;
  ii. the time of the call-out and the reasons for it;
   ...

  • Notation in log book is what we achieve by having tickets clearly tagged as piquet calls. ALERT! It should not be possible to change the FE afterward, or the tracking will be lost.
  • Name of the called SysAdmin can be entered manually by the Operator, picked from the FE members list.
  • Time is automatic when saving the ticket, reason in this case will be the alarm name (= short description)

The number of hours of stand-by duty performed by a member of the personnel 
shall be recorded in a statement to which the latter has access.

  • Since no specific visibility restriction is foreseen, master incidents will be accessible like any other ticket.

2) Use cases: full chain check of use cases for SysAdmins

  1. a) Single alarm results in a ticket for the SysAdmins (via Operator), solved by a SysAdmin. DONE Test both Quattor and Puppet nodes. as seen on screen amongst the created tickets
    1. DONE History access: is someone else in the team dealing with that machine?
    2. DONE Reassign the case to another SysAdmin (pre-select not possible e.g. by piquet, which becomes a regular assignement )
      ALERT! bulk assignement could be possible but is not implemented
    3. DONE Enter details in “work notes” (average 15kB, not unusual up to ~50kB)
    4. DONE Can I set a comment? (e.g. “node draining” or first conclusions) ⇒ have to use the short description, pre-filled with alarm name
    5. DONE Can I suspend a ticket for myself? ⇒ can use the Waiting for user status as it is not used otherwise
    6. No Can I set a reminder ? (e.g. “remind me in 2 days”) ⇒ no and will not be possible
    7. ALERT! Eventually, select the root cause (classify –list possible? 64 in ITCM: to be reviewed, cut down to ~30) ⇒ proposal to abandon such categorisation, but to be checked with Vincent
    8. DONE How to correlate/relate tickets? (attach to another one) one ticket can be related to a master
    9. No Possibility to fork a RQF with CI info attached to? (e.g. machine retired, but needs to have Operations to remove it physically) ⇒ no a new RQF has to be created with CI information copied to it

  1. b) Single alarm results in a ticket for the SysAdmins, escalated to Service Manager (fixed by SM)
    1. DONE Same as 2.a.1-4
    2. HELP Service Manager taken from CI (IT-contact) ⇒ manually, either using the email button or adding it to the watch list
    3. HELP Ticket passed to the selected FE ⇒ possible if the FE is known + (transition period: what if Service Manager is not a FE?) use the email button + IT-contact in CI
      ALERT! Because of the specific classification, Piquet calls should not be passed on to another FE. Rule by procedure, not feasible by tool.
    4. HELP If ticket comes back, it should be to the team (if original SysAdmin takes it will be decided internally) possible, but out of our control, as this is left to the choice of who is sending the ticket back

  1. c) Single alarm results in a ticket for the SysAdmins, waiting on Service Manager or expert (fixed by SA)
    1. DONE Same as 2.a.1-4
    2. DONE Easy way to contact the third party (email button?) ⇒ 2 possibilities: right-click add IT-contact to watchlist and email button
    3. DONE Service Manager proposed from CI, but editable (if other expert needed) ⇒ must use the wait for 3rd party function
    4. DONE Can I set a reason? (built in?) ⇒ use the Third Party field
    5. No Can I set a reminder? ⇒ no and will not be implemented
    6. No Can it be “paired” with another ticket? (aka Service Manager is another FE, updated in his ticket are propagated back into original/!SysAdmin ticket) ⇒ no but usual notifications will be fired and the Sysadmin will have anyway to go back to his original ticket, where the reply has to be appended

  1. d) Single alarm results in a ticket for the SysAdmins, triggering Vendor Call. Note: each vendor FE will hold vendor and sub-contractor, and only individuals will be allowed i.e. for the technicians
    1. ALERT! Creation of vendor call ticket, check on warranty validity ⇒ to be implemented
    2. ALERT! If ok, the CI associated to the vendor call must contain at least: serial number, host name, location/rack name ⇒ will be implemented
    3. HELP Visibility: further checks of such ticket by SysAdmin should be possible. ⇒ because SysAdmin will be in watch list and special feature will allow this
    4. DONE Edit the “work notes” (or “communications”?), insert details (average 8kB, 40kB usual, max seen 480kB)
    5. HELP Attach documents (logs; limited to size ~15MB in Remedy) ⇒ to be checked if search is possible in attachments, assumed they are text
    6. ALERT! Suspend SysAdmin ticket (“parent”?) when call placed ⇒ not present, but do-able
    7. DONE Should the SysAdmin be on the watch list or will he be the caller?auto added to watch list and auto updated if assignee is changed in the master incident
    8. DONE In case the Service Manager needs to agree on stoppage, his name/email should come from associated CI, and associated to watchlist. “Waiting on 3rd party” should then be possible. ⇒ also compatible with restrictions on FE (per vendor) .
    9. ALERT! Who will be notified if SLA OLA breached? ⇒ Vendor (even before OLA is breached) + FE manager (i.e. Anthony Grossir) ⇒ more than one individual needed here
      • ALERT! OLA will be defined by categories ⇒ (Anthony Grossir will follow this up), one category becomes "time to solve problem" and no longer what the problem was ⇒ but this is tracked elsewhere
      • QUESTION? no longer notification to SysAdmins ⇒ bug or feature?
      • ALERT! Anthony Grossir to specify email/communication channels with Vendors
      • HELP a view can be defined (in standard way) to list pending repairs, even with "time to complete" column
    10. HELP How will the SysAdmin be notified that he can resume his original ticket? Auto-resume? ⇒ yes, upon VC completion, the master incident ticket comes back in progress but no particular notification is implemented for that
    11. DONE Can a VC be re-opened by a SysAdmin? ⇒ yes, as longer he is in the watch list -special feature for SysAdmins

  1. e) Alarms creating multiple hosts tickets: review “a-c,f”, forbid “d” (or DONE enhance it - this is what is implemented) for cases:
    1. DONE One host, several alarms (should be similar to single alarm use case)
    2. DONE One alarm, multiple hosts:
      1. HELP constraint= all hosts in same cluster/subcluster/hostgroup ⇒ no difference with today situation as longer as coming from LAS
      2. DONE Enter details in “work notes” (up to 140kB, could go 900kB ?)
    3. DONE Several hosts, several alarms:
      1. HELP constraint= all hosts in same cluster/subcluster/hostgroup ⇒ no difference with today situation as longer as coming from LAS
      2. HELP constraint= exactly same alarm(s) on all hosts
    4. DONE Automatic/assisted splitting (enhancement):
      1. DONE e.g. hardware failures ⇒ produce individual single host tickets
      2. DONE should propose a list to choose from ⇒ achieved with a right-click on the top of ticket, proposal of possible nodes for action

  1. f) HELP Incident created manually for a host, by either Operator, SysAdmin or Service Manager will be available directly via the portal
    1. ALERT! Test both Quattor (lxfsrg21a03) and Puppet (lxbsp2907) nodes. ⇒ to be tested with the portal
    2. ALERT! Possibility to attach CI (hostname, cluster, etc…). Mandatory? ⇒ to be tested with the portal, see if a template could be needed and possible
    3. ALERT! Follows one of previous use cases ⇒ to be tested

  1. g) Bulk updates on several tickets at a time:
    1. ALERT! Filtering based on state (other criteria? Hwmodel? SysAdmin? Mine? Team?), free selection for “apply to” ⇒ will be implemented
    2. DONE For the comments/work notes
    3. DONE For the status/closure codes/etc…
    4. ALERT! Produce/export a list of selected machines (a la wassh syntax) ⇒ to be tested, exports possible but must be practical
    5. ALERT! Possible actions (⇒ to be defined with SysAdmins )

3) Availability of predefined views:

  1. a) DONE Operator: list of incidents created recently (by him, from LAS) and not yet dealt with
  2. b) ALERT! Operator: list of live entries (hostnames, submitter: useful to correlate new alarms with on-going activity) ⇒ to be defined, but easy
  3. c) DONE Operator: machine history (useful in case of spare parts shipping, retrieve ticket to update)
  4. d) ALERT! Operator: pending vendor calls for equipment located at SafeHost (to validate access for vendors) ⇒ not discussed yet, related to Location field
  5. e) DONE SysAdmin: pending ticket (not yet assigned to individuals, supposed to be standard in SNOW)
  6. f) DONE SysAdmin: assigned to me (supposed to be standard in SNOW)
  7. g) DONE SysAdmin: quick overview (how many tickets in hand by each SysAdmin, whole team, suspended, vendor calls, urgent, high weight (>49, >80), …). List of existing Remedy macros?
    1. ALERT! Useful columns : HostName / Date / priority / Cluster / Status / Description / Weight / Number of machines / Submitter / Assigned to ⇒ to be defined, but easy
    2. DONE View 1 (by default) : Own tickets in hand/Suspended/completed
    3. DONE View 2: Tickets assigned to SysAdmin team (= not yet assigned to individuals = pending)
    4. ALERT! View 3: Tickets “Urgent” and “weight/importance >49” assigned to SysAdmin team ⇒ importance will become a boolean on long term (critical machine or not) but meanwhile have to use current values. Also urgent tickets are not in the current concept of the tool, will have to deal with Priority field
    5. ALERT! View 4: All the tickets in hand/suspended/completed for each SysAdmins ⇒ to be defined, but easy
    6. ALERT! Others: Piquet calls last week, tickets currently suspended, vendor calls not yet completed, urgent tickets and high weight (>49, >80), …) ⇒ to be defined, some will be easy, other a bit more complex
  8. h) ALERT! Logger: a list by domain/IT-groups, clusters, sub-clusters (Anthony G. to review) ⇒ not deeply discussed, but there is a group by feature
  9. i) ALERT! Logger: quick search of keyword in “hostnames” and “description” fields (imbedded in portal ?) ⇒ not discussed

4) Interfaces with essential tools

  1. a) ALERT! Generation of the pre-report for the daily morning minutes ⇒ get Elisiano (or Anthony G.) script and pass it to Zhechka/ServiceNow ⇒ script maybe not needed if SNOW reporting is flexible enough to produce desired output
  2. b) ALERT! A guide for custom queries (cgi-scripts: alarms since 18:00 previous day, open VC for SafeHost, …) ⇒ not really discussed but general agreement is that documentation should be available

5) Notifications ALERT! discussion postponed

  1. a) OLA breach only if due date defined, mail SysAdmin or the team if absent (built in?)
  2. b) urgency/priority has changed since last update by SysAdmin, mail SysAdmin or the team if absent (built in?)
  3. c) SMS to SysAdmins: when a ticket for an important machine is waiting since too long (customizable)
  4. d) SMS to SysAdmins: when a ticket is turned into “urgent” (will this feature be possible/available?)
  5. e) Easy way to switch on/off notification by SMS (assuming rota schemas are not yet possible)
  6. f) Mail to the team in case of update of a ticket of an absent SysAdmin (built in?)
  7. g) Auto-add recipients in watch list? (as done in Remedy with itcmccemail field –not a SysAdmin requirement)

6) Usability aspects: ALERT! partly discussed

Assessment of the overall tool, determine/define essential aspects; should incorporate SysAdmin feedback
  1. a) HELP Reactivity: search, updates, re-ordering of lists ⇒ _we wil get what we paid for, and will have to do with that _
  2. b) ALERT! Ergonomics: these aspects must be evaluated when the prototype will be available in the training instance
    1. location of buttons, no awkward aspects
    2. look and feel (phrasing aspects, transliteration from Remedy nomenclature)
    3. changes highlighting:
      1. DONE ticket reassigned back to the team
      2. ALERT! breached OLA visible in the “list” view ⇒ may need to add extra column
      3. ALERT! urgency/priority has changed since last update by SysAdmin
    4. HELP chronological view of updates (i.e. possibility to merge “Additional comments” and “Work notes”) ⇒ no but embedded Activity list is giving this, even though entries have to be expanded one by one
    5. HELP possibility to display/handle lists bigger than 100 lines per page. ⇒ no, but there is a grouping facility to aggregate similar entries, hence reduce the list length
  3. c) Watch Access to predefined reports/views: not part of the prototype itself, but addressed separately (with Patricia)
    1. With Fabio for SysAdmin work
    2. With Anthony/Benoit for Operators/DCS work
  4. d) TIP Advancement steps in workflow? (aka breadcrumb, all in fulfilment? –maybe for processes like installation or retire?) ⇒ nice idea but will have to be discussed with a low priority


-- FabioTrevisani - 15-Oct-2012 ; updated 25-Oct-2012

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2020-08-20 - TWikiAdminUser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox/SandboxArchive All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback