Troubleshooting

  • Run control logs can be browsed in Kibana, with the possibility to filter by partition, severity, etc.
  • If the system goes into an ERROR state and the STOP or SHUTDOWN command from the top node becomes unresponsive, you should issue the commands to each node. If there are still some nodes not in UNKNOWN state you can use Force SHUTDOWN from the ProcessManager panel.
  • If one or more nodes are hanging on a command (grayed out node), look at the Kibana logs to spot the cause of the problem. It's possible to AbortCommand by clicking on the top grayed out node (the action will propagate down the RunControl tree). This should make the nodes responsive again. In case the latter is not working you can RestartFSM.
  • If the system is still in an inconsistent state, contact an expert.

Include fails for a node

  1. Show the additional actions:
    workaround1.png

  2. Click on Exclude:
    workaround2.png

  3. The Include action will be now successful.

-- EnricoGamberini - 2018-09-10

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng workaround1.png r1 manage 104.5 K 2018-09-10 - 16:25 EnricoGamberini  
PNGpng workaround2.png r1 manage 102.5 K 2018-09-10 - 16:25 EnricoGamberini  
Edit | Attach | Watch | Print version | History: r5 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2018-09-10 - EnricoGamberini
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CENF All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback