SAM to Nagios migration of sensors/tests - ALICE
The page concerns migration of ALICE sensors and tests that are making use of SAM submission/execution framework.
Goals
- migrate all tests in SAM ALICE CE and VO-box sensors to Nagios based monitoring framework
- make tests Nagios compliant
- using special wrappers from org.sam framework
- rewriting tests from scratch
- integrate migrated tests into the Nagios based monitoring framework
- Result:
- all ALICE SAM tests using SAM submission framework migrated
- RPM with the tests released and put into egee-SA1 repository
- migrated tests integrated into new Nagios based monitoring framework
Planning
Two sensors to be migrated:
- CE
- one proxy
-
CE-sft-job
- Role=lcgadmin; 2 tests on WN; no dependency between tests on WN
- VO-box
During integration account for test->metric name changes. Eg.:
SAM test |
Nagios check |
CE |
CE-sft-job |
org.sam.CE-JobState (UI/Nagios) |
CE-sft-vo-swdir |
org.sam.WN-swdir (UI/Nagios/WN) |
CE-sft-softver |
org.sam.WN-SoftVer (UI/Nagios/WN) |
VO-box |
VOBOX-PM |
org.alice.VOBOX-PM |
VOBOX-DPD |
org.alice.VOBOX-DPD |
VOBOX-PR |
org.alice.VOBOX-PR |
VOBOX-PSR |
org.alice.VOBOX-PSR |
VOBOX-SA |
org.alice.VOBOX-SA |
VOBOX-UPR |
org.alice.VOBOX-UPR |
Plan
P.ID |
Name |
Notes |
Result |
1 |
migration of CE tests for Role=lcgadmin |
try using org.sam/samtest-run wrapper |
tests submitted with org.sam/CE-probe and produce Nagios compliant output; results come from MB; part of RPM in egee-SA1 repo |
2 |
integration of PI1 with Nagios |
management of proxy with Role=lcgadmin on Nagios box |
tests run with Role=lcgadmin under Nagios |
3 |
migration of VO-box tests |
this is a set of custom tests, which may require a re-write to be able to run under Nagios |
tests are submitted from command line against VO-boxes and produce Nagios compliant output; part of RPM in egee-SA1 repo |
4 |
integration of PI4 with Nagios |
nothing |
tests run under Nagios |
Milestones
Milestone |
Date |
Result |
M1 |
15 Nov'09 |
all tests migrated and the first release of RPM is made |
M2 |
15 Dec'09 |
migrated tests integrated into new Nagios based monitoring framework |
Progress
Planned |
Ongoing |
Done |
|
CE-sft-job |
- |
- |
CE-sft-vo-swdir |
- |
- |
vobox-DPD |
- |
- |
vobox-PM |
- |
- |
vobox-PR |
- |
- |
vobox-PSR |
- |
- |
vobox-SA |
- |
- |
vobox-UPR |
CE-sft-softver |
integration of PI1 |
- |
- |
integration of PI3 |
integrated manually using Hash.pm on samnag014 * |
- |
PI1 |
PI2 |
PI3 |
PI4 |
* metrics properly configured by NCG for Nagios; we could invoke them from CLI with
nagios-run-check
, meaningful results were produced; however, they fail with "(Service check did not exit properly)" when run under Nagios. This need debugging.
* created and started to populate
ALICE_CRITICAL
profile
in MDDB: metric set
org.alice.VOBOX
--
KonstantinSkaburskas - 04-Nov-2009
Topic revision: r2 - 2009-12-03
- unknown