MetricClass Migration

This table contains Lemon exceptions triggered more than 20 times in the reference period grouped by MetricClass.

Note: Some alarms could depend on more than one MetricClass.

HW

Used in #Alarms MetricClass Status Collectd Plugin Documentation Alarms using the MetricClass #1Y Responsible Status
10 adaptec.controller       adaptec_unsupported_os_version 193    
adaptec_unsupported_raid_configuration_for_os_version 190    
adaptec_raid_controller_not_found 121    
7 megaraidsas.controller       megaraidsas_raid_controller_not_found 338    
6 megaraidsas.physical_drives       megaraidsas_unconfigured_bad_drive 755    
megaraidsas_raid_controller_not_found 338    
megaraidsas_unconfigured_good_drive 32    
5 blockdevice-drivers.messages       scsi_blockdevice_driver_error_reported 16216    
device_mapper_error_reported 57    
5 adaptec.physical_drives       adaptec_raid_controller_not_found 121    
adaptec_missing_drive 22    
4 sasarray.stats       Sasarray_No_Enclosure_Found 2538    
Sasarray_Wrong_Number_Drives 907    
Sasarray_Fan_Problem 130    
Sasarray_Psu_Problem 68    
4 log.Parse       machine_exception 2231    
io_error 32    
nmi_received 28    
3 adaptec.bbu       adaptec_raid_controller_not_found 121    
3 IPMI.sel       ipmi_power 86741    
ipmi_mem 6137    
ipmi_proc 5667    
3 adaptec.raid_arrays       adaptec_raid_controller_not_found 121    
3 megaraidsas.raid_arrays       megaraidsas_raid_controller_not_found 338    
megaraidsas_raid_array_not_optimal 29    
3 megaraidsas.bbu       megaraidsas_raid_controller_not_found 338    
2 IPMI.avgrmscnt       ipmi_wrong 19880    
1 smart.failing       smart_failing 50    
1 system.partitionInfo       dma_disabled 21    
1 bonding.status       bonding_wrong 159    
1 system.loadAvg       ipmi_wrong 19880    
1 smart.selftest       smart_selftest 361    
1 system.CPUCount       ipmi_wrong 19880    
1 IPMI.ping       ipmi_no_contact 609    

OS

Used in #Alarms MetricClass Status Collectd Plugin Documentation Alarms using the MetricClass #1Y Responsible Status
21 system.processCount In Progress     afsd_wrong 822    
nscd_wrong 276    
sssd_wrong 115    
http_wrong 76    
sendmail_wrong 45    
fts_server_wrong 21    
12 system.partitionInfo DONE DF   root_full 8780 Monitoring DONE
nonwriteable_filesystems 1564    
var_full 469 Monitoring DONE
nfs_full 64    
boot_full 60 Monitoring DONE
tmp_full 56 Monitoring DONE
cvmfs_transaction_full 30    
8 log.Parse DONE Tail   YUM_error 179065    
VM_kill 37590 Monitoring DONE
nvm_fail 75    
5 system.uptime DONE Uptime   swap_io 7287913    
puppetd_wrong 6607    
zrep_age 121    
fts_server_wrong 21    
4 file.filecount       YUM_Transactions 7792    
kernel_crashdumps 109    
2 file.sslmtime       iss_nologin_age_too_old 28    
2 system.swapIO       swap_io 7287913    
dbod_swap_io 7160    
2 system.unmountedFilesystems       unmounted_filesystems 1380    
1 zfs.zrep_age       zrep_age 121    
1 puppetd.status       puppetd_wrong 6607    
1 rpmdb.verify_db       rpmdb_verify 88    
1 system.loadAvg DONE Load   high_load 22567 Monitoring DONE
1 system.CPUCount       high_load 22567    
1 sssdfunc.id       sssd_id_test 552    
1 zfs.zpool       zpool 94    
1 system.Os       Operating_System 60948    
1 system.swapUsed       swap_full 6044    

App

Used in #Alarms MetricClass Status Collectd Plugin Documentation Alarms using the MetricClass #1Y Responsible Status
132 system.processCount In Progress     lemonforwarder_wrong 1873    
limd_wrong 1725    
etcd_wrong 1293    
openstack-nova-compute 991    
origin_node_wrong 793    
kafka_broker_wrong 381    
eos_mgm_wrong 314    
slurmd 260    
openstack-nova-conductor 247    
openstack-nova-scheduler 179    
openstack-nova-api 176    
openstack-nova-network 123    
eos_fst_wrong 120    
tapeserverd_wrong 112    
c2_xrd_wrong 103    
puppetdb_wrong 89    
rabbitmq-server 84    
master_slave_service_wrong 82    
dashboard_consumer_wrong 76    
rmcd_wrong 70    
kibana_wrong 57    
eos_gridftpd_wrong 54    
sbatchd_wrong 25    
eos_mq_wrong 22    
97 log.Parse DONE Tail   TapeDriveDOWN 4508    
hdfssink_priviledged_action_exception 4037    
CVMFSProbe 3336 Luis / Steve In Progress
teeproxy_error 1896    
LoadBalancingUpdateFailed 1566    
sensor_sample 689    
CASTOR_OraErrors 214    
cmsweb_reqmgr2_is_not_responding 169    
cmsweb_das_web_is_not_responding 159    
cmsweb_dbs_is_down 157    
cmsweb_crabserver_is_down 152    
cmsweb_couchdb_is_down 141    
cmsweb_dbsmigration_is_down 137    
cmsweb_dmwmmon_is_down 137    
cmsweb_phedex_web_is_down 136    
cmsweb_phedex_datasvc_is_down 136    
cmsweb_reqmgr2_is_down 135    
cmsweb_phedex_graphs_is_down 133    
cmsweb_sitedb_is_down 132    
cmsweb_t0wmadatasvc_is_down 132    
cmsweb_reqmon_is_down 132    
cmsweb_confdb_is_down 125    
cmsweb_crabcache_is_down 124    
cmsweb_das_web_is_down 122    
cmsweb_mongodb_is_down 119    
cmsweb_popdb_web_is_down 105    
cmsweb_phedex_webapp_is_down 104    
cmsweb_t0reqmon_is_down 104    
cmsweb_victor_web_is_down 104    
dsm_error 91    
cmsweb_das_client_is_down 85    
cmsweb_crabserver_is_not_responding 75    
cmsweb_phedex_graphs_is_not_responding 72    
database_on_demand_dbod_sensor_exceed_restartmax 56    
cmsweb_dqm_dev_agents_is_down 51    
cmsweb_phedex_web_is_not_responding 49    
cmsweb_dqm_dev_web_is_down 48    
cmsweb_phedex_datasvc_is_not_responding 43    
cmsweb_phedex_webapp_is_not_responding 42    
cmsweb_couchdb_is_not_responding 36    
cmsweb_victor_web_is_not_responding 31    
riversink_peerdisconnected 25    
cmsweb_sitedb_is_not_responding 22    
28 xsls.availability       castor_alice_xsls_not_available 88    
XRDFED_CMS-EU 81    
cfg_unavailable 47    
18 flume_agent.flumefch       flume_zero_sink_rate 146274    
flume_channel_full 70247    
flume_zero_sink_rate_in_hdfssink 203    
flume_zero_sink_rate_in_essink 118    
flume_zero_sink_rate_in_gw 118    
flume_cert_es_channel_full 63    
flume_cert_es_zero_sink_rate 39    
flume_cert_hdfs_channel_full 22    
15 system.partitionInfo DONE DF   data_full 81    
pool_full 57 Batch Service In Progress
opt_partition_full_err 32    
14 flume_agent.flumesinkrate       flume_zero_sink_rate 146274    
flume_zero_sink_rate_in_hdfssink 203    
flume_zero_sink_rate_in_essink 118    
flume_zero_sink_rate_in_gw 118    
flume_cert_es_zero_sink_rate 39    
8 ProcessInfo       diskmanagerd_wrong 67    
8 file.sslmtime       racmon_log_age 45    
7 system.exitCode       eos-server_version_check_fail 1841    
dashb_http_log 1355    
mesos_slave_wrong 185    
7 file.filecount       flume_dirq_full 2629    
eos_mgm_no_recent_mdlog 34    
4 flume_agent.agent       flume_agent_wrong 4682    
4 kafka.broker       kafka_under_replicated 3096    
kafka_no_messages 606    
3 alarm.exception       puppetd_run_errors 1453845    
CVMFSProbe 3336    
3 system.uptime DONE Uptime   puppetd_run_errors 1453845    
3 dbod.monitoringAgg       database_on_demand_ping_timestamp 1580    
2 db.iptables       iptables_not_running 321    
2 log.ParseExtract1       EOS_critical-log-catchall_mail 826    
2 system.threadCount       eos_fst_toomanythreads 35    
2 url.httpcode       url_down_apache 167    
2 file.size       es_huge_logfile 37    
2 dbod.slavePingAgg       database_on_demand_replication_process 210    
1 rabbitmq.message In Progress     rabbitmq-server-messages 209 Cloud/Luis Pigueiras In Progress
1 oracle.StandByFlashRecoveryAreaSpaceReclaimableAgg       oracle_standby_flash_recovery_area_space_reclaimable_agg 315    
1 WhiteExpire       CVMFSWhiteExpire 72    
1 oracle.TablespacesQuotasAgg       oracle_ts_quotas_agg 58    
1 es.health       es_cluster_wrong 46    
1 log.ParseExtract4       afs_fileserver_rescheduled_debug 132    
1 oracle.StandByMRPAgg       oracle_standby_mrp_agg 72    
1 rabbitmq.partition In Progress     rabbitmq-server-partition 22 Cloud/Luis Pigueiras In Progress
1 es.heap_used       es_heap_size_large 23    
1 openshift.etcd_members_healthy       etcd_not_enough_members 909    
1 rpm_process_count.all       rpm_stuck 1078    
1 log.ParseExtract2       too_many_SELinux_AVC_denied 4367    
1 infiniband.ports       infiniband_port 291    
1 oracle.TNSServiceConnectivityAgg       oracle_tns_service_connectivity_agg 73    
1 service-state.status       eosd_service_error 1132    
1 openshift.node_ready       openshift_node_not_ready 330    
1 oracle.SQLResponseTimeAgg       oracle_sql_response_time_agg 670    
1 oracle.ClusterResourceStateAgg       oracle_cluster_resource_state_agg 135    
1 openshift.etcd_cluster_healthy       etcd_not_healthy 234    
1 oracle.AverageActiveSessionsAgg       oracle_average_active_sessions_agg 511    
1 oracle.StandByApplyLagAgg       oracle_standby_apply_lag_agg 108    
1 eosdisk.fsck       eos_offline_files 130    
1 CVMFS.Probe In Progress collectd-cvmfs   CVMFSProbe 3336 Luis / Steve In Progress
1 yumstatus.all       yum_broken 1588    
1 db.ip6tables       ip6tables_not_running 202    
1 oracle.InstanceTablespacesAgg       oracle_tablespaces_agg 100    
1 drain.all In Progress     condor_upgrade 7697 Batch Service In Progress
1 oracle.DatafileStatusAgg       oracle_datafile_status_agg 23    
1 system.networkInterfaceDropped       packetsDropped 19331    
1 puppetd.status       puppetd_run_errors 1453845    
1 oracle.PGAMemoryAbove3GBAgg       oracle_PGA_memory_above_3gb_agg 365    
1 es.nodes_process       es_nodes_process_ok 283    
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2018-05-25 - LuisFernandezAlvarez1
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback