WLCG Messaging System for Grids (MSG)

Current MSGPublishSimple

Current version: msg-publish-simple-0.9.3-1

  • Thu Jul 17 2008 Daniel Rodrigues <Daniel.Rodrigues@cern.ch> 0.9.7-1
    • Fixing log logic for default behaviour
    • Adding information for '502' Errors.

  • Fri Jul 4 2008 Daniel Rodrigues <Daniel.Rodrigues@cern.ch> 0.9.6-1
    • import on python 2.2 of SysLogHandler wouldn't succeeed
    • minor additional checks

  • Thu Jun 26 2008 Daniel Rodrigues <Daniel.Rodrigues@cern.ch> 0.9.5-1
    • Fixing problem with rejection of multiple lines containing ':'

  • Wed May 14 2008 Daniel Rodrigues <Daniel.Rodrigues@cern.ch> 0.9.4-1
    • Adding configurable Syslog.

  • Wed May 14 2008 Daniel Rodrigues <Daniel.Rodrigues@cern.ch> 0.9.3-1
    • Logger configurable.

  • Thu Feb 28 2008 Daniel Rodrigues <Daniel.Rodrigues@cern.ch> 0.9.2-1
    • Improving message errors output.
    • Correcting failover on configured publisher endpoints.

  • Tue Feb 26 2008 Daniel Rodrigues <Daniel.Rodrigues@cern.ch> 0.9.1-2
    • Move python module name to include no dashes.

  • Tue Feb 19 2008 James Casey <James.Casey@cern.ch> 0.9.1-1
    • Move python module to msg.publish-simple to avoid rpm clash with consume2oracle

  • Tue Feb 19 2008 James Casey <James.Casey@cern.ch> 0.9-1
    • Initial build.

MSG Tag: tool for job Monitoring

Tool for publishing information about the job here.

Lemon Alarms

you can use metric ID 4032 and for the exception you can use 30161 fro the exception.

 "/system/monitoring/metric/_4032" = nlist(
         "name",         "MSG_GV_Pub_log_check",
         "descr",        "Check MSG Publisher for HTTP connection
 problems",
         "class",        "log.Parse",
         "param",        list("logfile",
 "/var/log/gridview/msgpublish.log",
                              "rolledlogs", "true",
                              "dformat", "%Y-%m-%d %T,",
                              "istring", "Error: (?!(502|500))",
                              "sincelast", "30m"
                              ),
         "period",       1800,
         "smooth",       nlist("typeString", false, "maxdiff", 0.0,
 "maxtime", 36000),
         "active",       false,
         "latestonly",   false,
 );

 "/system/monitoring/exception/_30161" = nlist(
         "name",         "MSG_pub_conn_error",
         "descr",        "MSG publisher shows HTTP connection error.",
         "active",       true,
         "latestonly",   false,
         "importance",   2,
         "alarmtext",    "MSG_PUB_CONN_ERROR",
         "correlation",  "4032:1 != 0"
 ); 

Deployment Log

15-May-2008 Deployment for GridView-Publisher nodes (GridFTP Parsing)

15-May-2008 uploaded version to swrep-soap-client, according to recipe here:

[lxadm02] /afs/cern.ch/user/d/dfrodrig > swrep-soap-client query x86_64_slc4 msg-publish-simple-0.9.3-1.noarch.rpm
Package tag: /lcg/lcg2
Package refrence count: 0
Package Uploader: dfrodrig
Extra package information:
Name        : msg-publish-simple           Relocations: (not relocatable)
Version     : 0.9.3                             Vendor: (none)
Release     : 1                             Build Date: Wed May 14 14:10:34 2008
Install Date: (not installed)               Build Host: lxadm03.cern.ch
Group       : Network/Monitoring            Source RPM: msg-publish-simple-0.9.3-1.src.rpm
Size        : 26880                            License: GPL
Signature   : (none)
Packager    : James Casey <James.Casey@cern.ch>
Summary     : Simple publisher for the WLCG MSG messaging system
Description :

16-May-2008 Started spma_wrapper on all nodes

wassh -u root -h 'lxfsrc[5807,5808] lxfsrd[4601-4608] lxfsre[1304,1305,1701-1708,1901-1908]' /usr/sbin/spma_wrapper.sh

17-May-2008 Changed configuration and restarted gridview-publisher service

wassh -u root -h 'lxfsrc[5807,5808] lxfsrd[4601-4608] lxfsre[1304,1305,1701-1708,1901-1908]' 'cd /opt/gridview/etc; wget http://cern.ch/dfrodrig/publisher.conf; mv -f publisher.conf.1 publisher.conf'
wassh -u root -h 'lxfsrc[5807,5808] lxfsrd[4601-4608] lxfsre[1304,1305,1701-1708,1901-1908]' 'scp -o StrictHostKeyChecking=no lxfsre1704:/opt/lcg/etc/msg/msg-publish.conf /opt/lcg/etc/msg/msg-publish.conf'
wassh -u root -h 'lxfsrc[5807,5808] lxfsrd[4601-4608] lxfsre[1304,1305,1701-1708,1901-1908]' 'service gridview-publisher restart'

The following machines could not be upgraded:

FAIL lxfsrd4603: connect timeout
lxfsre1701:   ssh(30221) Permission denied.
lxfsre1702:   ssh(30222) Permission denied.
lxfsre1706:   ssh(30224) Permission denied.
lxfsre1902:   ssh(30232) Permission denied.

19-May-2008: Patch for gridview-publisher

[lxadm01] /afs/cern.ch/user/d/dfrodrig/webpage/gridview > swrep-soap-client put x86_64_slc4 /lcg/lcg2 gridview-publisher-1.0.2-1.noarch.rpm

[lxadm01] cdbop
get prod/pro_service_castor_gridftp_monitoring.tpl
vim  #increase version number pkd_add("gridview-publisher","1.0.2-1","noarch");
update prod/pro_service_castor_gridftp_monitoring.tpl
commit
exit

[lxadm01] /afs/cern.ch/user/d/dfrodrig > wassh -u root -c c2cms/t1transfer '/usr/sbin/spma_wrapper.sh'
[lxadm01] /afs/cern.ch/user/d/dfrodrig > wassh -u root -c c2cms/t1transfer 'cat /opt/gridview/etc/publisher.conf | grep Publisher_Log_Level' # Checking all set to 6!
[lxadm01] /afs/cern.ch/user/d/dfrodrig > wassh -u root -c c2cms/t1transfer 'service gridview-publisher restart'

-- DanielRodrigues - 15 May 2008

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2008-07-28 - DanielRodrigues
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback