Log analysis and tracing tools

Introduction

Enquiring what a user is doing or have done using the Grid services deployed both at site and project level is a crucial activity that helps security officers during their main tasks such as forensic analysis. In order to help site managers and regional operator the OSCT group has developed two command line tools able to extract users related activity from the gLite LB and the lcg CE Grid services.

The first one presented in this section is the egee-osct-lbtrace tool that queries the gLite-LB database while the second one is the egee-osct-dig-lcgce tool able to parse lcg CE log files.

egee-osct-lbtrace

Basic concepts

  • What is the egee-osct-lbtrace tool?
    • A C binary that uses the API provided by the LB to extract jobs related information
  • What API does egee-osct-lbtrace use?
    • It uses C bindings for the regular LB API from the glite_lb_client library
  • What do I need to run the tool?
    • First of all you need to configure proper indices for the LB to be queried
    • From the LB that you want to query you just have to execute the command with the proper options
  • As a manager of the gLite LB which jobs I can see
    • All the jobs processe by the LB

Installation

The egee-osct-lbtrace tool is currently available only for Scientific Linux 4, you can download it using YUM from the EGEE SA1 repository. First of all, set the repository downloading the .repo file provided by an rpm to be installed.

#rpm -ivh http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos4/i386/sa1-release-2-1.el4.noarch.rpm

This rpm save the /etc/yum.repos.d/sa1-centos4-release.repo (where centos4 in this context is equl to sl4) file enabling as default repository the production level. Then you can install the tool with the yum install command:

#yum install egee-osct-lbtrace 

Configuration

The set up of the tool involves the configuration file of both the tool itself and the LB services.

egee-osct-lbtrace

gLite LB
Since LB does not allow arbitrary queries in general, a query has to match specifi job indices that are implemented generating dedicated columns in a database table. In order to do that you have to edit the /opt/glite/etc/glite-lb-index.conf.

# /opt/glite/etc/init.d/glite-lb-bkserverd stop
# vi /opt/glite/etc/glite-lb-index.conf
[
   JobIndices = {
      [ type = "system"; name = "owner" ],
      [ type = "system"; name = "location" ],
      [ type = "system"; name = "destination" ],
      [ type = "system"; name = "lastUpdateTime" ]
   }
]
# /opt/glite/bin/glite-lb-bkindex -r /opt/glite/etc/glite-lb-index.conf 
# /opt/glite/etc/init.d/glite-lb-bkserverd start

Usage and examples

Once you've installed and configured your egee-osct-lbtrace package, you can explore what the tool provide by executing the lbtrace command.

# lbtrace --help
Usage: lbtrace
  [--help] [--chatty verbose_level] [--host LB_host]
  [--keypair bundle-file] [--key key-file] [--cert cert-file]
  action [action-specific options]

Currently supported actions are:
    list:  list all jobs from the given LB server
    trace: trace the specified job from the LB server
'--help' passed after action name will give you the help for this action

As ou can see, egee-osct-lbtrace tool provides two main functions:

  • the list function returns a list of jobs with related information
  • the trace jobid function provides more detailed data for the job selected

Example 1: Digging into the LB to find out jobs executed in a given CE queue for a specific userDN
Scenario:
A site manager under attack needs to know from which UIs jobs belonging to a given user DN has been sent

Tracing with the utility:
We will list all the job executed by the user DN with the destination queue provided buy the site manager

# lbtrace  -k host -H octopus.grid.kiae.ru list owner eq '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak' 
and status eq done and destination eq snowpatch-hep.westgrid.ca:2119/jobmanager-lcgpbs-ops
--- Job 1:
JobId: https://octopus.grid.kiae.ru:9000/B1uLHolwYwpTYh1uwudmjQ
Owner: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: sam111.cern.ch
JobState: Done
StatusReason: Job terminated successfully
Destination: snowpatch-hep.westgrid.ca:2119/jobmanager-lcgpbs-ops
CondorID: 664
GlobusID: [none]
PBSOwner: [none]
PBSNode: [none]

...

Type of data Description
JobId gLite WMS jobid assigned...
Owner Submitter's DN...
Source hostname/ip of the submitter source, usually a gLite UI
JobState Job state result based on the job state diagram
StatusReason More detailed status written by the gLite WMS LogMonitor
Destination Destination queue where the CE scheduled the job
CondorID Id set by the Condor system
GlobusID id id set by the Globus system
PBSOwner Owner of the pbs batch job
PBSNode NOde where the pbs batch job has been submitted

Example 2: Tracing to know all the job hops for a given jobID
Scenario:
Security office needs more forensics on a specific jobs

Tracing with the utility:
Detailed events for each hop will be provided for the given job with the trace option

# lbtrace -k host -H octopus.grid.kiae.ru --chatty basic trace https://octopus.grid.kiae.ru:9000/B1uLHolwYwpTYh1uwudmjQ
=== Event 1:
Timestamp: 2009.08.14 19:35:07.061424
Arrived: 2009.08.14 19:35:07.000000
Host: octopus.grid.kiae.ru
Sequence: UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:
LM=000000:LRMS=000000:APP=000000:LBS=000000
User: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: https://144.206.66.137:7443/glite_wms_wmproxy_server

=== Event 2:
Timestamp: 2009.08.14 19:35:07.168601
Arrived: 2009.08.14 19:35:07.000000
Host: octopus.grid.kiae.ru
Sequence: UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
User: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: https://144.206.66.137:7443/glite_wms_wmproxy_server

=== Event 3:
Timestamp: 2009.08.14 19:35:07.250496
Arrived: 2009.08.14 19:35:07.000000
Host: octopus.grid.kiae.ru
Sequence: UI=000000:NS=0000000002:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
User: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: https://144.206.66.137:7443/glite_wms_wmproxy_server
...
-----

Type of data Description
Timestamp Timestamp of the log file describing the event
Arrived Timestamp for the previous event recorded by the LB as soon as it is arrived
Host Hostname of the event generator
Sequence Sequence code added by the component that has generated the event
User Submitter's DN
Source Event source

egee-osct-dig-lcgce

Basic concepts

  • What is the egee-osct-dig-lcgce tool?
    • A python binary that extract data from the gatekeeper jobmap files
  • What type of data is contained into the gatekeeper jobmap files?
    • Grid-related information such as User DN, FQAN, JobGridId the lcg CE generates for each job executed
  • What do I need to run the tool?
    • Just access the lcg CE or the central logging system
    • Then all you need is to tell to a configuration file where the jobmap dir is located

Installation

The egee-osct-dig-lcgce tool is currently available only for Scientific Linux 4, you can download it using YUM from the EGEE SA1 repository. First of all, set the repository downloading the .repo file provided by an rpm to be installed.

#rpm -ivh http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/sl4/i386/sa1-release-2-1.el4.noarch.rpm

This rpm save the /etc/yum.repos.d/sa1-centos4-release.repo (where centos4 in this context is equl to sl4) file enabling as default repository the production level. Then you can install the tool with the yum install command:

#yum install egee-osct-dig-lcgce

Configuration

Once the rpm is installed you can edit a plain-text INI file that are divided in two sections: the global one containing general settings such as the verbosity level and the lcgCE more specific section concerning the setting for the jobmap dir path and the definition of LRMS-specific tools.

Currently the egee-osct-dig-lcgce reads two files:

  1. /opt/glite/etc/osct-tracers.conf
  2. $HOME/.osct-tracers.conf

In case of duplicated options between the configuration files the latter takes precedence so that user can override system wide preferences without copying the entire file to the one located into the user's home.

# cd egee-osct-diglcgce
# cp dot.osct-tracers.conf /opt/glite/etc/osct-tracers
# vi /opt/glite/etc/osct-tracers.conf
[global]
log_level = 0

[lcgCE]
jobmapdir = /opt/edg/var/gatekeeper/
tracejob = /bin/echo @@JOBID@@

Usage and examples

Once you've installed and configured your egee-osct-dig-lcgce package, you can explore what the tool provide by executing the /opt/glite/bin/dig-lcgce command with the --help option.

dig-lcgce --help
usage: dig-lcgce [options]

options:
  -h, --help            show this help message and exit
  -lLIMIT, --limit=LIMIT
                        Limit the number of returned entries
  -sSTART, --start=START
                        Starting date, format -- YYYYMMDD
  -eEND, --end=END      Ending date, format -- YYYYMMDD
  -t, --trace           Traces job ID via LRMS-specific command

Note that, each time the entry in the jobmap dir is obtained it is possible to evaluate the entry itself through simple conditions against the data recorded into the jobmap dir files (e.g. userDN, userFQAN, ceID, jobID). Furthermore, since typical CEs have around thousands jobmap files, dates bounded searches gratly speed up the output process.

Example 1: Digging for information related to a suspected user DN
Scenario:
A site-manager notified the EGEE CSIRTs mailing list about a malicious job submitted with the user with DN /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli

Digging with the utility:
We will search if the suspected userDN submitted any job in the last fifteen days

#dig-lcgce -s 20090901 -e 20090916 userDN eq '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli'
{'localUser': '18700', 'ceID': 'gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert', 'timestamp': '2009-09-07 14:00:21', 
'userFQAN': ['/dteam/Role=NULL/Capability=NULL', '/dteam/italy/Role=NULL/Capability=NULL', '/dteam/italy/INFN-CNAF/Role=NULL
/Capability=NULL'], 'userDN': '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli', 
'jobID': 'https://lb009.cnaf.infn.it:9000/ZTHAJucuJpw4mgwKysV_2A', 'lrmsID': '118258.gridit-ce-001.cnaf.infn.it'}

Thus we've found out that the suspected user DN run a job in our site...
more forensic required, we could be run a malicious job in our site!

Example 2: Enquiring how many jobs a suspected user DN submitted by a given WMS/LB
Scenario:
As for more forensic analysis, you want to ask to the lb009.cnaf.infn.it WMS/LB manager information about the jobs submitted by the suspected user in order to understand their provenance

Digging with the utility:
We know the userDN, we want to get jobIDs generated by the lb009.cnaf.infn.it WMS/LB host

 [root@gridit-ce-001 misva]# dig-lcgce --limit 5 -s 20090601 -e 20090916 jobID like 'lb009' and userDN like 'misurelli'
{'localUser': '18700', 'ceID': 'gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert', 'timestamp': '2009-08-03 10:30:06', 
'userFQAN': ['/dteam/Role=NULL/Capability=NULL', '/dteam/italy/Role=NULL/Capability=NULL', '/dteam/italy/INFN-CNAF/Role=NULL
/Capability=NULL'], 'userDN': '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli', 
'jobID': 'https://lb009.cnaf.infn.it:9000/q9wgyxrrRjc4vdjfAIvqgQ', 'lrmsID': '81145.gridit-ce-001.cnaf.infn.it'}

{'localUser': '18700', 'ceID': 'gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert', 'timestamp': '2009-09-07 14:00:21', 
'userFQAN': ['/dteam/Role=NULL/Capability=NULL', '/dteam/italy/Role=NULL/Capability=NULL', '/dteam/italy/INFN-CNAF/Role=NULL
/Capability=NULL'], 'userDN': '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli', 
'jobID': 'https://lb009.cnaf.infn.it:9000/ZTHAJucuJpw4mgwKysV_2A', 'lrmsID': '118258.gridit-ce-001.cnaf.infn.it'}

We've now realized that the suspected user run two jobs in the last three month and a half using the lb008.cnaf.infn.it host...
we can now provide the WMS/LB manager with the jobIDs to get (hopefully with the egee-osct-lbtrace tool) more information about the jobs

Support and feauture requests

For any kind of support and feature requests related to the above tools please use the Savanna bug tracker section for the SA1 tools. Once registered you can refer to the Security category to route all your requests.

https://savannah.cern.ch/projects/sa1tools/

-- GiuseppeMisurelli - 2009-09-03

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2010-04-09 - GiuseppeMisurelli
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback