Log analysis and tracing tools
Introduction
Enquiring what a user is doing or have done using the Grid services deployed both at site and project level is a crucial activity that helps security officers during their main tasks such as forensic analysis. In order to help site managers and regional operator the OSCT group has developed two command line tools able to extract users related activity from the gLite LB and the lcg CE Grid services.
The first one presented in this section is the
egee-osct-lbtrace tool that queries the gLite-LB database while the second one is the
egee-osct-dig-lcgce tool able to parse lcg CE log files.
egee-osct-lbtrace
Basic concepts
- What is the egee-osct-lbtrace tool?
- A C binary that uses the API provided by the LB to extract jobs related information
- What API does egee-osct-lbtrace use?
- It uses C bindings for the regular LB API from the glite_lb_client library
- What do I need to run the tool?
- First of all you need to configure proper indices for the LB to be queried
- From the LB that you want to query you just have to execute the command with the proper options
- As a manager of the gLite LB which jobs I can see
- All the jobs processe by the LB
Installation
The egee-osct-lbtrace tool is currently available only for Scientific Linux 4, you can download it using YUM from the
EGEE SA1 repository. First of all, set the repository downloading the .repo file provided by an rpm to be installed.
#rpm -ivh http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos4/i386/sa1-release-2-1.el4.noarch.rpm
This rpm save the
/etc/yum.repos.d/sa1-centos4-release.repo
(where centos4 in this context is equl to sl4) file enabling as default repository the production level. Then you can install the tool with the yum install command:
#yum install egee-osct-lbtrace
Configuration
The set up of the tool involves the configuration file of both the tool itself and the LB services.
egee-osct-lbtrace
gLite LB
Since LB does not allow arbitrary queries in general, a query has to match specifi job indices that are implemented generating dedicated columns in a database table. In order to do that you have to edit the
/opt/glite/etc/glite-lb-index.conf
.
# /opt/glite/etc/init.d/glite-lb-bkserverd stop
# vi /opt/glite/etc/glite-lb-index.conf
[
JobIndices = {
[ type = "system"; name = "owner" ],
[ type = "system"; name = "location" ],
[ type = "system"; name = "destination" ],
[ type = "system"; name = "lastUpdateTime" ]
}
]
# /opt/glite/bin/glite-lb-bkindex -r /opt/glite/etc/glite-lb-index.conf
# /opt/glite/etc/init.d/glite-lb-bkserverd start
Usage and examples
Once you've installed and configured your egee-osct-lbtrace package, you can explore what the tool provide by executing the
lbtrace
command.
# lbtrace --help
Usage: lbtrace
[--help] [--chatty verbose_level] [--host LB_host]
[--keypair bundle-file] [--key key-file] [--cert cert-file]
action [action-specific options]
Currently supported actions are:
list: list all jobs from the given LB server
trace: trace the specified job from the LB server
'--help' passed after action name will give you the help for this action
As ou can see, egee-osct-lbtrace tool provides two main functions:
- the
list
function returns a list of jobs with related information
- the
trace jobid
function provides more detailed data for the job selected
Example 1: Digging into the LB to find out jobs executed in a given CE queue for a specific userDN
Scenario:
A site manager under attack needs to know from which UIs jobs belonging to a given user DN has been sent
Tracing with the utility:
We will list all the job executed by the user DN with the destination queue provided buy the site manager
# lbtrace -k host -H octopus.grid.kiae.ru list owner eq '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak'
and status eq done and destination eq snowpatch-hep.westgrid.ca:2119/jobmanager-lcgpbs-ops
--- Job 1:
JobId: https://octopus.grid.kiae.ru:9000/B1uLHolwYwpTYh1uwudmjQ
Owner: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: sam111.cern.ch
JobState: Done
StatusReason: Job terminated successfully
Destination: snowpatch-hep.westgrid.ca:2119/jobmanager-lcgpbs-ops
CondorID: 664
GlobusID: [none]
PBSOwner: [none]
PBSNode: [none]
...
Type of data |
Description |
JobId |
gLite WMS jobid assigned... |
Owner |
Submitter's DN... |
Source |
hostname/ip of the submitter source, usually a gLite UI |
JobState |
Job state result based on the job state diagram |
StatusReason |
More detailed status written by the gLite WMS LogMonitor |
Destination |
Destination queue where the CE scheduled the job |
CondorID |
Id set by the Condor system |
GlobusID id |
id set by the Globus system |
PBSOwner |
Owner of the pbs batch job |
PBSNode |
NOde where the pbs batch job has been submitted |
Example 2: Tracing to know all the job hops for a given jobID
Scenario:
Security office needs more forensics on a specific jobs
Tracing with the utility:
Detailed events for each hop will be provided for the given job with the trace option
# lbtrace -k host -H octopus.grid.kiae.ru --chatty basic trace https://octopus.grid.kiae.ru:9000/B1uLHolwYwpTYh1uwudmjQ
=== Event 1:
Timestamp: 2009.08.14 19:35:07.061424
Arrived: 2009.08.14 19:35:07.000000
Host: octopus.grid.kiae.ru
Sequence: UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:
LM=000000:LRMS=000000:APP=000000:LBS=000000
User: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: https://144.206.66.137:7443/glite_wms_wmproxy_server
=== Event 2:
Timestamp: 2009.08.14 19:35:07.168601
Arrived: 2009.08.14 19:35:07.000000
Host: octopus.grid.kiae.ru
Sequence: UI=000000:NS=0000000001:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
User: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: https://144.206.66.137:7443/glite_wms_wmproxy_server
=== Event 3:
Timestamp: 2009.08.14 19:35:07.250496
Arrived: 2009.08.14 19:35:07.000000
Host: octopus.grid.kiae.ru
Sequence: UI=000000:NS=0000000002:WM=000000:BH=0000000000:JSS=000000:LM=000000:LRMS=000000:APP=000000:LBS=000000
User: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
Source: https://144.206.66.137:7443/glite_wms_wmproxy_server
...
-----
Type of data |
Description |
Timestamp |
Timestamp of the log file describing the event |
Arrived |
Timestamp for the previous event recorded by the LB as soon as it is arrived |
Host |
Hostname of the event generator |
Sequence |
Sequence code added by the component that has generated the event |
User |
Submitter's DN |
Source |
Event source |
egee-osct-dig-lcgce
Basic concepts
- What is the egee-osct-dig-lcgce tool?
- A python binary that extract data from the gatekeeper jobmap files
- What type of data is contained into the gatekeeper jobmap files?
- Grid-related information such as User DN, FQAN, JobGridId the lcg CE generates for each job executed
- What do I need to run the tool?
- Just access the lcg CE or the central logging system
- Then all you need is to tell to a configuration file where the jobmap dir is located
Installation
The egee-osct-dig-lcgce tool is currently available only for Scientific Linux 4, you can download it using YUM from the
EGEE SA1 repository. First of all, set the repository downloading the .repo file provided by an rpm to be installed.
#rpm -ivh http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/sl4/i386/sa1-release-2-1.el4.noarch.rpm
This rpm save the
/etc/yum.repos.d/sa1-centos4-release.repo
(where centos4 in this context is equl to sl4) file enabling as default repository the production level. Then you can install the tool with the yum install command:
#yum install egee-osct-dig-lcgce
Configuration
Once the rpm is installed you can edit a plain-text INI file that are divided in two sections: the
global
one containing general settings such as the verbosity level and the
lcgCE
more specific section concerning the setting for the jobmap dir path and the definition of LRMS-specific tools.
Currently the egee-osct-dig-lcgce reads two files:
-
/opt/glite/etc/osct-tracers.conf
-
$HOME/.osct-tracers.conf
In case of duplicated options between the configuration files the latter takes precedence so that user can override system wide preferences without copying the entire file to the one located into the user's home.
# cd egee-osct-diglcgce
# cp dot.osct-tracers.conf /opt/glite/etc/osct-tracers
# vi /opt/glite/etc/osct-tracers.conf
[global]
log_level = 0
[lcgCE]
jobmapdir = /opt/edg/var/gatekeeper/
tracejob = /bin/echo @@JOBID@@
Usage and examples
Once you've installed and configured your egee-osct-dig-lcgce package, you can explore what the tool provide by executing the /opt/glite/bin/dig-lcgce command with the
--help
option.
dig-lcgce --help
usage: dig-lcgce [options]
options:
-h, --help show this help message and exit
-lLIMIT, --limit=LIMIT
Limit the number of returned entries
-sSTART, --start=START
Starting date, format -- YYYYMMDD
-eEND, --end=END Ending date, format -- YYYYMMDD
-t, --trace Traces job ID via LRMS-specific command
Note that, each time the entry in the jobmap dir is obtained it is possible to evaluate the entry itself through simple conditions against the data recorded into the jobmap dir files (e.g. userDN, userFQAN, ceID, jobID). Furthermore, since typical CEs have around thousands jobmap files, dates bounded searches gratly speed up the output process.
Example 1: Digging for information related to a suspected user DN
Scenario:
A site-manager notified the EGEE CSIRTs mailing list about a malicious job submitted with the user with DN
/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli
Digging with the utility:
We will search if the suspected userDN submitted any job in the last fifteen days
#dig-lcgce -s 20090901 -e 20090916 userDN eq '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli'
{'localUser': '18700', 'ceID': 'gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert', 'timestamp': '2009-09-07 14:00:21',
'userFQAN': ['/dteam/Role=NULL/Capability=NULL', '/dteam/italy/Role=NULL/Capability=NULL', '/dteam/italy/INFN-CNAF/Role=NULL
/Capability=NULL'], 'userDN': '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli',
'jobID': 'https://lb009.cnaf.infn.it:9000/ZTHAJucuJpw4mgwKysV_2A', 'lrmsID': '118258.gridit-ce-001.cnaf.infn.it'}
Thus we've found out that the suspected user DN run a job in our site...
more forensic required, we could be run a malicious job in our site!
Example 2: Enquiring how many jobs a suspected user DN submitted by a given WMS/LB
Scenario:
As for more forensic analysis, you want to ask to the lb009.cnaf.infn.it WMS/LB manager information about the jobs submitted by the suspected user in order to understand their provenance
Digging with the utility:
We know the userDN, we want to get jobIDs generated by the lb009.cnaf.infn.it WMS/LB host
[root@gridit-ce-001 misva]# dig-lcgce --limit 5 -s 20090601 -e 20090916 jobID like 'lb009' and userDN like 'misurelli'
{'localUser': '18700', 'ceID': 'gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert', 'timestamp': '2009-08-03 10:30:06',
'userFQAN': ['/dteam/Role=NULL/Capability=NULL', '/dteam/italy/Role=NULL/Capability=NULL', '/dteam/italy/INFN-CNAF/Role=NULL
/Capability=NULL'], 'userDN': '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli',
'jobID': 'https://lb009.cnaf.infn.it:9000/q9wgyxrrRjc4vdjfAIvqgQ', 'lrmsID': '81145.gridit-ce-001.cnaf.infn.it'}
{'localUser': '18700', 'ceID': 'gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert', 'timestamp': '2009-09-07 14:00:21',
'userFQAN': ['/dteam/Role=NULL/Capability=NULL', '/dteam/italy/Role=NULL/Capability=NULL', '/dteam/italy/INFN-CNAF/Role=NULL
/Capability=NULL'], 'userDN': '/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=giuseppe misurelli',
'jobID': 'https://lb009.cnaf.infn.it:9000/ZTHAJucuJpw4mgwKysV_2A', 'lrmsID': '118258.gridit-ce-001.cnaf.infn.it'}
We've now realized that the suspected user run two jobs in the last three month and a half using the lb008.cnaf.infn.it host...
we can now provide the WMS/LB manager with the jobIDs to get (hopefully with the egee-osct-lbtrace tool) more information about the jobs
Support and feauture requests
For any kind of support and feature requests related to the above tools please use the Savanna bug tracker section for the SA1 tools. Once registered you can refer to the
Security category to route all your requests.
https://savannah.cern.ch/projects/sa1tools/
--
GiuseppeMisurelli - 2009-09-03