Initial efforts

GRATIA is not well documented, and I need to get a handle on where to go.

There will be a ProbeConfig file that allows the probe to operate and etc. That was prepared by Xin in the depths of time, and has not been changed by me yet. There are many things I don't understand here. Seems to have no direct link to the probe path.

I have had no success so far in my fumbling attempts to create and populate a UsageRecord -- I think perhaps I have missed a step in not having ProbeConfig in my working directory? smile Fixed and hoping that will be the sticking point.

Just discovered that there were logs that I was indeed creating UsageRecords in the logs -- but I suppose that since I had no valid ProbeConfig, that was useless. In gratia/var/logs

I am using the SGE probe template as a starting point. Seems the command sequence is:

1. Get a line from the SGE job log that looks kind of like this:

2. Parse it via a regular column structure pre-programmed into the SGE object to create a dictionary (in the init call). Simple -- I can get the dictionaries directly from cx_Oracle. I'll just map them to an internal variable in an init call? Is that even necessary?

3. Create a UsageRecord

>>> r=Gratia.UsageRecord('Batch')
2011-05-20 15:42:48 EDT Gratia: Creating a Record 2011-05-20T19:28:50Z Traceback (most recent call last): File "", line 1, in File "", line 2067, in init super(self.__class__,self).__init__() File "", line 1920, in init self.__ProbeName = Config.get_ProbeName() AttributeError: 'NoneType' object has no attribute 'get_ProbeName'

OK -- this didn't come up when I was working without a ProbeConfig -- so I'm changing it to PandaMeter (to match the filename)

Since that failed as well, I suppose I have to read the error message. smile Here's what I see -- that there is no defined Config object (it's uninitialized), so it is indeed NoneType. Off to the code.

Tried using just the straight probe template -- failed in the same way. Not a syntax problem.

Durrr --


4. Populate the UsageRecord

Adding JobId and UserId and all that -- use strings only. There should be more protection in Gratia for these things, but oh well.

Code seems to be (at least initially) developed.

5. Gratia.Send(r)

No unsuppressed usage records in this packet: not sending

Dang! OK, checking -- this comes from unconfirmed XML.

>>> out[11]

{'maxdiskunit': None, 'assignedpriority': 1000, 'dispatchdblock': None, 'ninputdatafiles': 1, 'nevents': 0, 'creationtime': datetime.datetime(2011, 5, 16, 0, 49, 37), 'maxcpucount': 0, 'cpuconsumptionunit': None, 'destinationse': 'ANALY_NIKHEF-ELPROD', 'maxattempt': 0, 'minramunit': None, 'exeerrorcode': 0, 'pilotid': None, 'specialhandling': 'rebro', 'jobsetid': 2671, 'modificationhost': '', 'brokerageerrorcode': 0, 'relocationflag': 1, 'cloud': 'NL', 'sourcesite': '2599', 'workinggroup': None, 'ninputfiles': None, 'homepackage': 'AnalysisTransforms-AtlasProduction_16.0.2.4', 'prodsourcelabel': 'user', 'ddmerrorcode': 0, 'produsername': 'elisa piccaro', 'taskbuffererrordiag': 'killed by Panda server : upstream job failed', 'ipconnectivity': None, 'jobdispatchererrorcode': 0, 'attemptnr': 0, 'maxcpuunit': None, 'superrorcode': 0, 'metadata': None, 'cpuconversion': None, 'vo': 'atlas', 'computingelement': '', 'inputfiletype': 'AOD', 'transexitcode': None, 'proddbupdatetime': datetime.datetime(1, 1, 1, 0, 0), 'currentpriority': -3229, 'transformation': '', 'jobdefinitionid': 2672, 'jobdispatchererrordiag': None, 'pandaid': 1236909887, 'piloterrorcode': 0, 'maxdiskcount': 0, 'superrordiag': None, 'jobparameters': None, 'proddblock': 'data10_7TeV.periodE.physics_Muons.PhysCont.AOD.repro05_v02/', 'processingtype': 'pathena', 'commandtopilot': None, 'cpuconsumptiontime': 0, 'jobname': '13d1c1c9-9bec-4087-a0de-f35b5224b64b', 'batchid': None, 'brokerageerrordiag': None, 'grid': None, 'jobstatus': 'cancelled', 'parentid': 1236204503, 'atlasrelease': 'Atlas-16.0.2', 'endtime': datetime.datetime(2011, 5, 16, 1, 25, 39), 'prodserieslabel': None, 'computingsite': 'ANALY_NIKHEF-ELPROD', 'exeerrordiag': None, 'ddmerrordiag': None, 'destinationsite': None, 'destinationdblock': 'user.epiccaro.JPsiIt_PeriodE_common_v3/', 'corecount': None, 'inputfileproject': 'data10_7TeV', 'pilottiming': None, 'cmtconfig': 'i686-slc5-gcc43-opt', 'taskbuffererrorcode': 100, 'modificationtime': datetime.datetime(2011, 5, 16, 1, 25, 39), 'minramcount': 0, 'schedulerid': None, 'lockedby': 'panda-client-0.3.41', 'transfertype': None, 'starttime': None, 'produserid': '/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy', 'countrygroup': None, 'jobexecutionid': 2613, 'piloterrordiag': None, 'statechangetime': datetime.datetime(2011, 5, 16, 1, 25, 39), 'creationhost': '', 'taskid': 427, 'inputfilebytes': 3816013252}

>>> r=GetRecord(out[11])

>>> r.XmlCreate()

>>> r.XmlData

['\n', 'file:///u:/OSG/urwg-schema.11.xsd">\n', '\n', '\n', '\t', 'None', '\n', '\t', '1236909887', '\n', '\n', '\n', '\t', '/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy', '\n', '\t', 'elisa piccaro', '\n', '\t', '/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy', '\n', '\n', '\t', '13d1c1c9-9bec-4087-a0de-f35b5224b64b', '\n', '\t', 'cancelled', '\n', '\t', 'PT23H23M58.0S', '\n', '\t', 'PT23H23M58.0S', '\n', '\t', 'PT0S', '\n', '\t', '0', '\n', '\t', '2011-05-16T04:49:37Z', '\n', '\t', '2011-05-16T05:25:39Z', '\n', '\t', '', '\n', '\t', '', '\n', '\t', 'ANALY_NIKHEF-ELPROD', '\n', '\t', 'ANALY_NIKHEF-ELPROD', '\n', '\t', '', '\n', '\t', u'PandaMeter', '\n', '\t', u'PanDA_ATLAS', '\n', '\t', u'OSG', '\n', '\t', '1', '\n', '\t', 'Batch', '\n', '\n']

>>> xmlDoc = Gratia.safeParseXML("".join(r.XmlData))

>>> CheckXmlDoc(xmlDoc,False)


OK -- that's what's happening. There's something wrong with the XML we are using.

<?xml version="1.0" encoding="UTF-8"?><JobUsageRecord xmlns=""        xmlns:urwg=""        xmlns:xsi=""         xsi:schemaLocation=" file:///u:/OSG/urwg-schema.11.xsd">
<RecordIdentity urwg:recordId="" urwg:createTime="2011-05-20T23:26:55Z" />
   <LocalJobId >None</LocalJobId>
   <GlobalJobId >1236909887</GlobalJobId>
   <LocalUserId >/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy</LocalUserId>
   <GlobalUsername >elisa piccaro</GlobalUsername>
   <DN >/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy</DN>
   <JobName >13d1c1c9-9bec-4087-a0de-f35b5224b64b</JobName>
   <Status >cancelled</Status>
   <TimeDuration urwg:type="submit" >PT23H23M58.0S</TimeDuration>
   <WallDuration urwg:description="Was entered in seconds" >PT23H23M58.0S</WallDuration>
   <CpuDuration urwg:usageType="user" urwg:description="Was entered in seconds" >PT0S</CpuDuration>
   <Processors consumptionRate="total" urwg:metric="total" >0</Processors>
   <StartTime urwg:description="Was entered in seconds" >2011-05-16T04:49:37Z</StartTime>
   <EndTime urwg:description="Was entered as text" >2011-05-16T05:25:39Z</EndTime>
   <MachineName ></MachineName>
   <SubmitHost ></SubmitHost>
   <Host primary="true" >ANALY_NIKHEF-ELPROD</Host>
   <Queue >ANALY_NIKHEF-ELPROD</Queue>
   <Resource urwg:description="user" ></Resource>
   <ProbeName >PandaMeter</ProbeName>
   <SiteName >PanDA_ATLAS</SiteName>
   <Grid >OSG</Grid>
   <Njobs >1</Njobs>
   <Resource urwg:description="ResourceType" >Batch</Resource>

[18:34:01 Satchel ~]$ xmllint --valid test.xml test.xml:1: validity error : Validation failed: no DTD found ! Location=" file:///u:/OSG/urwg-schema.11.xsd"

Missing a schema file?

That's not it -- the same structure works if I only add a JobId. Anything else creates the same problem, even if it's clearly no threat to the XML.

15:16:34 EDT Gratia: Warning: UserIdentity block does not have exactly one populated LocalUserId node in Unknown Unknown 15:16:34 EDT Gratia: Info: suppressing record with Unknown Unknown due to Grid == Local 15:16:34 EDT Gratia: No unsuppressed usage records in this packet: not sending 15:16:34 EDT Gratia: *********************************************************

Hm. OK, I made the config file not require Grid records (SuppressGridLocalRecords = '1' -> '0')

Didn't work. But when I turned debug to 5, I managed to get somewhere -- the rejection came from LocalJobId being blank. After adding it, we're in business. Yay!

Now I need to figure out what to do with the certs. I am asking Philipe and Steve.

-- AldenStradling - 20-May-2011

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2011-05-23 - AldenStradling
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback