DAST shift report from the DAST shifter

Updates

DateSorted ascending Update
04/03/2016 Most issues were related to Ganga. A few issues with warning messages, persistency of jobs if there's a crash and possible lock ups were reported. These have (hopefully) now all been dealt with and should be available in 6.1.17. LHCbTasks still has issues but these are going to be addressed this week. I suspect the new release of Ganga will be in the Dev area first but an email will be sent out to give info. Other than that there was a Dirac service issue due to a restart of at least on of the voboxes and a problem with a corrupted userkey.pem file.
05/06/2016 There were a few ganga issues discussed on the list last week: - A bug in copying jobs which resulted in "RuntimeError: dictionary changed size during iteration" when copying the dataset. The workaround is to copy the file names to a new dataset by hand. This is fixed in the next release. - The jobs repository occasionally loses the names of jobs, similarly to an issue previously reported with box. What worked in this case was to run the job creation script from the commandline then start ganga once it completed, though I couldn't reproduce the error so I doubt this will work for everyone. This is also fixed in the next release. - There were reports of extremely slow job finalisation and output retrieval from two users in ganga 6.1.14. Using dirac-wms-job-get-output-data instead worked much more quickly, as did reverting to ganga 6.0.44. It's not obvious whether this is an issue with ganga or the Dirac API and version of LHCbDirac, so I've not opened an issue on it for the moment. Rob gave some helpful tips on various things that can affect ganga performance. - There were some file access issues at IN2P3. This was due to a faulty configuration which was fixed, though the user in question still seems to be having issues, which Mark is dealing with.
07/27/2015 Release of Ganga 6.1.x to the LHCb users by default from SetupProject, some initial teething problems but deployment went quite well. Ganga: problem copying job objects discovered.
8/11/2015 The past week had a quite low number of queries. A problem with empty directories left in the gangadir preventing Ganga from starting. Solution is to search for them and delete them. find ~/gangadir/repository/${USER}/LocalXML/6.0/jobs/* -empty -type d -delete The location might need to be adjusted if the user has moved their gangadir to a different location by editing their .gangarc. A problem with a job with subjobs not submitting correctly. There is in 6.1.13 still an issue with some jobs failing to submit if the submission is only done after a quit and restart of Ganga. Solution is to create job and submit in same session. A problem with a grid certificate. After the problem did not fix itself 24 hours after the user updated their certificate, I contacted Joel Closier and he fixed something. Problem often arise when a user gets a new certificate (like switch from a University one to a CERN one).
08/21/2015 There has been a problem for users to read rdst files that depend on the raw ancestor files. The depth flag in the LHCbDataset object was not propagated to the Dirac jobs. The v601r6 release will fix this. The current v601r4 release has a few problems in terms of occasional problems in making a copy of a job and a problem with that the splitting algorithm only works once (quick Ganga and re-enter to fix).
09/04/2015 CNAF will remain down through the weekend. Ganga 6.1.8 is out, with no issues yet, but probably not many users yet either. There are a couple of open threads ( here, and here ) which are pending followup by the users who started the threads.
10/02/2015 Ganga 6.1.9 was released on Friday, This release identifies as 'Ganga-SVN' due to a mistake in the release of the latest version
10/02/2015 Several sites were in downtime over the past week, this causes problems on the mailing list. A fix has been mentioned several times on the list and in the email report to the lhcb-dast list
11/12/2015 Last week was pretty quiet again. There was another case where a new user's home directory on the grid didn't exist. Again this was fixed quickly by Joel. The same user then had some mysteriously failing Bender jobs, which I wasn't able to resolve - see the thread here: https://groups.cern.ch/group/lhcb-ganga/Lists/Archive/DispForm.aspx?ID=5677 I asked Igor to report() the job in question but the report has yet to appear. Otherwise there were two instances of apparently corrupted files, one at Manchester and one at IN2P3. A ticket has been opened with the local grid services to investigate.
-- RobCurrie - 2015-08-17


This topic: LHCb > WebHome > LHCbComputing > LHCbDAST > LHCbDASTShiftReport
Topic revision: r10 - 2016-03-07 - MarkWSlater
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback