-- HarryRenshall - 16 Jan 2008

Week of 080114

Open Actions from last week:

Monday:

see the weekly phone conference in Indico

Tuesday:

Experiment reports: ALICE (PM) met with the Castor team this morning to discuss in which Castor release will be a new xrootd plugin that they need. More info tomorrow. Since RAL is staying with Castor 2.1.4 (this and 2.1.6 are supported for bug fixes) ALICE will send raw data to tape at RAL in February but not read it back. ATLAS (SC) software to handle space tokens is ready and will be used as soon as sites configure them. CMS (AS) nothing to report. LHCb (RS) Nick Brook will coordinate solving the Castor/rfio problem of LGCb at RAL and CNAF. There will be an LFC bulk data modification to the master LFC.

SMOD report: (M.C-S) FIO will upgrade all CERN LFC to 1.6.7 next week. The 1.6.8 version which runs in slc4 is not yet available though ATLAS plan to use it for bulk operations in February.

DBMOD report: (JDS) We now know that the physics databases client on worker nodes needs much of the applications area software stack. It also needs Oracle libraries and these are in fact freely available but without technical support. Migration to new hardware in the integration RAC will be done for CMS this week and ATLAS next week.

Monitoring / dashboard report: We are looking at how best to handle the critical services.

Release update: Nothing new to report.

Questions from sites:

AOB: D.Bonacorsi of CMS asked if Castor sites are free to choose if they run 2.1.4 or 2.1.6 ? (M.C-S) Yes, both are supported for bug fixes. Also can they choose gridftp1 or 2 ? (M.C-S) Castor 2.1.6 will require gridftp2. (JS) the approved middleware versions should be ready to push out to sites next week. (NT) the SL4 VO-box is ready to go into production - are there any objections if we release it ? (there were none). (JS) Note that reports for this meeting can be sent to the mailing list wlcg-scod@cernNOSPAMPLEASE.ch

Wednesday

Experiment reports: ATLAS (SC) and LHCb (RS) do not need site installation of the Oracle Instant Client as they are included in the experiment software suites. ALICE (PM) will be having another meeting with the Castor team to decide on using a new xrootd plugin. It might also be available for Castor 1.6.4.

Core services (CERN) report:

DB services (CERN) report (MG): There was a move of CMS databases to a 64-bit OS machine which exposed a frontier application of theirs using a hard-coded server name in a connection string. The user has been advised.

Monitoring / dashboard report (JC): CCRC08 elog-gers have been deployed, accessible from the Twiki. They will be used to document interventions, problems and also general observations. In addition there is one intended to link in with MoU response times. Write access requires registration (from the elog entry page). There are already LHCb and ATLAS detector elog-gers but we could also host experiments under CCRC08.

Release update (NT): A DPM patch release is about to be made, FTS will be released next Monday and also gfal and lcg-utils. SC reported that LHCb had found a bug in list-replica. Since this is the baseline version supporting SRM 2 it should be built into the repository but not (yet) installed. The slc4 VO-box will be put in the middleware repository tomorrow. JDS hoped that the CCRC08 baseline middleware versions would be ready for sites to start installing by next Mondays operations meeting.

Questions from sites:

AOB:

Thursday

Experiment report(s): LHCb want to move files from the pit to the Castor lhcbdata pool which they cannot currently see. MCS asked them to send a request to castor.support. CMS reported that the tomcat server in front of the SAM database was down - a known problem.

Core services (CERN) report (MCS): At 06.00 the CMS Castor instance could not send its heartbeat to the central service. Being looked at with network experts. As soon as the new xrootd plugin has been tested it can be deployed - there are no Castor dependencies. CNAF and RAL should deploy the current one then all sites should redeploy the new one together.

DB services (CERN) report: (JDS on behalf of MG): The problem with the integration RAC from CMS INT9R observed yesterday by Frontier following the migration to 64 bit hardware was due to a not refreshed cache of IP addresses by the Frontier Tomcat server which was therefore leading the application to connect still to the old (and not more existing) hardware. CMS is aware of this problem and will fix it.

Under the request of LHCb, we have applied today a rolling patch (already tested by LHCb on their integration RAC) which fixes a Oracle bug affecting updates of two CLOB columns in the same query which appears in the LHCb COOL use case.

Monitoring / dashboard report:

Release update: There is some confusion over the valid dpm patches - some reported were in fact stale. MCS reported that the dpm in production is good enough for what we need. NT said there would anyway be a new dpm today, fts gfal and lcg-utils would go to pps next Monday for rapid cycling. The gfal get_replica (calling list_replicas) bug is understood and a fix is available.

Questions from sites:

AOB:

Friday

Experiment report(s):

  • SAM: issue with OSG sites understood and (hopefully) soon to be fixed
  • ALICE: Uni-Mexico; Russia and RAL to start with gLite 3.1 VOBox
  • ATLAS: need to know when at least one site will be ready with space tokens
  • LHcb: rfio issue & RAL - changing CASTOR config with 1 LSF per disk server seems to solve problem -> roll-out to other sites to be configured.

Core services (CERN) report:

DB services (CERN) report:

We have applied on Atlas online PVSS data a procedure to compact and compress the archived data. The measured outcome is a reduction of 50% of the allocated space (from 1100 GB to 500 GB) and, as a direct consequence, a two-fold speed-up of PVSS queries. This procedure had been developed with Atlas and CO in Q4 2007 following the discovery of a bug that caused Oracle blocks to be only partially filled.

Unfortunately the compression has caused a streams bug, the capture processes are aborted when mining the redo log or archive log files which contain the information for the compressed tables and the replication is now blocked for ATLAS setup since Tuesday 12.01. A Service Request was opened on priority 1 and a patch is being developed. We are in close contact with the Oracle Support . A patch is already existing and with Oracle development for final validation. Oracle tells us that this patch should arrive before Monday morning and we will immediately apply it.

We have also enlarged the retention for the log files so that we will not need to reimport.

Monitoring / dashboard report:

  • Metrics still being defined - will take ~1 more week
  • Gridmap for service providers - still need experiment input
  • Experiment tests in SAM - info collected and will be distributed

Release update:

  • Release to production: VO box, gFal, lcg_utils, DPM ready -> make available 09:30 UTC+1
  • LFC 1.6.8 in cert - expected Mon/Tue -> ATLAS T0 LFC
  • FTS: pre-production smoke-test Monday -> pilot Monday
  • list-replica bug in gFal still to be provided

Questions from sites:

  • WN tar ball availability? Same as above

AOB:

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2008-01-18 - JamieShiers
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback