-- HarryRenshall - 15 Apr 2008

Week of 080414

Open Actions from last week:

Daily CCRC'08 Call details

To join the call, at 15.00 CET Tuesday to Friday inclusive (usually in CERN bat 28-R-006), do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

Monday:

See the weekly joint operations meeting minutes

Additional Material:

Tuesday:

elog review: nothing new

Experiments round table:

ALICE (PM): are sending the last data required for their commissioning exercise to begin on 5 May. Currently CNAF is down for a Castor upgrade and the NDGF SE is down. They are testing migrating their VO-box software to run on 64-bit Linux.

Sites round table:

NL-T1 (JT): ATLAS are having difficulties with their SAM tests to judge the availability of the joint NIKHEF-SARA Dutch Tier 1. A runaway LHCb program wrote 120 GB of logs bringing down some worker nodes.

Core services (CERN) report:

DB services (CERN) report: During tests of new RAC hardware some 10% of the storage controllers have failed and as yet they have no explanation from the vendor. For this reason they propose to use Oracle's dataguard software to maintain an asynchronous failover copy of the physics databases on the old hardware after migration to the new. They will put this plan before the MB today. The planning is to migrate LHCb and CMS today, ATLAS tomorrow then WLCG on Thursday (which means a 2 hour FTS downtime and 4 hours down for the local ATLAS LFC).

Monitoring / dashboard report: CMS want to start using the Condor glide-in facility to submit about 100 jobs/day and this will need mods in the dashboard to track them. Also they want to start looking at the cpu efficiency of their various applications.

Release update:

AOB: Registration for the WLCG collaboration workshop closes tomorrow.

Wednesday

elog review: New item from PIC for LHCb. Currently space token is determined from file name path but for May CCRC this was supposed to be different - what is the status ?

Experiments round table:

LHCb (RS): asking sites if they have yet deployed the LHCBUSER space. For NL-T1 JT confirmed this (3 TB there) must come out of the LHCb MoU space envelope. All 7 Tier 1 are now replicating the LHCb LFC using Oracle streams.

CMS (DB): Not a lot of export activity now as they are busy preparing for the May run.

Sites round table:

Core services (CERN) report: After this meeting the CERN production FTS is going to be switched to the 'RAL' experiment shares model where each one gets a guarranteed total bandwidth share based on the number of files and within which it can set sub-shares. This does stop experiments profitting from lack of use by another experiment. DB of CMS asked to be informed when other FTS reconfigurations are made.

DB services (CERN) report: The migrations of the CMS and LHCb RACs to new hardware were carried on successfully yesterday afternoon and finished within the scheduled window of intervention. A standby (using Oracle Data Guard) is kept on the old hardware as a fail-over in case of further problems with the new hardware (which has recently shown several controllers and disk failures). We are also investigating the capture process of the LHCb T0 to T1 which has failed several times in the new RAC and there are a few hours of latency. We will fix it as soon as possible.

The migration of the ATLAS RAC is going on now (scheduled from 14:00 to 16:00). The list of affected services is as follows (can be found also on the it status board)

atlas_rac atlas_muon_ec_align atlas_authdb atlas_coolprod atlas_prodsys atlas_dd atlas_muoncert atlas_tags atlas_da atlas_t0 atlas_muonprod atlas_muon atlas_muon_rpc atlas_integration atlas_muoncsc atlas_largus atlas_larcalib atlas_trt atlas_largfr atlas_muonmic atlas_htmldb atlas_atlog atlas_config atlas_oksprod atlas_pvssprod atlas_dashboard atlas_dcs atlas_pvssconf_dcs atlas_coolwrite atlas_dq2_location atlas_dq2 atlas_mdt_dcs atlas_mda atlas_coca atlas_oks streams online replication atlr_backup atlas_tagsprod atlas_tags_writer

The migration to the new hardware of the LCG RAC is scheduled for tomorrow 15th April from 15:00 to 17:00. The list of affected services is (also on It board):

lcg_FCR lcg_fts lcg_fts_monitor lcg_fts_t2 lcg_fts_t2_w lcg_gridview lcg_lfc lcg_same lcg_sam lcg_sam_portal lcg_voms A transparent hardware upgrade was performed at ASGC affecting the 3D, FTS, LFC, CASTOR and SRM databases.

Monitoring / dashboard report:

Release update:

AOB: DB of CMS said the Castor team have migrated ATLAS to level 2.1.7 but now want to wait 2 weeks to see how it settles in so what would this imply for a CMS upgrade and would it be important for them ? JS said we always wanted to exercise such an upgrade during data taking and that he would find out if this release brings any benefits for CMS.

Thursday

elog review:

Experiments round table:

Sites round table:

  • RAL: Turns out LHCb require the LFC to be published in BDII for their SAM tests. We weren't doing this for the latest server. Normal LHCb use was OK we believe. Now fixed!

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Friday

elog review:

Experiments round table:

Sites round table:

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

AOB:

Edit | Attach | Watch | Print version | History: r8 | r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2008-04-17 - JamieShiers
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback