Week of 050919

Open Actions from last week:
  • check that correct SEs are published in prod. BDII. (Patricia)
  • Need to get procedures for dCache (Maarten)
  • Publish info into Wiki about info system (James/Gavin) DONE
  • QF for BDII - and guide for T-2 or expt to set it up (Gavin)
  • Need to put in caching to BDII provider for FTS, if expensive (e.g. channel lookups) (Gavin)

Actions on Hold:

  • Need to restart castor2 db to apply patch - schedule for gap between alice and cms (Vlado) SCHEDULED, WAITING 19th SEP
  • Get "dev-kit" for FTS API to write LFC-FPS (Paolo, David) WAIT for FTS1.4

On Call: James + Sophie

Monday:

Log: cmsprod pool got filled up via 25000 tape recalls.

New Actions:

  • Test the GC triggering in the DB for the WAN pool (olof) DONE
  • Check on how the old phedex pool nodes are now configured (Olof) DONE
  • Check with nilo/eric on DB intervention - schedule for next week (olof)
  • separate out one node for SLC4 deployment (JanVE/James) DONE
  • Discuss with Lassi how much data he weill stage and work out where it goes (olof) DONE

Discussion:

  • CMS will start this week - ALICE will also do some data movement this week.

Tuesday:

Log: nothing to report

New Actions:

  • Jan/Sophie: get LFC DLI ports open. DONE

Discussion:

  • atlas LFC issues - we'll open the port.
  • QF - BDII + channel state today/tomorrow
  • mproxy problems - can't have a renewable and retrievable proxy for the same user on a single myproxy server.

Wednesday

Log: CMS have seen problems with castorgridsc - all transfers are hanging. Also, t support mails to shiva don't seem to be getting through.

Actions:

  • arrange meeting with ATLAS for LFC Production plans (Simone) (Prob Tue 4th)
  • Arrange CMS meeting (Jamie) (3.30)
  • Check why mails don't get into SHIVA (James/Zdenek) DONE
  • Check with olof on Castor2 problem s (James/Jan) DONE

Discussion:

  • gLite QF RPMs today
  • lxshare220d will be for SLC4

Thursday

Log: Problems with Oracle DB last night.
  • Problem with castor oracle logging node reported a corruption with the file at the OS level and stopped the instance.
  • at 19.00 stager DB stopped - load became very high and then stopped. Could not be restarted other than hard reset - this triggered FS rebuild and a long recovery time.
  • at 03:00 same problem happened.

Actions:

  • Castor2 upgrade next week - check with sebastien (Tue) (olof)
  • operators acted incorrecty on a LCG_MON_GRIDFTP alarm (lxshare025d) - need to check procedures (vlado)

Discussion:

  • there were three problems on wednesday
    • trigger for GC was blocking FSs that had been selected for movement for files that needed to be copied into the pool FIXED
    • Error in the SRM - time window where tape recall is involved on an SRM get before the system knows about the file - failure of the first getRequestStatus CODE TO TEST
    • Problem with pool imbalances - J-D added a new policy to weight the size and available space of the FS = FIXED=

Friday

Log:

Actions:

Discussion:

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2005-09-22 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback