Difference: DteailedStatusSites (1 vs. 23)

Revision 232006-10-12 - AngeloCarbone

Line: 1 to 1
 
META TOPICPARENT name="DC06Activity"

Resource Status (day by day)

    Added:
    >
    >
  • 11-10-06
    • CERN,RAL,: Running fine.
    • GRIDKA: Not able to run reconstruction jobs due to problem of software installation (under investigation)
    • PIC: Not able to run reconstruction jobs: libshift.so.2.1 is missing
    • CNAF: no reconstruction job sent to CNAF
    • IN2P3: Not able to run reconstruction jobs: jobs failed but apper running on the monitor page
    • NIKHEF: Transfer failed.
  •  
  • 10-10-06
    • CERN,RAL,PIC,GRIDKA,NIKHEF,IN2P3,CNAF: Running fine.
    • NIKHEF: Maintaining low-rate of ~2 files per hour.
  • Revision 222006-10-11 - AndrewCSmith

    Line: 1 to 1
     
    META TOPICPARENT name="DC06Activity"

    Resource Status (day by day)

      Added:
      >
      >
    • 10-10-06
      • CERN,RAL,PIC,GRIDKA,NIKHEF,IN2P3,CNAF: Running fine.
      • NIKHEF: Maintaining low-rate of ~2 files per hour.
    •  
    • 09-10-06
      • CERN,RAL,PIC,GRIDKA,NIKHEF,IN2P3,CNAF: Running fine.
      • IN2P3: Restarted transfers.
    • Revision 212006-10-09 - AndrewCSmith

      Line: 1 to 1
       
      META TOPICPARENT name="DC06Activity"

      Resource Status (day by day)

        Added:
        >
        >
      • 09-10-06
        • CERN,RAL,PIC,GRIDKA,NIKHEF,IN2P3,CNAF: Running fine.
        • IN2P3: Restarted transfers.
        • NIKHEF: Test jobs successful. Maintaining low-rate of ~2 files per hour.
      •  
      • 29-09-06
        • CERN,RAL,IN2P3,PIC,GRIDKA: Running fine.
        • RAL: Temporary in the morning with gridftp doors turning into black holes. Started ~2am, resolved 11am.
      • Revision 202006-09-29 - AndrewCSmith

        Line: 1 to 1
         
        META TOPICPARENT name="DC06Activity"

        Resource Status (day by day)

          Added:
          >
          >
        • 29-09-06
          • CERN,RAL,IN2P3,PIC,GRIDKA: Running fine.
          • RAL: Temporary in the morning with gridftp doors turning into black holes. Started ~2am, resolved 11am.
          • PIC: Problem with 'No space left on device' resolved (transfers restarted)
          • NIKHEF/SARA: Have to upgrade the SRM dcache servers to the last version of passive
          • CNAF: Test job failed.
        • 28-09-06
          • CERN,RAL,IN2P3,GRIDKA: Running fine.
          • PIC: Problem with 'No space left on device'.
          • NIKHEF/SARA: Have to upgrade the SRM dcache servers to the last version of passive
        •  
        • 27-09-06 Transfers were running to fine to RAL, IN2P3. However They Stopped DC06 because the rDST produced so far cannot be accessed for subsequent stripping processes.
        • Revision 192006-09-28 - AndrewCSmith

          Line: 1 to 1
           
          META TOPICPARENT name="DC06Activity"

          Resource Status (day by day)

            Line: 7 to 7
              Transfers were running to fine to RAL, IN2P3. However They Stopped DC06 because the rDST produced so far cannot be accessed for subsequent stripping processes.
            • CERN,RAL,IN2P3: Running fine.
            • PIC: Out of disk space at PIC-tape
            Changed:
            <
            <
            • GRIDKA:
            >
            >
            • GRIDKA: CERN-GRIDKA network connection down
             
            • NIKHEF/SARA: Have to upgrade the SRM dcache servers to the last version of passive dcache for allowing client for not being called back.
          • 26-09-06
          • Revision 182006-09-27 - AndrewCSmith

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            • 27-09-06
            Changed:
            <
            <
            Transfers were running fine overall T1 expect CNAF and NIKHEF/SARA (where they have to upgrade the SRM dcache servers to the last version of passive dcache for allowing client for not being called back. However They Stopped DC06 because the rDST produced so far cannot be accessed for subsequent stripping processes.

            >
            >
            Transfers were running to fine to RAL, IN2P3. However They Stopped DC06 because the rDST produced so far cannot be accessed for subsequent stripping processes.
            • CERN,RAL,IN2P3: Running fine.
            • PIC: Out of disk space at PIC-tape
            • GRIDKA:
            • NIKHEF/SARA: Have to upgrade the SRM dcache servers to the last version of passive dcache for allowing client for not being called back.
             
          • 26-09-06
            • CERN,PIC,IN2P3: Running fine.
          • Revision 172006-09-27 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Changed:
            <
            <
            • 26-09-06
            >
            >
            • 27-09-06

              Transfers were running fine overall T1 expect CNAF and NIKHEF/SARA (where they have to upgrade the SRM dcache servers to the last version of passive dcache for allowing client for not being called back. However They Stopped DC06 because the rDST produced so far cannot be accessed for subsequent stripping processes.

            • 26-09-06
             
              • CERN,PIC,IN2P3: Running fine.
              • RAL: Transfers failing to RAL (GGUS 13208)
              • CNAF: No FTS transfers. CNAF FTS Castor2 problems. New GGUS ticket submitted 13301. (previous ticket GGUS #13121)
            Changed:
            <
            <
            • 25-09-06
            >
            >
          • 25-09-06
          •  
              • CERN,PIC: Running fine.
              • RAL: Transfers failing to RAL (GGUS 13208)
              • CNAF, IN2P3: No FTS transfers. IN2P3 data transfer problems (GGUS #13050). CNAF FTS Castor2 problems (GGUS #13121)
              • GRIDKA: still data access problem related to GGUS ticket (#12320).
              • NIKHEF/SARA: tests of a backdoor method in ROOT using new client libraries of DCACHE were successfully against IN2P3 and RAL ( that upgraded their servers). Next week Ron will upgrade the dcache server at SARA so that SARA may enter into the game.
            Changed:
            <
            <
            • 22-09-06
            >
            >
          • 22-09-06
          •  
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: still data access problem related to GGUS ticket (12320). FTS transfers stopped.
              • CNAF, IN2P3: No FTS transfers. IN2P3 data transfer problems (GGUS #13050). CNAF FTS Castor2 problems (GGUS #13121)
            Changed:
            <
            <
            • 21-09-06
            >
            >
          • 21-09-06
          •  
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: still data access problem related to GGUS ticket (12320). FTS transfers ongoing.
              • CNAF, IN2P3: FTS transfers restarted for both sites but immediately failed. IN2P3 data transfer problems (GGUS #13050). CNAF FTS Castor2 problems (GGUS #13121)
            Changed:
            <
            <
            • 20-09-06
            >
            >
          • 20-09-06
          •  
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: some data access problem. Opening GGUS ticket...
              • CNAF: Intervention on Castor 2 stager done. It can start receiving transfers (low rate)
              • IN2P3: restart transfer at low rate.
            Added:
            >
            >
             

            19-20 of September : for more than 12 hours FTS at CERN did work; no transfers were possible.

            • 19-09-06
              • CERN,RAL,PIC GRIDKA: Running fine.

            Revision 162006-09-26 - AndrewCSmith

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 26-09-06
              • CERN,PIC,IN2P3: Running fine.
              • RAL: Transfers failing to RAL (GGUS 13208)
              • CNAF: No FTS transfers. CNAF FTS Castor2 problems. New GGUS ticket submitted 13301. (previous ticket GGUS #13121)
             
            • 25-09-06
              • CERN,PIC: Running fine.
              • RAL: Transfers failing to RAL (GGUS 13208)

            Revision 152006-09-25 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 25-09-06
              • CERN,PIC: Running fine.
              • RAL: Transfers failing to RAL (GGUS 13208)
              • CNAF, IN2P3: No FTS transfers. IN2P3 data transfer problems (GGUS #13050). CNAF FTS Castor2 problems (GGUS #13121)
              • GRIDKA: still data access problem related to GGUS ticket (#12320).
              • NIKHEF/SARA: tests of a backdoor method in ROOT using new client libraries of DCACHE were successfully against IN2P3 and RAL ( that upgraded their servers). Next week Ron will upgrade the dcache server at SARA so that SARA may enter into the game.
             
            • 22-09-06
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: still data access problem related to GGUS ticket (12320). FTS transfers stopped.
            Line: 11 to 17
             
              • CNAF, IN2P3: FTS transfers restarted for both sites but immediately failed. IN2P3 data transfer problems (GGUS #13050). CNAF FTS Castor2 problems (GGUS #13121)
            • 20-09-06
              • CERN,RAL,PIC: Running fine.
            Changed:
            <
            <
              • GRIDKA: some data access problem. Open GGUS ticket
            >
            >
              • GRIDKA: some data access problem. Opening GGUS ticket...
             
              • CNAF: Intervention on Castor 2 stager done. It can start receiving transfers (low rate)
            Changed:
            <
            <
              • IN2P3: restart transfer at low rate
            >
            >
              • IN2P3: restart transfer at low rate.
             

            19-20 of September : for more than 12 hours FTS at CERN did work; no transfers were possible.

            • 19-09-06
              • CERN,RAL,PIC GRIDKA: Running fine.

            Revision 142006-09-22 - AndrewCSmith

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 22-09-06
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: still data access problem related to GGUS ticket (12320). FTS transfers stopped.
              • CNAF, IN2P3: No FTS transfers. IN2P3 data transfer problems (GGUS #13050). CNAF FTS Castor2 problems (GGUS #13121)
            • 21-09-06
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: still data access problem related to GGUS ticket (12320). FTS transfers ongoing.
              • CNAF, IN2P3: FTS transfers restarted for both sites but immediately failed. IN2P3 data transfer problems (GGUS #13050). CNAF FTS Castor2 problems (GGUS #13121)
             
            • 20-09-06
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: some data access problem. Open GGUS ticket
              • CNAF: Intervention on Castor 2 stager done. It can start receiving transfers (low rate)
              • IN2P3: restart transfer at low rate
            Changed:
            <
            <

            19-20 of September : for more than 12 hours FTS at CERN did work; no transfers weren't possible.

            >
            >

            19-20 of September : for more than 12 hours FTS at CERN did work; no transfers were possible.

             
            • 19-09-06
              • CERN,RAL,PIC GRIDKA: Running fine.
              • CNAF,IN2P3:No transfers to the site

            Revision 132006-09-20 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"
            Changed:
            <
            <

            Resource Status (day by day)

            >
            >

            Resource Status (day by day)

            • 20-09-06
              • CERN,RAL,PIC: Running fine.
              • GRIDKA: some data access problem. Open GGUS ticket
              • CNAF: Intervention on Castor 2 stager done. It can start receiving transfers (low rate)
              • IN2P3: restart transfer at low rate

            19-20 of September : for more than 12 hours FTS at CERN did work; no transfers weren't possible.

             
            • 19-09-06
              • CERN,RAL,PIC GRIDKA: Running fine.
              • CNAF,IN2P3:No transfers to the site
            Added:
            >
            >
              • NIKHEF: promising tests of accessing data via gsidcap went successfully. This has been carried out against the Certification gsidcap SRM here at CERN. Not still fixed the firewall problem that further prevents LHCb using NIKHEF.
             
            • 18-09-06
              • CERN,RAL,PIC GRIDKA: Running fine.
              • CNAF:No transfers to the site

            Revision 122006-09-19 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 19-09-06
              • CERN,RAL,PIC GRIDKA: Running fine.
              • CNAF,IN2P3:No transfers to the site
            • 18-09-06
              • CERN,RAL,PIC GRIDKA: Running fine.
              • CNAF:No transfers to the site
              • IN2P3:Software installation seems to be corrupted over there. Jobs failing immediately. STOP transfer.
            • 17-09-06
              • CERN,RAL,PIC GRIDKA,IN2P3: Running fine.
              • CNAF:No transfers to the site
            • 16-09-06
              • CERN,RAL,PIC GRIDKA,IN2P3: Running fine.
              • CNAF:No transfers to the site
            • 15-09-06
              • CERN,RAL,PIC GRIDKA,IN2P3: Running fine.
              • CNAF:No transfers to the site
            • 14-09-06
              • CERN,RAL,PIC GRIDKA,IN2P3: Running fine.
              • CNAF:No transfers to the site
            • 13-09-06
              • CERN,RAL,IN2P3,PIC: Running fine.
              • GRIDKA,PIC: Restarted transfer to GRIDKA (they get again a reasonable share) and to PIC
              • CNAF Attempts transferring to CNAF. Again Castor 2 problems. Keep willingly transfer to CNAF to debug the problem.
             
            • 12-09-06
              • CERN,RAL,IN2P3: Running fine.
              • GRIDKA,CNAF,PIC:No transfers to these sites

            Revision 112006-09-13 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"
            Deleted:
            <
            <
             

            Resource Status (day by day)

            Added:
            >
            >
            • 12-09-06
              • CERN,RAL,IN2P3: Running fine.
              • GRIDKA,CNAF,PIC:No transfers to these sites
            • 11-09-06
              • CERN,RAL,IN2P3: Running fine.
              • GRIDKA,CNAF,PIC:No transfers to these sites
             
            • 08-09-06
              • PIC: Back running. So restarted transfers there.
              • GRIDKA:Still not transferring files there until the backlog of jobs (fairshare problem) is cleared.

            Revision 102006-09-10 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"
            Added:
            >
            >
             

            Resource Status (day by day)

            • 08-09-06
            Changed:
            <
            <
              • CNAF: Problems with FTS transfers there. Stopped them. PIC: Back running. So restarted transfers there. GridKa: Still not transferring files there until the backlog of jobs (fairshare problem) is cleared.
            >
            >
              • PIC: Back running. So restarted transfers there.
              • GRIDKA:Still not transferring files there until the backlog of jobs (fairshare problem) is cleared.
              • CNAF: Problems with FTS transfers there. Stopped them.
             
            • 07-09-06
              • CERN,PIC,RAL,IN2P3,CNAF:Understood the problem on RAL (see Old problem section for more explanation). Also the cooling system problem at CNAF is solved and services are back to the life. Tranfsers are oing to CNAF then.
              • GRIDKA: LHCb simply run out of the share allocated for them in GRIDKA .

            Revision 92006-09-08 - RajaNandakumar

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 08-09-06
              • CNAF: Problems with FTS transfers there. Stopped them. PIC: Back running. So restarted transfers there. GridKa: Still not transferring files there until the backlog of jobs (fairshare problem) is cleared.
             
            • 07-09-06
              • CERN,PIC,RAL,IN2P3,CNAF:Understood the problem on RAL (see Old problem section for more explanation). Also the cooling system problem at CNAF is solved and services are back to the life. Tranfsers are oing to CNAF then.
              • GRIDKA: LHCb simply run out of the share allocated for them in GRIDKA .

            Revision 82006-09-08 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            • 07-09-06
            Changed:
            <
            <
              • CERN,PIC,RAL,IN2P3,CNAF:Understood the problem on RAL (see Old problem section for more explanation). Also the coolong system problem at CNAF is solved and services are back to the life. Tranfsers are oing to CNAF then
              • GRIDKA: LHCb simply run out of the share allocated for them in GridKA.
            >
            >
              • CERN,PIC,RAL,IN2P3,CNAF:Understood the problem on RAL (see Old problem section for more explanation). Also the cooling system problem at CNAF is solved and services are back to the life. Tranfsers are oing to CNAF then.
              • GRIDKA: LHCb simply run out of the share allocated for them in GRIDKA .
             
            • 06-09-06
              • CERN,PIC,GRIDKA:Transfers are going pretty smoothly. Is GRIDKA recovered?
              • IN2P3: electrical power shut down

            Revision 72006-09-08 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            • 07-09-06
            Changed:
            <
            <
              • CERN,PIC,RAL,IN2P3:Understood the problem on RAL (see Old problem section for more explanation)
            >
            >
              • CERN,PIC,RAL,IN2P3,CNAF:Understood the problem on RAL (see Old problem section for more explanation). Also the coolong system problem at CNAF is solved and services are back to the life. Tranfsers are oing to CNAF then
             
              • GRIDKA: LHCb simply run out of the share allocated for them in GridKA.
            Deleted:
            <
            <
              • CNAF: No transfers are going to CNAF. Cooling system problem fixed. CNAF could be resurrect.
             
            • 06-09-06
              • CERN,PIC,GRIDKA:Transfers are going pretty smoothly. Is GRIDKA recovered?
              • IN2P3: electrical power shut down

            Revision 62006-09-08 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 07-09-06
              • CERN,PIC,RAL,IN2P3:Understood the problem on RAL (see Old problem section for more explanation)
              • GRIDKA: LHCb simply run out of the share allocated for them in GridKA.
              • CNAF: No transfers are going to CNAF. Cooling system problem fixed. CNAF could be resurrect.
             
            • 06-09-06
              • CERN,PIC,GRIDKA:Transfers are going pretty smoothly. Is GRIDKA recovered?
              • IN2P3: electrical power shut down
            Changed:
            <
            <
              • RAL: removed the channel as the performances have been reduced slightly and there are many back-logged transfer jobs. No sense submitting more and overloading.
              • CNAF: No transfers are going to CNAF
            >
            >
              • RAL: removed the channel as the performances have been reduced slightly and there are many back-logged transfer jobs. No sense submitting more and overloading. Problem with the channel CERN-RAL in the FTS service at CERN?
              • CNAF: No transfers are going to CNAF. Cooling system problem
             
            • 05-09-06
              • CERN,RAL:were running fine.
              • IN2P3: electrical power shut down
            Changed:
            <
            <
              • CNAF: no news from CNAF
            >
            >
              • CNAF: no news from CNAF. Cooling system problem that force the shut down of the most important services.
             
              • GRIDKA: The strange behavior of gridftpd seems to be correlated to the overload of the server it self. Doris is still investigating on that with other tests using the CERN BDII. No failures so far.
              • PIC: No news from PIC
              • SARA/NIKHEF: Test are ongoing.

            Revision 52006-09-06 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Changed:
            <
            <
            >
            >
            • 06-09-06
              • CERN,PIC,GRIDKA:Transfers are going pretty smoothly. Is GRIDKA recovered?
              • IN2P3: electrical power shut down
              • RAL: removed the channel as the performances have been reduced slightly and there are many back-logged transfer jobs. No sense submitting more and overloading.
              • CNAF: No transfers are going to CNAF
             
            • 05-09-06
              • CERN,RAL:were running fine.
              • IN2P3: electrical power shut down

            Revision 42006-09-05 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 05-09-06
              • CERN,RAL:were running fine.
              • IN2P3: electrical power shut down
              • CNAF: no news from CNAF
              • GRIDKA: The strange behavior of gridftpd seems to be correlated to the overload of the server it self. Doris is still investigating on that with other tests using the CERN BDII. No failures so far.
              • PIC: No news from PIC
              • SARA/NIKHEF: Test are ongoing.
             
            • 04-09-06
              • CERN,RAL,IN2P3 :were running fine. For Lyon there is the issue about too short queues made available to LHCb (GGUS #12150)
              • CNAF: it seems that once fixed the bug for the Castor DB the situation went back to the normality. Stagiing quite a lot of files without problems. They however didn't yet manage to run a reconstruction job because of the ownership of some files in the disk (not accessible by root from within a grid job). Hopefully today CNAF will become again green.

            Revision 32006-09-04 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Added:
            >
            >
            • 04-09-06
              • CERN,RAL,IN2P3 :were running fine. For Lyon there is the issue about too short queues made available to LHCb (GGUS #12150)
              • CNAF: it seems that once fixed the bug for the Castor DB the situation went back to the normality. Stagiing quite a lot of files without problems. They however didn't yet manage to run a reconstruction job because of the ownership of some files in the disk (not accessible by root from within a grid job). Hopefully today CNAF will become again green.
              • GRIDKA: Doris confirmed that the problem is in the gridftp daemon that closes sometimes the sockets. She's looking on the problem.
              • PIC: no reconstruction jobs are picked up from PIC. Fairshare issue? Under investigation.
              • SARA/NIKHEF: Test are ongoing. Opening a file with root using gsidcap is not yet possible but unless the application doesn't crash. GFAL plugin needs the site to be published as supporting gsidcap protocol. From the other side site administrator are keen to install new dcache clients on their WNs without waiting for another release of LCG. To understand the backward compatibility of the new dCache libraries (needed by LHCb for running reconstruction at NIKHEF and Lyon-TAPE) with previous version.
             
            • 01-09-06
              • CERN,RAL,IN2P3, PIC :were running fine

            Revision 22006-09-02 - unknown

            Line: 1 to 1
             
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            Changed:
            <
            <
            • 24-08-06
              • CERN,PIC: were running fine.
              • IN2P3,RAL: were down yesterday so not data has been sent to these centers
              • CNAF: Angelo managed to run successfully a reco job at CNAF after increasing the max number of simultaneous allowed jobs on LSF. Ask for another disk server too. Only simulation is running fine there (that doesn't require access to SE)
              • GRIDKA: no news from Gridka about the problem that originally was identified to be related to an overloaded griftp server.
            • 25-08-06
            >
            >
            • 01-09-06
              • CERN,RAL,IN2P3, PIC :were running fine
              • GRIDKA: some problem using the environment with LCG_GFAL_INFOSYS to CERN top bdii. However the main problem (from Doris) seems to be related to the gridftp doors that from time to time close their sockets. Under investigation too. Few reco jobs sent over there.
              • CNAF: it might be a (unknown) bug on the DB whose also CERN people may be not aware of. The DB worked last night. CMS and Atlas were keeping the LSF batch full (even if there is this problem). Another problem is in the pure disk pool (no garbage collector and no migration to tape) : in case the disk gets full not only write access but also read access becomes impossible. This turns into pending jobs in the queue and consequent pile up of forthcoming requests in LSF (blocking the slots).

            • 31-08-06
              • CERN,RAL,IN2P3: were running fine
              • CNAF: Castor2 problems that seem to be related to the DB (deadlocks).
              • PIC: No pilot agents were picking up any job (most likely due to loacl sharing applied). In the afternoon it seems that 43 jobs started to flush smoothly in the batch system.Waiting for results of these new reconstruction jobs
              • GRIDKA: no able to reproduce the problem with the local environment setting. Trying to reproduce the problem by using the environment used by a Dirac job (BDII set to CERN top BDII).

            • 30-08-06
             
              • CERN,RAL,IN2P3: were running fine.
            Changed:
            <
            <
              • CNAF: Is OK. All DC06 jobs sent over there were running fine. A new disk server has been added. Monitoring LSF and Castor2 stager. 600 simultaneous jobs allowed on the queue. Need of tuning the number of transfers and number of reconstruction jobs for optimizing the usage of the LHCb resources (non only for CNAF but for all centers). Asked for increasing the transfer rate to CNAF.
              • PIC: none of the jobs sent over there (both Simulation and DC06 reco) succeeded.
              • GRIDKA: no news from Gridka nor from GGUS team. Complained about the fact that cc'd people should also own the ticket. Action on Roberto for following Gridka, GGUs business and put pressure.
            • 28-08-06
              • CERN,RAL,IN2P3: were running fine all week end.
              • CNAF: runs fine the DC06 reco jobs. Angelo: transfer to CNAF too slow.
              • PIC: seems to be a problem with corrupted installation as Joel (back from vacation/illness) has to confirm
              • GRIDKA: Roberto will escalate to operations and will get in direct touch with GridKA people.
            >
            >
              • CNAF: still the long standing problem with Castor2 stager. Even increasing the max number of jobs per disk (300) and adding another disk server didn't help too much.
              • PIC: outage of the SE + a problem on the istallation. A file that is usually sourced by the LHCb jobs wasn't in the usual place (corrupted installation)
              • GRIDKA: still under investigation the "overload of gridftp" server problem.
             
            • 29-08-06
              • CERN,RAL,IN2P3: were running fine all week end.
              • CNAF: Castor2 is not working properly. Main DB (serving also other VOs) seems overloaded. Giuseppe Lo Presti is looking at the problem.
              • PIC: no news from Joel
              • GRIDKA: Doris had performed some stress test for reproducing the problem however her certificate runs out of date during the week end and further more the SRM crashed. She hopes to reproduce the problem with her script that access file in a close to LHCb way.
            Changed:
            <
            <
            • 30-08-06
            >
            >
            • 28-08-06
              • CERN,RAL,IN2P3: were running fine all week end.
              • CNAF: runs fine the DC06 reco jobs. Angelo: transfer to CNAF too slow.
              • PIC: seems to be a problem with corrupted installation as Joel (back from vacation/illness) has to confirm
              • GRIDKA: Roberto will escalate to operations and will get in direct touch with GridKA people.

            • 25-08-06
             
              • CERN,RAL,IN2P3: were running fine.
            Changed:
            <
            <
              • CNAF: still the long standing problem with Castor2 stager. Even increasing the max number of jobs per disk (300) and adding another disk server didn't help too much.
              • PIC: outage of the SE + a problem on the istallation. A file that is usually sourced by the LHCb jobs wasn't in the usual place (corrupted installation)
              • GRIDKA: still under investigation the "overload of gridftp" server problem.
            • 31-08-06
              • CERN,RAL,IN2P3: were running fine
              • CNAF: Castor2 problems that seem to be related to the DB (deadlocks).
              • PIC: No pilot agents were picking up any job (most likely due to loacl sharing applied). In the afternoon it seems that 43 jobs started to flush smoothly in the batch system.Waiting for results of these new reconstruction jobs
              • GRIDKA: no able to reproduce the problem with the local environment setting. Trying to reproduce the problem by using the environment used by a Dirac job (BDII set to CERN top BDII).
            • 01-09-06
              • CERN,RAL,IN2P3, PIC :were running fine
              • GRIDKA: some problem using the environment with LCG_GFAL_INFOSYS to CERN top bdii. However the main problem (from Doris) seems to be related to the gridftp doors that from time to time close their sockets. Under investigation too. Few reco jobs sent over there.
              • CNAF: it might be a (unknown) bug on the DB whose also CERN people may be not aware of. The DB worked last night. CMS and Atlas were keeping the LSF batch full (even if there is this problem). Another problem is in the pure disk pool (no garbage collector and no migration to tape) : in case the disk gets full not only write access but also read access becomes impossible. This turns into pending jobs in the queue and consequent pile up of forthcoming requests in LSF (blocking the slots).
            >
            >
              • CNAF: Is OK. All DC06 jobs sent over there were running fine. A new disk server has been added. Monitoring LSF and Castor2 stager. 600 simultaneous jobs allowed on the queue. Need of tuning the number of transfers and number of reconstruction jobs for optimizing the usage of the LHCb resources (non only for CNAF but for all centers). Asked for increasing the transfer rate to CNAF.
              • PIC: none of the jobs sent over there (both Simulation and DC06 reco) succeeded.
              • GRIDKA: no news from Gridka nor from GGUS team. Complained about the fact that cc'd people should also own the ticket. Action on Roberto for following Gridka, GGUs business and put pressure.
             
            Changed:
            <
            <
            -- Main.santinel - 01 Sep 2006
            >
            >
            • 24-08-06
              • CERN,PIC: were running fine.
              • IN2P3,RAL: were down yesterday so not data has been sent to these centers
              • CNAF: Angelo managed to run successfully a reco job at CNAF after increasing the max number of simultaneous allowed jobs on LSF. Ask for another disk server too. Only simulation is running fine there (that doesn't require access to SE)
              • GRIDKA: no news from Gridka about the problem that originally was identified to be related to an overloaded griftp server.

            Revision 12006-09-01 - unknown

            Line: 1 to 1
            Added:
            >
            >
            META TOPICPARENT name="DC06Activity"

            Resource Status (day by day)

            • 24-08-06
              • CERN,PIC: were running fine.
              • IN2P3,RAL: were down yesterday so not data has been sent to these centers
              • CNAF: Angelo managed to run successfully a reco job at CNAF after increasing the max number of simultaneous allowed jobs on LSF. Ask for another disk server too. Only simulation is running fine there (that doesn't require access to SE)
              • GRIDKA: no news from Gridka about the problem that originally was identified to be related to an overloaded griftp server.
            • 25-08-06
              • CERN,RAL,IN2P3: were running fine.
              • CNAF: Is OK. All DC06 jobs sent over there were running fine. A new disk server has been added. Monitoring LSF and Castor2 stager. 600 simultaneous jobs allowed on the queue. Need of tuning the number of transfers and number of reconstruction jobs for optimizing the usage of the LHCb resources (non only for CNAF but for all centers). Asked for increasing the transfer rate to CNAF.
              • PIC: none of the jobs sent over there (both Simulation and DC06 reco) succeeded.
              • GRIDKA: no news from Gridka nor from GGUS team. Complained about the fact that cc'd people should also own the ticket. Action on Roberto for following Gridka, GGUs business and put pressure.
            • 28-08-06
              • CERN,RAL,IN2P3: were running fine all week end.
              • CNAF: runs fine the DC06 reco jobs. Angelo: transfer to CNAF too slow.
              • PIC: seems to be a problem with corrupted installation as Joel (back from vacation/illness) has to confirm
              • GRIDKA: Roberto will escalate to operations and will get in direct touch with GridKA people.
            • 29-08-06
              • CERN,RAL,IN2P3: were running fine all week end.
              • CNAF: Castor2 is not working properly. Main DB (serving also other VOs) seems overloaded. Giuseppe Lo Presti is looking at the problem.
              • PIC: no news from Joel
              • GRIDKA: Doris had performed some stress test for reproducing the problem however her certificate runs out of date during the week end and further more the SRM crashed. She hopes to reproduce the problem with her script that access file in a close to LHCb way.
            • 30-08-06
              • CERN,RAL,IN2P3: were running fine.
              • CNAF: still the long standing problem with Castor2 stager. Even increasing the max number of jobs per disk (300) and adding another disk server didn't help too much.
              • PIC: outage of the SE + a problem on the istallation. A file that is usually sourced by the LHCb jobs wasn't in the usual place (corrupted installation)
              • GRIDKA: still under investigation the "overload of gridftp" server problem.
            • 31-08-06
              • CERN,RAL,IN2P3: were running fine
              • CNAF: Castor2 problems that seem to be related to the DB (deadlocks).
              • PIC: No pilot agents were picking up any job (most likely due to loacl sharing applied). In the afternoon it seems that 43 jobs started to flush smoothly in the batch system.Waiting for results of these new reconstruction jobs
              • GRIDKA: no able to reproduce the problem with the local environment setting. Trying to reproduce the problem by using the environment used by a Dirac job (BDII set to CERN top BDII).
            • 01-09-06
              • CERN,RAL,IN2P3, PIC :were running fine
              • GRIDKA: some problem using the environment with LCG_GFAL_INFOSYS to CERN top bdii. However the main problem (from Doris) seems to be related to the gridftp doors that from time to time close their sockets. Under investigation too. Few reco jobs sent over there.
              • CNAF: it might be a (unknown) bug on the DB whose also CERN people may be not aware of. The DB worked last night. CMS and Atlas were keeping the LSF batch full (even if there is this problem). Another problem is in the pure disk pool (no garbage collector and no migration to tape) : in case the disk gets full not only write access but also read access becomes impossible. This turns into pending jobs in the queue and consequent pile up of forthcoming requests in LSF (blocking the slots).

            -- Main.santinel - 01 Sep 2006

             
            This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
            Ideas, requests, problems regarding TWiki? Send feedback