Follow up on sites which apparently do not correctly report to APEL

Using accounting validation view in APEL we need to address sites which have problems publishing to APEL, or do not correctly publish to BDII attributes which are required in order to convert time to work. APEL uses for this purpose GlueScalingReferenceS100 or GlueHostBenchmarkS100, while Dashboard uses REBUS. From REBUS Dashboard takes overall HEPSPEC and divides it by number of logicac CPUs. In its turn REBUS takes from BDII GlueSubClusterLogicaCPU, GlueProcessorOtherDecription ( for HEPSPEC) and GlueSubClusterPhysicalCPU.

There is slight difference how SSB application can be used for ATLAS , ALICE and CMS.

For ATLAS one might first check the metric which shows EGI clock time work versus Dashboard clock time work. If this one disagrees then to look in the ratio of the EGI raw wall clock time vs Dashboard raw wall clock time

For CMS many US and not only US sites do not correctly publish to BDII GlueSubClusterLogicaCPU, GlueProcessorOtherDecription ( for HEPSPEC) and GlueSubClusterPhysicalCPU. That is why clock time work metric is not very reliable in Dashhboard, and therefore one better look in the ratio of the EGI raw wall clock time vs Dashboard raw wall clock time. For ALICE only raw wall clock time metric is available.

Table with problematic sites

Site name Inconsistency Impacted VOs Time range Problem description Ticket Status
UKI-SOUTHGRID-RALPP all experiments see lower consumption in EGI then in their systems ~50% ATLAS, CMS, LHCb September - https://its.cern.ch/jira/browse/ADCINFR-28 to be done
UKI-NORTHGRID-MAN-HEP APEL 45% bigger than ATLAS ATLAS June Batch system agrees with APEL, something is missing in Dashboard. Could be event servicej obs not correctly accounted there https://its.cern.ch/jira/browse/ADCINFR-28 Resolved for APEL accounting
UKI-LT2-RHUL good consistency for time, but ~50% discrepancy for work ATLAS May-July Might be wrong info in BDII for time-work transformation https://its.cern.ch/jira/browse/ADCINFR-28 resolved, turned to green in August, need to republish previous months
UNIBE-LHEP lower or completely missing consumption ATLAS since the beginning of the year had the wrong DN in GOCDB and APEL undeclared services, wrong info in BDII https://its.cern.ch/jira/browse/ADCINFR-28 BDII info is still missing. GcoDB is fixed. Still would need to republish
CSCS-LCG2 EGI is twice lower than Dashboard both for work and time ATLAS, CMS. LHCb starting form April wrong DN in GocDB https://its.cern.ch/jira/browse/ADCINFR-28 Info in GocDB and BDII fixed. Need to republish
Cyfronet EGI is twice lower than Dashboard both for work and time ATLAS, CMS from the beginning of the year - https://its.cern.ch/jira/browse/ADCINFR-28 Under investigation. Alessandra follows up with site admins
MPPMU EGI is considerably (8 times) lower than Dashboard both for work and time ATLAS from the beginning of the year possible missing ARC CE in gocdb also HPC resources may not be accounted for correctly https://its.cern.ch/jira/browse/ADCINFR-28 under investigation
BEIJING-LCG2 EGI is considerably lower than Dashboard both for work and time, ATLAS only while CMS is fine ATLAS Starting form May - https://its.cern.ch/jira/browse/ADCINFR-28 To be done
TR-10-ULAKBIM completely wrong numbers both for time and work metrics. For raw wallclock time, EGI is twice higher than Dashboard , for work it is ~90 times higher ATLAS January was fine , then degraded - https://its.cern.ch/jira/browse/ADCINFR-28 To be done
RC-KI EGI shows only 7% of usage both for time and work metrics Only ATLAS, ALICE is fine August - - To be done
Brunel EGI shows twice lower consumption than dashboard for ATLAS, and 5 times lower for CMS, both for work and time August-September ATLAS, CMS Apparently the problem is in ARC-Condor interface , which also has impact on CMS job submission https://ggus.eu/index.php?mode=ticket_info&ticket_id=123947 In progress
FZK-LCG2 5 times higher work in EGI than in Dashboard, raw wallclock is fine all VOs September Wrong info published in BDII - BDII info is fixed, September and partially October data to be republished
Begrid-ULB-VUB EGI shows twice higher number both for time and work metrics CMS July-September "we looked at or site_bdii and indeed there were information that were not reported anymore. This has been fixed" https://ggus.eu/index.php?mode=ticket_info&ticket_id=125016 Reported to CMS site support, solved?
INDIACMS-TIFR EGI shows 3 times lower numbers than Dashboard does for time, while for work on the contrary EGI provides 60% higher numbers than Dashboard CMS September, first 5 months of the year look fine " acknowledge that the accounting data was not getting uploaded to EGI. Solved" https://ggus.eu/index.php?mode=ticket_info&ticket_id=125017 Reported to CMS site support, solved?
NCP-LCG2 EGI shows 2,5 higher values both for time and work than Dashboard does CMS, while ALICE is fine First 4 months of the year are fine, than situation degraded TBD https://ggus.eu/index.php?mode=ticket_info&ticket_id=125018 Reported to CMS site support, ongoing
T2 Estonia EGI shows 1.5-2 times higher consumption both for time and work than Dashboard does CMS September "one error has been fixed and sending should be normal now" https://ggus.eu/index.php?mode=ticket_info&ticket_id=125019 Reported to CMS site support, solved?
Ru-PNPI EGI shows 30 times higher consumption than Dashboard does (Are you sure this is the correct way round?.The CMS cpu numbers in APEL look very small.) CMS only, ALICE, ATLAS and LHCb are fine starting from the beginning of the year Not correct reporting of jobs running at this site to Dashboard? https://ggus.eu/index.php?mode=ticket_info&ticket_id=125020 Reported to CMS site support, ongoing
Ru-SPbSU Time metric is fine, while work is shown 10 times higher in EGI, than in Dirac LHCb September (no data before September) Apparently wrong info in BDII - To be done

ATLAS follow up

* Tracked in the following ticket

CMS follow up

ALICE follow up

LHCb follow up

-- JuliaAndreeva - 2016-10-06

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2016-11-16 - JohnGordon
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback