CMS Tests with the gLite Workload Management System

11 October, 2006

  • Application: CMSSW_0_6_1
  • WMS host: rb109.cern.ch
  • RAM memory: 4 GB
  • LB server: lxb7026.cern.ch
  • Number of submitted jobs: 25000
  • Number of jobs/collection: 100 (250 collections)
  • Number of CEs: 24
  • Submission time interval: 24 h (one collection submitted every 6')
  • Maximum number of planners/DAG: 2

The number of planners during submission reached 186, which means about about 1 GB of memory (a planner takes about 5.6 MB). At the same time, the amount of memory used (physical + swap) was about 4.8 GB. To be investigated what is using all this memory.

The fact that the swap memory is used at about 20% makes the RB very slow for users who submit jobs: this condition should be avoided at all costs. It is critical to understand why the memory is so heavily used, even if the number of planners is not high. Other processes which take a lot of memory are the WMProxy server (1.5 GB), the WM (0.5 GB) and Condor (0.4 GB).

It is also important to have as soon as possible the fix which limits the time for which the WM tries to match jobs in the task queue: the current limit of 24 h is too long, because a collection whose jobs cannot be matched is kept alive for a long time, even if it is clear that the jobs cannot ever be matched.

-- Main.asciaba - 11 Oct 2006

Edit | Attach | Watch | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2006-10-11 - AndreaSciabaSecondary1
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback