Ganga memory and disk footprint

Conclusions

  1. ganga uses ~3 times more memory than the job's payload
    while adding jobs in the same ganga session it takes around:
    • 8KB per default job (Job())
    • 29KB per job with a name set to 10KB string
    • 230KB per job with a name set to 100KB string
  2. we do some extra caching when reading the repository from disk and memory consumption grows when ganga is restarted by ~30% per job
  3. we keep some extra references to job objects which prevent garbage collection
    while removing jobs the memory consumption does not decrease (and it should in python 2.5)
  4. we have a large memory footprint even with 0 jobs
    • comparing to standard python ganga is around 48 times larger
    • comparing to ipython ganga is around 15 times larger

Tests and data

Ganga version: 4.3.3

Tests with python2.5 and ipython 0.7.2 on Suse linux 10.2

Local, non-AFS repository. 2GB RAM.

Memory reporting tool: pmap pid | tail -1

Pure python memory footprint

1480K writable-private, 3476K readonly-private, and 28K shared

IPython memory footprint

4772K writable-private, 5220K readonly-private, and 28K shared

Ganga footprint

Starting from empty Ganga repository do a few operations (+ adding jobs, - removing jobs, R restarting ganga session, gc gc.collect()). The first column tells the number of jobs in the repository for which the memory and disk usage is reported, the second tells the operation which was performed to reach this number.

default constructed jobs

jobs in registry op RAM disk
0   72488K writable-private, 5272K readonly-private, and 240K shared 52K repository, 0K workspace
100 +100 73752K writable-private, 5272K readonly-private, and 240K shared
200 +100 74268K writable-private, 5272K readonly-private, and 240K shared
1000 +800 78888K writable-private, 5272K readonly-private, and 240K shared
2000 +1000 84576K writable-private, 5272K readonly-private, and 240K shared 5.1M repository, 37M workspace (empty dirs)
2000 R 91108K writable-private, 5272K readonly-private, and 240K shared
1000 -1000 unchanged 2.6M repository, 22M workspace
1000 gc unchanged
1000 R 81408K writable-private, 5272K readonly-private, and 240K shared
100 -900 unchanged 364K repository, 2.4M workspace
100 R 73384K writable-private, 5272K readonly-private, and 240K shared
0 -100R 72488K writable-private, 5272K readonly-private, and 240K shared 108K repository, 12K workspace

jobs with 10KB payload

s = 'xyz'*3333
for i in range(N): Job(name=s)

jobs in registry op RAM disk
0 s = 'xyz'*3333 72576K writable-private, 6164K readonly-private, and 240K shared 104K repository, 0K workspace
100 +100 75928K writable-private, 5272K readonly-private, and 240K shared 3.2M repository, 1.2M workspace
200 +100 78616K writable-private, 5272K readonly-private, and 240K shared 6.3M repository, 2.4M workspace
1000 +800 100036K writable-private, 5272K readonly-private, and 240K shared 32M repository, 12M workspace
2000 +1000 126052K writable-private, 5272K readonly-private, and 240K shared 63M repository, 24M workspace
2000 R 149576K writable-private, 5272K readonly-private, and 240K shared
100 -1900 unchanged 3.2M repository, 1.3M workspace
100 R 76320K writable-private, 5272K readonly-private, and 240K shared
0 -100R 72480K writable-private, 5272K readonly-private, and 240K shared 56K repository, 8K workspace

jobs with 100KB payload

jobs in registry op RAM disk
0 s = 'xyz'*33333 72564K writable-private, 5272K readonly-private, and 240K shared 104K repository, 0K workspace
100 +100 96224K writable-private, 5272K readonly-private, and 240K shared 29M repository, 1.2M workspace
200 +100 118604K writable-private, 5272K readonly-private, and 240K shared 58M repository, 2.4M workspace
1000 +800 298560K writable-private, 5272K readonly-private, and 240K shared 289M repository, 13M workspace
2000 +1000 521608K writable-private, 5272K readonly-private, and 240K shared 578M repository, 25M workspace
2000 R 677012K writable-private, 5272K readonly-private, and 240K shared
100 -1900 unchanged 166M repository, 7.1M workspace
100 R 102772K writable-private, 5272K readonly-private, and 240K shared 29M repository, 1.6M workspace
0 -100R 72476K writable-private, 5272K readonly-private, and 240K shared 108K repository, 12K workspace

calculations


# number of jobs in repository (without restarting ganga session)
N = [0,100,200,1000,2000]

# memory occupation for default contructed jobs
a = [72488,73752,74268,78888,84576]

# memory occupation for jobs with 10KB payload in j.name
b = [72576,75928,78616,100036,126052]

# memory occupation for jobs with 100KB payload in j.name
c = [72564,96224,118604,298560,521608]

# normalized to 0 memory occupation for 0 jobs
A = [x-a[0] for x in a] #[0, 1264, 1780, 6400, 12088]
B = [x-b[0] for x in b] #[0, 3352, 6040, 27460, 53476]
C = [x-c[0] for x in c] #[0, 23660, 46040, 225996, 449044]

# memory occupation per job

KB_per_job_A = [A[i]/N[i] for i in range(1,len(N))] #[12, 8, 6, 6]
KB_per_job_B = [B[i]/N[i] for i in range(1,len(N))] #[33, 30, 27, 26]
KB_per_job_C = [C[i]/N[i] for i in range(1,len(N))] #[236, 230, 225, 224]

extract data about running ganga process

pmap $1 | tail -1; du -sh ~/gangadir-memory-test/repository; du -sh ~/gangadir-memory-test/workspace

-- JakubMoscicki - 08 Jul 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2008-09-21 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback