Event Service on HPC

Defining a panda queue for the Event Service is similar to defining an MCORE queue. Here we will only specify differences.

  • catchall: localEsMerge,jobseed=es,HPC_HPC,mode=normal,queue=debug,repo=m2015
    • localEsMerge defines location of the merge job. If you don't want to merge at the same queue, remove it. * For preemptable queues, please remove it. The merge job should not be preempted. * For MCORE, please remove it. The merge job is a single core job.
    • jobseed=es is used by panda to schedule only ES jobs to this queue.
    • HPC_HPC is needed for pilot to start Yoda process.
    • mode=normal Yoda can run in 'normal' mode and 'backfill' mode.
    • queue=debug HPC job queue
    • partition=edison HPC partition
    • repo=m2015 HPC repo
    • For the other parameters, there are default value. If you want to override it, you can set it in catchall.
    • These are examples:
                  catchall: "HPC_HPC,log_to_objectstore,mode=normal,queue=regular,backfill_queue=regular,max_events=200000,initialtime_m=3,time_per_event_m=13,
                                repo=m2015,nodes=50,min_nodes=50,max_nodes=1001,partition=edison,min_walltime_m=119,walltime_m=120,max_walltime_m=120,
                                cpu_per_node=24,mppnppn=1,ATHENA_PROC_NUMBER=24,stageout_threads=20,copy_input_files=false,parallel_jobs=10000"
                  catchall:  "HPC_HPC,mode=normal,queue=debug,backfill_queue=regular,max_events=2000,initialtime_m=8,time_per_event_m=13,repo=m2015,nodes=2,
                                  min_nodes=2,max_nodes=3,partition=edison,min_walltime_m=28,walltime_m=30,
                                  max_walltime_m=30,cpu_per_node=24,mppnppn=1,ATHENA_PROC_NUMBER=23"
                  catchall:  "HPC_HPC,mode=backfill,queue=debug,backfill_queue=regular,max_events=2000,initialtime_m=8, time_per_event_m=10,nodes=2,min_nodes=2,
                                   max_nodes=3,partition=edison,min_walltime_m=28,walltime_m=30,max_walltime_m=30,
                                   cpu_per_node=24,mppnppn=1,ATHENA_PROC_NUMBER=23"
          

  • corecount:
    • It's used by pilot to set ATHENA_PROC_NUMBER which defines to start multiple processes for AthenaMP. It can be 1.

  • objectstore:
    • It's used by pilot to send finished event objects to objectstore. The 'objectstore' item in schedconfig is not used anymore.
    • You need to click "Find and associate another ObjectStorage/Bucket. Modify attached objectstores" in AGIS "PandaQueueObject Info" page. Then fill "bnl_os" in the "Search OS buckets:" textbox and click "Search objectstores". Select all three buckets and save.

More information can be found in HpcYoda.

-- WenGuan - 2019-04-01

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2019-04-01 - WenGuan
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback