Yoda is the Event Service implemention on HPC. Overall architecture is shown below.
Yoda is composed of several parts:
- pilotRunJobHpcEvent - the frontend part to download jobs and get event ranges from Panda, stage out outputs to objectstore and update event status to Panda.
- YodaDroid - HPC MPI job to run Events
- EventServerJobManager - main part in Droid to manage AthenaMP, TokenExtractor and Yampl messaging. It's the main part to inject events to AthenaMP to process and retrieve outputs.
It's part of pilot to start HPC ES. After pilot get job from Panda. It will setup the environment, prepare job files and commands. Then it will use HPCManager to getHPCResource(free cores for backfill mode, default resource defined in schedconfig for normal mode), submit HPC jobs and poll the jobs. HPCManager is the interface between pilot and HPC. Now it's implemented based on PBS/Torque cluster. It can be extended.
Yoda-Droid
Yoda-Droid is the HPC MPI job.
- Yoda is the part running on MPI rank 0. It manages the job and events table centrally. It uses MPI interface to distributed job and events to Droid. Outputs received from Droid through MPI interface will be updated in events table and dumped to pilot periodly.
- Droid is the part running on MPI rank more than 0. It gets job from Yoda, then starts EventServerJobManager to start the job. When EventServerJobManager is ready(AthenaMP is setup), Droid will get event ranges from Yoda and inject event ranges to ESJobManager. Then Droid will poll ESJobManager to wait the outputs and send outputs to Yoda.
main part in Droid to manage
AthenaMP,
TokenExtractor and Yampl messaging thread.
AthenaMP and
TokenExtractor are components in Athena. Users can use yampl messaging channel to contact
AthenaMP. So
ESJobManager is the part to handle messages in Yampl messaging channel.
How to Run Yoda ES pilot
If you are interested in running Yoda ES jobs. You can follow theses steps:
--
WenGuan - 2015-02-24
Topic revision: r1 - 2015-02-24
- WenGuan