Executing PowerDEVS in remote nodes (Distributed execution)

General

Executing simulation in various nodes usually involves doing a Parameter sweep. To setup paramert Sweeping refer to ParameterSweeping in PowerDEVS.
Usually you will want to use several remote machines. One option is to use CERNs clud services to create Virtual Machines (VMs) using Openstack. See CERN OpenStack Private Cloud Guide

PowerDEVS requires Scilab GUI to be running in order to readParameters and store results. So, for remote connections x11 forwarding should be used to get the GUI.
Scilab opens a specific TCP port to interconnect with the PowerDEVS process. This port is assigned based on the userID. Thus, only a single Scilab instance can run per user.

Using the VNC option below allows to run as different local users (multiple simulations can run in the same host).

Using the second option only the NICE CERN account is used.

Using VNC

To setup VNC server and client follow instructions here: https://wiki.centos.org/HowTos/VNC-Server

Following those instructions, a local user is created (ej:remote_simulations). You can now connect to the server using that local user and VNC password

  1. It is recommeded the local user to have sudoers priviliges
  2. make sure the user can access the /afs to get the source and then copy results
  3. follow instructions here to setup devtoolset-2 for that user ('source /opt/rh/devtoolset-2/enable' in the ~/.bashrc file)
IMPORTANT: never close the user session. If you do so, the VNC server has to be restarted for all users in order to bring the vnc session back.

Once VNC server and clients are setup

  1. (if you are outside CERN) You need to create an SSH tunnel: ssh -L 5901:127.0.0.1:5901 -N -f -l <user> <server_ip_address>. Example:
    ssh -L 5901:127.0.0.1:5901 -N -f -l mbonaven 188.184.87.85
Connect to the server with the VNC client: vncviewer <serverIP>:<userScreen>. Example: vncviewer 188.184.87.85:1
NOTE: if you performed step 1 (shh tunnel), then in this step 2 you need to connect with vncviewer to localhost:5901 (instead of the IP): vncviewer localhost:5901

A new VNC desktop window will appear (the remote host). Perform all the following steps within the remote host (in the new VNC window):
  1. Open a terminal
  2. move where the PowerDEVS source is located: cd /afs/cern.ch/work/m/mbonaven/public/powerDEVS/remote_exec_powerdevs
  3. Start Scilab GUI: bin/startScilab.sh &
  4. cd output
  5. To execute multiple simulations: ./RunLocalCopy.sh -n 10 -f 99999999
    (NOTE: change the final time (-tf) and number of simulations (-n) as desired)

Running multiple simulation in the same node

NOTE: It is recommended not to run more simulations than available cores in the node.

In order to execute more than one powerDEVS process in the same node (and its respective Scilab GUI), just with VNC connect using a different local user.
For example, you can configure VNC server with 2 users: 1:remote_simulations and 2:remote_simulations2. Then you can connect to both separately: vncviewer <ip>:1 &; vncviewer <ip>:2 &;
Then, you can follow the steps above (from 2 to 6) in each session separately. You can check that the Scilab port used is different.

An example of the node status running 1 simulations process (left, only 1 core used) and 2 simulations process (right, two cores used)

Each process using almost all of one core (check they are executed by different users):

Additional common tasks

  1. You might want to create a snapshot of a VM. Follow this tutorial: Snapshots for testing and cloning
    1. Use lxplus7 machines
    2. IMPORTANT (or you might loose connectivity to the VM): While creating the snapshot the machine will be unresponsive and will need to resync clocks afterwards. With wrong clocks, SSO authentication might fail. Within the VM run after the snapshot is taken: sudo ntpdate -u ip-time-0

  2. If AFS fails loading when logging in. You might get things like reported here: ' timeout in locking authority file /afs/cern.ch/user/f/foo/.Xauthority ', 'hepix: E: /usr/bin/fs returned error, no tokens?' , 'permission denied" errors on my files.' you can additionaly try:
    1. sudo /etc/init.d/afs start
    2. Or just these: aklog; kinitl; k5reauth;
  3. VMs usually come with a cron jobs that regularly cleans /tmp folder (the one we use to run simulations).
    1. You can run simulations in another folder (make sure it has the right permissions). For that update the RunLocal scripts
    2. Update the /etc/cron.daily/tmpwatch file so that it ignores the /tmp/powerdevs folders. -X "/tmp/powerdevs*"

Console + ssh + x11 forwarding + screen

This is how to run multiple simulations in different nodes.

Requirements

- powerdevs model needs to be in an AFS directory. So that it can be accesses by all machines in CERN.

- the 'model' executable file should be already generated for the architecture of the machine you want to run it under. if not: - git pull - cd build; make clean; make -j2; cd ..;

Procedure

NOTE: It is highly recommendable to run everything using screen to avoid loosing progress when the ssh session is closed. Just type 'screen' before starting.

1. Log to one node you want to run (use pc-atd-cc-01/02.cern.ch or VM nodes https://openstack.cern.ch/project/instances/):

ssh -X 128.142.152.167

NOTE: you can even open another screen session in the server.

2. Move to the AFS directory where you have powerdevs

cd /afs/cern.ch/work/m/mbonaven/public/powerDEVS/remote_exec_powerdevs

3. Start Scilab

bin/startScilab.sh &

This should open a new Scilab screen. If it fails, check the 'startScilab.sh' script and make sure it points to were you have installed Scilab.

4. Move to the 'output' directory (the model needs to be executed from here for the working directory to be set up correctly)

cd output

5. To execute the simulation (5.1 to execute a single simulation ; 5.2 to execute multiple simulations)

  1. 1 To execute a single simulation (note: change the final time (-tf) as desired): export LD_LIBRARY_PATH=/afs/cern.ch/sw/lcg/external/Boost/1.53.0_python2.7/x86_64-slc6-gcc48-opt/lib
    ./model -tf 300

  1. 2 To execute multiple simulations (note: change the final time (-tf) and number of simulations (-n) as desired): ./RunLocalCopy.sh -n 10 -f 99999999

NOTE: If you are using screen you can now detach from the session. type 'ctrl+a' then 'd'. The ssh session will be kept open.

-- MatiasAlejandroBonaventura - 2017-04-10

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2017-06-12 - MatiasAlejandroBonaventura
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback