Avoid stalled jobs

Our current setup is :

  • The VM is running for a set amount of time (currently 24 hours)
  • The JobAgent has no limit on its number of cycles
BOINC will kill the VM at the end of the set time, and any job that was running at the time will stall.

Some possible solutions are :

1) Put a limit of one cycle on the JobAgent, and shutdown the VM ate the end of this cycle. This will end the BOINC task, and a new one will begin.

2) Make use of the planned $MACHINEFEATURES and $JOBFEAUTURES to signal that we have 24 hours of computing available and let the jobs adapt.


BOINC credits

The BOINC accounting system is active and grant credits to the volunteers based on the time the VM ran and their CPU power. This is not in most case the actual work done on DIRAC jobs. This is maybe not an issue, as it still credit volunteers for the time they give us (after all, it's our problem if we don't use it).

Note that BOINC only gives credit at the end of a succesfull run. This has two constraints :

  • VMs must have a set run time
  • A failed or cancelled VM run on BOINC side will not grant credits, regardless of what happened inside the VM (eg. successful jobs)
An alternate method of granting credits on the run exist but is considered deprecated by the BOINC developpers.

Having this credit system enabled is pretty much mandatory to attract people within the BOINC community, especially the competitive minded volunteers.

LHCb accounting

Volunteers accounting

The Test4Theory project use a second accounting method, based on the number of event produced. While not a good metric in our case as all events are not the same, a similar system could be implemented.

Currently, BOINC VMs hostname are set to boinc[host id]. This enable a basic job accounting, since the hostname will appear in the parameters of the jobs. Philippe Charpentier made a script to select jobs based on this : WorkloadManagementSystem/scripts/ Because the hostname is not a primary key, this is slow.

A web interface could be made available to volunteers to access this information.

Institutionnal accounting

The Manchester grid site has offered to deploy the project on their desktop computers and need an accounting of their participation. This will be handled by a specific site set for them based on the machine location (using a reverse DNS lookup). The script to do that exist and must be integrated into the contextualization.

New VM contextualization / merge BOINC-specific changes ?

Quite a lot of BOINC-specific additions have been introduced in the contextualization and would need to be properly merged in the new contextualization. They currently reside in a branch at :

Web page

The project web page is pretty much just the default BOINC page, this would probably need some additions.

Security aspects

The VM is now accessible as boinc:boinc:boinc. See the amiconfig file

The certificate is in the image that is created. Use the SSH contextualization?

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2013-08-05 - FedericoStagni
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback