Welcome to the CERN-Tier0 Analysis Facility (CAF) for (p)DUNE Users TWiki Home page
General
(p)DUNE Analysis Facility located at CERN-Tier0 to provide fast response to latency critical activities:
- Diagnostic of detector problems
- Prompt alignment and calibration
- export new constants to Tier-0 and other computing centers worldwide (FNAL) for future data reprocessing
- performance services
- Hot physics analysis
The task of the (p)DUNE Tier-0 system is to perform the prompt reconstruction of the raw data coming from the on-line data acquisition system, and to register raw and derived data with the File Transfer Service (
FTS
) system, which then distributes them to the FNAL centre and beyond.
Introduction
- You need to be CERN registered and have a CERN account.
- About the Account Management service
- Keep in mind: AFS is being phased out (end of 2017) at CERN, you might want to look into CERNBOX or EOS
as a storage solution. To activate your CERNBox personal storage space (1 TB, up to 1 million files and the maximum size of a single file is 8GB, hosted in the CERN Computer Center): login to https://cernbox.cern.ch
(using your CERN account and password). More info here.
- For NP Project and its prototype experiments follow the link.
- FYI : There is also client for mobile phones and fixed computers that allow sync. For installation and configuration use the link
- Be aware you must choose np-comp UNIX computing group
- Registered to the corresponding e-group (depending on which protoDUNE activity you're member):
- if you want to access Neutrino Platform experiment(s) EOS space, follow this link.
- For any problem you may have, please send email to neutplatform computing support
Tier0-core/nodes/EOS access
Neutrino Platform and the prototype experiments at CERN,
NP02/
NP04, have dedicated cores and space from CERN Tier0.
By now what it is available : 1 PB EOS, 6 PB tape and 1500 cores and from August onwards NP will provide 3 PB of EOS disks space.
The machines are the normal batch worker nodes, 2 GB memory per core, and the jobs will run together with other jobs on the batch farm.
The batch system is based on
HTCondor
.
The CPUs are a mix of new and not so new, typically less then 3 years old. The typical machine size is 8 cores.
For more information how to access NP experiments EOS space, follow
this link.
Software Installation
Submitting jobs to NP Tier0 cores
After login to lxplus cluster, you can proceed with install larsoft/dunetpc with the same way as in neutplatform cluster. See instructions
1,
2.
You can have a set of examples scripts located at the
NP gitlab repository
. If you have problems accessing it let
me
know.
To submit the condor job:
condor_submit nptest_htcondorjob.sub
Have a look at the self-explanatory comments of the scripts.
To examine the running jobs, you have several options.
The closest analog to "bpeek" (lxbatch system) is "condor_tail <jobID>" ,
which can be used to inspect the standard out (or other files condor knows about)
of running jobs.
Or you can use "condor_ssh_to_job <jobID>" which drops you into the same sandbox as the running job,
allowing you to inspect as you see fit.
You can also have EOS for the input/output/log.
Keep in mind: "condor_submit -spool" will just take your files and submit them to the schedd,
and won't write to them in the meantime.
You then can retrieve them when your job completes using "condor_transfer_data".
You can move output files at the end of your command by just
having the script you submit as the "executable" do it for you
To monitor batch NP02/NP04 jobs , follow the
link1,
link2 respectively.
A collection of useful HTCondor commands, one can find
here.
For more information have a look at the
Quick Start Guide
form CERN HTCondor.
CAF system accounts
Specific CAF subsystem accounts and job priorities scheduler/queues:
For the HTCondor schedds:
The standard condor_schedds that IT has are also not available for login because they hold people's credentials - but are load-balanced, so in principle should be fine.
There is an option later, if we all agree/want, to run our own schedd (I'm in favor of this option). This can be handy if, for example, lots of production jobs are submitted from the same machine - the local schedd gives a much faster response.
CAF batch groups with priority shares
The following lxbatch batch groups with priority shares are available for the systems and combined performance groups. The batch group managers are responsible for adding and removing members.
Data Model
Useful Commands
Monitoring
CERN's HTCondor monitoring is
here
. The following
monitoring link
can show if we have users using our resources.
Batch Monitoring for NP02
HTCondor monitoring for NP02
Batch Monitoring for NP04
HTCondor monitoring for NP04
Operations
EOS and TAPE
- For more information how to access NP experiments EOS space, follow this link.
- For more information how to access NP experiments CASTOR (CERN Advanced STORage manager), a data tape storage system used at CERN, follow this link.
Miscellanea
Useful links
Contacts
Tier-0 contacts:
In case of problems with the on-call phone, contact the experts directly:
Back to Neutrino Platform Computing Twiki Main Page
Major updates:
--
NectarB - 2017-02-03