CRAB3 abstract for EGI CF 2012


CRAB: Data processing and production in the Compact Muon Solenoid experiment


The CMS Remote Analysis Builder (CRAB) is a tool which addresses the needs of the Compact Muon Solenoid (CMS) community, allowing the users to easily access the resources offered by the Grid. The end user analyses are automatically split in several jobs parallely executed on the Grid infrstructure. CRAB has progressed from a limited initial prototype nearly 5 years ago, to a fully validated system heavily employed by the CMS collaboration to prepare over 100 analysis papers. CMS currently observes more than 400 unique users submitting CRAB jobs per week, with close to 1000 individuals per month. The CMS Computing Technical Design Report (CTDR) estimated roughly 100k Grid submissions per day. The CRAB team has an ambitious program planned in 2012: to release a new generation of CRAB that aims to make a step towards a SaaS architecture. This work will present the joint CMS experiment and CERN IT-ES effort to realize such project, highlighting the impact on the service maintenance and first experiences dealing with beta users.


Taking the experience gained from previous CRAB versions, developers plan to release a new version of the tool which aims to improve the sustainability of the service besides solving known issues and bottlenecks. CRAB will be centrally deployed as an online service exposing a Representational State Transfer (REST) interface. Services offered by the server will be accessible by the end user through a lightweight client, which will send requests to the server REST interface. The server is composed by a multi-tiered architecture where each tier takes care of performing specific functions in the chain. The WorkQueue tier takes care of providing a central queue for all the user requests, and manages the priorities between users/requests themselves. Interactions with the underlying Grid layer are handled by the so called Agent tier. The Agent pulls user requests from the Workqueue, it splits them in several jobs, and submit jobs to the Grid. Finally, the tier called AsyncStageOut handles the output produced by user's jobs.

By using CRAB a user can abstract from the technical details of the Grid infrastructure, and just focus on his primary activity: the analysis of the data collected by the Large Hadron Collider. Features like automatic resubmission of failed jobs and automatic handling of Grid computational and storage resources, considerably simplify the user's work. From the maintainance point of view, the new implementation aims at reducing of the sustainability cost. In fact, the tool has been rewritten on top of a commonly developed library (named WMCore), which is also used for other use cases in CMS.


The experiment aim to improve the reliability, usability and scalability of the analysis system as well as to reduce the human effort needed for the analysis operations. We believe the transition to the new version of CRAB is extremely valuable for the success of the CMS Computing in reaching the mentioned objectives.


At the time of writing the new version of CRAB is on the process of consolidating the basic functionalities. It is close to enter the commissioning phase after which CMS will start the transition from CRAB2 to CRAB3. We present the status of the project and the achieved experience during the integration period.

Track classification

Software services for users and communities



-- SpigaDaniele - 14-Nov-2011

