CORAL Network glitch abstract for EGI CF 2012

Title

Handling of network and database instabilities in CORAL

Overview

The Large Hadron Collider (LHC), the world's largest and highest-energy particle accelerator, started its operations in September 2008 at CERN, Switzerland. Huge amounts of data are generated by the four experiments installed at different collision points along the LHC ring. The largest data volumes come from the ‘event data’ that record the signals left in the detectors by the particles generated in the LHC beam collisions and are generally stored on files. Relational database systems are commonly used instead to store the ‘conditions data’ that record the geometry, configuration and other working parameters of the detectors at the time the event data were collected.

The Common Relational Abstraction Layer (CORAL) software is widely used by the LHC experiments for storing and accessing conditions data using relational database technologies.

Description

CORAL is a software package that was designed to simplify the development of applications, by screening individual users from the database-specific C++ APIs and SQL flavours.

It provides a C++ abstraction layer that supports data persistency for several backends and deployment models, including local access to SQLite files, direct client access to Oracle and MySQL servers, and read-only access to Oracle through the FroNTier/Squid and CoralServer/CoralServerProxy server/cache systems.

During 2010, several problems were reported by the LHC experiments using CORAL, involving application hangs or crashes after the network or the database servers became temporarily unavailable. CORAL already provided some level of handling of these instabilities, which are due to external causes and cannot be avoided, but this proved to be insufficient in some cases and to be itself the cause of other problems, such as the hangs mentioned before, in other cases. As a consequence, a major redesign of the CORAL plugins was implemented, with the aim of making the software more robust against these network glitches.

Impact

The new implementation ensures that CORAL automatically reconnects to the database in a transparent way whenever possible and gently terminates the application when this is not possible. Internally, it takes care of resetting all relevant parameters of the underlying backend technology (such as OCI, the Oracle Call Interface).

Conclusions

The CORAL software is widely used for accessing from C++ and python applications the data stored by the LHC experiments using a variety of relational database technologies (including Oracle, MySQL and SQLite). The new feature, implemented to cope with the network glitches, allow an automatic reconnection to the database making CORAL more robust and reliable in a transparent and safe way for the users.

Track classification

Operational services and infrastructure

Comments

None

-- RaffaelloTrentadue - 30-Nov-2011

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2011-11-30 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback