PanDA server migration from Oracle to MySQL DB backend


  • This page is evolving. Please keep checking.
  • This page contains information about BigPanDA related migration of PanDA server from Oracle 11g to MySQL 5.1 performed in 2013-2014.
  • Feel free to contact JaroslavaSchovancova and TadashiMaeno with questions.

PanDA server database backends

  • PanDA server code is written in python, using Oracle as the main DB backend (e.g. for the ATLAS experiment). Most of the SQL queries in the code are written with Oracle's PL/SQL syntax.
  • However, with a limited number of changes in the panda_server.cfg configuration file, one can decide whether Oracle or MySQL database backend will be used for PanDA data storage.
  • WrappedCursor() is a class defined in pandaserver.taskbuffer.WrappedCursor. It alters Oracle's PL/SQL syntax, so that other DB backends can be used.
    • As of September 2014, the MySQL DB backend can be used.

Class WrappedCursor

WrappedCursor implementation

  • available properties and methods of the WrappedCursor class
        # connection object
        conn = None
        # cursor object
        cur = None
        # use special error codes for reconnection in querySQL
        useOtherError = False
        # backend
        backend = 'oracle'
        # constructor
        def __init__(self, connection)
        # iterator
        def __iter__(self)
        # string representation
        def __str__(self)
        def execute(self, sql, varDict=None, cur=None)
        def executemany(self, sql, params)
        def fetchall(self)
        def fetchmany(self, arraysize=1000)
        def fetchone(self)
        def var(self, dataType, *args, **kwargs)
        def getvalue(self,dataItem)
        def next(self)
        def close(self)
        def prepare(self, statement)
        # wrappers to handle Oracle's "RETURNING INTO" statement in different backends
        def _returningIntoOracle(self, returningInputData, varDict, cur, dryRun=False)
        def _returningIntoMySQLpre(self, returningInputData, varDict, cur)
        def _returningIntoMySQLpost(self, returningInputData, varDict, cur)
        def description(self)
        def rowcount(self)
        def arraysize(self)
        def arraysize(self,val)

Addressed issues

  • Here follows a list of issues that have been addressed in order to run PanDA server backed with MySQL from the same code as Oracle-backed version.

Table names in MySQL have to be case-insensitive

  • Table names throughout the code are used with different capitalization. Therefore MySQL has to be configured so that table names are case-insensitive:
    # cat /etc/my.cnf

Schema names configured in panda_server.cfg

  • Oracle uses several schemas, while in MySQL a single schema is defined.
  • In WrappedCursor() the schema names in a SQL query are replaced with schema names from configuration:
                sql = re.sub('ATLAS_PANDA\.', panda_config.schemaPANDA + '.', sql)
                sql = re.sub('ATLAS_PANDAMETA\.', panda_config.schemaMETA + '.', sql)
                sql = re.sub('ATLAS_GRISLI\.', panda_config.schemaGRISLI + '.', sql)
                sql = re.sub('ATLAS_PANDAARCH\.', panda_config.schemaPANDAARCH + '.', sql)
                sql = re.sub('ATLAS_DEFT\.', panda_config.schemaJEDI + '.', sql)


  • Adapting Oracle's CURRENT_DATE for MySQL consists of 2 steps.
    • Digging the interval length.
                  sql = re.sub("CURRENT_DATE\s*-\s*(\d+|:[^\s\)]+)", "DATE_SUB(CURRENT_DATE,INTERVAL \g<1> DAY)", sql)
    • Using MySQL's CURRENT_TIMESTAMP instead of CURRENT_DATE, because of different date string formats in Oracle and MySQL.
                  sql = re.sub('CURRENT_DATE', 'CURRENT_TIMESTAMP', sql)


  • Adapting Oracle's SYSDATE for MySQL consists of 2 steps.
    • Digging the interval length.
                  sql = re.sub("SYSDATE\s*-\s*(\d+|:[^\s\)]+)", "DATE_SUB(SYSDATE,INTERVAL \g<1> DAY)", sql)
    • Using MySQL's SYSDATE() instead of SYSDATE. sql = re.sub('SYSDATE', 'SYSDATE()', sql)


  • There is no equivalent of EMPTY_CLOB() in MySQL, therefore replacing it with an empty string:
                sql = re.sub('EMPTY_CLOB\(\)', "''", sql)


  • Using MySQL's LIMIT instead of Oracle's ROWNUM.
                sql = re.sub("(?i)(AND)*\s*ROWNUM\s*<=\s*(\d+)", " LIMIT \g<2>", sql)
                sql = re.sub("(?i)(WHERE)\s*LIMIT\s*(\d+)", " LIMIT \g<2>" , sql)


  • In PanDA server the RETURNING INTO term is used to get an ID of the new job or file.
  • In MySQL this functionality is replaced by LAST_INSERT_ID() together with an autoincremented column.

var of data type number

  • When using cursor.var(), the Oracle-backed PanDA server returns variable of a data type cx_Oracle.NUMBER. However, MySQLdb provides no such thing, so we use python's long data type.
    if panda_config.backend == 'oracle':
        import cx_Oracle
        varNUMBER = cx_Oracle.NUMBER
        import MySQLdb
        varNUMBER = long


  • Oracle provides sequences, while MySQL does not.
  • However, MySQL provides autoincremented columns.
  • In PanDA server .nextval or .currval properties of a defined sequence are used.
  • In MySQL-backed server the sequence functionality is replaced by a definition of a special table as a counterpart for each sequence. The .nextval property is then replaced by INSERT of a NULL value into an autoincremented column of the "sequence table", combined together with SELECT LAST_INSERT_ID().
  • Adapting SQL quieries for sequences has to be taken into account also for the JobSpec, FileSpec, and DatasetSpec classes, e.g.
        # return expression of bind values for INSERT
        def bindValuesExpression(cls,useSeq=False):
            from config import panda_config
            ret = "VALUES("
            for attr in cls._attributes:
                if useSeq and cls._seqAttrMap.has_key(attr):
                    if panda_config.backend == 'mysql':
                        # mysql
                        ret += "%s," % "NULL"
                        # oracle
                        ret += "%s," % cls._seqAttrMap[attr]
                    ret += ":%s," % attr
            ret = ret[:-1]
            ret += ")"
            return ret
        bindValuesExpression = classmethod(bindValuesExpression)
  • In MySQL-backed server the DB schema contains tables of the same name as the Oracle's sequence, e.g. ATLAS_PANDA.JOBSDEFINED4_PANDAID_SEQ.
    • The "sequence table" JOBSDEFINED4_PANDAID_SEQ:
           col bit(1) NULL,
           PRIMARY KEY (id)
  • As of September 2014, the PanDA server uses the following sequences:
    ### ATLAS_PANDA schema ###
    ### ATLAS_PANDAMETA schema ###
    ### ATLAS_DEFT schema ###

Addressing sequence ATLAS_PANDA.JOBSDEFINED4_PANDAID_SEQ.nextval

  • Code example:
                    if panda_config.backend == 'mysql':
                        ### fake sequence
                        sql = " INSERT INTO ATLAS_PANDA.JOBSDEFINED4_PANDAID_SEQ (col) VALUES (NULL) "
                        self.cur.arraysize = 10
                        self.cur.execute(sql + comment, {})
                        sql2 = """ SELECT LAST_INSERT_ID() """
                        self.cur.execute(sql2 + comment, {})
                        job.jobsetID, = self.cur.fetchone()
                    else:  # panda_config.backend == 'oracle':
                        sqlESS = "SELECT ATLAS_PANDA.JOBSDEFINED4_PANDAID_SEQ.nextval FROM dual ";
                        self.cur.arraysize = 10
                        self.cur.execute(sqlESS + comment, {})
                        job.jobsetID, = self.cur.fetchone()

Addressing sequence ATLAS_PANDA.CLOUDTASKS_ID_SEQ.nextval within the original table, with RETURNING INTO term

  • Code example:
                if panda_config.backend == 'mysql':
                    ### fake sequence
                    sql = "INSERT INTO ATLAS_PANDA.cloudtasks (id,taskid,status,tmod,tenter) VALUES(NULL,:taskid,:status,CURRENT_DATE,CURRENT_DATE)"
                else:  # panda_config.backend == 'oracle':
                    sql = "INSERT INTO ATLAS_PANDA.cloudtasks (id,taskid,status,tmod,tenter) VALUES(ATLAS_PANDA.CLOUDTASKS_ID_SEQ.nextval,:taskid,:status,CURRENT_DATE,CURRENT_DATE)"
                sql += " RETURNING id INTO :newID"
                varMap = {}
                varMap[':taskid'] = cloudTask.taskid
                varMap[':status'] = cloudTask.status
                varMap[':newID']  = self.cur.var(varNUMBER)
                self.cur.execute(sql+comment, varMap)
                # get id
       = long(self.cur.getvalue(varMap[':newID']))

Database Schema

Major updates:

-- JaroslavaSchovancova - 09 Sep 2014

Responsible: JaroslavaSchovancova

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2014-09-10 - JaroslavaSchovancova
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PanDA All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback