Difference: SQLlite (1 vs. 3)

Revision 32011-06-27 - MartinBessner

Line: 1 to 1
 
META TOPICPARENT name="ProductionProcedures"
Changed:
<
<
Instead of accessing real ConditionDB infomration MC simulation jobs at non-T1 sites access a SQLlite file available in the shared area. Very often this access does not succeed and this is
usually evident if in the Gauss logs you see lines like:
>
>
Instead of accessing real ConditionDB information MC simulation jobs at non-T1 sites access a SQLlite file available in the shared area. Very often this access does not succeed and this is
usually evident if in the Gauss logs you see lines like:
 
GiGa                       INFO Stacking Action Object is not required to be

Revision 22010-06-11 - PeterJones

Line: 1 to 1
 
META TOPICPARENT name="ProductionProcedures"
Instead of accessing real ConditionDB infomration MC simulation jobs at non-T1 sites access a SQLlite file available in the shared area. Very often this access does not succeed and this is
usually evident if in the Gauss logs you see lines like:


GiGa                       INFO Stacking Action Object is not required to be
loaded
Changed:
<
<
COOLConfSvc INFO CORAL Connection Retrial Period set to 60s COOLConfSvc INFO CORAL Connection Retrial Time-Out set to 900s CORAL/RelationalPlugi... ERROR SQLiteStatement::prepare 5 database is locked CORAL/RelationalPlugi... ERROR SQLiteStatement::fetchNext 21 database is lockedDDDB ERROR Problems opening database
>
>
COOLConfSvc INFO Persistency Connection Retrial Period set to 60s COOLConfSvc INFO Persistency Connection Retrial Time-Out set to 900s Persistency/RelationalPlugi... ERROR SQLiteStatement::prepare 5 database is locked Persistency/RelationalPlugi... ERROR SQLiteStatement::fetchNext 21 database is lockedDDDB ERROR Problems opening database
 DDDB ERROR cool::DatabaseDoesNotExist: The database does not exist RichG4P... FATAL Exception with tag=DetectorDataSvc is caught

Revision 12008-12-17 - RobertoSantinel

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="ProductionProcedures"
Instead of accessing real ConditionDB infomration MC simulation jobs at non-T1 sites access a SQLlite file available in the shared area. Very often this access does not succeed and this is
usually evident if in the Gauss logs you see lines like:


GiGa                       INFO Stacking Action Object is not required to be
loaded
COOLConfSvc                INFO CORAL Connection Retrial Period set to 60s
COOLConfSvc                INFO CORAL Connection Retrial Time-Out set to 900s
CORAL/RelationalPlugi...  ERROR SQLiteStatement::prepare 5 database is locked
CORAL/RelationalPlugi...  ERROR SQLiteStatement::fetchNext 21 database is lockedDDDB                      ERROR Problems opening database
DDDB                      ERROR cool::DatabaseDoesNotExist: The database does
not exist
GiGa.TrackSeq.RichG4P...  FATAL  Exception with tag=DetectorDataSvc is caught
GiGa.TrackSeq.RichG4P...  ERROR DetectorDataSvc GaudiException in loadObject()
/dd StatusCode=FAILURE


In the following we suggest a template that one can copy&paste on the GGUS ticket for the site explaining how this problem can take place and basic investiogation and measures one can adopt to tackle it.

One of the most recursive problems with SQLite in NFS finds its best candidate as source of
the problem in the OS locking mechanism.
In general, once lockd server can't complete one RPC,
it waits forever, preventing any other locking to work. All clients
which try lock will stall/timeout/crash depending on configuration and
software versions.

The site admins should try to answer:
 Are the lockd services started correctly? Are there any firewall rules
blocking the lockd requests? Can anybody try to use the "lockfile" command to
see if it hangs?

The simplest way (which I know) to test this is python code:

import fcntl
fp = open("lock-test.txt", "a")
fcntl.lockf(fp.fileno(), fcntl.LOCK_EX|fcntl.LOCK_NB)

If that code is not working in NFS mounted directory (obviously
it must be rw for that particular test), lock daemon
is stuck (or locking is configured wrong).

Note, that in case there are several load balancing NFS servers, one
can have problem while other not. So the test will fail not on all
nodes.

A work-around can be to have the NFS share mounted with the option "nolock".
That should pretend that the locking is working.

-- RobertoSantinel - 17 Dec 2008

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback