Information and Monitoring work plan
Requirements
There are only two requirements directly related to the JRA1-UK work. Under
the security heading:
- #101 - asks us to make use of VOMS groups and roles
and under the Information System heading:
- #111 - we understand that the real requirement is to provide fast and reliable access to service end points. The requirment authors appear to assume that this requires some form of caching
Effort
We are currently very low on manpower having just had 4 resignations
from the team. This leaves by early July:
Djaoui, Abdeslem: Development
Duncan, Alastair: Development and integration
Fisher, Steve: Cluster leader
Wilson, Antony: Developmenet and support
We are recruiting at least 3.5 developers to provide the 168 PM over
the 2 year period.
R-GMA Plans
R-GMA is now working fairly well with experienced users. However the
design has some limitations which are hard to work around. We are
currently working on a new design which includes authz, multiple VDBs
(name spaces) and registry and schema replication and support for
Oracle and other RDBMS.
The first release of this new design will contain no new functionality
but will be a sound basis to work from. Significant changes in the design
include:
- The registry no longer sends out notifications. This should increase
reliability and does make registry replication much easier to implement.
- There will only be one socket open for streaming from one machine to
another
- regular handling of remote invocation (time-outs etc)
- database independence
- managed tuple stores - essential to support authz
This first release will probably also include multiple VDB support but not
the ability to issue a query spanning more than one VDB.
In subsequent releases we will provide in this sequence:
1 Queries over multiple VDBs
2 Authz by VDB. This will make use of VOMS Groups and Roles (or any
other certificate attributes)
3 Registry replication
4 Schema replication
5 Oracle support
Service Discovery (SD) Plans
We will work on more responsive SD by invoking the plug-ins in
parallel.
We are developing a "configuration-free" SD which is useful
as a bootstrap mechanism as it can locate the R-GMA server on the
local subnet.
Time scales and assignment of people
We are currently unable to make good predictions as considerable
effort can sometimes be "lost" in support. In addition JRA1-UK has a
temporary criticial shortage of manpower. We hope to achieve by the
end of this year:
- queries over multiple VDBs and authz (using VOMS attributes) by VDB
- R-GMA clients with the option of automatic configuration
- Parallel invocation of SD plugins
However this depends on being able to get new people empployed and
working effectively in a rather short time.
Cache
SD Performance and requirement 111
We understand from Erwin that requirement 111 is to provide fast and
reliable access to service end points. The authors of the requirement
appear to assume that this requires some form of client side caching.
We had originally intended to provide client caching in the Service
Discovery APIs. However we are no longer convinced of the benefit.
Clients run on UIs, within other services and on WNs. Were client
caching to be provided it could be by process, by user or by host.
The cache could be read first which is best for performance or used as
a last resort which is best for getting the correct results.
Per process caching would be fairly easy to provide - but it is much
better provided by the application code. The application makes a
single call to get the SD end points it needs, and only goes back to SD
if all the end points prove to be inoperative.
If caching were by user it would be necessary to store the information
between jobs. This could easily be done with a $HOME for each user -
but this does not work on the WNs with any kind of dynamic account
system.
The third option of making the cache host-wide seems impractical for
security reasons as different users may have the rights to see
different services. It would require a privileged daemon and not the
client API collecting information from all services which is
effectively what happens with an R-GMA secondary producer.
So what is the solution? I suggest that best approach is to modify the
SD API implementation so that it invokes plug-ins in parallel rather
than sequentially. Once a plausible response is obtained other threads
can be be ignored or killed. This should reduce the time that SD takes
to respond. Note that if R-GMA and
BDII are each only available for
90% of the time then the pair should give 99% availability and in fact
both are much better than 90%.
Services and users should use SD sensibly by minimising calls to
it. Neither R-GMA nor
BDII should be used directly to obtain service
end points.
Sufficient secondary producers (aka archivers) should be installed to
obtain good R-GMA response. The right number should be determined in
consultation with SA1.
Work is already going on to bring the
BDII and R-GMA SD in line and to
use the same configuration files. This ensures that both systems will
give the same answer for service versions and will minimise the
configuration effort.
Finally please note that the service described at
https://savannah.cern.ch/task/?func=detailitem&item_id=3069
could very
easily be provided by R-GMA once the authz is in place. This would
require a single primary producer for the "constants" and a few
secondary producers.
-- Main.grandic - 27 Jun 2006