SD Performance and requirement 111
We understand from Erwin that requirement 111 is to provide fast and
reliable access to service end points. The authors of the requirement
appear to assume that this requires some form of client side caching.
We had originally intended to provide client caching in the Service
Discovery APIs. However we are no longer convinced of the benefit.
Clients run on UIs, within other services and on WNs. Were client
caching to be provided it could be by process, by user or by host.
The cache could be read first which is best for performance or used as
a last resort which is best for getting the correct results.
Per process caching would be fairly easy to provide - but it is much
better provided by the application code. The application makes a
single call to get the SD end points it needs, and only goes back to SD
if all the end points prove to be inoperative.
If caching were by user it would be necessary to store the information
between jobs. This could easily be done with a $HOME for each user -
but this does not work on the WNs with any kind of dynamic account
system.
The third option of making the cache host-wide seems impractical for
security reasons as different users may have the rights to see
different services. It would require a privileged daemon and not the
client API collecting information from all services which is
effectively what happens with an R-GMA secondary producer.
So what is the solution? I suggest that best approach is to modify the
SD API implementation so that it invokes plug-ins in parallel rather
than sequentially. Once a plausible response is obtained other threads
can be be ignored or killed. This should reduce the time that SD takes
to respond. Note that if R-GMA and
BDII are each only available for
90% of the time then the pair should give 99% availability and in fact
both are much better than 90%.
Services and users should use SD sensibly by minimising calls to
it. Neither R-GMA nor
BDII should be used directly to obtain service
end points.
Sufficient secondary producers (aka archivers) should be installed to
obtain good R-GMA response. The right number should be determined in
consultation with SA1.
Work is already going on to bring the
BDII and R-GMA SD in line and to
use the same configuration files. This ensures that both systems will
give the same answer for service versions and will minimise the
configuration effort.
Finally please note that the service described at
https://savannah.cern.ch/task/?func=detailitem&item_id=3069
could very
easily be provided by R-GMA once the authz is in place. This would
require a single primary producer for the "constants" and a few
secondary producers.
-- Main.jwhite - 05 Oct 2006