TWiki
>
EGEE Web
>
TCGHome
>
WNWorkingGroup
(revision 16) (raw view)
Edit
Attach
PDF
---+ WN Working Group. The worker node working group is [[TCGHome][TCG]] sponsored activity that aims to address the matching and utilization of of worker node resources within the EGEE grid. The mandate, list of members and mailing list is available here. [[http://egee-intranet.web.cern.ch/egee-intranet/NA1/TCG/wgs/wn.htm][Mandate]] ---++ Steps to Reach Goals. A number of steps must be completed to reach a state where we support rich configuration of Glue !Clusters and !SubClusters. For all deployment dependencies they are within a site or gLite release. We have no expectation that something must be deployed everywhere before we can proceed to the next step. | *Key* | *Item* | *Development Status* | *YAIM Status* | *Certification Status* | *Dependencies* | | *A* | Resolving WN to a !GlueSubCluster | Package =glite-wn-info= exists and can be configured to return this value. \ | PATCH:2114 is submitted but is in configuration status, i.e YAIM needs to support a richer wn-list.conf \ file allowing per WN properties. \ | Needs the updated YAIM before it can really begin. \ | None, could be deployed tomorrow. | | *B* | Resolving a !GlueSubCluster to an RTEPublisher | Steve Burke's has a provider for this resolution =glite-info-provider-service-1.0.3-0.noarch.rpm=. \ I reviewed it recently and gave some comments that were accepted. BUG:45313. \ | Trivial configuration, trivial addition to YAIM. New YAIM function attached on WNWorkingGroupInstallLog. \ | Submitted after updates, easy to certify. \ | Can be deployed now even on today's lcg-CE and would be worth while anyway before being moved with the \ eventual creation of the =glite-CLUSTER= node. Although can be deployed , effectivly not used till *C* \ is done | | *C* | Creation of per !SubCluster RTE Tags areas | No development. \ | Not submitted formally as a request to yaim but a trivial function that can be added to the current lcg-CE as well. Will do it. \ Essentially for each !SubCluster create a =/opt/glite/var/info/<SubCluster>/<VO>= directories with =sgm= like permissions.\ | Trivial to certify. \ | As mentioned this can be done now on the lcg-CE where we know the name of the single SubCluster. However *D* should be \ deployed first so that any tags that end up in their are subsequently published as software tags in the !SubCluster. | | *D* | Infoprovider update to publish software tags per !SubCluster | Requires an update to =lcg-info-dynamic-software= BUG:45310. \ This is the next thing I will do. \ | No changes to YAIM \ | Easy to certify if a little fictitious. Nothing will actually be putting tags in here at the time of deployment. \ | No dependencies, can and should be deployed tomorrow. | | *E* | =lcg-ManageTags= and =lcg-tags= need to support a =--cluster= option | Developers are expecting bugs from me with a request to support "--subcluster" option. We have to \ have publication details of the RTEPublisher finalised in *B* before they can have details of what to query. \ | No YAIM work. \ | Certification has to check not only =--subcluster= works but mainly that it is backwards compatible assuming that a lack of RTEpublisher for a !SubCluster \ signifies that the !SubCluster is the hostname. \ | As mentioned the RTEPublisher publishing must be finalised as per *B*. Once done and developed can be deployed without delay. The deployment is irrelevant \ in the sense that =sgm= users will just carry on using =--cehost= and not =--subcluster= in the first instance of deployment | | *F* | YAIM supporting free style !Clusters and !SubClusters | There is now a good attempt at this within YAIM not in the main development branch. This is documented well \ [[YAIMcluster_1][YAIM Cluster]] and I reviewed the installation recently here WNWorkingGroupInstallLog. Its a very good first attempt and all of the \ complexity needed is present. The items highlighted in the install log are all _small_ fixes. Essentially another round of development process is required. \ | This is all YAIM. \ | Another development round needed. \ | This could in principal be deployed tomorrow. But this unlikely to happen, much of the above can and should be deployed first. *C* should be done first \ though. | ---++ Comments A number of comments have already been addressed to the group that might be considered for inclusion within discussions and outputs. * CPU Numbers - The working group will most likely touch on describing heterogeneous batch farms with multiple !GlueSubClusters. Consequently publishing reliable numbers numbers for CPUs in !GlueSubCluster for use by e.g [[http://gstat.gridops.org/gstat][gstat]] becomes a sensible objective. [[http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0709&L=lcg-rollout&F=&S=&P=6338][Long lcg-rollout thread]]. * Passing wallclocktime for jobs. It is very likly that the group will consider the passing to the LRMS values for memory and or disk requirements. Particularly for sites supporting MPI jobs it is vital that jobs are all also submitted with a wall clock time to allow for backfill. While not an objective of the group since different WNs do not generally support different wall clocktimes it is related to argument passing and so can be considered. ---++ Strategy * VOs to produce a list of constraints related to WN capacity they wish to describe their jobs by, e.g Memory, Diskspace, anything else? * Produce an outline of what can be achieved today. By today we are talking about Glue 1.3 Schema, WMS 3.1 and the LCG CE. * We can consider from this if a short term solution is worth implementing given the anticipated constraints of the lcg-CE. Any such solution would likely result in recommendations to the YAIM team for such a deployment. * Some sites notably RAL already run with a configuration such that matching different worker node resources within the same site is possible but far from optimal. * Run within the PPS a CREAM CE which is expected to at the very earliest available as pre-pre-release at the end of October 2007. * This will be configured as a CE, torque batch system and two batch workers with different hardware configurations. * Information publishing of this CREAM CE can be tweaked by hand to establish the publishing of this heterogeneous cluster. ---++ Software Tags The software tags need to addressed with respect to running multiple !SubClusters on the same physical host. * WNWorkingGroupVoInstallMethods ---++ Test Rig A test installation is being set up within the PPS. See: WNWorkingGroupInstallLog ---++ Presentations * GDB Januuary 7th 2008 [[http://indico.cern.ch/conferenceDisplay.py?confId=20225][GDB Agenda]]. ---++ Relevant Documents * [[http://del.icio.us/tag/egeewnwg][del.ico.us egeewnwg tag]]. This is most up to date with relavent links, feel free to add if a del.icio.us user. * [[http://grid.pd.infn.it/cream/][CREAM Homepage]] * [[http://glueschema.forge.cnaf.infn.it/Spec/V13][Glue Schema 1.3]] * [[http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_different_memory_limits_for_different_queues_on_the_same_CE][Guide to publishing different memory GlueSubClusters and GlueClusters]] * [[http://hepix.caspur.it/afs/hepix.org/project/batch/gridbody.html][HEPiX - Grid Information on Compute Elements]] * [[http://hepix.caspur.it/afs/hepix.org/project/batch/gridbatchhepixbody.html][HEPiX - Another related HEPiX Page]] * In particular [[http://hepix.caspur.it/spring2006/TALKS/4apr.prelz.dir/][Francesco's talk]] and [[http://hepix.caspur.it/spring2006/TALKS/4apr.schwickerath.blahp_status.pdf][Ulrich's talk]]. * [[http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/ce_blahp_conf.shtml][BLAH install guide]] * [[http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/blah_porting_notes.txt][BLAH draft porting notes]] * [[http://forge.ogf.org/sf/go/projects.jsdl-wg/docman.root][JSDL Spec, Influences what will be available for GLUE2.0]] ----++ Meetings. * First Meeting - September 27th - 14:00 - WNWorkingGroupMinutes20070920 -- Main.SteveTraylen - 25 Sep 2007
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r18
<
r17
<
r16
<
r15
<
r14
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r16 - 2008-12-16
-
SteveTraylen
Log In
EGEE
EGEE Web
EGEE Web Home
gLite
ProductTeams
SA3
JRA1
TMB
EMT
SA1
SA2
NA2
NA4
EGEE-UIG
List of
registered projects
List of EGEE-RP
interactions
Changes
Index
Search
Main.WebList
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
EGEE
All webs
Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Ask a support question
or
Send feedback