This twiki gathers some preliminary material to be used as input for discussion on September 2014 WLCG MB. The twiki analyses what the main operation costs in terms of effort are for sites and experiments. It tries to understand where effort could be potentially reduced and how to do it.
Question |
Answers |
Notes |
Site organisation |
What is the name of your site (it will remain confidential)? |
site name |
|
What type of tier is your site? |
0, 1, 2 |
|
How many LHC VOs does your site support? |
n |
|
How many non-LHC VOs does your site support? |
n |
|
How much effort is spent in service operations and other activities? |
FTE |
|
|
Batch system |
|
|
Worker nodes |
|
|
Storage system |
|
|
Networking |
|
|
Computing Elements |
|
|
perfSONAR |
|
|
Local monitoring |
|
|
squid servers |
|
|
Argus |
|
|
Information system |
|
|
VO boxes |
|
|
Other Grid services (please specify) |
|
|
Providing support via tickets |
|
|
Experiment contacts |
|
|
WLCG meetings |
|
|
Active participation to WLCG task forces, working groups, etc. |
|
|
Testing new technologies |
|
|
Other WLCG-related tasks (please specify) |
|
|
Service upgrades and changes |
Do you think that the frequency of middleware releases is manageable for your site? |
Not at all / Barely / Usually / Quite / Perfectly |
|
Are you satisfied with the support (including documentation, step-by-step instructions, etc.) you get from WLCG during service upgrades/changes? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
Is it easy to find the right documentation and repositories, when you search for it? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
In which repositories is most important to find the RPMs to install or upgrade a service (select at most two)? |
EPEL / EMI / UMD / WLCG |
|
How difficult is to perform standard upgrades from standard repositories? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
Do you have any comments or suggestions on how to improve service upgrade operations? |
free text |
|
Communication |
How important is that requests originated by experiments are communicated via WLCG operations rather than by the experiments themselves? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
Do you think that communication between the site and WLCG operations is effective? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
What could be done to improve the communication between the site and WLCG operations? |
free text |
|
Do you think that sharing of information across WLCG sites is effective? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
How would you improve the sharing of information across WLCG sites? |
free text |
|
What are, or would be, your preferred channels to communicate with other sites (at most three choices)? |
|
|
|
Meetings |
|
|
Mailing lists |
|
|
Wiki or other web pages |
|
|
Web forums |
|
|
Other |
|
|
If possible, provide examples for the selected answers |
free text |
|
Does your site regularly follow the fortnightly WLCG operations coordination meeting? |
Never / Rarely / Usually / Often / Always |
|
Does your site regularly read the minutes of the WLCG operations coordination meeting? |
Never / Rarely / Usually / Often / Always |
|
What changes do you think would make the meeting more effective and interesting for you as a site? |
free text |
|
Do you think that, overall, WLCG operations Task Forces and Working Groups are useful for your site? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
If your site is not involved in a TF or WG, please indicate the main reason(s) |
free text |
|
Are you satisfied with GGUS as the official user support tool? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
What improvements would you like to see in GGUS? |
free text |
|
When WLCG expects a certain action from a site (service upgrades and reconfiguration, etc …), what channels do you want to be used, in order of importance? |
|
|
|
WLCG broadcasts |
[1, 2, 3] |
|
GGUS tickets |
[1, 2, 3] |
|
Operations meetings |
[1, 2, 3] |
|
Monitoring |
Do you think that the results of the SAM tests are complete enough to assess the level of functionality of your site? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
Do you think as a site that the SAM tests are reliable in telling if something is working properly (e.g. negligible fractions of false positives or negatives)? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
Do you find SAM tests easy to understand and well documented? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
Is the output of a failed SAM test complete enough to understand the cause of a site problem? |
Never / Rarely / Usually / Often / Always |
|
How do you usually find out that a SAM test is failing at your site? |
|
|
|
By receiving a ticket |
|
|
By periodically checking the SAM web page |
|
|
From an alarm from your local monitoring system, interfaced to SAM |
|
|
From WLCG availability/reliability reports |
|
|
Other (please specify) |
|
|
What improvements to the SAM monitoring would your site like to be implemented? |
free text |
|
Overall, how useful do you consider these types of site monitoring )? |
Not at all / Slightly / Moderately / Quite / Extremely |
|
|
SAM |
|
|
Hammercloud |
|
|
Real production and analysis jobs |
|
|
Data transfer metrics |
|
|
Network monitoring |
|
|
Other (please specify) |
|
|
Please, describe below any ideas you may have to improve the site monitoring |
|
|
Grid service administration |
Please rate how easy is to perform the following operations in the administration of service X? |
1 = Very hard / 2 = somewhat hard / 3 = normal / 4 = rather easy / 5 =extremely easy |
|
Service |
Accessing adequate documentation |
First deployment |
Service upgrades (including security patches) |
Reconfigurations |
Troubleshooting and fixing problems |
Getting support from the developers |
Batch system |
|
|
|
|
|
|
Worker nodes |
|
|
|
|
|
|
Storage system |
|
|
|
|
|
|
Networking |
|
|
|
|
|
|
Computing Elements |
|
|
|
|
|
|
perfSONAR |
|
|
|
|
|
|
Local monitoring |
|
|
|
|
|
|
squid servers |
|
|
|
|
|
|
Argus |
|
|
|
|
|
|
Information system |
|
|
|
|
|
|
VO boxes |
|
|
|
|
|
|
.