EMI comments on the EGI Quality Criteria documents (Version 1)
Review was carried out by EMI technical management and QA.
Reviewed material
UMD Quality Criteria (EGI Document 240-v8)
https://documents.egi.eu/public/ShowDocument?docid=240
- Generic Quality Criteria
- Compute Capabilities Quality Criteria
- Data Quality Criteria
- Storage Capabilities Quality Criteria
- Security Capabilities Quality Criteria
- Information Capabilities Quality Criteria
- Operations Capabilities Quality Criteria
General remarks
- EMI project has already established its own set of quality requirements. These are formulated via the EMI SA2 policies [1]. These policies form the base of EMI component quality check and component certifications and validation.
The EMI Quality Assurance Policy documents cover the entire software development life cycle.
The EGI QC documents have a very similar scope and there is a big overlap between the EMI Policies and the EGI QC documents.
- EMI has the goal to adjust its QA Policy document to satisfy its customers.
Therefore, EMI plans to synchronize its own QA Policy documents with the EGI QC.
However, the synchronization should be based on mutual understanding.
Before EMI can commit itself to the EGI QC requirements, the EMI QA team and the technical management need to carry out a detailed, thorough analysis of each of the UMD criteria and communicate back acceptance or possible problems for each of the criteria, one by one. Meanwhile, it is recommended that the EGI-QC team also takes a look at the EMI QA Policies. The more thorough analysis of the EGI-QC and the update of the EMI Policy documents can take place only after the
EMI-1 release (May/June).
- Software produced by EMI is checked against the requirements laid down in EMI Policy documents. In particular, the (already ongoing) certification and validation of the EMI-1 release components is based on the EMI policies.
This implies that the EGI QC requirements can be earliest taken into account during the EMI-2 (or EMI-1.1) preparation, right after the EMI QA Policies will have been updated and synchronized with EGI QC.
- Since there is already quite large overlap between the EMI Policies and the EGI QC, the EMI-1 release components will already satisfy majority of the EGI QC as well.
- As a general problem, EMI sees that many of the criteria are not precisely formulated leaving quite some room for misunderstanding.
- Another big recurring issue is the responsibility of providing test suits for some of the criteria. There should be a clear agreement between EGI and the technology providers regarding the responsibility of test suit provisioning and maintenance.
- The Criteria template hides the most important content: We suggest to move the "Description Field" right below the title/ID. Also, changing the numeric ID to something more meaningful may be considered. The purpose of the "History" field is not clear. Other minor readability improvement would be the introduction of
"Mandatory: YES/NO" instead of the current practice (inconsistent layout).
- We have found plenty of jargon abbreviations undefined and also plenty of copy/paste errors throughout the "final" documents.
- Finally, it is mentioned that the name "UMD Quality Criteria" is a bit misleading. The criteria collected in the 7 documents have very different
nature: some of them are indeed quality criteria, others are just functionality tests while there are plenty of lower level technical requirements as well (e.g.
what a startup script should do).
Specific Comments on Generic Quality Criteria
- Template: Change "Input from TP" to "Input from Technology Provider"
- Functional Description: The criterion description is rather vague. For example, what would be a functional description document for a developer API component, or for a user tool?
- Release Notes: already included in EMI Documentation policy
- User Documentation: very generic criteria.
- Online help (man pages): Currently it is not mandatory in EMI.
- API documentation: Currently it is not mandatory in EMI.
- Administration documentation: EMI requires an Installation and Admin guide.
- Service Reference Card: Also required in EMI
- Software licence: Required and tracked in the EMI Component Release Tracker [2]
- Source code availability: Part of the description is too vague and talks of non-quantifiable values. Otherwise the criteria is part of EMI policy.
- Build procedure documentation: Yes, part of EMI policy
- Automatic builds: Yes, EMI components are continuously tested via automatic nightly builds.
- Binary distributions: EMI has the same requirement
- Release changing testing: Similar requirement exists in EMI. The big question here is the granularity of testing. There is no way to test every minor code change or fix for trivial bugs.
- Service control and status: This is more a technical requirement than a Quality Criteria. It is currently not regulated to this extent in EMI, not part of EMI policy.
- Log files: This is more a technical requirement than a Quality Criteria. It is currently not regulated to this extent in EMI, not part of EMI policy.
- Service Reliability: Description is little bit vague (what is a good performance?). No such requirement is explicitly part of EMI policies.
Nevertheless, Product Teams are required to perform scalability and performance tests as part of their product certification.
- Word writable files: Not part of EMI requirements. Who would provide test suit? This QC could be part of the security requirements, not really a generic one.
Specific Comments on Compute Capabilities Quality Criteria
- While a general attention seems to have been dedicated to test existing
proprietary interfaces (e.g. CREAM, ARC) this hasn't happened for unicore,
whereby the author seems to assume that unicore=bes, which is not the case (we
would expect some JOBEXEC_UNICORE_* tests as well).
-It can be accepted that sometimes the tests need to be very specific,
especially if they want to support nordugrid, wlcg, unicore grids as we know
them and reserve support for more grids - see below - No way that what is asked
can be done, see e.g. for both jobexec and jobsched stuff.
- What concerns DRMAA, EMI really cannot understand where it may fit the big
picture.
- Some important test cases are missing. One such test can be: submit of a 1
hour job with half an hour proxy; or even: check that subsequent jobs by the
same user with different roles & capabilities are able to properly handle file
permissions and are not mixed up. Tons of similar tests could be selected. EGI
should select the required tests more carefully, preferably consulting with the
technology providers (or just study the existing test plans in EMI [3]). In
general, most of the requird tests are pretty trivial.
- Requiring BES tests from EMI is questionable. BES is not an official EMI
compute interface. Not to mention why the BES API testing should be done using
the UNICORE UCC JSON language? As said, BES API testing doesn't mean UNICORE API
testing, and viceversa.
- 1.6 Availability/Scalability: The current pass condition (Pass if the
throughput is enough to handle at least 5000 simultaneous jobs.) is to be
discussed. 5k simultaneously seems a bit unrealistic. or maybe jobs/day is
something more familiar, say 50k submitted jobs/day. The QC should also report a
time lag within which job status changes should be visible to the client. I
don't care about a service delivering 100Kjobs a minute is the user will know
they done in days.
- JOBSCH_WMS_API_2: Why is this test required? this was a EGEE2 requirement and
none has really ever considered it. As of now, JSDL has been integrated into
BES, so JOBSCH_BES_1 should be more than enough
- JOBSCH_EXEC_1: This is an important one. it will require some non trivial work.
- Service availability, monitoring and error handling: is this required for the
WMS only?
-- #698: WMS stability and performance
that's quite upsetting indeed, as the WMS is currently used by cms, as the most
important example, accommodating by far for their production quality criteria
which are about 50kjobs/day. the service requires one restart per month if not less.
-- 1000 simultaneous jobs does not make much sense as well. just take into
account the the match-making operation alone, which is very optimized if one
thinks of the 18k queues available on average, takes 1-2 seconds. the QC do not
specify the job type to be stress tested (single
jobs/collections/DAGs/parametric/MPI etc). again, jobs/day would be better
Specific Comments on Data and Storage Quality Criteria
- The documents describe testing of components, interfaces and protocols within
EMI data. Components are Hydra, Amga and FTS. The interface is the POSIX file
system access and the protocols are http(s),
WebDAV, gridFTP and SRM.
- In general the proposed tests all make sense. A large fraction of the tests
are already done by the responsible product teams. It was not possible to check
the details. To our opinion, in some cases, the proposed tests are not
sufficient, in other cases they are not useful.
- For gsiFTP the document only requires read and write. gsiFTP however has two
different versions and within those versions different modes, most of which are
used in our infrastructure. Testing should be done for all combinations.
- For
WebDAV only testing of read and write is proposed.
WebDAV has a much
richer set of function which might be worth testing.
- An opposite example is the SRM. The document requires to test all "SRM
operations described in SRM v2.2". EMI thinks that only those tests are useful
which cover functionality used by the current infrastructure. There is
functionality in SRM v2.2 which is not, and will not be implemented by any of
the SE's.
- For POSIX file access, the document suggests to test all possible functions.
We might consider to limit this as well.
- The real question: Who is going to provide or maintain those tests ?
Specific Comments on Security Capabilities
-AUTHN_ IFACE_2: "may use" ... ARC/gLite no way in EMI-1
- AUTHN_CA _*: Not really a "security area" problem... CA dists are packaged
by others. In fact, our MW doesn't really need to know anything about a CA
distribution!
- ATTAUTH_ MGMT_5: What are the ACLs being changed here?
-ATTAUTH_ WEB_1: Is this necessary?
VOMS-Admin function?
-ATTAUTH_ WEB_3: Does this exist?
-AUTHZ_ PCYDEF_4: Is this SCAS? Why SCAS? EMI is phasing out SCAS!
-CREDMGMT_IFACE_1: Is this test for
MyProxy?
-CREDMGMT_IFACE_3: Proxy Renewal is not a service. Test suite for this tricky as
it requires WMS etc installation.
- CREDMGMT_LINK_1: STS and future work of the AAI strategy group. Otherwise
"mostly harmless".
Specific Comments on Information Capabilities
- First comment is that this document is a little too simplistic and
'high-level'. While in general they have captured the main factors that are
probably important, there is not enough information to really understand the
quality criteria or the context.
- What are we testing; a system?, a component?, a service? Secondly the test
descriptions are a little abstract for example, test that the Service
information conforms to the
GlueSchema v2. What does this really mean? How do
you test this? Who provides the test? Who runs the test etc.
- Basically there is not enough information in the document to understand what
criteria needs to be met and how.
References
[1] EMI Quality Assurance Policy documents
https://twiki.cern.ch/twiki/bin/view/EMI/SA2#EMI_Policy_Documents
[2] EMI Release tracker
https://savannah.cern.ch/task/?group=emi-releases
[3] Collection of EMI test plans
https://twiki.cern.ch/twiki/bin/view/EMI/QCTestPlan