L&B Software Verification and Validation Plan
Service Description
Logging and Bookkeeping (LB or L&B for short) is a Grid service that keeps a short-term trace of Grid jobs as they are processed by individual Grid components.
The LB was initially developed in the EU DataGrid project as a part of the Workload Management System (WMS). The development continued in the EGEE project (EGEE-II, EGEE-III respectively), where LB became an independent part of the gLite middleware. Within EMI, L&B playes the role of a universal tool that receives event information from various components and uses it to compute the states of various processes (jobs, file transfers...) within the grid.
L&B-related documentation can be found in the EGEE Pproject's CVS. For convenience, documentation is also available for download throught the following links:
Test suite documentation:
Deployment scenarios
There are three distinguished deployment scenarios for the L&B service:
- Standalone L&B server – a dedicated L&B node performing all essential functionality of L&B over TCP/IP connections.
- L&B proxy – a minimalistic, high-speed interface to L&B used exclusively by, and typically collocated with, the WMS service; accessible over a local socket. L&B proxy cannot run alone, there needs to be a full-fledged L&B server running elsewhere, too.
- Both L&B server and proxy – referred to as mode both, L&B server and proxy run on a single node (as separate daemons), typically also collocated with WMS. This makes the L&B accessible both over TCP/IP as well as a local socket.
The
L&B client is a set of libraries and examples, and therefore it is rarely installed as a stand-alone package. It is typically installed as a runtime dependency for clients relying on calls to L&B Clinet API.
For functionality tests, L&B is typically operated in
mode both. The functionality tests address the performance of L&B Server and Client at the same time, mostly through calls to examples distributed with the client libraries.
Functionality tests
Features/Scenarios to be tested
Service Ping Test (implemented)
This test checks that all services required to provide logging and querying capability are configured and running in the environment.
Normal workflow – correct input
Check if all services are accessible and listening where expected.
Pass/Fail Criteria
Pass: All services are listening as expected
Fail: One or more services are not available
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Job Registration (implemented)
Test if jobs can be correctly registered.
Normal workflow – correct input
Register a job
Pass/Fail Criteria
Pass: Job has been registered and assigned a Job ID
Fail: Registration failed
Error workflow – erroneous input
Try registering a job with the same Job ID and the EDG_WLL_LOGLFLAG_EXCL flag set.
Pass/Fail Criteria
Pass: Registration of a recycled Job ID refused
Fail: Registration of a recycled Job ID allowed
Regression tests included
Event Delivery (implemented)
Test if events are being correctly delivered.
Normal workflow – correct input
Register a job and generate events. Test if events are delivered correctly and job state changes accordingly.
Pass/Fail Criteria
Pass: Events get delivered and job state changes as expected.
Fail: Events do not get delivered, job state does not change as per the state diagram.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
HTTPs Interface (implemented)
Getting lists and status information for jobs via the HTTPs interface.
_Note: While this test is not implemented yet, it is extremely easy to carry out manually._
Normal workflow – correct input
Register a job. Get a list of known jobs from the HTTPs interface (possibly check whether the listing contains the test job or, inded, any job). Get the status of the job using the HTTPs interface (using the jobID as an URL).
Pass/Fail Criteria
Pass: Job IDs were present in the listing and it was possible to download status information for the test job registered previously.
Fail: HTTPs interface was not accessible, test failed to download the list of jobs or
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Local Logger Detached (Standalone) Operation (implemented)
Test if local logger running in a detached (standalone) mode accepts events and stores them correctly for future delivery
Normal workflow – correct input
Run a Logger detached, register events, and test if they get accepted and stored properly.
Pass/Fail Criteria
Pass: Event logging works without blocking program execution, local logger caches events as expected.
Fail: Unable to log events or events get accepted but are not stored for future delivery
Error workflow – erroneous input
N/A
Pass/Fail Criteria
NA/
Interlogger Recovery (implemented)
Test if the Interlogger processes cached events correctly once it recovers from crash/loss of connectivity
Normal workflow – correct input
At startup, point the Interlogger at events cached by the local logging service. Test if the events get processed and delivered correctly.
Pass/Fail Criteria
Pass: Cached events get delivered correctly, job state changes accordingly.
Fail: Cached events have not been delivered.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Job State Computation (implemented)
Test if job states are being computed es expected.
Normal workflow – correct input
Register a job, then log a series of events. Watch the job state change accordingly as the events are being logged.
Pass/Fail Criteria
Pass: Job states change correctly, each event results in the appropriate state change.
Fail: Observed behavior contradicts the state diagram, events do not get delivered or fail to trigger state changes as expected.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Proxy-based Event Delivery (implemented)
Test if events are being correctly delivered via L&B proxy.
Normal workflow – correct input
Register a job and generate events. Test if events are delivered correctly through the L&B proxy, and that job state changes accordingly.
Pass/Fail Criteria
Pass: Events get delivered and job state changes as expected.
Fail: Events do not get delivered, job state does not change as per the state diagram.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Web Service Interface (implemented)
Query the server through the WS interface.
Normal workflow – correct input
Register a job and query the job status and job log through the WS interface.
Pass/Fail Criteria
Pass: The server returned information as expected (job in a submitted state, registration event present in the job log).
Fail: The server failed to provide the information or the information was incorrect.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Notification Delivery (implemented)
Test if L&B notifications are delivered correctly.
Normal workflow – correct input
Register a job, register a notification and start receiving notifications. Log events concerning that job and test if notifications are being delivered as expected.
Pass/Fail Criteria
Pass: Notifications have been delivered as expected.
Fail: Notifications have not been delivered.
Error workflow – erroneous input
Generate events not matching the notification's criteria.
Pass/Fail Criteria
Pass: No notifications have been generated for events not matching the criteria.
Fail: Notifications were generated even for events not matching the criteria
Regression Tests Included
Changing Notification Criteria (implemented)
Test if L&B notifications are delivered correctly after notification criteria changed.
Normal workflow – correct input
Register a job, register a notification, (check that the notification works,) change notification criteria (e.g. job ID), and start receiving notifications. Log events matching the criteria and test if notifications are being delivered as expected.
Pass/Fail Criteria
Pass: Notifications have been delivered as expected.
Fail: Notifications have not been delivered.
Error workflow – erroneous input
Generate events not matching the updated criteria. Check that notifications have not been sent out for such events.
Pass/Fail Criteria
Pass: No notifications have been generated for events not matching the criteria.
Fail: Notifications were generated even for events not matching the criteria
Delayed Notification Delivery (implemented)
Test if L&B notifications are cached correctly and delivered appropriately once the client starts.
Normal workflow – correct input
Register a job, register a notification. Log events concerning that job and test if notifications are delivered as expected once the client starts listening.
Pass/Fail Criteria
Pass: Notifications have been delivered as expected.
Fail: Notifications have not been delivered.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
L&B Server Purge (implemented)
Test if L&B Server purge works correctly. Also test if purged jobs are remembered (recognized as having once existed).
Normal workflow – correct input
Purge jobs from the server. Try a full purge as well as conditioned purges. Check the state of a purged job.
Pass/Fail Criteria
Pass: Server database has been purged properly on all occasions, querying a purged job returns a 'purged' state.
Fail: Failed to purge jobs that should have been purged, purged jobs that should not have been purged, or did not recognize the previous existence of a job.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
L&B vs. real-world WMS (implemented)
Normal workflow – correct input
- Submit a simple hello-world type job.
- Submit a simple job and cancel it.
- Submit a collection of simple jobs.
- Submit a collection and cancel it.
In all above cases: Watch the life cycle. Check the resulting state (
Cleared or
Cancelled). Check events received in the course of the job's execution; events from all relevant components must be present (NS, WM, JC, LM, and LRMS).
Pass/Fail Criteria
Pass: Jobs were submitted. Cancel operation worked where applicable. Resulting state was as expected (
Cleared or
Cancelled). Events were received from all components as expected.
Fail: Failure to achieve results outlined above (unable to submit a job, wrong state reached, events were not received from all components as expected).
Note: Events do not necessarily need to arrive in the correct order. Events recieved out of order do not constitute a failed test!
Error workflow – erroneous input
- Submit a simple job that is sure to fail.
- Submit a collection of jobs, one of which is sure to fail.
Pass/Fail Criteria
Pass: Jobs were submitted. Resulting state was as expected (
Aborted).
Fail: Correct state was not reached.
Testing BDII Response (implemented)
Check if an query to the
BDII returns correct answers.
Normal workflow – correct input
Query the ldap service on an L&B machine. Parse results to make sure the expected information is returned. In particular, make sure that service version is reported correctly.
Pass/Fail Criteria
Pass: Information returned, matching expectations.
Fail: No response to query, incomplete, or erroneous.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Testing ACL settings (implemented)
Test ACL modification.
Normal workflow – correct input
Register a job, register a ChangeACL event to modify the job's ACL. Check job status information to see that the event was interpreted correctly.
Note: This test does not attempt to actually test access control since using two identities within a single automated tool is rather challenging. Only the correct interpretation of the ChangeACL event is being tested.
Pass/Fail Criteria
Pass: The requested change has been applied to the ACL appropriately.
Fail: The event has not been delivered, or was not interpreted correctly.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Testing the SandBox Transfer State Machine (implemented)
This test makes sure that logging sandbox transfers works properly.
Normal workflow – correct input
- Register a compute job.
- Register input sandbox trasfer.
- Register output sandbox transfer.
- Generate events to trigger job state changes in one of the sandbox transfer jobs.
- Start the transfer and check that state has changed appropriately.
- Finish the transfer and check that state has changed appropriately.
- Check that the compute job and its sandbox transfer jobs link up correctly.
Pass/Fail Criteria
Pass: Sandbox Transfer jobs properly registered, states following the transfer procedure correctly, job IDs set correctly for all jobs.
Fail: Any failure (no registration, bad job type, failed state changes, job IDs of related jobs not known or reported)
Error workflow – erroneous input
Use another sandbox transfer job registered above to start, then fail the transfer and check that this is reflected by the resulting transfer job status.
Pass/Fail Criteria
Pass: Sandbox Transfer jobs properly registered, states following the transfer procedure correctly.
Fail: Any failure (no registration, bad job type, failed state changes)
Testing Statistic Functions (implemented)
Test if statistics provided by L&B to WMS work properly
Normal workflow – correct input
- Register a series of jobs.
- Generate events to push jobs to a given state.
- Run statistics function to calculate rate of jobs reaching that state.
- Query for average time needed by testo jobs to go from one state to another.
- Check if the statistics returned reasonable results.
Pass/Fail Criteria
Pass: Reasonable values were returned.
Fail: The test returned errors, negative values or other non-realistic output.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Testing Multi-Threaded Operation (implemented)
Test if L&B client works in multi-threaded environment, at least in essence.
Normal workflow – correct input
- Register a series of jobs.
- Run a client using multiple threads to query the server simultaneously.
- Check if all threads finished OK.
Pass/Fail Criteria
Pass: Test did not hang, all threads finished correctly.
Fail: The test hung, ended with a segmentation fault or a similar error. Correct L&B error messages, such as
connection refused due to server overload, are acceptable if produced occasionally.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Regression Tests Included
Test L&B Harvester (implemented)
Test if L&B harvester receives and stores notifications correctly.
Normal workflow – correct input
- Run the L&B harvester, configured to receive relevant information.
- Register jobs and generate events.
- Check that resulting notifications are properly received and information stored by the Harvester.
Pass/Fail Criteria
Pass: All expected information was received.
Fail: Any problem occurred.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Test delivery of notifications through MSG (implemented)
Test if notifications get properly delivered through messaging.
Normal workflow – correct input
Register a job, register a notification (with delivery to MSG) and start receiving messages. Log events concerning that job and test if messages are being delivered as expected.
Pass/Fail Criteria
Pass: Messages have been delivered as expected.
Fail: Messages have not been delivered.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
File permissions (implemented)
Check sensible permission settings for configuration and operation files.
Normal workflow – correct input
On an installed, configured and running L&B server: check sensible ownership and permission settings for selected files.
Pass/Fail Criteria
Pass: Settings match predetermined mask.
Fail: Ownership or permissions for any file did not match predetermined mask.
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Test L&B Nagios probe (implemented)
Test if the L&B nagios probe runs correctly
Normal workflow – correct input
- Run the probe from command line.
- Check if the probe returns correct text and exit value.
Pass/Fail Criteria
Pass: The probe performed as expected (returned OK for a running server or, if justified, another consistent information for a server in a different state).
Fail: Any problem occurred. Interpretting and reporting a malfunction correctly is not considered a problem on the nagios probe side.
Collection-specific tests (implemented)
This is a placeholder for collection-specific regression tests
Normal workflow – correct input
- Run assigned regression tests.
Pass/Fail Criteria
Pass: Tests finished as expected.
Fail: Any of the tests failed.
Regression Tests Included
Dumping and Loading Events (implemented)
This is a test for the backup (dump) and restore (load) procedure.
Normal workflow – correct input
- Register tests jobs of all applicable types, including collections and DAGs
- Generate events to change the state of all types of test jobs. At least one subjob in any type of collection must remain free of events to test embryonic registration
- Check all those jobs are in their expected states.
- Dump events for all test jobs
- Purge all test jobs from the L&B server
- Test that all test jobs were actually purged (status query must return
EIDRM
)
- Load dumped events
- Check that all test jobs are in their expected states as before
Pass/Fail Criteria
Pass: Tests finished as expected: all events were first purged and then restored to their previous states
Fail: Any of the tests failed.
Regression Tests Included
Test site admin access updates from GOCDB (not implemented)
Test if job information ACLs are properly updated from GOCDB
Normal workflow – correct input
With site-specific information registered in the GOCDB, check if it has been properly downloaded and assigned.
Pass/Fail Criteria
Pass: Access control was set properly
Fail: Access control setting failed
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Test virtual appliance support (not implemented)
Test event delivery and status computation for virtual appliances
Normal workflow – correct input
Initiate a virtual appliance and follow (simulate where applicable) events in its life time. Check that they have been correctly interpreted.
Pass/Fail Criteria
Pass: Event delivery worked and events were interpretted correctly. State changes were triggered as expected.
Fail: Any of the above conditions has not been met (events were not delivered or were misintrepreted)
Error workflow – erroneous input
N/A
Pass/Fail Criteria
N/A
Features not to be tested
Querying the L&B API
There is no test focusing specifically on querying the L&B server API; however, such tests are inherently included in scenarios defined above as all of them need to test their outcome by generating input first and querying for results thereafter.
Tests
There is no scenario to test if the above tests test the behavior of the service correctly. Ensuring that is left completely to the authors and users of the tests. This philosophy is also applied to testing the Nagios probe, which is also a test.
Performance Tests
L&B components support performance testing options. There is a separate document dealing with
L&B Performace/Stress Testing
.
Scalability Tests
In the context of L&B scalability translates directly to job throughput covered by performance tests.
--
ZdenekSustr - 02-Feb-2011