wiki:PipeOnline/Requirements
Last modified 10 years ago Last modified on 10/09/08 08:55:15

PipeOnline High Level Requirements

Overview

The PipeOnline project that will provide a distributed system for running DAISY Pipeline jobs.

The PipeOnline application shall be usable in a LAN and WAN context.

The application should be packaged so that service deployment is relatively simple from a system administration point of view.

Technical Requirements

Server contexts

The application shall be deployable on Windows, Unix/Linux? and MacOSX servers.

While its not necessarily required by this specification, the application will be based on serverside Java technologies, and will require the installation of a JVM and a servlet container (such as Tomcat or Jetty).

The application will use a persistence mechanism such as relational database.

Functional Requirements

Job Execution types

The application will provide several ways to run a Pipeline Job:

queue based
Queues are used to provide a means for distributing system resources among multiple usages. For example, in a setup where several universities are using the same deploymed application, each university would have its own queue. Jobs in the same queue are executed sequentially, and queues are run in parallell. Each deployed application has 1-n queues activated. The numbers of queues is definable by the administrator. This is the default method, since it allows parallell execution (given by the number of active queues) and processor load distribution.
instant
Jobs are executed instantly without queuing. This is typically used for non-resource intensive job types (such as XML validation). This provides for scalability in a multi-user environment.
scheduled
Added Jobs are executed at a time given by the user. The provides for load distribution over time. This feature is not implemented for the initial deployment, but is a future extension.
automatically from sniffing
Jobs are automatically instantiated and added to a queue/executed when a certain precondition is met (such as an input file appearing in a predefined location). This feature is not implemented for the initial deployment, but is a future extension.

Job Execution Flow

  • Unless anonymous, the user logs in to the system
  • If the deployment is set up to expose several job types (= Pipeline scripts), the user chooses which Job type to create.
  • The user sets those Job parameters that are set to be visible, and executes the job. The user also has the option to provide an additional notification email address to the one assocatied with the current account.
  • The set parameters are validated by the system before job execution. If an obvious error is detected, the system provides feedback for correcting it.
  • A page is displayed that explains to the user that an email notification will occur when the job is done.
  • A Power User has access to a Queue view where progress and queue status can be reviewed in detail.
  • The notification email contains a link to the download. Accessing the download requires login. The login has to be the same as used when creating the job. (A future feature is to have a dedicated desktop download client that will upload the result to the users system, and delete the server-side copy)
  • If job execution fails for any reason, the system must know if the failure was caused by user input, or by a system failure. The user is informed of the error in the notification email. In the latter case, the system administrator is notified as well.

Account types

The application will allow to define several account types. An account types defines the set of application features that the user has access to.

The system should allow for rights configuration per account type and user. In the initial deployment, only account type rights configuration will be implemented.

For the initial deployment these four types will be used and recognized.

Basic User
Rights to create a new job, run it.
Power User
Inherits the rights of the Basic User account type, but adds the right to reorder jobs in queues, and to select which queue to add a job to.
Anonymous
A deployed application may allow for anonymous access. Anonymous users do not log in. For this type of user, and when conversion services are used, the result page typically contains a link to download the result. The option to have anonymous users provide email addresses for result notification is also supported.
Administrator
Manage Password and email addresses. Manage script visibility per user group.

System Persistence

The following objects are persisted:

  • queues (Executed jobs may remain in the queue so that they can be re-executed. An administrator can set the maximum amount of executed jobs to keep per queue)
  • user accounts
  • statistics: for each job run:

Global statistics:

  • user that created the job
  • what script was run, with what parameters
  • execution time
  • play time
  • Pipeline message log
  • System error log

Script-specific statistics:

  • A certain script (defined by script name) can have an object associated with it that extracts content-type-specific data from the result. This data is persisted as key-value pairs. For example, this feature could be used to extract time duration and ISBN from a generated DTB.

Regarding play time: Each job's playtime must be accumulated per user so that statistics regarding payment of royalties(in Norwegian: vederlag) to LINO are present in the system.

Regarding logs: the administrator can set which global statistics to persist, and the granularity of Pipeline messages to persist. The administrator can set the maximum amount of time (in days) that a result will remain on disk.