PAZAR Documentation

Project outline

Algorithms and software for the analysis of transcription regulating sequences have proliferated. Methods based on phylogenetic footprinting, combinatorial interactions between transcription factors and genomics data appear regularly. Unfortunately much of the research in the field continues to re-analyze the same data. In order to advance this active field, it is critical to gain unrestricted access to a broader range of reference data. Such collections would be useful both for the measurement of software performance and, most importantly, as fertile ground for investigation. Small collections appear periodically in independent databases, often as by-products of algorithm development projects. For instance, the JASPAR database of binding profiles emerged in this manner.

PAZAR is a software framework for the construction and maintenance of regulatory sequence data annotations; a framework which allows multiple boutique databases to function independently within a larger system (or information mall). Our goal is to be the public repository for regulatory data.

Our principles

  1. To be open-access and open-source, providing a completely transparent development and data compilation. In this regard, the PAZAR project is now hosted by SourceForge, where anyone can go to browse our CVS repository. A mailing list News and Views is also available in which every major development will be posted.
  2. To function as a boutique system where curators own their data and can release it according to their own will.
  3. To be simple to use either in the curation process or the query of the database. For this purpose we are currently developing an advanced API to insulate the user from the underlying data model and to provide simple methods for the user to deposit to or query from the database.

System architecture

The PAZAR system is currently developed as a mySQL database featuring a complex schema which allows for a high level of flexibility regarding the type of information that can be captured. The database dictionary and an explanation of the IO system can help you find out some of the database constraints and internal structure.

To ease the insertion of data into the database, we are developing two curation interfaces, one allowing the curator to capture higher levels of details than the other. We have also designed an XML exchange format that can be used to format already existing datasets.

As an open system, each boutique operator within PAZAR is welcome to participate in further API development and to create and maintain their own annotation interfaces. Two forums are also available so that everyone can ask for help (at the PAZAR SourceForge help forum) or make any comment or suggestion (at the PAZAR SourceForge open discussion forum).

License

The PAZAR code is available under the GNU Lesser General Public License (LGPL). PAZAR data in "public" or "open" data collections are available under the GNU LGPL, while data in "private" collections are property of the curators of those collections and permission must be explicitly provided. Only "public" and "open" collections can be accessed by anonymous users.

Publications and resources

Please use the citation information below when referring to PAZAR in publication.

Portales-Casamar E, Arenillas D, Lim J, Swanson MI, Jiang S, McCallum A, Kirov S, Wasserman WW. The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences. Nucleic Acids Res. 37(Database issue):D54-60. (2009)
Portales-Casamar E, Kirov S, Lim J, Lithwick S, Swanson MI, Ticoll A, Snoddy J, Wasserman WW. PAZAR: a Framework for Collection and Dissemination of Cis-regulatory Sequence Annotation. Genome Biology (8)R207. (2007)

Download our poster