PAZAR - Project Outline
Introduction:
Algorithms
and software for the analysis of transcription regulating sequences
have proliferated. Methods based on phylogenetic footprinting,
combinatorial interactions between transcription factors and genomics
data appear regularly. Unfortunately much of the research in the field
continues to re-analyze the same data. In order to advance this active
field, it is critical to gain unrestricted access to a broader range of
reference data. Such collections would be useful both for the
measurement of software performance and, most importantly, as fertile
ground for investigation. Small collections appear periodically in
independent databases, often as by-products of algorithm development
projects. For instance, the JASPAR database of binding profiles emerged
in this manner.
PAZAR is a software
framework for the construction and maintenance of regulatory sequence
data annotations; a framework which allows multiple boutique databases
to function independently within a larger system (or information mall).
Our goal is to be the public repository for regulatory data.
Download PAZAR's poster
PAZAR's principles:
(1)
to be OPEN-ACCESS and OPEN-SOURCE, providing a completely transparent
development and data compilation. In this regard, the PAZAR project is
now hosted by sourceforge.net, where everyone can go and browse the CVS
repository. A mailing list 'News and Views' is also available in which
every major development will be posted.
(2) to function as a boutique system where curators own their data and can release it according to their own will.
(3)
to be simple to use either in the curation process or the query of the
database. For this purpose we are currently developing an advanced API
to insulate the user from the underlying data model and to provide
simple methods for the user to deposit to or query from the database.
Overview:
The
PAZAR system is currently developed as a mySQL database featuring a
complex schema which allows for a high level of flexibility regarding
the type of information that can be captured. The database dictionary
and an explanation of the input/output
system can help you find out
some of the database constraints and internal structure.
To
ease the insertion of data into the database, we are developing two
curation interfaces, one allowing the curator to capture higher levels
of details than the other. We have also designed an XML exchange format that can be used to format already existing datasets.
As
an OPEN SYSTEM, each boutique operator within PAZAR is welcome to
participate in further API development and to create and maintain their
own annotation interfaces. Two forums are also available so that
everyone can ask for help ('Help' forum) or make any comment or
suggestion ('Open Discussion' forum).
License:
The PAZAR code is available under the GNU Lesser General Public License (LGPL). The PAZAR data in "public" or "open" data collections are available under the GNU LGPL, while data in "private" collections are property of the curators of those collections and permission must be explicitly provided. Only "public" and "open" collections can be accessed by anonymous users.
Publication:
Please use the citation information below when referring to PAZAR in publication.
Portales-Casamar E, Kirov S, Lim J, Lithwick S, Swanson MI, Ticoll A, Snoddy J, Wasserman WW. PAZAR: a Framework for Collection and Dissemination of Cis-regulatory Sequence Annotation. Genome Biology 2007, 8, R207.
|