PAZAR Documentation

Help

Content

What is PAZAR?

PAZAR is a software framework for the construction and maintenance of regulatory sequence data annotations which allows multiple boutique databases to function independently within a larger system (or information mall). For more information, see the Overview section.

Tutorials

Definitions

  • Transcription factor complex — In PAZAR, all trancription factors (TFs) are defined as complexes, each complex being comprised of one or more individual proteins. This allows users to define different binding specificities for the same TF protein, depending on whether it acts as a monomer (one protein), a homodimer (two identical proteins bound together), a heterodimer (two different proteins bound together), etc. Thus, when submitting a new TF in PAZAR, the annotator is first asked to give a name to the complex (the name should reflect all proteins present in the complex). Then each protein included in the complex (also called subunit) needs to be defined one after the other by providing at least its gene identifier, then clicking on "Add more TFs to this complex". If the complex is comprised of only one TF or subunit, then no more TFs need to be added to the complex.
  • Position frequency matrix and sequence logo generation — Based on an alignment of all known sites, the total number of observations of each nucleotide is recorded for each position, producing a Position Frequency Matrix (PFM). The sequence logo scales each nucleotide by the total bits of information multiplied by the relative occurence of the nucleotide at the position. Sequence logos enable fast and intuitive visual assessment of pattern characterics.
    In PAZAR, PFMs and Logos are produced by using the probabilistic motif discovery algorithm MEME (see reference below).
    Timothy L. Bailey and Charles Elkan. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.

Search interface

1. Overview

Four general query types can be conducted within PAZAR. Users can search PAZAR by TF, Target gene, Regulatoy Sequence, or pre-computed TF profile simply by clicking on the corresponding tab. While each of these queries will consider all of the public data within PAZAR, queries of specific boutique datasets can also be performed by clicking on their corresponding names listed when clicking on the 'Browse projects' link.

In addition, a PAZAR Mall graphic user interface can also be used by clicking on 'View interactive mall map'. Boutique datasets within PAZAR are represented by stores within the mall. The mall has six separate floors that are accessible via the escalator. Boutique datasets are listed in the mall directory found at the bottom of the page.

View the mall overview and introduction tutorial (2 minutes)

2. Search by Transcription Factor

2.1. Introduction

To search PAZAR by TF, click on the 'Transcription factors (TFs)' tab. Users can search for a specific TF within all of PAZAR based upon several TF-specific identifiers.

View the TF search tutorial (3:02 minutes)

2.2. TF identifiers

  • User-defined TF name

    • i.e. NF1
    • TF name as defined by the user. We will be using soon a controlled vocabulary to replace this free text. The results will display all entries containing the provided subset of characters.
  • Ensembl gene ID

    • i.e. ENSG00000162599
    • Ensembl stable gene ID. This ID will be converted to the corresponding Ensembl transcript IDs first.
  • Ensembl transcript ID

    • i.e. ENST00000294608
    • Ensembl stable transcript ID. This is the reference ID for TFs in PAZAR, thus the one to use preferentially.
  • Entrez Gene ID

    • i.e. 4774
    • NCBI Entrez Gene ID. This ID will be converted to an Ensembl gene ID first.
  • Refseq ID

    • i.e. NM_005595
    • Refseq DNA ID. Do not include the version at the end (NM_005595.1). This ID will be converted to an Ensembl gene ID first.
  • Swissprot ID

    • i.e. Q12857
    • UniProtKB/Swiss-Prot ID. This ID will be converted to an Ensembl gene ID first.
  • PAZAR TF ID

    • i.e. TF0000231
    • PAZAR TF IDs are unique to a project. Therefore, the same TF (same Ensembl Gene ID) will have different PAZAR TF IDs if annotated in different projects. Use Ensembl Gene IDs to get all data about a TF across projects.

2.3. TF view

At the top of the TF View is a summary table of all of the TFs obtained from the search. By clicking on the magnifying glass next to the PAZAR TF ID, users will be taken directly to the specific data for their TF of interest. Within this section, users find the TF-specific information followed by a list of all of the PAZAR regulatory sequences that are bound by that TF. Users can visualize the genomic context of each regulatory sequence by clicking on the links to the UCSC Genome Browser and Ensembl found at the far right of the page. Also, by clicking on a regulatory sequence ID or a gene ID, users can access the PAZAR Sequence View or Gene View respectively. In addition, a position frequency scoring matrix and transcription factor binding profile are generated dynamically using the MEME software for each transcription factor. Users can construct a custom scoring matrix and binding profile based upon a subset of the sequences for that TF by clicking in the check boxes of those sequences meant to be included and clicking 'Generate PFM with selected sequences'. Alternatively, users can generate scoring matrices and binding profiles based upon just genomic or artificial sequences by clicking on 'Select genomic sequences' or 'Select artificial sequences' respectively. As well, users can generate a custom scoring matrix and binding profile based upon selected sequences from any of the transcription factors displayed on the page by clicking 'Generate PFM' at the very bottom of the page.

3. Search by gene

3.1. Introduction

In order to search PAZAR by gene, click on the 'Target genes' tab. Users can search for a specific gene within all of PAZAR based upon several gene-specific identifiers.

view the gene search tutorial (2:27 minutes)

3.2. Gene identifiers

  • User-defined gene name

    • i.e. GFAP
    • Gene symbol as defined by the user. We do not automatically use official symbols as they vary across species. The results will display all entries containing the provided subset of characters.
  • Ensembl gene ID

    • i.e. ENSG00000131095
    • the Ensembl stable gene ID. This is the reference ID in PAZAR, thus the one to use preferentially.
  • Ensembl transcript ID

    • i.e. ENST00000253408
    • Ensembl stable transcript ID. This ID will be converted to an Ensembl gene ID first.
  • Entrez Gene ID

    • i.e. 2670
    • NCBI Entrez Gene ID. This ID will be converted to an Ensembl gene ID first.
  • Refseq ID

    • i.e. NM_002055
    • Refseq DNA ID. Do not include the version at the end (NM_002055.2). This ID will be converted to an Ensembl gene ID.
  • Swissprot ID

    • i.e. Q9UFD0
    • UniProtKB/Swiss-Prot ID. This ID will be converted to an Ensembl gene ID first.
  • PAZAR gene ID

    • i.e. GS0000217
    • PAZAR Gene IDs are unique to a project. Therefore, the same gene (same Ensembl Gene ID) will have different PAZAR Gene IDs if annotated in different projects. Use Ensembl Gene IDs to get all data about a gene across projects.

3.3. Gene View

The Gene View is color-coded blue. At the top of the Gene View page is a summary table of all of the genes obtained from the search. By clicking on the magnifying glass next to the PAZAR gene ID, users will be taken directly to the specific data for their gene of interest. Within this section, users find the gene-specific information followed by a list of all of the PAZAR regulatory sequences that correspond to that gene. Users can visualize the genomic context of each regulatory sequence by clicking on the links to the UCSC Genome Browser and Ensembl found at the far right of the page. Also, by clicking on the regulatory sequence ID for a specific regulatory sequence, found in the far left column, users can access the PAZAR Sequence view for that sequence.

3.4. Sequence View

The Sequence View is color-coded orange. The sequence and gene information are located at the top of the page followed by tables summarizing the supporting experimental data for this regulatory sequence. Clicking on the Analysis ID found in the leftmost column of this table takes users to the PAZAR Analysis View.

3.5. Analysis View

The Analysis View is color-coded green. Within this view is a more in-depth description of the supporting experimental data.

4. Search by sequence

4.1. Introduction

In order to search PAZAR by sequence, click on the 'Regulatory sequences' tab. Users can enter a sequence string (containing only A, C, G or T. Alternatively, they can provide a PAZAR sequence ID. Providing a PAZAR Sequence ID will directly open the Sequence View for this particular sequence.

4.2. Sequence identifier

  • PAZAR sequence ID

    • i.e. RS0000226
    • Providing a PAZAR Sequence ID will directly open the Sequence View for this particular sequence.

5. Search by pre-computed transcription factor binding profile

To search PAZAR by pre-computed transcription factor binding profile, click on the 'pre-computed TF profiles' tab. Users can retrieve TF binding profiles sorted by their associated project, name, or species by clicking on the corresponding buttons. On the PAZAR TF Binding Profile view, users are provided with a summary table with specific data for each transcription factor. Clicking 'More', found at the right hand side of the screen causes a secondary window to appear with even more detailed information regarding that specific transcription factor.

View the TF profile search tutorial (0:54 minutes)

6. Search within a specific boutique project

One might desire to limit queries to a single collection. To do so, the user must click on 'Browse projects' and click on the corresponding boutique. The 'Project View' provides a brief description of the dataset as well as some statistics on the data it contains. Below, the user can choose amongst various filters to search through the data and display it in the 'Gene View', where regulatory sequences will be grouped by the genes they regulate, or in the 'TF View', where the sequences are grouped by the TFs that bind to them.

View the boutique search tutorial (1:02 minutes)

PAZAR submission interface

1. Introduction

To enter data into PAZAR please follow those steps:

  • Register at the register page.
  • Click on my projects to see all the projects you belong to and to create new ones.
  • Click on submit to enter new data. For more detailed questions on the submission interface, see the FAQ topics section below.
  • If one has a pre-existing dataset, an automated data import can be realized upon contacting the PAZAR development team.

2. Submission interface screenshots

3. Frequently asked questions

Sequence Retrieval

  • Q: if two transcripts varying by only 1 or 2 bases could potentially be used for a given PAZAR record, does it matter which is chosen? A: Either can be used for the PAZAR record with the inclusion of a comment if necessary.
  • Q: how can restriction fragment data be used to isolate a DNA sequence referenced in a paper? A: download a portion of the genomic sequence which encompasses the restriction sites described in the paper. Then, conduct a restriction fragment analysis of the sequence, and see if the restriction map matches the description given in the paper.

Sequence entry

  • Q: if the same sequence is found in 2 or more species, should each be given a separate entry in PAZAR? A: yes, Create a separate record for each species.
  • Q: if an identical genomic sequence is used in "identical" transient transfection expression assays in 2 or more papers, can mutants of that sequence from both papers be submitted to PAZAR as a part of a single experimental assay? A: definitely not. One cannot compare and combine experimental data from separate papers in a single assay. Even if the design of the experiment is the same between papers, it was performed using different cells, reporters, conditions, etc. As a result, there is no way that the expression level of mutants from separate papers can be compared to that of the wild-type sequence in a single experimental assay. Instead, create a separate experimental assay for each of the papers, associated with the shared wild-type sequence.
  • Q: how can main page data be saved for a genomic sequence with no experimental evidence? A: by clicking the "Done" button at the bottom of the main page, all data entered will be saved to PAZAR. This would otherwise occur automatically upon opening an "Experimental Evidence" window.
  • Q: what nomenclature should be used when entering TFs from different species? A: enter the TF name as follows: Species_TFName (ie. Mouse_Phox2a, Human_Phox2a)
  • Q: should elements within the 3' UTR of a gene be entered into PAZAR? A: definitely. We are interested in any regulatory elements whether they are upstream or downstream of a gene.
  • Q: what could be the problem if PAZAR does not permit a certain sequence name to be used? A: there are certain characters that are not recognized by PAZAR, such as the single quote ('). By selecting a name without such characters, problems will be averted.
  • Q: how are complexes named within PAZAR? A: enter complex names as follows: Species_Protein1/Protein 2/etc.(ie. HUMAN_RXR/RAR). If a complex is given a specific name other than the simple combination of its components, use Species_specificcomplexname.
  • Q: can insertion mutations be documented in PAZAR? A: not yet. This is a feature that will be incorporated into the PAZAR submission interface in the near future.

Experimental nomenclature

  • Q: what should be used as the point of reference when describing the expression level of sequence mutants? A: changes in expression associated with sequence mutants should be expressed relative to the expression of the wild-type sequence.
  • Q: can drug treatments be documented in PAZAR? A: for expression assays in which a wild-type sequence is tested for levels of expression in the presence or absence of a chemical compound, transcription factor, etc. the drug should be included in the record as a perturbation. In contrast, for all DNA-binding assays drug treatments or transcription factor co-expression should be described in the comment field.
  • Q: what is signified by the presence of "NA" in the effects column of the gene summary? A: the presence of "NA" in the effects column of the gene summary suggests that the qualitative effect of the experimental evidence was not defined in the supporting publication. This option is often used when submitting transgenic mouse data to PAZAR. In such a case, the primary outcome examined is whether a given construct has been able to reconstitute wild-type patterns of expression. Nothing however can be said regarding the levels of expression present in the mice, making it necessary to use "NA".
  • Q: if there are multiple cell lines/cell types used for the same experiment, which should be submitted to PAZAR? A: if the results are the same for each cell line, only the most relevant cell line (i.e. neuronal cell lines) should be explicitly selected for the PAZAR submission. Any other cell lines that are deemed relevant can be included in the comments section. If results differ between cell lines, separate experimental assays should be submitted to PAZAR for each cell line associated with informative data.
  • Q: what are potential choices for the "Sample Type" field on the nuclear extract page? A: currently choices for sample type include nuclear extract, cellular extract, or even whole cell extract if applicable.
  • Q: Given the situation in which there is a supershift experiment performed using a nuclear extract should the factor to which the antibody binds be recorded as a "TF/complex" or as an "Interaction with Unknown Factor (ie. nuclear extract)"? A: this type of an experiment does not prove that a TF is interacting directly with a cis-regulatory element (CRE). It could be interacting with the CRE via any other protein from the nuclear extract. However, in the interest of linking the CRE to this TF within PAZAR, consider it to be a "TF/complex binding to this CRE". However, make sure to also mention that the protein was from a nuclear extract in the comments section of the record.
  • Q: what should be entered for a transcription factor name if in a Supershift assay, a paper states that an antibody recognized a protein family, and not just a single protein? A: enter the most common member of the protein family as the transcription factor, but include in the comments that the antibody was not specific to that protein but instead recognized the protein family in general.
  • Q: how should an experiment with a perturbation (TF or chemical, etc) be submitted to PAZAR if there are no results provided for the experiment in the absence of the perturbation? A: on the main experimental assay page, select "NA" for the wild-type expression level in the absence of perturbation. Then, enter the perturbation with its associated level of expression. Add mutants in a similar fashion.
  • Q: what should be the point of reference used for describing the level of expression associated with a mutant subject to a perturbation? A: describe the expression level of the mutant with perturbation relative to the expression level of the wild-type with perturbation.
  • Q: how do we qualitatively interpret the interaction level for gel shift competition experiments? A: a probe successfully able to eliminate a band shift involving wild-type probe is considered to be a good interactor. If the probe (wild-type or mutant) is not able to compete away the initial interaction, it is considered to be a poor interactor.
  • Q: when entering a mutation that leads to a complete elimination of binding, what should be indicated for the "Effect of this mutation on the interaction"? A: in this situation "None" should be chosen for the level of interaction. Do note that in this context, "None" means no binding, not "no effect on binding".
  • Q: in the annotation of a transcription factor that regulates a Pleiades Promoter Project gene, should experiments demonstrating a role in transcriptional regulation (ie. Coexpression of the TF leads to transactivation) be submitted to PAZAR? A: this data should definitely be included as a perturbation in a PAZAR submission. Even though this information cannot be viewed currently from the TF summary page, it is important to have this supporting evidence. The summary view will be modified in the future in order to include this type of information. Even if coexpression of a TF leads to repression of gene expression, the data should be submitted to PAZAR.
  • Q: wow should a "supershift" experiment in which incubation with antibody leads to the disappearance of a band be entered into PAZAR (i.e. interfering with binding instead of retarding mobility)? A: put the method in as a supershift, but in the comments also mention that the band did not shift to lower mobility but instead disappeared.
  • Q: how should one verify whether a transcription factor (TF) is already present in PAZAR prior to submission? A: prior to submitting a new TF to PAZAR, conduct a search within PAZAR using the ENSEMBL ID for that TF. If the TF is present, it will be linked to that ENSEMBL ID and will be retrieved. Also, by convention use the HUGO gene name for all human TFs, or for other species, the Entrez Gene ID.

Submission interface

  • Q: the same line of evidence appears twice for a given sequence submitted to PAZAR? What could have caused this problem? A: this is what results from clicking "submit" twice on the evidence submission page.
  • Q: where can mutation information submitted within the "interaction evidence with unknown factor" page? A: once the "interaction evidence with unknown factor" page is filled out and the submit button pressed, an option to add mutants is provided.
  • Q: how should a TF complex including a TF already present in PAZAR be submitted? A: The information for this TF should be newly entered, even though it is already present in PAZAR. This is due to the fact that complex records exist independently of single TF records within PAZAR.