PAZAR Documentation
Data formats » PAZAR GFF format

PAZAR GFF Format

Overview

The PAZAR GFF format (What is GFF?) is intended to capture simple annotations. It is not meant to record a detailed annotation. Please use the XML format if you want more options. One record is on a unique line and holds one annotation for one sequence, this annotation coming either from an interaction or an expression experiment (not both in the same record).

The record is stored as an interaction if db_tfinfo is provided. If you want to record an interaction but the factor is unknown, state:

db_tfinfo="unknown"

If db_tfinfo is not provided, the record will be stored as an expression. An interaction will be stored as "good" and an expression as "induced". If you want to record a specific expression level, use the expression field. If a mutant sequence is reported it is assumed that this mutant has an impaired activity compared to the annotation of the original sequence (interaction = "none" or expression = "no change").

Structure of format

Fields are: <seqname>  <source>  <feature>  <start>  <end>  <score>  <strand>  <frame>  [attributes]

Those 9 fields are tab-delimited. The attribute field must have a tag value structure with the following syntax, flattened onto one line by semicolon separators:

tag1="value1";tag2="value2"

Mandatory fields (except projects containing artificial sequences)

  • seqname - the name of the sequence—must be a chromosome
  • source - the project to store this feature in
  • feature - the name of this feature or enter a single period (".")
  • start - the starting position of the feature in the sequence (the first base is numbered 1)
  • end - the ending position of the feature (inclusive)
  • score - no need here—if there is no score value, enter a single period (".")
  • strand - valid entries include a plus sign ("+"), a minus sign ("-"), or—for those who don't know or don't care—a period (".")
  • frame - no need here—the value should be a single period (".")

Mandatory attributes (except projects containing artificial sequences)

  • sequence

    • example —
      sequence="ATTTGTAGGAGTGAGTCAGCTGACCCGC";
  • db_seqinfo

    • format — database:assembly
    • example —
      db_seqinfo="EnsEMBL:NCBI 35";
  • species

    • example —
      species="Homo sapiens";
  • db_geneinfo

    • format — database:accession:name
    • example —
      db_geneinfo="EnsEMBL_gene:ENSG00000133256:PDE6B";
    • note — the database can be either "EnsEMBL_gene", "Entrez_gene", "RefSeq", or "SwissProt". Last part (gene name) optional.
* Note: for projects containing artificial sequences, some or all mandatory fields may not have values. In addition, species and db_geneinfo attributes may not be present.

Optional attributes

  • band

    • example —
      band="16.3";
  • db_transcriptinfo

    • format — database:accession:name
    • example —
      db_transcriptinfo="EnsEMBL_transcript:ENST00000255622";
    • note — the database can be either "EnsEMBL_transcript", "RefSeq", or "SwissProt". Last part (isoform name) is optional.
  • transcript_start

    • example —
      transcript_start="609373";
  • analysis_name

    • example —
      analysis_name="gff_example1";
    • note — please ensure that you provide the same analysis name to all records belonging to the same experiment.
  • analysis_comment

    • example —
      analysis_comment="some comment on the experiment";
    • note — please ensure that you provide the same analysis comment to all records belonging to the same experiment.
  • db_tfinfo

    • format — database:accession:name
    • example —
      db_tfinfo="EnsEMBL_transcript:ENST00000250471:NRL";
    • note — the database can be either "EnsEMBL_transcript", "RefSeq", or "SwissProt". The record is stored as an interaction if db_tfinfo is provided. If you want to record an interaction but the factor is unknown, state db_tfinfo="unknown". If db_tfinfo is not provided, the record will be stored as an expression.
  • method

    • example —
      method="SELEX";
  • evidence

    • example —
      evidence="curated";
    • note — the evidence should be either "curated" or "prediction".
  • pmid

    • example —
      pmid="15264535";
  • cell_type

    • format — cell:species
    • example —
      cell_type="HepG2:Homo sapiens";
  • expression

    • format — level:scale
    • example —
      expression="56:percent";
    • note — use this field if you want to record a specific expression level. If not used, an expression experiment is stored as "induced".
  • impaired_mutant

    • format — sequence
    • example —
      impaired_mutant="gactactgatgGtaacNagtcga";
    • note — the format of the sequence should be lowercase where the original sequence remains and uppercase for the mutated nucleotides. If the mutation is a deletion use "N" where the original nucleotides were.

Examples

Interaction example

chr7 oreganno_example OREG0000056 99037559 99037584 . - . sequence="gcatcaagaacatgtggttctaatgg"; db_seqinfo="EnsEMBL:NCBI 35"; species="Homo sapiens"; db_geneinfo="EntrezGene:1576:CYP3A4"; band="q22.1"; analysis_name="gff_example1"; db_tfinfo="EnsEMBL Transcript:ENST00000289790:USF1"; method="EMSA_analysis"; evidence="curated"; pmid="14742674"; impaired_mutant="gcatcaagaacatTAggttctaatgg"

Expression example

chr7 oreganno_example OREG0000056 99037559 99037584 . - . sequence="gcatcaagaacatgtggttctaatgg"; db_seqinfo="EnsEMBL:NCBI 35"; species="Homo sapiens"; db_geneinfo="EntrezGene:1576:CYP3A4"; band="q22.1"; analysis_name="gff_example2"; cell_type="HepG2:Homo sapiens"; method="gene reporter assay"; evidence="curated"; pmid="14742674"; impaired_mutant="gcatcaagaacatTAggttctaatgg"