Step2: Capturing the regulatory sequence and/or TF basic information
Once the project
element has been defined (see Step 1), you are ready to enter sequence
and transcription factor information. These will be entered within the
'data' element, which is a child element within the 'pazar' element.
2.0-
Initialization
The 'data' element
stores all the annotations separately. They will be linked together
later in the 'analysis' element (see
Step 3).
First the 'data' element has to initialized:
<data>
Then, different type of annotations can be inserted:
The
'reg_seq' is embedded within 'tsr'
and 'gene_source' elements.
The 'gene_source' element informs about the gene accession number.
The 'tsr' element describes the transcription start region based on the
observation that transcription does not always start at exactly the same
nucleotide (however, a unique start site can be described by inserting
the same value in fuzzy_start and fuzzy_end).
Thus, if a gene has 2 alternative promoters, each of which can be
described with a different 'tsr' element within the 'gene_source'
element, different regulatory sequences can be associated with
each 'tsr'.
Replace the red values with your own information.
The pazar IDs are internal IDs that will not be stored. They can be
anything as long as they are unique throughout the file.
The
'reg_seq' element can also be embedded in a 'marker' element if the
gene regulated by the sequence is not defined yet. The marker can be a
gene but then it is just used for location purpose and not to infer any
role for the sequence on this gene.
Replace the red values with your own
information.
The pazar IDs are internal IDs that will not be stored. They can be
anything as long as they are unique throughout the file.
A
transcription factor is described in
multiple steps. First, at the gene level: The
'tf' element is embedded in both 'transcript' and 'gene_source'
elements. Multiple 'transcript' elements can be used to describe
multiple isoforms of a gene. Then, at the protein level: The 'funct_tf' element
captures the
functional protein information with as many
'tf_unit' elements as there are proteins in the complex (1
for monomers, 2 for dimers,...). The tf_id calls a pazar_id from a 'tf'
element.
Replace the red values
with your own information.
The pazar IDs are internal IDs that will not be stored. They can be
anything as long as they are unique throughout the file.
The
'construct' element can be used to describe any sequence without
specific genomic coordinates (e.g. a synthesized oligonucleotide
representing a consensus binding site).
<construct construct_name="FN-13A"
description="random
oligo" sequence="gggtgagtcagcg"
pazar_id="co_0001"/> <Replace
the red values
with your own information.
The pazar IDs are internal IDs that will not be stored. They can be
anything as long as they are unique throughout the file.