<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<font face="Helvetica, Arial, sans-serif">I have just been re-reading

the 2007 Brugia genome paper in Science.&nbsp; I'm now a bit surprised that

the JIGSAW gene set generated by Darin and the nGASP group scored

better in Gary's anomalies than TIGR set based on the way the paper

says the genes were called.&nbsp; From the supplemental Methods section;<br>

<br>

2. Gene Finding<br>

The gene calling programs Augustus (10), FGENESH (11), GlimmerHMM (12)

and SNAP (13) were<br>

used to predict protein coding sequences. The final gene models were

then picked by JIGSAW (14).<br>

5<br>

FGENESH was trained with the assembled WGS genomic sequences and ESTs

of B. malayi. All other<br>

ab initio gene finding programs and JIGSAW were trained by a set of 497

B. malayi sample genes<br>

which were curated manually. We used cloned B. malayi genes available

in GenBank that had both<br>

genomic and cDNA sequences. We also used a set of manually curated gene

models. The trained<br>

Augustus, FGENESH, GlimmerHMM and SNAP gene finders yielded the ab

initio gene calls, and the<br>

pre-trained JIGSAW picked the final gene models by incorporating:<br>

(1).the output of a BLASTX search against the NCBI nonredundant protein

database,<br>

(2) the B. malayi EST data aligned by PASA (15),<br>

(3) B. malayi tgi (TIGR gene index) data aligned by BLASTN,<br>

(4) cDNA data of other nematode genomes (complete predicted

transcriptomes of C. elegans and C.<br>

briggsae) aligned by TBLASTX<br>

(5) EST data from closely related filarial nematodes (D. immitis, O.

volvulus and Wuchereria bancrofti)<br>

aligned by TBLASTX and<br>

(6) the gene models predicted by the gene finding programs.<br>

Some manual annotation was conducted to fix the gene splits, gene

fusion and other prediction errors<br>

based on the cDNA, EST or homolog evidence.<br>

The output of the gene-finding programs was assessed prior to

performing the final B. malayi genome<br>

gene prediction. The gene sensitivity and exon sensitivity of JIGSAW

were 54.1% and 88.7%,<br>

respectively. The gene sensitivities and exon sensitivities of the four

gene finding programs ranged<br>

from 20% to 33% and 73.6% to 79.5% (detailed data not shown).<br>

<br>

Any one have thoughts on this?<br>

<br>

thanks,<br>

<br>

John<br>

<br>

<br>

</font><br>

Gary Williams wrote:

<blockquote cite="mid:487B1792.6080406@sanger.ac.uk" type="cite">John,

  <br>

  <br>

The results of the curation anomalies in Brugia for the TIGR and Jigsaw

gene predictions are as follows:

  <br>

  <br>

  <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TIGR&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; JIGSAW

  <br>

  <br>

UNMATCHED_PROTEIN&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2695&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 648

  <br>

  <br>

Jigsaw looks to be very significantly better at spotting homologous

regions where there are protein alignments and incorporating them into

gene structures.

  <br>

  <br>

OVERLAPPING_EXONS&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 22&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0

  <br>

  <br>

I am surprised that there were any overlapping exons from different

genes on opposite strands in the TIGR prediction. This is poor.

  <br>

  <br>

WEAK_INTRON_SPLICE_SITE 6340&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9623

  <br>

  <br>

Jigsaw uses a significantly greater number of poor-scoring splice

sites. This appears to be because it tries harder to predict gene

models across pseudogenic regions.

  <br>

  <br>

SPLIT_GENES_BY_PROTEIN&nbsp; 323&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 438

  <br>

MERGE_GENES_BY_PROTEIN&nbsp; 156&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 236

  <br>

  <br>

I would not put much value on this difference; both sets of prediction

appear to have have trouble with merging and splitting predictions in

regions such as pseudogenes and duplicated pairs of genes.

  <br>

  <br>

REPEAT_OVERLAPS_EXON&nbsp;&nbsp;&nbsp; 864&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 815

  <br>

  <br>

No great difference.

  <br>

  <br>

UNCONFIRMED_INTRON&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 44228&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 40895

  <br>

  <br>

A slightly greater number of introns confirmed by ESTs or mRNAs were

missed by TIGR than by Jigsaw. These figures were for each individual

EST or mRNA predicting an intron that was missed by the prediction, so

strongly expressed regions will be counted more than regions with only

a few ESTs/mRNA.

  <br>

  <br>

If the numbers of unique missed introns is counted instead of the

number of transcripts across the introns, then we get:

  <br>

TIGR: 20,787 and Jigsaw: 18,993

  <br>

so Jigsaw comes out as the best again.

  <br>

  <br>

EST_OVERLAPS_INTRON&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0

  <br>

  <br>

This is the number of predicted introns with an EST transcript running

across them. This does not look significant.

  <br>

  <br>

SHORT_INTRON&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 59&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 137

  <br>

  <br>

This looks like jigsaw tries harder to generate a gene model over

difficult (pseudogenic?) regions and will create a short intron over a

frameshift.

  <br>

  <br>

  <br>

Result: Jigsaw is significantly better at making correct gene models,

but also tries to make them even in inappropriate pseudogenic regions.

  <br>

  <br>

Gary

  <br>

  <br>

  <br>

John Spieth wrote:

  <br>

  <blockquote type="cite">

    <blockquote type="cite">

      <blockquote type="cite">

        <blockquote type="cite">

          <blockquote type="cite">Hi Gary,

            <br>

            <br>

Have you had time yet to generate brugia anomalies using the TIGR and

JIGSAW gene sets?

            <br>

            <br>

thanks,

            <br>

            <br>

John

            <br>

          </blockquote>

        </blockquote>

      </blockquote>

    </blockquote>

  </blockquote>

  <br>

  <br>

</blockquote>

</body>

</html>