[Ngasp-help] Re: [Ngasp-dev] Gunnar Raetsch's comments

Sun Jul 27 15:40:53 EDT 2008

Hi All,

Some comments below.

>
>
> * I missed a more detailed discussion of the annotation dataset that
> has been used for evaluation. There are a few issues:
> 	* For unconfirmed genes, how where they predicted? Did any of the
> compared methods produce these predictions?
> 	* Also, for the EST-confirmed genes: these gene models where
> generated using EST alignments, some manual curation and probably also
> gene finder predictions. Except the manual curation, the input is the
> same as for cat-3 gene finding. Who knows what the correct gene models
> are...
>     (or something in this direction, you probably have your own
> thoughts on this).

This is not a bad idea if it is easy to do.

>
> * The evaluation for the combiners was done on only the 3' end of the
> regions. We noticed there is a significant difference for in the
> performance between the 5' and 3' ends (several percent in transcript
> level). Hence, the performance on the 3' end and the performance on
> the whole region are not directly comparable. I don't necessarily want
> you to redo the evaluation, but at least this should be noted in the
> main text and also that combiners had about 50% more data for  
> training.
>

This is actually shocking if true.  Why would the 3' portions of the  
regions be so much easier to predict?  Are they enriched for the  
characteristics that Arvil found to be associated with difficult to  
predict gene?

I don't think that it is true that the combiners necessarily had more  
training data unless they also used all of the original training  
regions as well.  But these did not have all of the gene predictions  
that they used in the combination, right?

> * Since the result of the paper is that combiners are the method of
> choice, it would be very interesting to understand how important the
> accuracy and the choice of the base-gene finders are. It would be
> really great if you could show some results, indicating which
> (sub-)set of (the three) gene finders lead to which performance.

This point has been covered in some previous combiner papers.   
Basically, better inputs lead to better outputs and so the best  
individual gene predictions are most likely to be the ones that are  
most effective for the combiners.

See J. Allen's original combiner paper in Genome Research and the  
GLEAN paper from earlier this year in Bioinformatics for examples.

>
> * I really would like the evaluation separately for each category.
> This would allow us the compare the contributions within the
> categories more easily. It would be very nice to have, at least in the
> supplement.

I thought we had this already.

>
> * One feature that came to our mind that you did not check whether it
> leads to wrong predictions, is whether there is a gene on the opposite
> strand. Many gene finders only predict genes on one strand. mGene, for
> instance, does not.

I don't see any reason to do this.

>
> * It would be a great service to the gene finding community and also
> would facilitate reproducibility of this research, if you'd provide
> the evaluation scripts which lead to exactly the numbers given in the
> table in the supplement of this paper. (In any case I'd like to get
> them to evaluate new predictions that we have in the same way.)

This is possible, but difficult given the processing that took place  
on the submitted predictions before they could be read by the  
evaluation code.

That said, I am supportive of this.   But we would have to include  
much more detailed file munging methods in the supplement section.

>
> Finally, I'd like to ask you to reconsider the choice of the Journal.
> I think it would have a good chance in PLoS Comp Bio or Genome
> Biology, which I find considerably better suited for this work than
> BMC Bioinformatics. Why not trying it?

I would support going to PLoS Comp Bio if people think that we had a  
chance there.  Genome Biology is also a possibility as they published  
the EGASP.

Remember, however, that EGASP went to Genome Biology as a supplement  
through their marketing department after their editorial turned down  
the idea.  We were pitching the summary paper plus the methods papers  
so it was a significant different thing than just this.

Paul

>
> Thank you and the others for writing the manuscript and organising
> this competition!
>
> All the best,
>
> Gunnar
>
>
> On 16.07.2008, at 19:26, Dr. Tristan J. Fiedler wrote:
>
> Dear nGASP Participants,
>
> We thank you again for your participation in nGASP.
>
> The nGASP analysis team has now written up the results of nGASP
> as a paper, which we plan to submit to BMC Bioinformatics.
>
> As agreed, we are sending you a copy of the draft manuscript
> for your perusal before submission.
>
> We would be very grateful if you can let us know if you have any
> major comments on the draft manuscript by Thursday 24th July.
>
> Comments may be sent to ngasp-help at wormbase.org
>
> Yours sincerely,
>
> The nGASP analysis team.
>
> <ngasp_16jul08b_alc.doc>
>
> +-------------------------------------------------------------------+
> Gunnar Rätsch                         http://www.fml.mpg.de/raetsch
> Friedrich Miescher Laboratory       Gunnar.Raetsch at tuebingen.mpg.de
> Max Planck Society                          Tel: (+49) 7071 601 820
> Spemannstraße 39, 72076 Tübingen, Germany   Fax: (+49) 7071 601 801
>
>
>
> On 16.07.2008, at 19:26, Dr. Tristan J. Fiedler wrote:
>
>> Dear nGASP Participants,
>>
>> We thank you again for your participation in nGASP.
>>
>> The nGASP analysis team has now written up the results of nGASP
>> as a paper, which we plan to submit to BMC Bioinformatics.
>>
>> As agreed, we are sending you a copy of the draft manuscript
>> for your perusal before submission.
>>
>> We would be very grateful if you can let us know if you have any
>> major comments on the draft manuscript by Thursday 24th July.
>>
>> Comments may be sent to ngasp-help at wormbase.org
>>
>> Yours sincerely,
>>
>> The nGASP analysis team.
>>
>> <ngasp_16jul08b_alc.doc>
>
> +-------------------------------------------------------------------+
> Gunnar Rätsch                         http://www.fml.mpg.de/raetsch
> Friedrich Miescher Laboratory       Gunnar.Raetsch at tuebingen.mpg.de
> Max Planck Society                          Tel: (+49) 7071 601 820
> Spemannstraße 39, 72076 Tübingen, Germany   Fax: (+49) 7071 601 801
>
>
>
>
> +-------------------------------------------------------------------+
> Gunnar Rätsch                         http://www.fml.mpg.de/raetsch
> Friedrich Miescher Laboratory       Gunnar.Raetsch at tuebingen.mpg.de
> Max Planck Society                          Tel: (+49) 7071 601 820
> Spemannstraße 39, 72076 Tübingen, Germany   Fax: (+49) 7071 601 801
>
>
>
>
> <untitled-[1.2] 
> ><PGP.sig>_______________________________________________
> Ngasp-dev mailing list
> Ngasp-dev at wormbase.org
> http://mail.wormbase.org/mailman/listinfo/ngasp-dev