[Gramene] compara-database

Sharon Wei weix at cshl.edu
Tue Jun 29 15:38:40 EDT 2010


Dear Ambrose,

I had some experience working with WGA pipelines and synteny analysis, 
and can give you comments on those.

1) For blastz pairwise alignmnent pipeline, my experience was based on 
ensembl-55. I basically followed the instructions in 
ensembl-compara/scripts/pipeline/, README-pairaligner  and 
README-chain-net. You may need to change some hard-coded paths, for 
example, AlignmentChains.pm and AlignmentNets.pm in 
EnsEMBL/Compara/Production/GenomicAlignBlock/.  In addition, if you 
don't have LSF job management system, you need to modify the job 
submission and job status code in ensembl-hive to match your job 
management software, for example: modify modules in 
Bio/EnsEMBL/Hive/Meadow/.  Then you need to keep on testing and 
debugging on your system until you run through all the steps without 
errors and got GABs. One experience to pass on is to always run with 
repeatMasked genomes and make sure the repeat features are projected on 
the toplevel coordinate system ( if that is what you config to run 
against). Another experience is it is a good practice to use compara 
master database (a central database storing ids for genome_db, dnafrag, 
method_link, method_link_species_set, species_set), each new WGA 
pipeline database will got prepopulated by copying data from this master 
db, this will make ids consistent across different databases and the 
subsequence data merging much less a pain.

2) For EPO pipeline, my experience was based on ensembl-56/57. I had 
been trying this pipeline on and off for about a year and recently had 
some breakthrough and got it working with 4 monocot genomes.  In 
addition to the notes above, you may want to calculate the neutral 
substitution rate from your species tree. If it is less than 0.5, you 
may need to change the gerpelem parameter in   
ensembl-compara/modules/Bio/EnsEMBL/Compara/Production/GenomicAlignBlock/Gerp.pm. 
In my last try with 4 monocot species, the brach length of the species 
tree sum up to about 0.42, so I changed "gerpelem" to "gerpelem -d 0.3" 
in Gerp.pm to make it work. You could read the GERP paper and GERP 
documentations on gerpelem for better understanding.
As we are still experimenting with this pipeline and will run more epo 
pipelines on plant genomes in the next a few months, I will keep you 
updated with new findings if there will be any.

3) For ensembl synteny based on WGA, our experience was even though it 
may work well for vertebrate, it doesn't produce satisfactory results on 
plant genomes. So Gramene has developed its own gene tree based synteny 
buiding method.


Hope this helps.


Sharon
Gramene Project

 
ambrose andongabo (RRes-Roth) wrote:
> Dear All,
>         I am currently working on setting up Brassica Ensembl(http://www.brassica.info/Brassica_rapa/Info/Index) comparative genomics database for our Brassica Ensembl. I have been talking with some people from EBI and most of them have referred me to the people working at Grammene because you have a lot of experience with setting up compara databases for plants. Currently we have core databases for Brassica rapa (the Chinese cabbage), Arabidopsis thaliana and poplar. We intend to use these genomes to generate data for our compara database. In the future we will be adding other Brassica species when the data will become available. 
>
> I will be please if you can give me some steps I have to follow and a clue about some pitfalls I may encounter during the analysis. As of now I intend to use the following work flow
>
> The order of work
> 1) Pairwise Alignments
> 2) GeneTree
> 3) Synteny
> 4) MSAs & conservation
> 5)  Family
>
> Many thanks in advance
>
> NB: I am really very interested in know how you performed the EPO alignment. All the steps involve and the details will be appreciated.
>
> Cheers
>
> Ambrose
>
> _______________________________________________
> Gramene mailing list
> Gramene at brie4.cshl.edu
> http://mail.gramene.org/mailman/listinfo/gramene
>   




More information about the Gramene mailing list