[Gramene] compara-database
Will Spooner
wspooner at cshl.edu
Thu Jul 1 10:05:23 EDT 2010
For the genetree component, I refer you to this document;
http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-compara/scripts/pipeline/README-genetree?revision=1.38&root=ensembl&view=markup
This gets pretty well maintained, and is fairly detailed.
Will
On 29 Jun 2010, at 20:38, Sharon Wei wrote:
> Dear Ambrose,
>
> I had some experience working with WGA pipelines and synteny analysis, and can give you comments on those.
>
> 1) For blastz pairwise alignmnent pipeline, my experience was based on ensembl-55. I basically followed the instructions in ensembl-compara/scripts/pipeline/, README-pairaligner and README-chain-net. You may need to change some hard-coded paths, for example, AlignmentChains.pm and AlignmentNets.pm in EnsEMBL/Compara/Production/GenomicAlignBlock/. In addition, if you don't have LSF job management system, you need to modify the job submission and job status code in ensembl-hive to match your job management software, for example: modify modules in Bio/EnsEMBL/Hive/Meadow/. Then you need to keep on testing and debugging on your system until you run through all the steps without errors and got GABs. One experience to pass on is to always run with repeatMasked genomes and make sure the repeat features are projected on the toplevel coordinate system ( if that is what you config to run against). Another experience is it is a good practice to use compara master database (a central database storing ids for genome_db, dnafrag, method_link, method_link_species_set, species_set), each new WGA pipeline database will got prepopulated by copying data from this master db, this will make ids consistent across different databases and the subsequence data merging much less a pain.
>
> 2) For EPO pipeline, my experience was based on ensembl-56/57. I had been trying this pipeline on and off for about a year and recently had some breakthrough and got it working with 4 monocot genomes. In addition to the notes above, you may want to calculate the neutral substitution rate from your species tree. If it is less than 0.5, you may need to change the gerpelem parameter in ensembl-compara/modules/Bio/EnsEMBL/Compara/Production/GenomicAlignBlock/Gerp.pm. In my last try with 4 monocot species, the brach length of the species tree sum up to about 0.42, so I changed "gerpelem" to "gerpelem -d 0.3" in Gerp.pm to make it work. You could read the GERP paper and GERP documentations on gerpelem for better understanding.
> As we are still experimenting with this pipeline and will run more epo pipelines on plant genomes in the next a few months, I will keep you updated with new findings if there will be any.
>
> 3) For ensembl synteny based on WGA, our experience was even though it may work well for vertebrate, it doesn't produce satisfactory results on plant genomes. So Gramene has developed its own gene tree based synteny buiding method.
>
>
> Hope this helps.
>
>
> Sharon
> Gramene Project
>
> ambrose andongabo (RRes-Roth) wrote:
>> Dear All,
>> I am currently working on setting up Brassica Ensembl(http://www.brassica.info/Brassica_rapa/Info/Index) comparative genomics database for our Brassica Ensembl. I have been talking with some people from EBI and most of them have referred me to the people working at Grammene because you have a lot of experience with setting up compara databases for plants. Currently we have core databases for Brassica rapa (the Chinese cabbage), Arabidopsis thaliana and poplar. We intend to use these genomes to generate data for our compara database. In the future we will be adding other Brassica species when the data will become available.
>> I will be please if you can give me some steps I have to follow and a clue about some pitfalls I may encounter during the analysis. As of now I intend to use the following work flow
>>
>> The order of work
>> 1) Pairwise Alignments
>> 2) GeneTree
>> 3) Synteny
>> 4) MSAs & conservation
>> 5) Family
>>
>> Many thanks in advance
>>
>> NB: I am really very interested in know how you performed the EPO alignment. All the steps involve and the details will be appreciated.
>>
>> Cheers
>>
>> Ambrose
>>
>> _______________________________________________
>> Gramene mailing list
>> Gramene at brie4.cshl.edu
>> http://mail.gramene.org/mailman/listinfo/gramene
>>
>
> _______________________________________________
> Gramene mailing list
> Gramene at brie4.cshl.edu
> http://mail.gramene.org/mailman/listinfo/gramene
---
More information about the Gramene
mailing list