[Gmod-gbrowse] [Gmod-help] general questions about data querying

Don Gilbert gilbertd at cricket.bio.indiana.edu
Wed Nov 11 17:24:44 EST 2009


Dear Ryan, Dave etc.,

This is the sort if thing I do lots of, but using GFF files and perl code.
Often the perl code is a simple one-liner.  If your GFF file is sorted
by location, finding overlapped genes with opposite strand is easy:

cat genes.gff | grep mRNA | perl -ne\
'($c,$t,$s,$b,$e,$p,$o)=split; if($c eq $lc and $b<$le and $o ne $lo){print "<$lgene>$_";}\
($lc,$lb,$le,$lo,$lgene)=($c,$b,$e,$o,$_);' | more

 .. or maybe you want to grep exon types, or UTRs here
 .. if your .gff isn't location sorted, 
cat genes.gff | sort -k1,1 -k4,4n -k5,5nr | ..

You can do this in a mysql database, with sql code, but there are so many
useful sorts of pattern matching and greping with perl and unix that I find
that much easier for genome processing and analyses.

- Don



More information about the Gmod-help mailing list