BGI rice protein sequences

Huaichun Wang hcwang at science.uottawa.ca
Mon May 5 08:49:22 EDT 2003


Hello,
I downloaded BGI rice protein sequence set (Os_indica_protein_BGI.txt)
from Gramene site. I found many sequences contain such characters like
^@^@^@. E.g., the sequence

>Scaffold3908_2/putative protein/600/Unknown
MASSPRPAATAAAHRRGLIQRPPSAQAYLSAAAALLVLAAVAFSRAGHRFPHPPATRRCRPDAEGSWSAGVFLGDSPFSLEPIEHWGISKADGAAWPVANPVVTCAEVEDAGFPSSFVAKPFLFLQGDAIYMFFETKNPITSQGDIAAAVSEDAGVTWQQLGVVLDEEWHLSYPYVFTYKNKVYMMPESSKNGDIRLYRALDFPLKWELEKVLLEKPLVDSVIINFQGSYWLLGTDLSSYGAKRNREISIWYNNSPLSPWIPHKQNLIHNTGKMLSTRNGGRPFIYNGNLYRVGKGQGGGSGH

then comes lot of ^@^@^@s. After that is the following sequence segment:

GIQVFKVEILKSNEYKEVEVPFVINKQLKGRNAWNGARSHHLDVQQLPSGKLWIGVMDGDRVPSGDSVHRLTIGYMIYGVVLILVLVTGGLIGTINCSLPLRWSLPHTEKRSGLFNVEQRFFLYHKLSSLISNLNKLGSLICGRINYRTCKGRVYVVVVMLILVVLTCVGTHYIYGGNGAEEPYPIKGKHSQFTLLTMTYDARLWNLKMFVEHYSNCASVRDIVVVWNKGQPPAQVLANCGTILNFSHVIATRSNAPLESKLEILLQEAGRGNILTIKQRIMQPAVPGKRAYQSCKW

I wonder how to handle this kind of unusual characters. Thanks for
instruction.

Huaichun





More information about the Gramene mailing list