2010年2月23日

Using BioJava to Parsing FASTA 使用 BioJAVA 讀取 FASTA 檔案


Introudce
簡介

One of methods to parse FASTA in BioJava cookbook is using SeqIOTools which has been marked as deprecated.
BioJava 上的 Cookbook 範例有讀取 FASTA 的教學,不過所使用的 API "SeqIOTools" 在 Docs 上已經被標註為 Deprecated

So, this example shows how to do it in other ways, like SequenceIterator and RichSequence.
因此以下的 Code 除了舊式寫法外,還包含了使用 SequenceIterator 以及 RichSequence 來達成
 
 
 
Code
原始碼

import java.io.*;
import java.util.*;
import org.biojava.bio.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.db.*;
import org.biojava.bio.seq.io.*;
import org.biojava.bio.symbol.*;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;
public class ReadFasta {
  /**
   *  This is sample code for parsing FASTA by using BioJava 
   * 本程式測試如何使用 BioJava 取得 FASTA 序列並輸出
   */
  public static void main(String[] args) throws IOException, BioException {
    try {
 
      /* set up parameters
       * 設定必要參數 
       */
      String sFilename = "D:\\working\\cRAP_200905_prefixCON.fasta";
      String sSeqType="PROTEIN";
      String sFileType="FASTA";
 
      /* for Sequence DB
       * 給 SequenceDB 使用
       */
      BufferedInputStream is = new BufferedInputStream(new FileInputStream(sFilename));      
 
      /* for Sequence Iterator 
       * 給 SequenceIterator 與 RichSequenceIterator 使用
       */
      BufferedReader br = new BufferedReader(new FileReader(sFilename));         
 
      /* for SequenceDB 
       * 給 SequenceDB 使用
       */
      Alphabet alpha = AlphabetManager.alphabetForName(sSeqType);  
      /*  Parsing FASTA by SeqIOTools (Deprecated)
       *  使用舊型 SeqIOTools 來取得(已棄置)
       */
      /* get a SequenceDB of all sequences in the file
       * 使用 SequenceDB 方式取得: 會有序列不按照順序的問題
       */
      //SequenceDB db =  SeqIOTools.readFasta(is, alpha);
      //SeqIOTools.writeFasta(System.out, db);
      /*  Test method of Sequence
       *  測試 Sequence 的 method
       */
      //System.out.println("getClass: "+db.getClass());
      //System.out.println("ids: "+db.ids());
      //System.out.println("getName: "+db.getName());
      //System.out.println("getSequence: "+db.getSequence("CON_sp|ALBU_BOVIN|").seqString());
      //SeqIOTools.writeFasta(System.out, db);
     
      /* Parsing FASTA by SequenceIterator
       * 使用 SequenceIterator 方式取得       
       */
      SequenceIterator iter =(SequenceIterator) SeqIOTools.fileToBiojava(sFileType, sSeqType,br);
       /*
       int iCount=0;
       while (  iter.hasNext()  ){
          Sequence s = iter.nextSequence();

          //String name = s.getName();
          //s.length();
          //s.getAnnotation();
          System.out.println("\nName: "+s.getName());
          System.out.println("\tAnnotation: "+s.getAnnotation());
          System.out.println("\tSeq: "+s.seqString());
          iCount++;
         
        }
        System.out.println("\tCount for Seq: "+iCount);
       */
      /*  Parsing FASTA by RichSequence
       * 使用新的 RichSequence 方式來取得:
       */
       RichSequenceIterator rsIter = RichSequence.IOTools.readFastaProtein(br, null);
       while (rsIter.hasNext()){
          RichSequence rsSequence = rsIter.nextRichSequence();

          //String name = s.getName();
          //s.length();
          //s.getAnnotation();
          System.out.println("\tAccession: "+rsSequence.getAccession());
          System.out.println("\tDescription: "+rsSequence.getDescription());
          System.out.println("\tDivision: "+rsSequence.getDivision());
          System.out.println("\tName: "+rsSequence.getName());
          System.out.println("\tseqString: "+rsSequence.seqString());
      
       }
     
    }
    catch (BioException ex) {
      //not in fasta format or wrong alphabet
      ex.printStackTrace();
    }catch (NoSuchElementException ex) {
      //no fasta sequences in the file
      ex.printStackTrace();
    }catch (FileNotFoundException ex) {
      //problem reading file
      ex.printStackTrace();
    }
  }
}
 


Reference 
參考


Parsing FASTA by SeqIOTools
使用舊型的 SeqIOTools 來取得 FASTA 範例


Parsing FASTA by RichSequence
RichSequence.IOTools 的 API


沒有留言:

張貼留言