2010年10月8日

How to find/build the closest specie sequence database 如何在序列資料庫中找到最接近的目標物種

Take "Vibrio parahaemolyticus" for exmaple.
以腸炎弧菌(Vibrio parahaemolyticus)為例,

We cannot find this specie on common sequence database, ex: NCBInr.
一般在序列資料庫(如NCBInr)沒有細分到這一層,

If we cannot reduce possible sequence database pool, we'll get too much results too difficult to analysis.
如果不分類就搜尋的話,可能導致花費大量的時間在搜尋過程或結果的過濾上。

2010年2月26日

TPP Demo2009 中文說明+操作畫面

TPP 官方 wiki 在 2009 年中製作了一份 TPP Demo2009 ,
裡面的內容從 TPP 軟體的初始下載到安裝,
以及基本的資料庫(X!Tandem)搜尋、檢驗分析結果(PeptideProphet, ProteinProphet),
還有新增的功能(SpectraST搜尋)都有 Demo 說明,
是一個很好的 TPP 操作指引。

TPP Demo2009 - 10. Protein-level validation with ProteinProphet 使用 ProteinProphet 做 protein 層級的檢驗

ProteinProphet is a tool that provides statistical validation of Protein identifications, and is based on PeptideProphet results.

ProteinProphet 是一套基於 PeptideProphet 結果檔的檢驗工具,
以統計模型來檢定 Protein Identifications 是否為 false match。

TPP Demo2009 - 9. Peptide Quantitation with ASAPRatio 使用 ASAPRatio 定量 Peptide

ASAPRatio is a tool for measuring relative expression levels of peptides and proteins from isotopically-labeled samples (e.g. ICAT, SILAC, etc).
ASAPRatio 是一套用來測量以同位素標定樣本的工具,可計算出該 peptides 與 proteins 層級的相對表現量。

2010年2月25日

TPP Demo2009 - 8. Further peptide-level validation iProphet 使用 iProphet 做進一步的 peptide 層級檢驗

iProphet (or InterProphet) is a tool that provides statistical refinement of PeptidePropet results.
iProphet (或稱為 InterProphet) 工具能將 PeptidePropet 產生的結果檔合併並做更進一步統計上的檢驗。

TPP Demo2009 - 7. Visualize LC-MS/MS data using Pep3D 使用 Pep3D 來瀏覽 LC-MS/MS 圖譜

Pep3D is a tool for visualizing LC MS data, along with results from PeptideProphet.

Pep3D 是用來視覺化 LC MS 資料的工具,可搭配 PeptideProphet 的結果檔。

TPP Demo2009 - 6. Validation of Peptide-Spectrum assignments with PeptideProphet 使用 PetideProphet 檢驗 pepitde 與 Spectrum 對應的關係

PeptideProphet provides statistical validation of search engine results by assigning a probability to each peptide-spectrum match.

PeptideProphet 給予搜尋結果的每對 peptide 與 spectrum 一個機率,以統計模型來檢定是否為 false match。

TPP Demo2009 - 5. Search data with SpectraST 使用 SpectraST 搜尋圖譜數據

SpectraST is a search engine that compares acquired spectra against a library of pre-identified spectra to which peptide sequences have been assigned. In order to conduct the search, we must first download the appropriate spectral library.

SpectraST 為圖譜搜尋引擎,可經由事先建立的 peptide sequence 與 spectra 對應關系,取得實驗數據的比對結果。
為了建立該搜尋,需先下載適當的 spectral library。

TPP Demo2009 - 4. Search data with X!Tandem 使用 X!Tandem 搜尋引擎

A custom version of the popular open-source search engine X!Tandem is bundled and installed with the TPP. It has been modified from the original distribution by adding the K-Score scoring function, developed by a team at the Fred Hutchinson Cancer Research Center.

TPP 本身包含客製化後的開放搜尋軟體 X!Tandem。
內含的 X!Tandem 已被更改並加入 K-Score 功能,該功能由Fred Hutchinson Cancer Research Center 所發展。

TPP Demo2009 - 3. Convert raw data to the mzML format 轉換 raw data 成 mzML 格式檔案

We have developed the TPP (and dozens of related tools) to read mass-spec data from a common, open data format. We must therefore first convert the proprietary raw data to this format, called mzML.

TPP 及其數種相關工具以共通的檔案格式當做來源,因此需要將原始的數據轉成該共通格式,稱為 mzML。

TPP Demo2009 - 2. Download and install the test data and database 下載並安裝測試資料

For this demo, we will be using a SILAC-labeled Yeast dataset, comprised of 2 runs on a high mass-accuracy Orbitrap instrument, along with a Yeast database appended with decoys. We also include a search parameters file.

Deom 所用的資料為 SILAC-labeled Yeast 資料,包含在高精準度 Orbitrap 儀器下的2組數據。序列資料庫為 Yeast 並包含 decoys。資料亦附上搜尋的參數檔。

TPP Demo2009 - 1. Download and install the TPP 下載並安裝 TPP

To install on your Windows system, please follow our Windows Installation Guide, making sure that you select to download the file "TPP_Setup_v4_3_JETSTREAM_rev_1.exe" from our Sourceforge download site.

Sourceforge download site.下載"TPP_Setup_v4_3_JETSTREAM_rev_1.exe",並按照 Windows Installation Guide 指示安裝 windows 版本的 TPP。

2010年2月23日

Using BioJava to Parsing FASTA 使用 BioJAVA 讀取 FASTA 檔案


Introudce
簡介

One of methods to parse FASTA in BioJava cookbook is using SeqIOTools which has been marked as deprecated.
BioJava 上的 Cookbook 範例有讀取 FASTA 的教學,不過所使用的 API "SeqIOTools" 在 Docs 上已經被標註為 Deprecated

So, this example shows how to do it in other ways, like SequenceIterator and RichSequence.
因此以下的 Code 除了舊式寫法外,還包含了使用 SequenceIterator 以及 RichSequence 來達成
 
 
 
Code
原始碼

import java.io.*;
import java.util.*;
import org.biojava.bio.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.db.*;
import org.biojava.bio.seq.io.*;
import org.biojava.bio.symbol.*;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;
public class ReadFasta {
  /**
   *  This is sample code for parsing FASTA by using BioJava 
   * 本程式測試如何使用 BioJava 取得 FASTA 序列並輸出
   */
  public static void main(String[] args) throws IOException, BioException {
    try {
 
      /* set up parameters
       * 設定必要參數 
       */
      String sFilename = "D:\\working\\cRAP_200905_prefixCON.fasta";
      String sSeqType="PROTEIN";
      String sFileType="FASTA";
 
      /* for Sequence DB
       * 給 SequenceDB 使用
       */
      BufferedInputStream is = new BufferedInputStream(new FileInputStream(sFilename));      
 
      /* for Sequence Iterator 
       * 給 SequenceIterator 與 RichSequenceIterator 使用
       */
      BufferedReader br = new BufferedReader(new FileReader(sFilename));         
 
      /* for SequenceDB 
       * 給 SequenceDB 使用
       */
      Alphabet alpha = AlphabetManager.alphabetForName(sSeqType);  
      /*  Parsing FASTA by SeqIOTools (Deprecated)
       *  使用舊型 SeqIOTools 來取得(已棄置)
       */
      /* get a SequenceDB of all sequences in the file
       * 使用 SequenceDB 方式取得: 會有序列不按照順序的問題
       */
      //SequenceDB db =  SeqIOTools.readFasta(is, alpha);
      //SeqIOTools.writeFasta(System.out, db);
      /*  Test method of Sequence
       *  測試 Sequence 的 method
       */
      //System.out.println("getClass: "+db.getClass());
      //System.out.println("ids: "+db.ids());
      //System.out.println("getName: "+db.getName());
      //System.out.println("getSequence: "+db.getSequence("CON_sp|ALBU_BOVIN|").seqString());
      //SeqIOTools.writeFasta(System.out, db);
     
      /* Parsing FASTA by SequenceIterator
       * 使用 SequenceIterator 方式取得       
       */
      SequenceIterator iter =(SequenceIterator) SeqIOTools.fileToBiojava(sFileType, sSeqType,br);
       /*
       int iCount=0;
       while (  iter.hasNext()  ){
          Sequence s = iter.nextSequence();

          //String name = s.getName();
          //s.length();
          //s.getAnnotation();
          System.out.println("\nName: "+s.getName());
          System.out.println("\tAnnotation: "+s.getAnnotation());
          System.out.println("\tSeq: "+s.seqString());
          iCount++;
         
        }
        System.out.println("\tCount for Seq: "+iCount);
       */
      /*  Parsing FASTA by RichSequence
       * 使用新的 RichSequence 方式來取得:
       */
       RichSequenceIterator rsIter = RichSequence.IOTools.readFastaProtein(br, null);
       while (rsIter.hasNext()){
          RichSequence rsSequence = rsIter.nextRichSequence();

          //String name = s.getName();
          //s.length();
          //s.getAnnotation();
          System.out.println("\tAccession: "+rsSequence.getAccession());
          System.out.println("\tDescription: "+rsSequence.getDescription());
          System.out.println("\tDivision: "+rsSequence.getDivision());
          System.out.println("\tName: "+rsSequence.getName());
          System.out.println("\tseqString: "+rsSequence.seqString());
      
       }
     
    }
    catch (BioException ex) {
      //not in fasta format or wrong alphabet
      ex.printStackTrace();
    }catch (NoSuchElementException ex) {
      //no fasta sequences in the file
      ex.printStackTrace();
    }catch (FileNotFoundException ex) {
      //problem reading file
      ex.printStackTrace();
    }
  }
}
 


Reference 
參考


Parsing FASTA by SeqIOTools
使用舊型的 SeqIOTools 來取得 FASTA 範例


Parsing FASTA by RichSequence
RichSequence.IOTools 的 API