Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages

SuffixArrayTrypticCompressed Class Reference

#include <OpenMS/DATASTRUCTURES/SuffixArrayTrypticCompressed.h>

Inheritance diagram for SuffixArrayTrypticCompressed:

SuffixArray

List of all members.


Detailed Description

Class that implements a suffix array for a String. It can be used to find peptide Candidates for a MS spectrum.

This class implements a suffix array. It can just be used for finding peptide Candidates for a given MS Spectrum within a certain mass tolerance. The suffix array can be saved to disc for reused so it has to be build just once. The suffix array consits of a vector of pair of ints for every suffix, a vector of LCP values and a so called skip vector. Only the sufices that are matching the function isDigestingEnd are created. Besides a suffix will not reach till the end of the string but till the next occurence of the seperator ($). So only the interessting sufices will be saved. This will reduce the used space.

Public Member Functions

 SuffixArrayTrypticCompressed (const String &st, const String &sa_file_name) throw (Exception::InvalidValue, Exception::FileNotFound)
 constructor taking the string and the filename for writing or reading
 SuffixArrayTrypticCompressed (const SuffixArrayTrypticCompressed &sa)
 copy constructor
virtual ~SuffixArrayTrypticCompressed ()
 destructor
String toString ()
 transforms suffix array to a printable String
void findSpec (std::vector< std::vector< std::pair< std::pair< int, int >, float > > > &candidates, const std::vector< double > &spec) throw (Exception::InvalidValue)
 the function that will find all peptide candidates for a given spectrum
bool save (const String &file_name) throw (Exception::UnableToCreateFile)
 saves the suffix array to disc
bool open (const String &file_name) throw (Exception::FileNotFound)
 opens the suffix array
void setTolerance (double t) throw (Exception::InvalidValue)
 setter for tolerance
double getTolerance () const
 getter for tolerance
bool isDigestingEnd (const char aa1, const char aa2) const
 returns if an enzyme will cut after first character
void setTags (const std::vector< String > &tags) throw (Exception::InvalidValue)
 setter for tags
const std::vector< String > & getTags ()
 getter for tags
void setUseTags (bool use_tags)
 setter for use_tags
bool getUseTags ()
 getter for use_tags
void setNumberOfModifications (unsigned int number_of_mods)
 setter for number of modifications
unsigned int getNumberOfModifications ()
 getter for number of modifications
void printStatistic ()
 output for statistic

Protected Member Functions

 SuffixArrayTrypticCompressed ()
 constructor
int getNextSep_ (const int p) const
 gets the index of the next sperator for a given index
int getLCP_ (const std::pair< int, int > &last_point, const std::pair< int, int > &current_point)
 gets the lcp for two strings described as pairs of ints
int findFirst_ (const std::vector< double > &spec, double &m)
 binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance.
int findFirst_ (const std::vector< double > &spec, double &m, int start, int end)
 binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly.
void parseTree_ (int start_index, int stop_index, int depth, int walked_in, int edge_len, std::vector< std::pair< int, int > > &out_number, std::vector< std::pair< int, int > > &edge_length, std::vector< int > &leafe_depth)
 treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm.
bool hasMoreOutgoings_ (int start_index, int stop_index, int walked_in)
 indicates if a node during traversal has more outgoings

Protected Attributes

const Strings_
 the string with which the suffix array is build
double tol_
 mass tolerance for finding candidates
std::vector< std::pair< int,
int > > 
indices_
 vector of pairs of ints describing all relevant sufices
std::vector< int > lcp_
 vector of ints with lcp values
std::vector< int > skip_
 vector of ints with skip values
double masse_ [256]
 mass table
int number_of_modifications_
 number of allowed modifications
std::vector< Stringtags_
 all given tags
bool use_tags_
 indicates whether tags are used or not
int progress_


Constructor & Destructor Documentation

SuffixArrayTrypticCompressed ( const String st,
const String sa_file_name 
) throw (Exception::InvalidValue, Exception::FileNotFound)

constructor taking the string and the filename for writing or reading

Parameters:
st the string as const reference with which the suffix array will be build
saFileName the filename for writing or reading the suffix array
Exceptions:
Exception::InvalidValue if string does not start with empty string ($)
The constructor checks if a suffix array with given filename (without file extension) exists or not. In the first case it will simple be loaded and otherwise it will be build. Bulding the suffix array consists of several steps. At first all indices for a digesting enzyme (defined by using function isDigestingEnd) are created as an vector of int pairs. After creating all relevant indices they are sorted and the lcp and skip vectors are created.

SuffixArrayTrypticCompressed ( const SuffixArrayTrypticCompressed sa  ) 

copy constructor

virtual ~SuffixArrayTrypticCompressed (  )  [virtual]

destructor

SuffixArrayTrypticCompressed (  )  [protected]

constructor


Member Function Documentation

String toString (  )  [virtual]

transforms suffix array to a printable String

Implements SuffixArray.

void findSpec ( std::vector< std::vector< std::pair< std::pair< int, int >, float > > > &  candidates,
const std::vector< double > &  spec 
) throw (Exception::InvalidValue) [virtual]

the function that will find all peptide candidates for a given spectrum

Parameters:
spec const reference of double vector describing the spectrum
Returns:
a vector of int pairs.
Exceptions:
Exception::InvalidValue if the spectrum is not sorted ascendingly
for every mass within the spectrum all candidates described by as pairs of ints are returned. All masses are searched for the same time in just one suffix array traversal. In order to accelerate the traversal the skip and lcp table are used. The mass wont be calculated for each entry but it will be updated during traversal using a stack datastructure

Implements SuffixArray.

bool save ( const String file_name  )  throw (Exception::UnableToCreateFile) [virtual]

saves the suffix array to disc

Parameters:
filename const reference string describing the filename
Returns:
bool if operation was succesful
Exceptions:
Exception::UnableToCreateFile if file could not be created (e.x. if you have no rigths)

Implements SuffixArray.

bool open ( const String file_name  )  throw (Exception::FileNotFound) [virtual]

opens the suffix array

Parameters:
filename const reference string describing the filename
Returns:
bool if operation was succesful
Exceptions:
Exception::FileNotFound 

Implements SuffixArray.

void setTolerance ( double  t  )  throw (Exception::InvalidValue) [virtual]

setter for tolerance

Parameters:
t double with tolerance
Exceptions:
Exception::InvalidValue if tolerance is negative

Implements SuffixArray.

double getTolerance (  )  const [virtual]

getter for tolerance

Returns:
double with tolerance

Implements SuffixArray.

bool isDigestingEnd ( const char  aa1,
const char  aa2 
) const [virtual]

returns if an enzyme will cut after first character

Parameters:
aa1 const char as first aminoacid
aa2 const char as second aminoacid
Returns:
bool descibing if it is a digesting site

Implements SuffixArray.

void setTags ( const std::vector< String > &  tags  )  throw (Exception::InvalidValue) [virtual]

setter for tags

Parameters:
tags const vector of strings with tags with length 3 each
Exceptions:
Exception::InvalidValue if at least one tag does not have size of 3

Implements SuffixArray.

const std::vector<String>& getTags (  )  [virtual]

getter for tags

Returns:
const vector of string with tags

Implements SuffixArray.

void setUseTags ( bool  use_tags  )  [virtual]

setter for use_tags

Parameters:
use_tags indicating whether tags should be used or not

Implements SuffixArray.

bool getUseTags (  )  [virtual]

getter for use_tags

Returns:
bool indicating whether tags are used or not

Implements SuffixArray.

void setNumberOfModifications ( unsigned int  number_of_mods  )  [virtual]

setter for number of modifications

Parameters:
number_of_mods 

Implements SuffixArray.

unsigned int getNumberOfModifications (  )  [virtual]

getter for number of modifications

Returns:
unsigned int describing number of modifications

Implements SuffixArray.

void printStatistic (  )  [virtual]

output for statistic

Implements SuffixArray.

int getNextSep_ ( const int  p  )  const [protected]

gets the index of the next sperator for a given index

Parameters:
p const int describing a position in the string
Returns:
int with the index of the next occurence of the sperator or -1 if there is no more seperator

int getLCP_ ( const std::pair< int, int > &  last_point,
const std::pair< int, int > &  current_point 
) [protected]

gets the lcp for two strings described as pairs of ints

Parameters:
last_point const pair of ints describing a substring
current_point const pair of ints describing a substring
Returns:
int with the length of the lowest common prefix

int findFirst_ ( const std::vector< double > &  spec,
double &  m 
) [protected]

binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance.

Parameters:
spec const reference to spectrum
m mass
Returns:
int with the index of the first occurence
Note:
requires that there is at least one occurence

int findFirst_ ( const std::vector< double > &  spec,
double &  m,
int  start,
int  end 
) [protected]

binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly.

Parameters:
spec const reference to spectrum
m mass
start start index
end end index
Returns:
int with the index of the first occurence
Note:
requires that there is at least one occurence

void parseTree_ ( int  start_index,
int  stop_index,
int  depth,
int  walked_in,
int  edge_len,
std::vector< std::pair< int, int > > &  out_number,
std::vector< std::pair< int, int > > &  edge_length,
std::vector< int > &  leafe_depth 
) [protected]

treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm.

Parameters:
start_index int describing the start index in indices_ vector
stop_index int describing the end index in indices_ vector
depth at with depth the traversion is at the actual position
walked_in how many characters we have seen from root to actual position
edge_len how many characters we have seen from last node to actual position
out_number reference to vector of pairs of ints. For every node it will be filled with how many outgoing edge a node has in dependece of its depth
edge_length will be filled with the edge_length in dependence of its depth
leafe_depth will be filled with the depth of every leafe
Note:
intialize: walked_in=0, depth=1, edge_len=1

bool hasMoreOutgoings_ ( int  start_index,
int  stop_index,
int  walked_in 
) [protected]

indicates if a node during traversal has more outgoings

Parameters:
start_index int describing the start index in indices_ vector
stop_index int describing the end index in indices_ vector
walked_in how many characters we have seen from root to actual position


Member Data Documentation

const String& s_ [protected]

the string with which the suffix array is build

double tol_ [protected]

mass tolerance for finding candidates

std::vector<std::pair<int,int> > indices_ [protected]

vector of pairs of ints describing all relevant sufices

std::vector<int> lcp_ [protected]

vector of ints with lcp values

std::vector<int> skip_ [protected]

vector of ints with skip values

double masse_[256] [protected]

mass table

int number_of_modifications_ [protected]

number of allowed modifications

std::vector<String> tags_ [protected]

all given tags

bool use_tags_ [protected]

indicates whether tags are used or not

int progress_ [protected]


The documentation for this class was generated from the following file:
Generated Tue Apr 1 15:36:44 2008 -- using doxygen 1.5.4 OpenMS / TOPP 1.1