#include <OpenMS/DATASTRUCTURES/SuffixArrayTrypticCompressed.h>
This class implements a suffix array. It can just be used for finding peptide Candidates for a given MS Spectrum within a certain mass tolerance. The suffix array can be saved to disc for reused so it has to be build just once. The suffix array consits of a vector of pair of ints for every suffix, a vector of LCP values and a so called skip vector. Only the sufices that are matching the function isDigestingEnd are created. Besides a suffix will not reach till the end of the string but till the next occurence of the seperator ($). So only the interessting sufices will be saved. This will reduce the used space.
Public Member Functions | |
SuffixArrayTrypticCompressed (const String &st, const String &sa_file_name) throw (Exception::InvalidValue, Exception::FileNotFound) | |
constructor taking the string and the filename for writing or reading | |
SuffixArrayTrypticCompressed (const SuffixArrayTrypticCompressed &sa) | |
copy constructor | |
virtual | ~SuffixArrayTrypticCompressed () |
destructor | |
String | toString () |
transforms suffix array to a printable String | |
void | findSpec (std::vector< std::vector< std::pair< std::pair< int, int >, float > > > &candidates, const std::vector< double > &spec) throw (Exception::InvalidValue) |
the function that will find all peptide candidates for a given spectrum | |
bool | save (const String &file_name) throw (Exception::UnableToCreateFile) |
saves the suffix array to disc | |
bool | open (const String &file_name) throw (Exception::FileNotFound) |
opens the suffix array | |
void | setTolerance (double t) throw (Exception::InvalidValue) |
setter for tolerance | |
double | getTolerance () const |
getter for tolerance | |
bool | isDigestingEnd (const char aa1, const char aa2) const |
returns if an enzyme will cut after first character | |
void | setTags (const std::vector< String > &tags) throw (Exception::InvalidValue) |
setter for tags | |
const std::vector< String > & | getTags () |
getter for tags | |
void | setUseTags (bool use_tags) |
setter for use_tags | |
bool | getUseTags () |
getter for use_tags | |
void | setNumberOfModifications (unsigned int number_of_mods) |
setter for number of modifications | |
unsigned int | getNumberOfModifications () |
getter for number of modifications | |
void | printStatistic () |
output for statistic | |
Protected Member Functions | |
SuffixArrayTrypticCompressed () | |
constructor | |
int | getNextSep_ (const int p) const |
gets the index of the next sperator for a given index | |
int | getLCP_ (const std::pair< int, int > &last_point, const std::pair< int, int > ¤t_point) |
gets the lcp for two strings described as pairs of ints | |
int | findFirst_ (const std::vector< double > &spec, double &m) |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. | |
int | findFirst_ (const std::vector< double > &spec, double &m, int start, int end) |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly. | |
void | parseTree_ (int start_index, int stop_index, int depth, int walked_in, int edge_len, std::vector< std::pair< int, int > > &out_number, std::vector< std::pair< int, int > > &edge_length, std::vector< int > &leafe_depth) |
treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm. | |
bool | hasMoreOutgoings_ (int start_index, int stop_index, int walked_in) |
indicates if a node during traversal has more outgoings | |
Protected Attributes | |
const String & | s_ |
the string with which the suffix array is build | |
double | tol_ |
mass tolerance for finding candidates | |
std::vector< std::pair< int, int > > | indices_ |
vector of pairs of ints describing all relevant sufices | |
std::vector< int > | lcp_ |
vector of ints with lcp values | |
std::vector< int > | skip_ |
vector of ints with skip values | |
double | masse_ [256] |
mass table | |
int | number_of_modifications_ |
number of allowed modifications | |
std::vector< String > | tags_ |
all given tags | |
bool | use_tags_ |
indicates whether tags are used or not | |
int | progress_ |
SuffixArrayTrypticCompressed | ( | const String & | st, | |
const String & | sa_file_name | |||
) | throw (Exception::InvalidValue, Exception::FileNotFound) |
constructor taking the string and the filename for writing or reading
st | the string as const reference with which the suffix array will be build | |
saFileName | the filename for writing or reading the suffix array |
Exception::InvalidValue | if string does not start with empty string ($) |
SuffixArrayTrypticCompressed | ( | const SuffixArrayTrypticCompressed & | sa | ) |
copy constructor
virtual ~SuffixArrayTrypticCompressed | ( | ) | [virtual] |
destructor
SuffixArrayTrypticCompressed | ( | ) | [protected] |
constructor
String toString | ( | ) | [virtual] |
void findSpec | ( | std::vector< std::vector< std::pair< std::pair< int, int >, float > > > & | candidates, | |
const std::vector< double > & | spec | |||
) | throw (Exception::InvalidValue) [virtual] |
the function that will find all peptide candidates for a given spectrum
spec | const reference of double vector describing the spectrum |
Exception::InvalidValue | if the spectrum is not sorted ascendingly |
Implements SuffixArray.
bool save | ( | const String & | file_name | ) | throw (Exception::UnableToCreateFile) [virtual] |
saves the suffix array to disc
filename | const reference string describing the filename |
Exception::UnableToCreateFile | if file could not be created (e.x. if you have no rigths) |
Implements SuffixArray.
bool open | ( | const String & | file_name | ) | throw (Exception::FileNotFound) [virtual] |
opens the suffix array
filename | const reference string describing the filename |
Exception::FileNotFound |
Implements SuffixArray.
void setTolerance | ( | double | t | ) | throw (Exception::InvalidValue) [virtual] |
setter for tolerance
t | double with tolerance |
Exception::InvalidValue | if tolerance is negative |
Implements SuffixArray.
double getTolerance | ( | ) | const [virtual] |
bool isDigestingEnd | ( | const char | aa1, | |
const char | aa2 | |||
) | const [virtual] |
returns if an enzyme will cut after first character
aa1 | const char as first aminoacid | |
aa2 | const char as second aminoacid |
Implements SuffixArray.
void setTags | ( | const std::vector< String > & | tags | ) | throw (Exception::InvalidValue) [virtual] |
setter for tags
tags | const vector of strings with tags with length 3 each |
Exception::InvalidValue | if at least one tag does not have size of 3 |
Implements SuffixArray.
const std::vector<String>& getTags | ( | ) | [virtual] |
void setUseTags | ( | bool | use_tags | ) | [virtual] |
setter for use_tags
use_tags | indicating whether tags should be used or not |
Implements SuffixArray.
bool getUseTags | ( | ) | [virtual] |
void setNumberOfModifications | ( | unsigned int | number_of_mods | ) | [virtual] |
unsigned int getNumberOfModifications | ( | ) | [virtual] |
getter for number of modifications
Implements SuffixArray.
void printStatistic | ( | ) | [virtual] |
int getNextSep_ | ( | const int | p | ) | const [protected] |
gets the index of the next sperator for a given index
p | const int describing a position in the string |
int getLCP_ | ( | const std::pair< int, int > & | last_point, | |
const std::pair< int, int > & | current_point | |||
) | [protected] |
gets the lcp for two strings described as pairs of ints
last_point | const pair of ints describing a substring | |
current_point | const pair of ints describing a substring |
int findFirst_ | ( | const std::vector< double > & | spec, | |
double & | m | |||
) | [protected] |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance.
spec | const reference to spectrum | |
m | mass |
int findFirst_ | ( | const std::vector< double > & | spec, | |
double & | m, | |||
int | start, | |||
int | end | |||
) | [protected] |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly.
spec | const reference to spectrum | |
m | mass | |
start | start index | |
end | end index |
void parseTree_ | ( | int | start_index, | |
int | stop_index, | |||
int | depth, | |||
int | walked_in, | |||
int | edge_len, | |||
std::vector< std::pair< int, int > > & | out_number, | |||
std::vector< std::pair< int, int > > & | edge_length, | |||
std::vector< int > & | leafe_depth | |||
) | [protected] |
treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm.
start_index | int describing the start index in indices_ vector | |
stop_index | int describing the end index in indices_ vector | |
depth | at with depth the traversion is at the actual position | |
walked_in | how many characters we have seen from root to actual position | |
edge_len | how many characters we have seen from last node to actual position | |
out_number | reference to vector of pairs of ints. For every node it will be filled with how many outgoing edge a node has in dependece of its depth | |
edge_length | will be filled with the edge_length in dependence of its depth | |
leafe_depth | will be filled with the depth of every leafe |
bool hasMoreOutgoings_ | ( | int | start_index, | |
int | stop_index, | |||
int | walked_in | |||
) | [protected] |
indicates if a node during traversal has more outgoings
start_index | int describing the start index in indices_ vector | |
stop_index | int describing the end index in indices_ vector | |
walked_in | how many characters we have seen from root to actual position |
double tol_ [protected] |
mass tolerance for finding candidates
std::vector<std::pair<int,int> > indices_ [protected] |
vector of pairs of ints describing all relevant sufices
std::vector<int> lcp_ [protected] |
vector of ints with lcp values
std::vector<int> skip_ [protected] |
vector of ints with skip values
double masse_[256] [protected] |
mass table
int number_of_modifications_ [protected] |
number of allowed modifications
bool use_tags_ [protected] |
indicates whether tags are used or not
int progress_ [protected] |
Generated Tue Apr 1 15:36:44 2008 -- using doxygen 1.5.4 | OpenMS / TOPP 1.1 |