All OpenMS files use a tab width of two. Use the command set tabstop=2
in vi
or set-variable tab-width 2
if you are using emacs
. For those two editors, the indentation behavior should be set automatically throught the standard file headers (see below). Due to these ugly issues with setting the tabwidth in the editor, it is perfectly ok not to use tabs at all. In emacs, you can replace all tabs with the right number of spaces by typing the following keys: C-x h
(to mark the whole buffer), then M-x untabify RET
.
All lines are terminated UNIX-type by a LF character. If you have accidentally inserted CRLF or CR line endings when edting on a Windows or Mac system, you can fix that later on with the dos2unix
command. Using a consistent style for line endings is important to make commands like svn praise
helpful.
Matching pairs of opening and closing curly braces should be set to the same column:
while (continue == true) { for (int i = 0; i < 10; i++) { ... } if (x < 7) { .... } }
if (isValid(a)) return 0;
which might later be changed to something like
if (isValid(a)) error = 0; return 0;
The resulting errors are hard to find. There are two ways to avoid these problems: (a) always use braces around a block (b) write everyting in a single line. We recommend method (a). However, this is mainly a question of personal style, so no explicit checking is performed to enforce this rule.
// -*- Mode: C++; tab-width: 2; -*- // vi: set ts=2: // ... copyright header, not shown ... // // -------------------------------------------------------------------------- // $Maintainer: Heinz Erhardt $ // -------------------------------------------------------------------------- #ifndef OPENMS_KERNEL_DPEAK_H #define OPENMS_KERNEL_DPEAK_H #include <OpenMS/CONCEPT/Types.h> #include <functional> #include <sstream> namespace OpenMS { ... the actual code goes here ... } // namespace OpenMS #endif // OPENMS_KERNEL_DPEAK_H
// -*- Mode: C++; tab-width: 2; -*- // vi: set ts=2: // ... copyright header, not shown ... // // -------------------------------------------------------------------------- // $Maintainer: Heinz Erhardt $ // -------------------------------------------------------------------------- #include <OpenMS/KERNEL/DPeak.h> namespace OpenMS { ... the actual code goes here ... } // namespace OpenMS
.h
file must be accompanied by a .C
file, even if is just a ``dummy''. This way a global make
will stumble across errors. .C
file. The varaible names of these instances start with default_
. Here an example for the DPeak
class: #include <OpenMS/KERNEL/DPeak.h> namespace OpenMS { DPeak<1> default_dpeak_1; DPeak<2> default_dpeak_2; }
Simply speaking, _impl.h files are for templates what .C files are for ordinary classes. Remember that the definition of a class or function template has to be known at its point of instantiation. Therefore the implementation of a template is normally contained in the .h file. (No problem so far, things are even easier than for ordinary classes, because declaration and definition are given in the same file. You may like this or not.) Things get more complicated when certain design patterns (e.g., the factory pattern) are used which lead to "circular dependencies". Of course this is only a dependency of names, but it has to be resolved by separating declarations from definitions, at least for some of the member functions. In this case, a .h file can be written that contains most of the definitions as well as the declarations of the peculiar functions. Their definition is deferred to the _impl.h file ("impl" for "implementation"). The _impl.h file is included only if the peculiar member functions have to be instantiated. Otherwise the .h file should be sufficent. No .h file should include an _impl.h file.
Each OpenMS class should provide the following interface:
class Test { public: // default constructor Test(); // copy constructor Test(const Test& test); // destructor virtual ~Test(); // assignment operator Test& operator = (const Test& test) { //ALWAYS CHECK FOR SELF ASSIGNEMT! if (this == &test) return *this; //... return *this; } };
There are however circumstances that allow to omit these methods:
operator delete
invocation on a pointer to a base class will fail badly.
OpenMS uses its own type names for primitive types. Use only the types defined in OpenMS/include/OpenMS/CONCEPT/Types.h
!
The main OpenMS classes are implemented in the namespace OpenMS
. Auxilary classes are implemented in OpenMSInternal
. There are some other namespaces e.g. for constants and exceptions.
Accessors to protected or private members of a class are implemented as a pair of get-method and set-method. This is necessary as accessors that return mutable references to a member cannot be wrapped with Python!
class Test { public: // always implement a non-mutable get-method const UInt& getMember() const { return member_; } // always implement a set-method void setMember(const UInt& name) { name_ = member_; } protected: UInt member_; };
For members that are too large to be read with the get-method, modified and written back with the set-method, an additional non-const get-method can be implemented!
For primitive types a non-const get-method is strictly forbidden! For more complex types it should be present only when really necessary!
class Test { public: const vector<String>& getMember() const { return member_; } void setMember(const vector<String>& name) { name_ = member_; } // if absolutely necessary implement a mutable get-method vector<String>& getMember() { return member_; } protected: vector<String> member_; };
Many OpenMS classes base on STL classes. However, only the C++ Standard Library part of the STL must be used. This means that SGI extensions like hash_set
, hash_multiset
, hash_map
and hash_multimap
are not allowed!
No OpenMS program should dump a core if an error occurs. Instead, it should attempt to die as gracefully as possible. Furthermore, as OpenMS is a framework rather than an application, it should give the programmer ways to catch and correct errors. The recommended procedure to handle - even fatal - errors is to throw an exception. Uncaught exception will result in a call to abort
thereby terminating the program.
All exceptions used in OpenMS are derived from Exception::Base
defined in CONCEPT/Exception.h
. A default constructor should not be implemented for these exceptions. Instead, the constructor of all derived exceptions should have the following signature:
AnyException(const char* file, int line, const char* function[, ...]);
IndexOverflow
for an example).
The throw
directive for each exception should be of the form
throw AnyException(__FILE__, __LINE__, __PRETTY_FUNCTION__);
to simplify debugging. __FILE__
and __LINE__
are standard-defined preprocessor macros. The symbol __PRETTY_FUNCTION__
works similar to a char*
and contains the type signature of the function as well as its bare name, if the GNU compiler is being used. It is defined to <unknown>
on other platforms. Exception::Base provides methods (getFile
, getLine
, getFunction
) that allow the localization of the exception's cause.
As usual with C++, the standard way to catch an exeption should be by reference (and not by value).
try { // some code which might throw } catch ( Exception & e) { // Handle the exception, then possibly re-throw it: // throw; // the modified e }
Reserved words of the C++ language and symbols defined e. g. in the STL or in the standard C library must not be used as names for classes or class members. Even if the compiler accepts it, such words typically mess up the syntax highlighting and are confusing for other developers, to say the least. Bad examples include: set, map, exp, log. (All developers: Add your favorites to this list whenever you stumble upon them!)
Header files and source files should be named as the classes they contain. Source files end in ".C", while header files end in ".h". File names should be capitalized exactly as the class they contain (see below). Each header/source file should contain one class only, although exceptions are possible for light-weight classes.
Usage of underscores in names has two different meanings: A trailing ``_'' at the end indicates that something is protected or private to a class. Apart from that, different parts of a name are sometimes separated by an underscore, and sometimes separated by capital letters. (The details are explained below.)
Note that according to the C++ standard, names that start with an underscore are reserved for internal purposes of the language and its standard library (roughly speaking), so you should never use them.
Class names and type names always start with a capital letter. Different parts of the name are separated by capital letters at the beginning of the word. No underscores are allowed in type names and class names, except for the names of protected types and classes in classes, which are suffixed by an underscore. The same conventions apply for namespace
s.
class Simple; //ordinary class class SimpleThing; //ordinary class class PDBFile; //using an abbreviation class Buffer_; //protected or private nested class class ForwardIteratorTraits_; //protected or private nested class
Variable names are all lower case letters. Distinguished parts of the name are separated using underscores ``_
''. If parts of the name are derived from common acronyms (e.g. MS) they should be in upper case. Private or protected member variables of classes are suffixed by an underscore.
int simple; //ordinary variable bool is_found; //ordinary variable string MS_instrument; //using an abbreviation int counter_; //protected or private member int persistent_id_; //protected or private member
No prefixing or suffixing is allowed to identify the variable type - this leads to completely illegible documentation and overly long variable names.
Function names (including class method names) always start with a lower case letter. Parts of the name are separated using capital letters (as are types and class names). They should be comprehensible, but as short as possible. The same variable names must be used in the declaration and in the definition. Arguments that are actually not used in the implementation of a function have to be commented out - this avoids compiler warnings. The argument of void
functions (empty argument list) must omitted in both the declaration and the definition. If function arguments are pointers or references, the pointer or reference qualifier is appended to the variable type. It should not prefix the variable name.
void hello(); //ordinary function, no arguments int countPeaks(PeakArray const& p); //ordinary function bool ignore(string& /* name */); //ordinary function with an unused argument bool isAdjacentTo(Peak const * const * const & p) const; //an ordinary function bool doSomething_(int i, string& name); //protected or private member function
Enumerated values and preprocessor constants are all upper case letters. Parts of the name are separated by underscores.
#define MYCLASS_SUPPORTS_MIN_MAX 0 //preprocessor constant enum DimensionId { DIM_MZ = 0, DIM_RT = 1 }; //enumerated values enum DimensionId_ { MZ = 0, RT = 1 }; //enumerated values
(You should avoid using the preprocessor anyway. Normally, const
and enum
will suffice unless something very special.)
Parameters in .ini files and elsewhere follow these conventions:
To generate UML diagrams use yEd and export the diagrams in png format. Do not forget to save also the corresponding .yed file.
Each OpenMS class has to be documented using Doxygen. The documentation is inserted in Doxygen format in the header file where the class is defined. Documentation includes the description of the class, of each method, type declaration, enum declaration, each constant, and each member variable.
Longer pieces of documentation start with a brief description, followed by an empty line and a detailed description. The empty line is needed to separate the brief from the detailed description.
Descriptions of classes always have a brief section!
Please use the doxygen style of the following example for OpenMS:
/** @defgroup DummyClasses Dummy classes @brief This class contains dummy classes Add classes by using the '@ingroup' command. */ /** @brief Demonstration class. A demonstration class for teaching doxygen @note All classes need brief description! @ingroup DummyClasses */ class Test { public: /** @brief An enum type. The documentation block cannot be put after the enum! */ enum EnumType { int EVal1, ///< Enum value 1. int EVal2 ///< Enum value 2. }; /** @brief constructor. A more elaborate description of the constructor. */ Test(); /** @brief Dummy function. A normal member taking two arguments and returning an integer value. The parameter @p dummy_a is an integer. @param dummy_a an integer argument. @param dummy_s a constant character pointer. @see Test() @return The dummy results. */ int dummy(int dummy_a, const char *dummy_s); /// Brief description in one line. int isDummy(); /** @name Group of members. Description of the group. */ //@{ /// Dummy 2. void dummy2(); /// Dummy 3. void dummy3(); //@} protected: int value; ///< An integer value. };
The defgroup command indicates that a comment block contains documentation for a group of classes, files or namespaces. This can be used to categorize classes, files or namespaces, and document those categories. You can also use groups as members of other groups, thus building a hierarchy of groups. Using the ingroup command a comment block of a class, file or namespace will be added to the group or groups.
The groups (or modules as doxygen calls them) definded by the ingroup command should contain only the classes of special interest to the OpenMS user. Helper classes and such must be omitted.
Documentation which does not belong to a specific .C or .h file can be written into a separate Doxygen file (with the ending .doxygen). This file will also be parsed by Doxygen.
Open tasks are noted in the documentation of a header or a group using the todo command. The ToDo list is then shown in the doxygen menu under 'Related pages'. Each ToDo should be followed by a name in parentheses to indicated who is going to handle it.
These commands should be used as well:
Doxygen is not hard to learn, have a look at the manual :-)
The code for each .C file has to be commented. Each piece of code in OpenMS has to contain at least 5% of comments. The use of
// Comment text
/* Comment text */
Instructive programming examples can be provided in the source/EXAMPLES
directory.
OpenMS uses Subversion to manage different versions of the source files. For easier identification of the responsible person each OpenMS file contains the $Maintainer:$
string in the preamble.
Examples of .h
and .C
files have been given above. In non-C++ files (Makefiles, (La)TeX-Files, etc.) the C++ comments are replaced by the respective comment characters (e.g. ``#'' for Makefiles, ``'' for (La)TeX). TeX will switch to math mode after a $
, but you can work around this by writing something like
Latest SVN $ $Date:$ $
if you want to use it in texts; the one here expands to ``Latest SVN Date: 2007-01-19 13:47:36 +0100 (Fri, 19 Jan 2007) ''. Subversion does not turn on keyword substitution by default. See svn -h propset
and svn -h proplist
for details.
Each OpenMS class has to provide a test program. This test program has to check each method of the class. The test programs reside in the directory source/TEST
are usually named <classname>_test.C
. The test program has to be coded using the class test macros as described in the OpenMS online reference. Special care should be taken to cover all special cases (e.g. what happens, if a method is called with empty strings, negative values, zero, null pointers etc.). Please activate the keyword substitution of '$Id$' for all tests with the following command: svn propset svn:keywords Id <file>
.
If a test needs suplementary files, put these files in the source/TEST/data/
folder. The name of suplementary files has to begin with the name of the tested class.
START_TEST(class_name, version)
END_TEST
CHECK(name)
RESULT
STATUS(message)
ABORT_IF(condition)
TEST_EQUAL(a, b)
TEST_NOT_EQUAL(a, b)
TEST_REAL_EQUAL(a, b)
TEST_STRING_EQUAL(a, b)
a
and b
are equal as strings PRECISION(double)
TEST_EXCEPTION(exception, expression)
TEST_EXCEPTION_WITH_MESSAGE(exception, expression, message)
TEST_FILE(file, template_file)
Do not use methods with side-effects inside the comparison macros i.e. *(it++). The expressions in the macro are called serveral times, so the side-effect is triggered several times as well.
You might want to create temporary files during the tests. The following macro puts a temporary filename into the string argument. The file is automatically deleted after the test.
There are also some PHP tools for testing other tasks in the source/config/tools/
directory. See source/config/tools/README.txt
for details!
The abbreviation TOPP stands for The OpenMS Proteomics Pipeline, a collection of tools based upon the C++ classes in OpenMS. The TOPP tools are located in source/APPLICATIONS/TOPP
.
If a test needs suplementary files, put these files in the same folder. The name of suplementary files has to begin with the name of the tested tool. All extensions but .tmp
, .output
or .log
are possible.
All output files that can be deleted after the test must end with .tmp
. All files with that extension are deleted after the test, unless in debug mode.
The tests for TOPP programs are located in source/TEST/TOPP
. The Makefile
provides the following main targets:
default
make target. Each tool has its own make target, e.g. to test TOPPTool
you can run make TOPPTool_test
. The test targets are listed in the variables TOPP_TOOL_TESTS
(for each tool individually) and TOPP_PIPELINE_TESTS
(for pipelines composed out of several tools). Note that these sub-targets should be maintained by the same maintainers as the tools themselves!
Each tool test will create a couple of .tmp
files. Normally, these are deleted automatically. You can set DEBUG=1
to avoid this. make debug
is equivalent to make DEBUG=1 default
. You can run any test in debug mode, e.g. make DEBUG=1 TOPPTool_test
.
Normally, all output written to stdout
and stderr
by the tools is redirected to /dev/null
. You can set VERBOSE=1
to avoid this. make verbose
is equivalent to make VERBOSE=1 default
. You can run any test in verbose mode, e.g. make VERBOSE=1 TOPPTool_test
.
make debug
or any other test. In order to make this target effective, it is important that all files generated by your tests will (finally) have a suffix .tmp
, .log
(e.g. TOPP.log
), or .rounded_tmp
(explained below). Of course you can rename them using $(MV)
to achieve this.
The actual tests are written using (1.) macros which are defined by configure
in config_defs.mak
and (2.) some special functions using substitution capabilities of the (GNU) make program itself.
echo
, diff
, mv
, cp
, rm
directly - always use their 'uppercase' counterparts $(ECHO)
, $(DIFF)
, $(MV)
, $(CP)
, $(RM)
provided by configure
. Doing so is necessary for portability.
$(call RUN_PROG_OPT,TOPPTool,options)
Be careful not to insert any whitespace around the arguments of call
. Depending on the VERBOSE
settings, this macro is expanded in different ways by make
program.
$(call TEST_FILE_EQUAL,correct.xtn)
or
$(call TEST_FILE_EQUAL,computed.tmp,correct.xtn)
This will compare two files using a small diff-like application called NumericDiff
(explained in the next section). In the first form, the 'other' file defaults to correct.tmp
, that is, the basename of the second argument (correct.xtn
) extended by .tmp
. Be careful not to insert any whitespace around the arguments of call
. (Using the "call" syntax provides a point of customization. Depending on the DEBUG
settings, this macro is expanded in different ways by make
program.)
The TOPP tests will be run on 32 bit and 64 bit platforms. Therefore a purely character-based comparison of computed and expected result files might fail although the results are in fact numerically correct - think of cases like 9.999e+3
vs. 1.0001e+4
. Instead we provide a small program NumericDiff
in source/TEST/TOPP
. This program steps through both inputs simultaneously and classifies each position into 3 categories: numbers, characters, whitespace. Within each line of input, numbers are compared with respect to their ratio (i.e., relative error), characters must match exactly (e.g. case is significant) and all whitespace is considered equal. Empty lines or lines containing only whitespace are skipped, but extra linebreaks 'within' lines will result in error messages. For more details and verbosity options, see the built-in help message and the source code.
Each test relies on a number of files. These file should be named source/TEST/TOPP/<toolname>_<nummer>_<name>.<extension>
, where
<toolname>
has the form [A-Z][a-zA-Z]*
; this is the name of the TOPP tool <number>
has the form [0-9]+
; this is the running number of the test <name>
has the form [-_a-zA-Z0-9]+
; this should be a descriptive name (characters _
and -
are ok here, since <toolname>
and <number>
must not contain them) <extension>
; this is the extension expressing the type of the data.
Yes. Testing is crucial to verify the correctness of the library - especially when using C++. But why has it to be so complicated, using all these macros and stuff? One of the biggest problems when building large class frameworks is portability. C++ compilers are strange beasts and there is not a single one that accepts the same code as any other compiler. Since one of the main concerns of OpenMS is portability, we have to ensure that every single line of code compiles on all platforms. Due to the long compilation times and the (hopefully in future) large number of different platforms, tests to verify the correct behaviour of all classes have to be carried out automatically. This implies a well defined interface for all tests, which is the reason for all these strange macros. This fixed format also enforces the writing of complete class tests. Usually a programmer writes a few lines of code to test the parts of the code he wrote for correctness. Of the methods tested after the introduction of the test macros, about a tenth of all functions/methods showed severe errors or after thorough testing. Most of these errors didn't occur an all platforms or didn't show up on trivial input.
Writing tests for each method of a class also ensures that each line is compiled. When using class templates the compiler only compiles the methods called. Thus it is possible that a code segment contains syntactical errors but the compiler accepts the code happily - he simply ignores most of the code. This is quickly discovered in a complete test of all methods. The same is true for configuration dependend preprocessor directives that stem from platform dependencies. Often untested code also hides inside the const
version of a method, when there is a non-const method with the same name and arguments (for example most of the getName
) methods in OpenMS. In most cases, the non-const version is preferred by the compiler and it is usually not clear to the user which version is taken. Again, explicit testing of each single method provides help for this problem. The ideal method to tackle the problem of untested code is the complete coverage analysis of a class. Unfortunately this is only supported for very few compilers, so it is not used for testing OpenMS.
One last point: writing the test program is a wonderful opportunity to verify and complete the documentation! Often enough implementation details are not clear at the time the documentation is written. A lot of side effects or special cases that were added later do not appear in the documentation. Going through the documentation and the implementation in parallel is the best way to verify the documentation for consistence and (strange coincidence?!) the best way to implement a test program, too!
Generated Tue Apr 1 15:36:40 2008 -- using doxygen 1.5.4 | OpenMS / TOPP 1.1 |