Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages

Coding conventions

Revision
2734

Formatting

All OpenMS files use a tab width of two. Use the command set tabstop=2 in vi or set-variable tab-width 2 if you are using emacs. For those two editors, the indentation behavior should be set automatically throught the standard file headers (see below). Due to these ugly issues with setting the tabwidth in the editor, it is perfectly ok not to use tabs at all. In emacs, you can replace all tabs with the right number of spaces by typing the following keys: C-x h (to mark the whole buffer), then M-x untabify RET.

All lines are terminated UNIX-type by a LF character. If you have accidentally inserted CRLF or CR line endings when edting on a Windows or Mac system, you can fix that later on with the dos2unix command. Using a consistent style for line endings is important to make commands like svn praise helpful.

Matching pairs of opening and closing curly braces should be set to the same column:

while (continue == true)
{
  for (int i = 0; i < 10; i++)
  {
    ...
  }

  if (x < 7)
  {
    ....
  }
}
The main reason for this rule is to avoid constructions like:

if (isValid(a))
  return 0;

which might later be changed to something like

if (isValid(a))
  error = 0;
  return 0;

The resulting errors are hard to find. There are two ways to avoid these problems: (a) always use braces around a block (b) write everyting in a single line. We recommend method (a). However, this is mainly a question of personal style, so no explicit checking is performed to enforce this rule.

Sample .h file

// -*- Mode: C++; tab-width: 2; -*-
// vi: set ts=2:
//
... copyright header, not shown ...
//
// --------------------------------------------------------------------------
// $Maintainer: Heinz Erhardt $
// --------------------------------------------------------------------------

#ifndef OPENMS_KERNEL_DPEAK_H
#define OPENMS_KERNEL_DPEAK_H

#include <OpenMS/CONCEPT/Types.h>

#include <functional>
#include <sstream>

namespace OpenMS
{
   ... the actual code goes here ...
} // namespace OpenMS

#endif // OPENMS_KERNEL_DPEAK_H

Sample .C file

// -*- Mode: C++; tab-width: 2; -*-
// vi: set ts=2:
//
... copyright header, not shown ...
//
// --------------------------------------------------------------------------
// $Maintainer: Heinz Erhardt $
// --------------------------------------------------------------------------

#include <OpenMS/KERNEL/DPeak.h>

namespace OpenMS
{
   ... the actual code goes here ...
} // namespace OpenMS
Every .h file must be accompanied by a .C file, even if is just a ``dummy''. This way a global make will stumble across errors.
For template classes default instances with common template arguments should be put into the .C file. The varaible names of these instances start with default_. Here an example for the DPeak class:
#include <OpenMS/KERNEL/DPeak.h>

namespace OpenMS
{
  DPeak<1> default_dpeak_1;
  DPeak<2> default_dpeak_2;
}
The compiler does instanciate the template and detects errors at compile time that way. Doing this saves your time! Otherwise the error is detected much later, when the test is compiles.

A note on templates: when (and why) should I write an _impl.h file?

Simply speaking, _impl.h files are for templates what .C files are for ordinary classes. Remember that the definition of a class or function template has to be known at its point of instantiation. Therefore the implementation of a template is normally contained in the .h file. (No problem so far, things are even easier than for ordinary classes, because declaration and definition are given in the same file. You may like this or not.) Things get more complicated when certain design patterns (e.g., the factory pattern) are used which lead to "circular dependencies". Of course this is only a dependency of names, but it has to be resolved by separating declarations from definitions, at least for some of the member functions. In this case, a .h file can be written that contains most of the definitions as well as the declarations of the peculiar functions. Their definition is deferred to the _impl.h file ("impl" for "implementation"). The _impl.h file is included only if the peculiar member functions have to be instantiated. Otherwise the .h file should be sufficent. No .h file should include an _impl.h file.

Class requirements

Each OpenMS class should provide the following interface:

class Test
{
  public:
    
    // default constructor
    Test();

    // copy constructor
    Test(const Test& test);

    // destructor
    virtual ~Test();

    // assignment operator
    Test& operator = (const Test& test)
    {
      //ALWAYS CHECK FOR SELF ASSIGNEMT!
      if (this == &test) return *this;
      //...
      return *this;
    }
};

There are however circumstances that allow to omit these methods:

General rules

Primitive types

OpenMS uses its own type names for primitive types. Use only the types defined in OpenMS/include/OpenMS/CONCEPT/Types.h!

Namespaces

The main OpenMS classes are implemented in the namespace OpenMS. Auxilary classes are implemented in OpenMSInternal. There are some other namespaces e.g. for constants and exceptions.

Accessors to members

Accessors to protected or private members of a class are implemented as a pair of get-method and set-method. This is necessary as accessors that return mutable references to a member cannot be wrapped with Python!

class Test
{
  public:
    // always implement a non-mutable get-method
    const UInt& getMember() const
    {
      return member_;
    }
    
    // always implement a set-method
    void setMember(const UInt& name)
    {
      name_ = member_;
    }
    
  protected:
    UInt member_;
};

For members that are too large to be read with the get-method, modified and written back with the set-method, an additional non-const get-method can be implemented!

For primitive types a non-const get-method is strictly forbidden! For more complex types it should be present only when really necessary!

class Test
{
  public:
    const vector<String>& getMember() const
    {
      return member_;
    }
    
    void setMember(const vector<String>& name)
    {
      name_ = member_;
    }
    
    // if absolutely necessary implement a mutable get-method
    vector<String>& getMember()
    {
      return member_;
    }

  protected:
    vector<String> member_;
};

Use of the STL

Many OpenMS classes base on STL classes. However, only the C++ Standard Library part of the STL must be used. This means that SGI extensions like hash_set, hash_multiset, hash_map and hash_multimap are not allowed!

Exception handling

No OpenMS program should dump a core if an error occurs. Instead, it should attempt to die as gracefully as possible. Furthermore, as OpenMS is a framework rather than an application, it should give the programmer ways to catch and correct errors. The recommended procedure to handle - even fatal - errors is to throw an exception. Uncaught exception will result in a call to abort thereby terminating the program.

Exception classes

All exceptions used in OpenMS are derived from Exception::Base defined in CONCEPT/Exception.h. A default constructor should not be implemented for these exceptions. Instead, the constructor of all derived exceptions should have the following signature:

  AnyException(const char* file, int line, const char* function[, ...]);
Additional arguments are possible but should provide default values (see IndexOverflow for an example).

Throwing exceptions

The throw directive for each exception should be of the form

  throw AnyException(__FILE__, __LINE__, __PRETTY_FUNCTION__);

to simplify debugging. __FILE__ and __LINE__ are standard-defined preprocessor macros. The symbol __PRETTY_FUNCTION__ works similar to a char* and contains the type signature of the function as well as its bare name, if the GNU compiler is being used. It is defined to <unknown> on other platforms. Exception::Base provides methods (getFile, getLine, getFunction) that allow the localization of the exception's cause.

Catching exceptions

As usual with C++, the standard way to catch an exeption should be by reference (and not by value).

  try
  {
    // some code which might throw
  }
  catch ( Exception & e)
  {
    // Handle the exception, then possibly re-throw it:
    // throw;  // the modified e
  }

Naming conventions

Reserved words of the C++ language and symbols defined e. g. in the STL or in the standard C library must not be used as names for classes or class members. Even if the compiler accepts it, such words typically mess up the syntax highlighting and are confusing for other developers, to say the least. Bad examples include: set, map, exp, log. (All developers: Add your favorites to this list whenever you stumble upon them!)

File names

Header files and source files should be named as the classes they contain. Source files end in ".C", while header files end in ".h". File names should be capitalized exactly as the class they contain (see below). Each header/source file should contain one class only, although exceptions are possible for light-weight classes.

Underscores

Usage of underscores in names has two different meanings: A trailing ``_'' at the end indicates that something is protected or private to a class. Apart from that, different parts of a name are sometimes separated by an underscore, and sometimes separated by capital letters. (The details are explained below.)

Note that according to the C++ standard, names that start with an underscore are reserved for internal purposes of the language and its standard library (roughly speaking), so you should never use them.

Class / type / namespace names

Class names and type names always start with a capital letter. Different parts of the name are separated by capital letters at the beginning of the word. No underscores are allowed in type names and class names, except for the names of protected types and classes in classes, which are suffixed by an underscore. The same conventions apply for namespaces.

class Simple; //ordinary class
class SimpleThing; //ordinary class
class PDBFile; //using an abbreviation
class Buffer_; //protected or private nested class
class ForwardIteratorTraits_; //protected or private nested class

Variable names

Variable names are all lower case letters. Distinguished parts of the name are separated using underscores ``_''. If parts of the name are derived from common acronyms (e.g. MS) they should be in upper case. Private or protected member variables of classes are suffixed by an underscore.

int simple; //ordinary variable
bool is_found; //ordinary variable
string MS_instrument; //using an abbreviation
int counter_; //protected or private member
int persistent_id_; //protected or private member

No prefixing or suffixing is allowed to identify the variable type - this leads to completely illegible documentation and overly long variable names.

Function names/method names

Function names (including class method names) always start with a lower case letter. Parts of the name are separated using capital letters (as are types and class names). They should be comprehensible, but as short as possible. The same variable names must be used in the declaration and in the definition. Arguments that are actually not used in the implementation of a function have to be commented out - this avoids compiler warnings. The argument of void functions (empty argument list) must omitted in both the declaration and the definition. If function arguments are pointers or references, the pointer or reference qualifier is appended to the variable type. It should not prefix the variable name.

void hello(); //ordinary function, no arguments
int countPeaks(PeakArray const& p); //ordinary function
bool ignore(string& /* name */); //ordinary function with an unused argument
bool isAdjacentTo(Peak const * const * const & p) const; //an ordinary function
bool doSomething_(int i, string& name); //protected or private member function

Enums and preprocessor constants

Enumerated values and preprocessor constants are all upper case letters. Parts of the name are separated by underscores.

#define MYCLASS_SUPPORTS_MIN_MAX 0 //preprocessor constant
enum DimensionId { DIM_MZ = 0, DIM_RT = 1 }; //enumerated values
enum DimensionId_ { MZ = 0, RT = 1 }; //enumerated values

(You should avoid using the preprocessor anyway. Normally, const and enum will suffice unless something very special.)

Parameters

Parameters in .ini files and elsewhere follow these conventions:

This rule applies to all kinds of parameter strings, both keys and string-values.

Documentation

UML diagrams

To generate UML diagrams use yEd and export the diagrams in png format. Do not forget to save also the corresponding .yed file.

Doxygen

Each OpenMS class has to be documented using Doxygen. The documentation is inserted in Doxygen format in the header file where the class is defined. Documentation includes the description of the class, of each method, type declaration, enum declaration, each constant, and each member variable.

Longer pieces of documentation start with a brief description, followed by an empty line and a detailed description. The empty line is needed to separate the brief from the detailed description.

Descriptions of classes always have a brief section!

Please use the doxygen style of the following example for OpenMS:

/**
  @defgroup DummyClasses Dummy classes

  @brief This class contains dummy classes

  Add classes by using the '@ingroup' command.
*/

/**
  @brief Demonstration class.

  A demonstration class for teaching doxygen

  @note All classes need brief description!

  @ingroup DummyClasses
*/

class Test
{
  public:
    /**
      @brief An enum type.

      The documentation block cannot be put after the enum!
    */
    enum EnumType
    {
      int EVal1,     ///< Enum value 1.
      int EVal2      ///< Enum value 2.
    };

    /**
      @brief constructor.

      A more elaborate description of the constructor.
    */
    Test();

    /**
      @brief Dummy function.

      A normal member taking two arguments and returning an integer value.
      The parameter @p dummy_a is an integer.
      @param dummy_a an integer argument.
      @param dummy_s a constant character pointer.
      @see Test()
      @return The dummy results.
    */
    int dummy(int dummy_a, const char *dummy_s);

    /// Brief description in one line.
    int isDummy();

    /**
      @name Group of members.

      Description of the group.
    */
    //@{
    /// Dummy 2.
    void dummy2();
    /// Dummy 3.
    void dummy3();
    //@}

  protected:
    int value;       ///< An integer value.
};

The defgroup command indicates that a comment block contains documentation for a group of classes, files or namespaces. This can be used to categorize classes, files or namespaces, and document those categories. You can also use groups as members of other groups, thus building a hierarchy of groups. Using the ingroup command a comment block of a class, file or namespace will be added to the group or groups.

The groups (or modules as doxygen calls them) definded by the ingroup command should contain only the classes of special interest to the OpenMS user. Helper classes and such must be omitted.

Documentation which does not belong to a specific .C or .h file can be written into a separate Doxygen file (with the ending .doxygen). This file will also be parsed by Doxygen.

Open tasks are noted in the documentation of a header or a group using the todo command. The ToDo list is then shown in the doxygen menu under 'Related pages'. Each ToDo should be followed by a name in parentheses to indicated who is going to handle it.

These commands should be used as well:

Doxygen is not hard to learn, have a look at the manual :-)

Commenting code

The code for each .C file has to be commented. Each piece of code in OpenMS has to contain at least 5% of comments. The use of

// Comment text
instead of C style comments
/* Comment text */ 
is recommended to avoid problems arising from nested comments. Comments should be written in plain english and describe the functionality of the next few lines.

Examples

Instructive programming examples can be provided in the source/EXAMPLES directory.

Revision control

OpenMS uses Subversion to manage different versions of the source files. For easier identification of the responsible person each OpenMS file contains the $Maintainer:$ string in the preamble.

Examples of .h and .C files have been given above. In non-C++ files (Makefiles, (La)TeX-Files, etc.) the C++ comments are replaced by the respective comment characters (e.g. ``#'' for Makefiles, ``'' for (La)TeX). TeX will switch to math mode after a $, but you can work around this by writing something like

Latest SVN $ $Date:$ $ if you want to use it in texts; the one here expands to ``Latest SVN Date: 2007-01-19 13:47:36 +0100 (Fri, 19 Jan 2007) ''. Subversion does not turn on keyword substitution by default. See svn -h propset and svn -h proplist for details.

Testing

General

Each OpenMS class has to provide a test program. This test program has to check each method of the class. The test programs reside in the directory source/TEST are usually named <classname>_test.C. The test program has to be coded using the class test macros as described in the OpenMS online reference. Special care should be taken to cover all special cases (e.g. what happens, if a method is called with empty strings, negative values, zero, null pointers etc.). Please activate the keyword substitution of '$Id$' for all tests with the following command: svn propset svn:keywords Id <file>.

Suplementary files

If a test needs suplementary files, put these files in the source/TEST/data/ folder. The name of suplementary files has to begin with the name of the tested class.

Structure of a test program

Macros to start, finish and evaluate tests

Comparison macros

Do not use methods with side-effects inside the comparison macros i.e. *(it++). The expressions in the macro are called serveral times, so the side-effect is triggered several times as well.

Temporary files

You might want to create temporary files during the tests. The following macro puts a temporary filename into the string argument. The file is automatically deleted after the test.

Tools for testing and checking your code.

There are also some PHP tools for testing other tasks in the source/config/tools/ directory. See source/config/tools/README.txt for details!

Testing the TOPP programs

The abbreviation TOPP stands for The OpenMS Proteomics Pipeline, a collection of tools based upon the C++ classes in OpenMS. The TOPP tools are located in source/APPLICATIONS/TOPP.

Suplementary files

If a test needs suplementary files, put these files in the same folder. The name of suplementary files has to begin with the name of the tested tool. All extensions but .tmp, .output or .log are possible.

Suplementary files

All output files that can be deleted after the test must end with .tmp. All files with that extension are deleted after the test, unless in debug mode.

Running the tests

The tests for TOPP programs are located in source/TEST/TOPP. The Makefile provides the following main targets:

Macros for writing the tests

The actual tests are written using (1.) macros which are defined by configure in config_defs.mak and (2.) some special functions using substitution capabilities of the (GNU) make program itself.

Numerical inaccuracy

The TOPP tests will be run on 32 bit and 64 bit platforms. Therefore a purely character-based comparison of computed and expected result files might fail although the results are in fact numerically correct - think of cases like 9.999e+3 vs. 1.0001e+4. Instead we provide a small program NumericDiff in source/TEST/TOPP. This program steps through both inputs simultaneously and classifies each position into 3 categories: numbers, characters, whitespace. Within each line of input, numbers are compared with respect to their ratio (i.e., relative error), characters must match exactly (e.g. case is significant) and all whitespace is considered equal. Empty lines or lines containing only whitespace are skipped, but extra linebreaks 'within' lines will result in error messages. For more details and verbosity options, see the built-in help message and the source code.

File name conventions for TOPP tests

Each test relies on a number of files. These file should be named source/TEST/TOPP/<toolname>_<nummer>_<name>.<extension>, where

The data files should be as small as possible, but not totally trivial.

Is testing really necessary?

Yes. Testing is crucial to verify the correctness of the library - especially when using C++. But why has it to be so complicated, using all these macros and stuff? One of the biggest problems when building large class frameworks is portability. C++ compilers are strange beasts and there is not a single one that accepts the same code as any other compiler. Since one of the main concerns of OpenMS is portability, we have to ensure that every single line of code compiles on all platforms. Due to the long compilation times and the (hopefully in future) large number of different platforms, tests to verify the correct behaviour of all classes have to be carried out automatically. This implies a well defined interface for all tests, which is the reason for all these strange macros. This fixed format also enforces the writing of complete class tests. Usually a programmer writes a few lines of code to test the parts of the code he wrote for correctness. Of the methods tested after the introduction of the test macros, about a tenth of all functions/methods showed severe errors or after thorough testing. Most of these errors didn't occur an all platforms or didn't show up on trivial input.

Writing tests for each method of a class also ensures that each line is compiled. When using class templates the compiler only compiles the methods called. Thus it is possible that a code segment contains syntactical errors but the compiler accepts the code happily - he simply ignores most of the code. This is quickly discovered in a complete test of all methods. The same is true for configuration dependend preprocessor directives that stem from platform dependencies. Often untested code also hides inside the const version of a method, when there is a non-const method with the same name and arguments (for example most of the getName) methods in OpenMS. In most cases, the non-const version is preferred by the compiler and it is usually not clear to the user which version is taken. Again, explicit testing of each single method provides help for this problem. The ideal method to tackle the problem of untested code is the complete coverage analysis of a class. Unfortunately this is only supported for very few compilers, so it is not used for testing OpenMS.

One last point: writing the test program is a wonderful opportunity to verify and complete the documentation! Often enough implementation details are not clear at the time the documentation is written. A lot of side effects or special cases that were added later do not appear in the documentation. Going through the documentation and the implementation in parallel is the best way to verify the documentation for consistence and (strange coincidence?!) the best way to implement a test program, too!


Generated Tue Apr 1 15:36:40 2008 -- using doxygen 1.5.4 OpenMS / TOPP 1.1