Log In | New Account   
Home My Page ESPResSo++
Project Home Summary Activity Forums Lists Tasks Docs News SCM Mediawiki

Completed development discussions

From espressopp

Jump to: navigation, search

This page lists finished development discussions, i.e. discussions on topics that concern the layout of the C++ or Python code. For the open discussions, look here.

Contents

Finished discussions

C++-Python interface: where to put the bindings?

We discussed, where the Python and PMI bindings of the C++ classes are stored. There are three possiblities, that have the following pros and cons. After some discussion, we decided to put the bindings into the class files (first alternative). The most important argument was the suggestion by the boost.Python docs that the compilation time can be reduced.

In the class files

The bindings are defined in the C++ class files themselves.

Pro

  • Everything that belongs to a class is in one/two source file(s).

Con

  • ESPResSo++ would be directly bound to Python. However, it was decided earlier, that this is actually wanted.
  • Unit tests will always need Python mock objects.
  • Bindings logically do not belong neither to the definition (.hpp) file nor to the implementation of a class.

Module-wide registry

All bindings of one module are put into one file (bindings.cpp).

Pro

  • The interface of a whole module is collected in one file.

Con

  • This file has to be modified by many developers.
  • Compilation of these files may take too long time and consume too much memory. Using smaller files is actually recommended in the boost.Python docs [1].

Independent files

Each class has corresponding binding files.

Pro

  • The interface of the class is independent of its implementation. This would allow to more easily exchange the scripting language.
  • It is immediately visible, which classes have a Python interface
  • Compilation time of Python bindings is very high. As bindings usually change less than the implementation of a class, they do not need to be recompiled all the time.

Con

  • This would lead to clobbering of the source directories. Up to 3 files per class would be required (C++ implementation, Python bindings, and the header file).

Compiling ESPResSo++ without MPI

Naturally, ESPResSo++ needs also be able to run on a single processor in a non-parallel environment. Since all MPI-implementations are able to also run a program with only a single process, an MPI-version of ESPResSo++ would be sufficient. However, it is not always easy to install a parallel environment, and running parallel programs on batch systems is often also more complicated. Therefore ESPResSo++ also offers to compile without any parallel environment. This can be done in two ways:

  • Guarding of all MPI-statements with #ifdef-statements:
#ifdef HAVE_MPI
...do some MPI...
#else
...do the same for a single process...
#endif
  • provide a basic 'fake' MPI implementation for a single processor, which implements all MPI commands for a single processor.

The second approach was followed in the old ESPResSo, while ESPResSo++ follows the guarding approach. The reason is the use of boost.mpi: boost.mpi cannot be easily compiled against a fake MPI implementation, and replacing boost.mpi itself is very difficult.

One disadvantage of the guarding approach is that potentially some code has to be duplicated. In a simulation with only one processor some optimizations are possible. In the MPI case, the use of these optimizations is decided at run time (via checking the number of available processors), but the code is the same then as for a single CPU without MPI.

Is it really necessary that we can compile ESPResSo++ without MPI?

Pro MPI is required:

  • Even when MPI is used, it is usually possible to run a job on a single processor using only a single MPI thread. At least with OpenMPI, it is not even necessary to use mpiexec in that case.
  • On desktop machines, an easy-to-use MPI library is usually easily available and even provided by the OS distribution.
  • On HPC machines, it is not reasonable to run single-processor jobs anyway, so MPI will be required anyway.
  • The C++-source code becomes much shorter and more readable (not so many annoying guards in the code)
  • It is not necessary to implement the same thing twice - once in parallel for MPI, and once without MPI.
  • It is one source of errors less.

Pro MPI is optional:

  • MPI programs should be started using mpirun/mpiexec, and at least older versions of IBM's poe required poe programs to be started using poe.
    • OL: But noone is using IBM as development platform, and furthermore not with only one processor.
    • AA: for testing, e.g. just looking up whether some modules are found, that is still nice.
  • MPI programs can not necessarily be run interactively, while Python offers a nice interactive interface.
    • OL: This concerns which MPI implementations? And are these used on development platforms?
    • AA: same as above, it is nice for testing.
  • On most desktop installations, there is no MPI by default, and not always it is easy to install additional packages. Compiling a MPI package might be easy, but people that do no want to run parallel jobs do not want to do that.
    • OL: This is true. Furthermore, some people might not have administrative permissions on a desktop machine, so it is more tedious for them to install MPI. On the other hand, I would assume that MPI becomes more and more common, as almost all new machines will possess multicore CPUs, and MPI is one way to exploit that.
    • AA: All recent processors are 64-bit. Are therefore most desktops running under 64-bit OSes? I guess that tells a lot how common it will be to install the development headers for MPI.
  • When MPI is required, boost.MPI is required, too. Even if MPI is installed on a machine, boost.MPI will often not be installed, and needs to be compiled. This is not necessarily trivial and is a major pain for users.
    • AA: Well, boost in general is a major pain to compile, see topic below.

It was decided, that in time we will implement a fake MPI version that works on a single processor. This allows the developers to implement everything as though MPI is available on the one hand, while a user does not need to have MPI on the other hand.

Including boost

In many HPC-environments, you will need to compile boost yourself, since even if there is boost installed, it is probably not compiled using the MPI compiler. And often, the system is Linux and compiled with gcc, while the MPI compiler is for example Intel based. Such cross-compiler builds often fail with boost. Compiling boost is non-trivial, mainly because of bjam. It might make it much easier to compile ESPResSo++, if we include a boost version that is integrated in our autoconf build system. That does not exclude the possibility to use an installed boost.

It was decided that those parts of the boost library that are required by ESPResSo++ are included into the ESPResSo++ distribution.

How are an integrator and thermostat connected with each other?

Solution 1 (Container view of integrator)

The integrator is aware about all registered objects (force computers, thermostats)

thermostat = Langevin(temp = 1.0, gamma = 0.5, ...)
integrator = VelocityVerlet(
integrator.addThermostat(thermostat)

Solution 2 (Connector view)

The integrator does not know anything about connected objects

 thermostat.connect(integrator)

Most confusing if the views are mixed up:

 integrator.addForce(...)
 thermostat.connect(integrator)

Comments by OL

  • In both cases, at the end, the integrator will contain a reference to the thermostat.
  • The integrator does not need to know what type of object it is that modifies it (Thermostat, Barostat, ...). The only thing it needs to know is, when to call it. That is why the signal mechanism is useful (or the "Connector view"). Therefore, we should completely remove the addWhatever methods from the integrator object. The integrator simply provides a number of signals, that arbitrary objects can connect to. Example:
integrator = VelocityVerlet(timestep=0.02)
# set up the thermostat
thermostat = Langevin(temperature=1)
thermostat.connect(integrator)
# set up the LJ interaction with verlet lists
ljint = LennardJones(epsilon=2.0, cutoff=2.0)
verletlists = VerletLists(cutoff=2.0, skin=0.5)
ljforce = Force(interaction=ljint, pairs=verletlists)
ljforce.connect(integrator)
  • The connect method of a Thermostat then needs to connect to the correct signals of the Integrator, like
class Langevin {
  void connect(shared_ptr<MDIntegrator> integrator) {
     integrator->signalForceUpdate.connect(boost::bind(this, &Langevin::update_force));
  }
}
  • There is one caveat to the above code: when you call signal.connect, a new reference from the integrator (i.e. the signal) to the thermostat is created. This is, however, by default not a shared_ptr. Therefore it could happen that the thermostat dies after it was connected. To avoid this, it is necessary to create a better bind that uses a shared_ptr internally. It might be that Boost already provides that

(TB: yes, this is supported by Boost).

Comments by TB

  • It is right that the thermostat will not be deleted as C++ object if the integrator keeps a reference. But here you made implicitly already a decision where are other options possible. Why must the integrator keep the reference?
  • As long as we have not agreed on any circular references we must have a hierarchy of references and that should be fixed.

Comments by OL

  • I don't think that I implicitly made a decision, but I simply observe, that in all cases, the integrator must have a reference (direct or indirect) to the thermostat, as the integrator has to call the thermostat to do its job. This is independent of whether we use signals (connector view) or direct method calls (container view): in both cases, the integrator has a reference to the thermostat that will keep it alive: in the container view, the integrator itself will keep the thermostat alive, in the other case, the signal has a reference to the thermostat. Otherwise: how should the integrator call the thermostat?

Deletion of Python Objects

Related to the Container/Connector discussion: what happens if one of the objects is deleted?

del(thermostat)
  • integrator has still reference to thermostat, so it is still alive (container)
  • thermostat will disconnect automatically from integrator (connector)

While explicit deletion of an object, e.g. del(thermostat), might be convenient for the user, he might be confused by implicit deletes at the end of subroutines or by redefinition of the variable.

Example 1:

def setup(integrator):
   thermostat = Langevin(temp = 1.0, gamma = 0.5, ...)
   thermostat.connect(integerator)
   # Python calls del(thermostat) at the end of the subroutine

Example 2:

 thermostat = 5.4    # implies a delete  on the C++ thermostat

Comments by OL

  • What is the open question that remains here? Python only deletes objects, when there is no reference to it any more. In that case, the user simply has no way to access the object, so it can be safely deleted.


Discussion pages


Powered By GForge Collaborative Development Environment Contact us
Impressum (in German only)