Log In | New Account   
Home My Page ESPResSo++
Project Home Summary Activity Forums Lists Tasks Docs News SCM Mediawiki

C++ caveats

From espressopp

Jump to: navigation, search

C++ is quite a complex programming language, and therefore challenges the compiler. Here is some information collected on what can go wrong unexpectedly. This both refers to wrong usage as well as to compiler inefficiencies.

Contents

Member classes of friends

Consider

class Test {
public:
  class SubTest {
    friend class Test;
    int value;
  }
  class SubTest2 {
  public:
    void get(SubTest t) { return t.value; }
  };
};

Surprisingly, the current gcc-4.3 / Intel icpc accepts this. However, it is plainly wrong: SubTest2 is not a friend of SubTest, just because Test is! Actually, gcc-4.0.1, like shipped with MacOS 10.5 complains. The correct code is just a bit more complicated:

class Test {
public:
  class SubTest2;
  class SubTest {
    friend class Test;
    friend class SubTest2;
    int value;
  };
  class SubTest2 {
  public:
    void get(SubTest t) { return t.value; }
  };
};

So, you first need to forward declare SubTest2 (provided you don't want / cannot change the order of classes).

Calling template member functions in template classes

Sounds like a rare thing to do, no? Well, actually, the ParticleStorage of Espresso provides an important template member function, namely 'getProperty<class T>'. Using this function in a template class requires some care. On most systems, this works:

template<class T>
class Test {
  void doSomething(ParticleStorage *p, size_t prop) { p->getProperty<real>(prop); }
}

However, for example the MacOS g++-4.0.1 does not accept it. Why? Because under certain circumstances, ParticleStorage could now be a template parameter, and suddenly it is not clear whether getProperty is a template function or not. Well... The correct code looks weird, at best:

template<class T>
class Test {
  void doSomething(ParticleStorage *p, size_t prop) { p->template getProperty<real>(prop); }
}

There is some stray "template" in the function name now, which explicitely tells the compiler that this function call is a template function call.

Const reference vs. const_reference

If you take a look at basic classes such as the TupleVector, you will realize that a reference and a reference to a constant element are implemented as two seperated classes (reference and const_reference). The same also holds for iterators. This section describes why is this necessary.

The simple C++-like answer is: because a const reference is a constant reference to an element, not a reference to a constant element. Fine, but what does that mean? After all, on a first glance, a const reference does pretty much the job. Assume something like this:

class reference {
  operator=(value_type);
  operator value_type() const;
};

Well, with this, I can assign only to a reference, not a const reference, but get the value from reference and const-reference. So, perfect, isn't it?

Well, the problems come from type conversions. When passing such references, actually the copy constructor is used, which is by default:

reference::reference(const reference &ref);

Oops, now you can convert a constant reference into a non-constant reference! So, this one needs to be private, and only

reference::reference(reference &ref);

should be public. But then, we cannot transfer const references anymore. It simply boils down to the fact that you cannot declare a constructor constant (i.e., you cannot specify a constructor enforcing constantness of the constructed object).

For iterators things are more obvious: an iterator over constant objects still needs to be non-constant itself, since you need to be able to modify it to point to another constant object. So, a constant iterator is an unchangeable object that points to something, but that something can be modified. A const_iterator in turn is an object that points to something that is non-modifiable, however is changeable itself.

Register use in loops

This only refers to g++; however, most people will probably use gcc to compile ESPResSo++. g++ has some problems with optimizing loops over local classes or arrays, i.e. which are only available e.g. in the scope of a procedure. Take the following simple example:

Vector3D nsum (vector<Vector3D> &v) {
  double sx = 0, sy = 0, sz = 0;
  for(vector<Vector3D>::iterator it = v.begin(); it != v.end(); it++) {
    sx += (*it)[0];
    sy += (*it)[1];
    sz += (*it)[2];
  }
  return Vector3D(sx, sy, sz);
}

The inner loop is here compiled into:

.L127:
  addsd	(%rax), %xmm2
  addsd	8(%rax), %xmm0
  addsd	16(%rax), %xmm1
  addq	$24, %rax
  cmpq	%rax, %rdx
  jne	.L127

which is pretty neat and fast, but not really nice. Why not abstracting the dimension? So, in C that would be:

Vector3D lsum (vector<Vector3D> &v)
{
  double sum[3];
  for(vector<Vector3D>::iterator it = v.begin(); it != v.end(); it++) {
    for (int i = 0; i < 3; ++i) sum[i] += (*it)[i];
  }
  return Vector3D(sum[0], sum[1], sum[2]);
}

So, we just made sx, sy and sz into an array. However, this compiles into:

.L133:
  addsd	(%rax), %xmm2
  movsd	%xmm2, -40(%rsp)
  addsd	8(%rax), %xmm1
  movsd	%xmm1, -32(%rsp)
  addsd	16(%rax), %xmm0
  addq	$24, %rax
  cmpq	%rdx, %rax
  movsd	%xmm0, -24(%rsp)
  jne	.L133

Oops, suddenly, g++ keeps the sum still in a register, but also stores it on the stack, although the inner loop was unrolled. Well, still that costs only 10% performance, so, that is fine. But we are actually using C++, no? So, we should use a class to represent the vector (just as we do the whole time in the interface), which internally simply uses the loop over the 3 vector components, just as in the example above:

class Vector3D {
  double data[3];
public:
  Vector3D& operator+=(const Vector3D &v) {
    for (int i = 0; i < 3; ++i) data[i] += v.data[i];
    return *this;
  } 
};
Vector3D dsum (vector<Vector3D> &v)
{
  Vector3D sum;
  for(vector<Vector3D>::iterator it = v.begin(); it != v.end(); it++) sum += (*it);
  return sum;
}

So, that looks nice, short and comprehensive. Plus, still nothing of the interfacing has changed; we still have the same input/output parameters, and call the iterator of the vector exactly the same way. However, let's see what g++ does:

.L166:
  movapd	%xmm3, %xmm0
  addq	$1, %rcx
  addsd	(%rax), %xmm0
  movsd	%xmm0, (%rdi)
  movapd	%xmm0, %xmm3
  movapd	%xmm2, %xmm0
  addsd	8(%rax), %xmm0
  movsd	%xmm0, 8(%rdi)
  movapd	%xmm0, %xmm2
  movapd	%xmm1, %xmm0
  addsd	16(%rax), %xmm0
  addq	$24, %rax
  cmpq	%rdx, %rcx
  movapd	%xmm0, %xmm1
  movsd	%xmm0, 16(%rdi)
  jne	.L166

Again different! This version is about 50% slower than the two versions above, which is significant. What happens? As with the array, the compiler is not able to handle the class variable in registers after unrolling; in addition, it insists on putting the base pointer of this class into a register (here, the Vector3D sum is stored at the address where %rdi points). This has the disadvantage, that the compiler now needs to do all operations via %xmm0, which simply blows up the code for nothing. Essentially, this means that the simple instruction

 addsd	(%rax), %xmm2

is replaced by this whole bunch of instructions

 movapd	%xmm2, %xmm0
 addsd	(%rax), %xmm0
 movsd	%xmm0, (%rdi)
 movapd	%xmm0, %xmm2

which do nothing more as the single line, since %rdi points onto the stack, and the values stored there will be lost when the function exits.

Note that the manual unrolling, which is the only difference between the first and second example, apparently helps. This also accounts if we unroll inside Vector3D:

class Vector3D {
  double data[3];
public:
  Vector3D& operator+=(const Vector3D &v) {
    data[0] += v.data[0];
    data[1] += v.data[1];
    data[2] += v.data[2];
    return *this;
  } 
};

This version compiles to the same result as in the beginning, which however is not surprising, as the compiler now does not insist on putting data back to memory, and therefore does not need the class base pointer at all.

Inlining

Blind inlining in general is a bad idea. For example, let's try to benchmark the nsum function from above:

void test_nsum(vector<Vector3D> &v) {
  cout << "starting test" << endl;
  Vector3D sum;
  for(int i = 0; i < NTESTS; i++)
    sum += nsum(v);
  cout << "finished" << end;
  return sum;
}

Well, can't we gain something by inlining nsum? Let's try:

void test_nsum(vector<Vector3D> &v) {
  double sx = 0, sy = 0, sz = 0;
  cout << "starting test" << endl;
  for(int i = 0; i < NTESTS; i++)
    for(vector<Vector3D>::iterator it = v.begin(); it != v.end(); it++) {
      sx += (*it)[0];
      sy += (*it)[1];
      sz += (*it)[2];
    }
  cout << "finished" << end;
  return Vector3D(sx, sy, sz);
}

Oops, this one is again about 50% slower. What happens? Actually, the problem are the cout calls. These cannot be inlined, but rather refer to some functions in the standard C++-library. Now, the compiler cannot make any assumptions on these functions, and as a consequence, has to assume that these for example fork and join, so that sx, sy and sz cannot be assigned to registers. As a rule of thumb, inlining can only help if the inlined function is small enough to not use up all registers, and if the compiler knows the full program flow, i.e. no non-inlined functions are involved. Therefore, it is in general advantageous to subdivide your program into as small as possible blocks. The above inlining is for example ok, if the loop is put into its own block:

void test_nsum(vector<Vector3D> &v) {
  Vector3D sum;
  cout << "starting test" << endl;
  {
    double sx = 0, sy = 0, sz = 0;
    for(int i = 0; i < NTESTS; i++)
      for(vector<Vector3D>::iterator it = v.begin(); it != v.end(); it++) {
        sx += (*it)[0];
        sy += (*it)[1];
        sz += (*it)[2];
      }
    sum = Vector3D(sx, sy, sz);
  }
  cout << "finished" << end;
  return sum;
}

Powered By GForge Collaborative Development Environment Contact us
Impressum (in German only)