OMNeT++/OMNEST Bug Tracker - OMNeT++
View Issue Details
0000278OMNeT++simulation kernelpublic2011-04-20 19:342011-06-15 23:37
jesjones 
andras 
normalmajorsometimes
feedbackreopened 
4.1 
 
0000278: Comparing pointers is often wrong
It's fairly common for modules to have things like a std::map of pointers to some value. Unfortunately this can cause non-determinism between runs because objects can be allocated at different locations depending on things like the environment used. And if the behavior of the simulation changes depending on the order in which elements are processed the simulation will not behave the same.

Omnet itself doesn't seem to suffer much from this problem, but other packages do (e.g. inet). The enclosed patch attempts to address this by adding a std::less partial specialization for pointers which will cause compiler errors if it is instantiated and by adding a ComparePointers struct that can be used if you think it is OK to compare pointers.
No tags attached.
tar compare-pointers.tar (2,408) 2011-04-20 19:34
https://dev.omnetpp.org/bugs/file_download.php?file_id=58&type=bug
Issue History
2011-04-20 19:34jesjonesNew Issue
2011-04-20 19:34jesjonesFile Added: compare-pointers.tar
2011-04-27 10:13andrasNote Added: 0000422
2011-04-27 10:13andrasStatusnew => resolved
2011-04-27 10:13andrasResolutionopen => no change required
2011-04-27 10:13andrasAssigned To => andras
2011-04-27 17:00jesjonesNote Added: 0000425
2011-04-27 17:00jesjonesStatusresolved => feedback
2011-04-27 17:00jesjonesResolutionno change required => reopened
2011-04-27 18:55andrasNote Added: 0000426
2011-04-27 18:55andrasStatusfeedback => resolved
2011-04-27 18:55andrasResolutionreopened => no change required
2011-04-27 19:06jesjonesNote Added: 0000427
2011-04-27 19:06jesjonesStatusresolved => feedback
2011-04-27 19:06jesjonesResolutionno change required => reopened
2011-04-27 19:31andrasNote Added: 0000428
2011-06-10 17:47rhornigNote Added: 0000487
2011-06-10 23:19jesjonesNote Added: 0000488
2011-06-10 23:21jesjonesNote Added: 0000489
2011-06-10 23:22jesjonesNote Added: 0000490
2011-06-14 11:41rhornigNote Added: 0000492
2011-06-14 17:36jesjonesNote Added: 0000495
2011-06-15 13:59rhornigNote Added: 0000500
2011-06-15 13:59rhornigStatusfeedback => resolved
2011-06-15 13:59rhornigResolutionreopened => not fixable
2011-06-15 16:47jesjonesNote Added: 0000502
2011-06-15 16:47jesjonesStatusresolved => feedback
2011-06-15 16:47jesjonesResolutionnot fixable => reopened
2011-06-15 16:59jesjonesNote Added: 0000503
2011-06-15 17:02jesjonesNote Added: 0000504
2011-06-15 23:37jesjonesNote Added: 0000505

Notes
(0000422)
andras   
2011-04-27 10:13   
OMNeT++ itself does not have any occurences of this problem.
(0000425)
jesjones   
2011-04-27 17:00   
But inet does, our code did, mixim seems to (it has a std::map<AirFrame*, simtime_t> type for example), and I would wager the majority of non-trivial third-party packages do as well.

If reproducibility of simulation results between platforms, debug/release, or even between different command line options is important (as it certainly is to us) then it seems best to address this within omnet.
(0000426)
andras   
2011-04-27 18:55   
> But inet does, our code did, mixim seems to (it has a std::map<AirFrame*,
> simtime_t> type for example), and I would wager the majority of non-trivial
> third-party packages do as well.

Then submit into the respective bug trackers, e.g. http://sourceforge.net/apps/trac/mixim/report [^] for MiXiM. This is not the place for bugs in model frameworks.


> If reproducibility of simulation results between platforms, debug/release,
> or even between different command line options is important (as it certainly
> is to us) then it seems best to address this within omnet.

It is not possible to address this within OMNeT++. If you think it is, submit a patch.
(0000427)
jesjones   
2011-04-27 19:06   
It's not possible to fix the problem within omnet, but it is possible to extend omnet so that it exposes the problem within third-party addons (for some compilers anyway). And that is what the patch I included did.

If something like this is not added to omnet itself then either every single addon that cares about determinism is going to have to do it or all code has to be very very carefully audited to ensure that it is not comparing pointers in such a way as to introduce non-determinism.
(0000428)
andras   
2011-04-27 19:31   
Oh I see. I didn't realize the patch first, sorry. We'll consider it.
(0000487)
rhornig   
2011-06-10 17:47   
I have taken a look, but I was unable to make it work. What compiler this was tested on? It seems that it does not throw a compiler error even if there are iterations over maps that has pointers as a key.
(0000488)
jesjones   
2011-06-10 23:19   
Well it was disabled for everything but MSVC. I have, however, gotten it working with gcc 4.4.1 and Visual Studio 10:

diff --git a/include/simkerneldefs.h b/include/simkerneldefs.h
index 3b2588c..2f42066 100644
--- a/include/simkerneldefs.h
+++ b/include/simkerneldefs.h
@@ -25,6 +25,10 @@
 #include <math.h>
 #include "platdep/platdefs.h"
 #include "platdep/intxtypes.h"
+#ifdef __GNUC__
+#include <system_error>
+#endif
+
 
 // OMNeT++ version -- must match NEDC_VERSION and MSGC_VERSION in nedxml
 #define OMNETPP_VERSION 0x0401
@@ -100,16 +104,21 @@ namespace std
     {
         bool operator()(T* lhs, T* rhs) const
         {
-#ifdef _MSC_VER
- dont_compare_pointers();
-#else
- // FIXME: The compiler messages are not very informative, but I think gcc
- // is reporting errors even if this struct is not actually instantiated.
- //static_assert(false, "don't compare pointers");
-#endif
+ lhs.dont_compare_pointers();
             return lhs < rhs;
         }
     };
+
+#ifdef __GNUC__
+ template <>
+ struct less<error_category*> : public binary_function<error_category*, error_category*, bool>
+ {
+ bool operator()(const error_category* lhs, const error_category* rhs) const
+ {
+ return lhs < rhs;
+ }
+ };
+#endif
 }
 
 // Only use this if you are sure that the pointer ordering will not affect the simulation.
(0000489)
jesjones   
2011-06-10 23:21   
If I introduce an error in ipv6neighbourdiscovery.h MSVC tells me exactly where the error is:

45>c:\coco\devel\vendor\omnetpp\include\simkerneldefs.h(107): error C2228: left of '.dont_compare_pointers' must have class/struct/union
45>
45> type is 'IPv6NeighbourDiscovery::DADEntry *'
45> did you intend to use '->' instead?
45> c:\coco\devel\vendor\omnetpp\include\simkerneldefs.h(106) : while compiling class template member function 'bool std::less<_Ty>::operator ()(T *,T *) const'
45> with
45> [
45> _Ty=IPv6NeighbourDiscovery::DADEntry *,
45> T=IPv6NeighbourDiscovery::DADEntry
45> ]
45> c:\program files (x86)\microsoft visual studio 10.0\vc\include\set(49) : see reference to class template instantiation 'std::less<_Ty>' being compiled
45> with
45> [
45> _Ty=IPv6NeighbourDiscovery::DADEntry *
45> ]
45> c:\program files (x86)\microsoft visual studio 10.0\vc\include\xtree(451) : see reference to class template instantiation 'std::_Tset_traits<_Kty,_Pr,_Alloc,_Mfl>' being compiled
45> with
45> [
45> _Kty=IPv6NeighbourDiscovery::DADEntry *,
45> _Pr=std::less<IPv6NeighbourDiscovery::DADEntry *>,
45> _Alloc=std::allocator<IPv6NeighbourDiscovery::DADEntry *>,
45> _Mfl=false
45> ]
45> c:\program files (x86)\microsoft visual studio 10.0\vc\include\xtree(520) : see reference to class template instantiation 'std::_Tree_nod<_Traits>' being compiled
45> with
45> [
45> _Traits=std::_Tset_traits<IPv6NeighbourDiscovery::DADEntry *,std::less<IPv6NeighbourDiscovery::DADEntry *>,std::allocator<IPv6NeighbourDiscovery::DADEntry *>,false>
45> ]
45> c:\program files (x86)\microsoft visual studio 10.0\vc\include\xtree(659) : see reference to class template instantiation 'std::_Tree_val<_Traits>' being compiled
45> with
45> [
45> _Traits=std::_Tset_traits<IPv6NeighbourDiscovery::DADEntry *,std::less<IPv6NeighbourDiscovery::DADEntry *>,std::allocator<IPv6NeighbourDiscovery::DADEntry *>,false>
45> ]
45> c:\program files (x86)\microsoft visual studio 10.0\vc\include\set(58) : see reference to class template instantiation 'std::_Tree<_Traits>' being compiled
45> with
45> [
45> _Traits=std::_Tset_traits<IPv6NeighbourDiscovery::DADEntry *,std::less<IPv6NeighbourDiscovery::DADEntry *>,std::allocator<IPv6NeighbourDiscovery::DADEntry *>,false>
45> ]
45> c:\coco\devel\vendor\inet\src\networklayer\icmpv6\ipv6neighbourdiscovery.h(127) : see reference to class template instantiation 'std::set<_Kty>' being compiled
(0000490)
jesjones   
2011-06-10 23:22   
gcc isn't quite as nice:

45> In file included from ..\..\vendor\omnetpp\include\/cownedobject.h:23,
45> from ..\..\vendor\omnetpp\include\/carray.h:23,
45> from ..\..\vendor\omnetpp\include\omnetpp.h:32:
45> ..\..\vendor\omnetpp\include\/simkerneldefs.h: In member function 'bool std::less<T*>::operator()(T*, T*) const [with T = IPv6NeighbourDiscovery::DADEntry]':
45> c:\program files (x86)\codesourcery\sourcery g++ lite\bin\../lib/gcc/i686-pc-linux-gnu/4.4.1/../../../../i686-pc-linux-gnu/include/c++/4.4.1/bits/stl_tree.h:1170: instantiated from 'std::pair<typename std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator, bool> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_unique(const _Val&) [with _Key = IPv6NeighbourDiscovery::DADEntry*, _Val = IPv6NeighbourDiscovery::DADEntry*, _KeyOfValue = std::_Identity<IPv6NeighbourDiscovery::DADEntry*>, _Compare = std::less<IPv6NeighbourDiscovery::DADEntry*>, _Alloc = std::allocator<IPv6NeighbourDiscovery::DADEntry*>]'
45> c:\program files (x86)\codesourcery\sourcery g++ lite\bin\../lib/gcc/i686-pc-linux-gnu/4.4.1/../../../../i686-pc-linux-gnu/include/c++/4.4.1/bits/stl_set.h:411: instantiated from 'std::pair<typename std::_Rb_tree<_Key, _Key, std::_Identity<_Key>, _Compare, typename _Alloc::rebind<_Key>::other>::const_iterator, bool> std::set<_Key, _Compare, _Alloc>::insert(const _Key&) [with _Key = IPv6NeighbourDiscovery::DADEntry*, _Compare = std::less<IPv6NeighbourDiscovery::DADEntry*>, _Alloc = std::allocator<IPv6NeighbourDiscovery::DADEntry*>]'
45> C:\coco\devel\vendor\inet\src\networklayer\icmpv6\IPv6NeighbourDiscovery.cc:763: instantiated from here
45> ..\..\vendor\omnetpp\include\/simkerneldefs.h:107: error: request for member 'dont_compare_pointers' in 'lhs', which is of non-class type 'IPv6NeighbourDiscovery::DADEntry*'
(0000492)
rhornig   
2011-06-14 11:41   
Thanks for the update. Somehow I'm unable to trigger the error. I think these lines should triggert it:


    std::map<cComponentType*,cComponentType*> types;
    for (std::map<cComponentType*,cComponentType*>::iterator i=types.begin(); i!=types.end(); i++)
        EV << "dummy EV line\n";

But it just compiles fine without any error message... Could you post your code that triggers the error?
(0000495)
jesjones   
2011-06-14 17:36   
1) Make sure you are including simkerneldefs.h (or which ever header has the less specialization).
2) The compiler won't compile template code that isn't used and neither declaring a std::map nor iterating over the map require a comparison of the keys.
3) I was able to get a compiler error with both MSVC 10 and gcc 4.4.1 by adding the less specialization to simkerneldefs.h and adding the two lines below to the cComponent ctor:
    std::map<cComponentType*,cComponentType*> types;
    types.insert(std::make_pair(NULL, NULL));
(0000500)
rhornig   
2011-06-15 13:59   
Thanks for the report, I'was able to get it to work. Adding it to the OMNET framework poses several problems.

- It requires the -std=c++0x on the command line (which is not used currently).
- It just gave too many false positives. Having a map or set with a pointer is not invalid generally. Even iteration over it is ok. Only iteration over the set/map AND sending out messages in the loop are causing unreproducible results. Detecting such an iteration seems to be impossible. Once a user declares a map with pointer keys as valid, he can forget about it. Later he can implement loops and at an even later time insert some message sending into that. This still would cause problems.
(0000502)
jesjones   
2011-06-15 16:47   
I realize that the less specialization is not ideal and that there will be some false positives (though I think that the majority of errors will represent real problems). Having said that a simulator that returns different results when users do things like changing the command line (e.g. to enable logging) is *awful*. If one of our tests runs badly in our nightly build or on our continuous test machine we need to be able to debug that and we cannot do that if any little change is going to perturb the system so much that we can't get reproducible results.

And fixing these problems is not that big of a deal. Generally I add a custom compare type unless I am sure that comparing pointers is OK. And, I don't spend much time tracing through the code to see how it's used because there is no real harm in adding a custom less and the code may change later such that sort order does become important.
(0000503)
jesjones   
2011-06-15 16:59   
Here's an example of how this can be done. I added the following to IChannelControl::RadioEntry in ChannelControl.h in inet 1.99.1:
    struct CompareRadio
    {
        bool operator()(const RadioRef lhs, const RadioRef rhs) const
        {
            return lhs->radioModule->getId() < rhs->radioModule->getId();
        }
    };

and I changed the set to look like:
    std::set<RadioRef, CompareRadio> neighbors; // cached neighbor list

I think this fixes a real problem too, check out ChannelControl::sendToChannel.
(0000504)
jesjones   
2011-06-15 17:02   
if you're sure that sort order does not matter AND you don't think it will matter in the future then you can use ComparePointers. For example, ForceDirectedGraphLayouter seems to be used by the TK gui so sort order there should not affect simulation outcomes. So, I changed the moduleToBodyMap member to look like:
   std::map<cModule *, IBody *, ComparePointers<cModule> > moduleToBodyMap;
(0000505)
jesjones   
2011-06-15 23:37   
I don't see why -std=c++0x would be required (tho it does let you use the auto keyword which makes iteration a bit nicer).