Design

Goals

The libstdc++ debug mode replaces unsafe (but efficient) standard containers and iterators with semantically equivalent safe standard containers and iterators to aid in debugging user programs. The following goals directed the design of the libstdc++ debug mode:

  • Correctness: the libstdc++ debug mode must not change the semantics of the standard library for all cases specified in the ANSI/ISO C++ standard. The essence of this constraint is that any valid C++ program should behave in the same manner regardless of whether it is compiled with debug mode or release mode. In particular, entities that are defined in namespace std in release mode should remain defined in namespace std in debug mode, so that legal specializations of namespace std entities will remain valid. A program that is not valid C++ (e.g., invokes undefined behavior) is not required to behave similarly, although the debug mode will abort with a diagnostic when it detects undefined behavior.

  • Performance: the additional of the libstdc++ debug mode must not affect the performance of the library when it is compiled in release mode. Performance of the libstdc++ debug mode is secondary (and, in fact, will be worse than the release mode).

  • Usability: the libstdc++ debug mode should be easy to use. It should be easily incorporated into the user's development environment (e.g., by requiring only a single new compiler switch) and should produce reasonable diagnostics when it detects a problem with the user program. Usability also involves detection of errors when using the debug mode incorrectly, e.g., by linking a release-compiled object against a debug-compiled object if in fact the resulting program will not run correctly.

  • Minimize recompilation: While it is expected that users recompile at least part of their program to use debug mode, the amount of recompilation affects the detect-compile-debug turnaround time. This indirectly affects the usefulness of the debug mode, because debugging some applications may require rebuilding a large amount of code, which may not be feasible when the suspect code may be very localized. There are several levels of conformance to this requirement, each with its own usability and implementation characteristics. In general, the higher-numbered conformance levels are more usable (i.e., require less recompilation) but are more complicated to implement than the lower-numbered conformance levels.

    1. Full recompilation: The user must recompile his or her entire application and all C++ libraries it depends on, including the C++ standard library that ships with the compiler. This must be done even if only a small part of the program can use debugging features.

    2. Full user recompilation: The user must recompile his or her entire application and all C++ libraries it depends on, but not the C++ standard library itself. This must be done even if only a small part of the program can use debugging features. This can be achieved given a full recompilation system by compiling two versions of the standard library when the compiler is installed and linking against the appropriate one, e.g., a multilibs approach.

    3. Partial recompilation: The user must recompile the parts of his or her application and the C++ libraries it depends on that will use the debugging facilities directly. This means that any code that uses the debuggable standard containers would need to be recompiled, but code that does not use them (but may, for instance, use IOStreams) would not have to be recompiled.

    4. Per-use recompilation: The user must recompile the parts of his or her application and the C++ libraries it depends on where debugging should occur, and any other code that interacts with those containers. This means that a set of translation units that accesses a particular standard container instance may either be compiled in release mode (no checking) or debug mode (full checking), but must all be compiled in the same way; a translation unit that does not see that standard container instance need not be recompiled. This also means that a translation unit A that contains a particular instantiation (say, std::vector<int>) compiled in release mode can be linked against a translation unit B that contains the same instantiation compiled in debug mode (a feature not present with partial recompilation). While this behavior is technically a violation of the One Definition Rule, this ability tends to be very important in practice. The libstdc++ debug mode supports this level of recompilation.

    5. Per-unit recompilation: The user must only recompile the translation units where checking should occur, regardless of where debuggable standard containers are used. This has also been dubbed "-g mode", because the -g compiler switch works in this way, emitting debugging information at a per--translation-unit granularity. We believe that this level of recompilation is in fact not possible if we intend to supply safe iterators, leave the program semantics unchanged, and not regress in performance under release mode because we cannot associate extra information with an iterator (to form a safe iterator) without either reserving that space in release mode (performance regression) or allocating extra memory associated with each iterator with new (changes the program semantics).

Methods

This section provides an overall view of the design of the libstdc++ debug mode and details the relationship between design decisions and the stated design goals.

The Wrapper Model

The libstdc++ debug mode uses a wrapper model where the debugging versions of library components (e.g., iterators and containers) form a layer on top of the release versions of the library components. The debugging components first verify that the operation is correct (aborting with a diagnostic if an error is found) and will then forward to the underlying release-mode container that will perform the actual work. This design decision ensures that we cannot regress release-mode performance (because the release-mode containers are left untouched) and partially enables mixing debug and release code at link time, although that will not be discussed at this time.

Two types of wrappers are used in the implementation of the debug mode: container wrappers and iterator wrappers. The two types of wrappers interact to maintain relationships between iterators and their associated containers, which are necessary to detect certain types of standard library usage errors such as dereferencing past-the-end iterators or inserting into a container using an iterator from a different container.

Safe Iterators

Iterator wrappers provide a debugging layer over any iterator that is attached to a particular container, and will manage the information detailing the iterator's state (singular, dereferenceable, etc.) and tracking the container to which the iterator is attached. Because iterators have a well-defined, common interface the iterator wrapper is implemented with the iterator adaptor class template __gnu_debug::_Safe_iterator, which takes two template parameters:

  • Iterator: The underlying iterator type, which must be either the iterator or const_iterator typedef from the sequence type this iterator can reference.

  • Sequence: The type of sequence that this iterator references. This sequence must be a safe sequence (discussed below) whose iterator or const_iterator typedef is the type of the safe iterator.

Safe Sequences (Containers)

Container wrappers provide a debugging layer over a particular container type. Because containers vary greatly in the member functions they support and the semantics of those member functions (especially in the area of iterator invalidation), container wrappers are tailored to the container they reference, e.g., the debugging version of std::list duplicates the entire interface of std::list, adding additional semantic checks and then forwarding operations to the real std::list (a public base class of the debugging version) as appropriate. However, all safe containers inherit from the class template __gnu_debug::_Safe_sequence, instantiated with the type of the safe container itself (an instance of the curiously recurring template pattern).

The iterators of a container wrapper will be safe iterators that reference sequences of this type and wrap the iterators provided by the release-mode base class. The debugging container will use only the safe iterators within its own interface (therefore requiring the user to use safe iterators, although this does not change correct user code) and will communicate with the release-mode base class with only the underlying, unsafe, release-mode iterators that the base class exports.

The debugging version of std::list will have the following basic structure:

template<typename _Tp, typename _Allocator = allocator<_Tp>
  class debug-list :
    public release-list<_Tp, _Allocator>,
    public __gnu_debug::_Safe_sequence<debug-list<_Tp, _Allocator> >
  {
    typedef release-list<_Tp, _Allocator> _Base;
    typedef debug-list<_Tp, _Allocator>   _Self;

  public:
    typedef __gnu_debug::_Safe_iterator<typename _Base::iterator, _Self>       iterator;
    typedef __gnu_debug::_Safe_iterator<typename _Base::const_iterator, _Self> const_iterator;

    // duplicate std::list interface with debugging semantics
  };

Precondition Checking

The debug mode operates primarily by checking the preconditions of all standard library operations that it supports. Preconditions that are always checked (regardless of whether or not we are in debug mode) are checked via the __check_xxx macros defined and documented in the source file include/debug/debug.h. Preconditions that may or may not be checked, depending on the debug-mode macro _GLIBCXX_DEBUG, are checked via the __requires_xxx macros defined and documented in the same source file. Preconditions are validated using any additional information available at run-time, e.g., the containers that are associated with a particular iterator, the position of the iterator within those containers, the distance between two iterators that may form a valid range, etc. In the absence of suitable information, e.g., an input iterator that is not a safe iterator, these precondition checks will silently succeed.

The majority of precondition checks use the aforementioned macros, which have the secondary benefit of having prewritten debug messages that use information about the current status of the objects involved (e.g., whether an iterator is singular or what sequence it is attached to) along with some static information (e.g., the names of the function parameters corresponding to the objects involved). When not using these macros, the debug mode uses either the debug-mode assertion macro _GLIBCXX_DEBUG_ASSERT , its pedantic cousin _GLIBCXX_DEBUG_PEDASSERT, or the assertion check macro that supports more advance formulation of error messages, _GLIBCXX_DEBUG_VERIFY. These macros are documented more thoroughly in the debug mode source code.

Release- and debug-mode coexistence

The libstdc++ debug mode is the first debug mode we know of that is able to provide the "Per-use recompilation" (4) guarantee, that allows release-compiled and debug-compiled code to be linked and executed together without causing unpredictable behavior. This guarantee minimizes the recompilation that users are required to perform, shortening the detect-compile-debug bug hunting cycle and making the debug mode easier to incorporate into development environments by minimizing dependencies.

Achieving link- and run-time coexistence is not a trivial implementation task. To achieve this goal we required a small extension to the GNU C++ compiler (since incorporated into the C++11 language specification, described in the GCC Manual for the C++ language as namespace association), and a complex organization of debug- and release-modes. The end result is that we have achieved per-use recompilation but have had to give up some checking of the std::basic_string class template (namely, safe iterators).

Compile-time coexistence of release- and debug-mode components

Both the release-mode components and the debug-mode components need to exist within a single translation unit so that the debug versions can wrap the release versions. However, only one of these components should be user-visible at any particular time with the standard name, e.g., std::list.

In release mode, we define only the release-mode version of the component with its standard name and do not include the debugging component at all. The release mode version is defined within the namespace std. Minus the namespace associations, this method leaves the behavior of release mode completely unchanged from its behavior prior to the introduction of the libstdc++ debug mode. Here's an example of what this ends up looking like, in C++.

namespace std
{
  template<typename _Tp, typename _Alloc = allocator<_Tp> >
    class list
    {
      // ...
     };
} // namespace std

In debug mode we include the release-mode container (which is now defined in the namespace __cxx1998) and also the debug-mode container. The debug-mode container is defined within the namespace __debug, which is associated with namespace std via the C++11 namespace association language feature. This method allows the debug and release versions of the same component to coexist at compile-time and link-time without causing an unreasonable maintenance burden, while minimizing confusion. Again, this boils down to C++ code as follows:

namespace std
{
  namespace __cxx1998
  {
    template<typename _Tp, typename _Alloc = allocator<_Tp> >
      class list
      {
	// ...
      };
  } // namespace __gnu_norm

  namespace __debug
  {
    template<typename _Tp, typename _Alloc = allocator<_Tp> >
      class list
      : public __cxx1998::list<_Tp, _Alloc>,
	public __gnu_debug::_Safe_sequence<list<_Tp, _Alloc> >
      {
	// ...
      };
  } // namespace __cxx1998

  // namespace __debug __attribute__ ((strong));
  inline namespace __debug { }
}
Link- and run-time coexistence of release- and debug-mode components

Because each component has a distinct and separate release and debug implementation, there is no issue with link-time coexistence: the separate namespaces result in different mangled names, and thus unique linkage.

However, components that are defined and used within the C++ standard library itself face additional constraints. For instance, some of the member functions of std::moneypunct return std::basic_string. Normally, this is not a problem, but with a mixed mode standard library that could be using either debug-mode or release-mode basic_string objects, things get more complicated. As the return value of a function is not encoded into the mangled name, there is no way to specify a release-mode or a debug-mode string. In practice, this results in runtime errors. A simplified example of this problem is as follows.

Take this translation unit, compiled in debug-mode:

// -D_GLIBCXX_DEBUG
#include <string>

std::string test02();

std::string test01()
{
  return test02();
}

int main()
{
  test01();
  return 0;
}

... and linked to this translation unit, compiled in release mode:

#include <string>

std::string
test02()
{
  return std::string("toast");
}

For this reason we cannot easily provide safe iterators for the std::basic_string class template, as it is present throughout the C++ standard library. For instance, locale facets define typedefs that include basic_string: in a mixed debug/release program, should that typedef be based on the debug-mode basic_string or the release-mode basic_string? While the answer could be "both", and the difference hidden via renaming a la the debug/release containers, we must note two things about locale facets:

  1. They exist as shared state: one can create a facet in one translation unit and access the facet via the same type name in a different translation unit. This means that we cannot have two different versions of locale facets, because the types would not be the same across debug/release-mode translation unit barriers.

  2. They have virtual functions returning strings: these functions mangle in the same way regardless of the mangling of their return types (see above), and their precise signatures can be relied upon by users because they may be overridden in derived classes.

With the design of libstdc++ debug mode, we cannot effectively hide the differences between debug and release-mode strings from the user. Failure to hide the differences may result in unpredictable behavior, and for this reason we have opted to only perform basic_string changes that do not require ABI changes. The effect on users is expected to be minimal, as there are simple alternatives (e.g., __gnu_debug::basic_string), and the usability benefit we gain from the ability to mix debug- and release-compiled translation units is enormous.

Alternatives for Coexistence

The coexistence scheme above was chosen over many alternatives, including language-only solutions and solutions that also required extensions to the C++ front end. The following is a partial list of solutions, with justifications for our rejection of each.

  • Completely separate debug/release libraries: This is by far the simplest implementation option, where we do not allow any coexistence of debug- and release-compiled translation units in a program. This solution has an extreme negative affect on usability, because it is quite likely that some libraries an application depends on cannot be recompiled easily. This would not meet our usability or minimize recompilation criteria well.

  • Add a Debug boolean template parameter: Partial specialization could be used to select the debug implementation when Debug == true, and the state of _GLIBCXX_DEBUG could decide whether the default Debug argument is true or false. This option would break conformance with the C++ standard in both debug and release modes. This would not meet our correctness criteria.

  • Packaging a debug flag in the allocators: We could reuse the Allocator template parameter of containers by adding a sentinel wrapper debug<> that signals the user's intention to use debugging, and pick up the debug<> allocator wrapper in a partial specialization. However, this has two drawbacks: first, there is a conformance issue because the default allocator would not be the standard-specified std::allocator<T>. Secondly (and more importantly), users that specify allocators instead of implicitly using the default allocator would not get debugging containers. Thus this solution fails the correctness criteria.

  • Define debug containers in another namespace, and employ a using declaration (or directive): This is an enticing option, because it would eliminate the need for the link_name extension by aliasing the templates. However, there is no true template aliasing mechanism in C++, because both using directives and using declarations disallow specialization. This method fails the correctness criteria.

  • Use implementation-specific properties of anonymous namespaces. See this post This method fails the correctness criteria.

  • Extension: allow reopening on namespaces: This would allow the debug mode to effectively alias the namespace std to an internal namespace, such as __gnu_std_debug, so that it is completely separate from the release-mode std namespace. While this will solve some renaming problems and ensure that debug- and release-compiled code cannot be mixed unsafely, it ensures that debug- and release-compiled code cannot be mixed at all. For instance, the program would have two std::cout objects! This solution would fails the minimize recompilation requirement, because we would only be able to support option (1) or (2).

  • Extension: use link name: This option involves complicated re-naming between debug-mode and release-mode components at compile time, and then a g++ extension called link name to recover the original names at link time. There are two drawbacks to this approach. One, it's very verbose, relying on macro renaming at compile time and several levels of include ordering. Two, ODR issues remained with container member functions taking no arguments in mixed-mode settings resulting in equivalent link names, vector::push_back() being one example. See link name

Other options may exist for implementing the debug mode, many of which have probably been considered and others that may still be lurking. This list may be expanded over time to include other options that we could have implemented, but in all cases the full ramifications of the approach (as measured against the design goals for a libstdc++ debug mode) should be considered first. The DejaGNU testsuite includes some testcases that check for known problems with some solutions (e.g., the using declaration solution that breaks user specialization), and additional testcases will be added as we are able to identify other typical problem cases. These test cases will serve as a benchmark by which we can compare debug mode implementations.

Other Implementations

There are several existing implementations of debug modes for C++ standard library implementations, although none of them directly supports debugging for programs using libstdc++. The existing implementations include:

  • SafeSTL: SafeSTL was the original debugging version of the Standard Template Library (STL), implemented by Cay S. Horstmann on top of the Hewlett-Packard STL. Though it inspired much work in this area, it has not been kept up-to-date for use with modern compilers or C++ standard library implementations.

  • STLport: STLport is a free implementation of the C++ standard library derived from the SGI implementation, and ported to many other platforms. It includes a debug mode that uses a wrapper model (that in some ways inspired the libstdc++ debug mode design), although at the time of this writing the debug mode is somewhat incomplete and meets only the "Full user recompilation" (2) recompilation guarantee by requiring the user to link against a different library in debug mode vs. release mode.

  • Metrowerks CodeWarrior: The C++ standard library that ships with Metrowerks CodeWarrior includes a debug mode. It is a full debug-mode implementation (including debugging for CodeWarrior extensions) and is easy to use, although it meets only the "Full recompilation" (1) recompilation guarantee.