Debug Output and Logging

Being able to log messages and object states is one of the best tools for a programmer. The target environment may not have enough memory or network connectivity, or your application is real-time, therefore attaching a remote debugger may not be possible. Logging may be one of the few options available to you.

Logging messages has some cost so we have to make them count! I think log messages should have the following properties and information:

  • Time stamp, for better understanding of events happening in relation with one another; you can use this to measure performance some of the time.
  • Thread name or process ID; if you have threads or do IPC, this information is vital.
  • Identifier for message filtering. You want to be able to turn some or all of your messages off when you don't need them. Separately, using a script you can programatically collect relevant data that follow specific identifiers, then use sequences of identifiers to construct state transition and better understand what your code was doing.
  • Consistent format, not only will it make it easier for you to read and separate your log messages from other program output, it'll make it possible to write that script to parse logs automatically for you.
  • Human readable. You may have to analyze large amount of log data over long periods of time yourself. Log messages in sensible formats that a normal text editor can help you highlight and search.

You should also support both C and C++ styles of logging:

  • Allow formatting mechanism exactly the same as fprintf().
  • Allow use of streaming operator << so that objects with multiple properties can be logged easily without having to write a lot of formatting code. Normal conversion should work automatically as if you are working with std::cout.

C Style Logging

Your C logging API should have ... in its function prototype, so that variable number of parameters can be passed in and formatted using vsnprintf():

void log( const char* format, ... )
{
  char buf[1024];
  va_list args;
  va_start(args, format);
  vsnprintf(buf, sizeof(buf), format, args);
  va_end(args);

  log_internal(buf);
}

C++ Style Logging

This should be an object inheriting from std::ostream, so you get the basic operations right away.

std::ostream takes pointer to a std::streambuf object for its constructor. When data is being streamed out to std::ostream, eventually it goes to the underlying std::streambuf that represents the destination. If you want to buffer output (better, since you won't trigger writes byte-by-byte), you use setp() to describe storage capacity of your buffer:

char buf[1024]; // Internal storage
...
setp(buf, buf + 1024); // 1KB available for output

When buffer is full, or without calling setp(), overflow() triggers so that your stream buffer implementation can handle character that has no where to go:

int Log::overflow( int c )
{
  log_internal(buf);
  reset();
  if( c != EOF ) {
    // since we send buffered data to output
    // we can now store c.
    buf[0] = c;
    pbump(1);
  }
  return c;
}

Since we inherited from std::streambuf, we can use its properties to access data when outputing or storing messages. pptr() is the address available for writing next byte. epptr() is end of the writable buffer. pbump() allows us to move pptr() forward after we store next part of the message.

When working with << operators, you can also pass in std::endl to append a new line character to your message. Internally this triggers a sync() call, allowing us to flush message. We can implement this function:

int Log::sync()
{
  if( (epptr() - pptr()) != sizeof(buf) ) {
    *(pptr()) = '\n'; // Append new line character.
    pbump(1);
    *(pptr()) = 0; // In case if output requires null-termination.
  }
  log_internal(buf);
  reset();
  return 0;
}

In the examples above, every time we flush (write all data to output), we reset internal buffer pointers by calling setp() again.

Since we want to unify the format of our log messages, it is important that we are able to decide what goes into the buffer, and when. Something like:

class Log : public std::ostream, public std::streambuf
{
public:
  Log() : std::ostream(this) 
  {
    log_internal(...); // Time stamp, etc.
  }

  virtual ~Log() 
  {
    // If user forgets to << std::endl, help them out when
    // log object go out of scope.
    sync();
  }

protected:  
  int overflow( int );
  int sync();

private:
  char buf[1024];

  void reset()
  {
    setp(buf, buf + sizeof(buf));
  }
};

Overloading Operator

We want to easily dump any object using << operator. One way to do this is define an interface:

class LogObject
{
public:
  virtual const char* describeSelf() const = 0;

  friend std::ostream& operator( std::ostream&, const LogObject& );
  friend std::ostream& operator( std::ostream&, const LogObject* );
};

Any class that inherits from this base class will implement its own describeSelf() function to convert its properties and states into a string for output. We can then overload the operator globally so that they will all stream to our Log class:

std::ostream& operator << ( std::ostream& o, const LogObject& obj )
{
  o << obj.describeSelf();
  return o;
}

std::ostream& operator << ( std::ostream& o, const LogObject* obj )
{
  o << obj->describeSelf();
  return o;
}

Output Destinations

While you may simply want to send your output to a file, here are a few other ideas.

If you develope with Visual Studio on Windows, you can send message to the debug output panel by calling OutputDebugStringA().

If you use message queues for your processes or threads, use log identifier as the envelope and message as body, send both to message queue for subscribe/publish.

This is the reason why above examples both use log_internal() to allow output destination to change.

Thread ID

On some platforms there is no gettid() call. You may decide to use std::this_thread::get_id() to query for an identifier. In that case, you should know that thread ID in this case isn't necessarily an integer. You can work around this by using a std::stringstream object, or use the ID with a map to create an integer identifier for yourself so that your code can be portable.

However, on Linux it is advantageous to save the integer thread ID, because you can look up its /proc entry to help you debug.

Oddities

When using C++ style output with above implementation, behavior for the following code fragments are different:

{
  Log() << "Hello World" << std::endl;
  Log() << "Second..." << std::endl;
}

and:

{
  Log() << "Hello World";
  Log() << "First!" << std::endl;
}

In the second case, due to missing std::endl, "Hello World" will not be flushed until end of the block, after "First!" is printed.