Cross Platform Thread Local Storage (cont’d)

In the first post, I introduced the thread local storage we use across all platforms. The main goal was to provide similar interface and features as __declspec( thread ) variables provided by some compilers. The previous post reproduced in pure C++ what gcc and cl.exe allowed as extension, but on every platform that supports at least 1 thread local storage. The goal of this post is to extend the existing system to all types including complex classes.


Compilers only allow TLS on POD ( Plain Old Data ) types, no object can be used. This extension allows to use all types of objects, as long as it has a default constructor.

Allowing complex types in TLS leads to 2 challenges : allocating enough memory to contain the object and calling constructors and destructors on thread startup and exit.

In the previous version, the memory used for TLS was a table of 4 bytes-long items. We could have used this system to store a pointer to the complex object, but we tried to avoid dereferences as much as possible. The system is then changed to contain any size needed. Each TLS instance contains the index inside a buffer. This buffer is created for each new thread and contains the memory needed to store all objects. The base class constructor now takes the size of the object and tracks the total size in a static variable. When the thread starts, the buffer is allocated and all objects are constructed inside this buffer. But more on this later. This technique has the advantage of allocating a single chunk of memory for a thread. It avoids memory waste of multiple allocations.

 
class ThreadLocalStorageBase
{
    static int m_buffer_size = 0;
    static TLSIndex m_tls_identifier;
    int m_index;
 
protected:
    ThreadLocalStorageBase( int object_size )
    {
        object_size = ( object_size + Alignment - 1 ) & ~( Alignment - 1 );
        m_index = m_buffer_size;
        m_buffer_size += object_size;
    }
 
    void * GetThreadLocalStorage() const 
    {
        char * local_storage_table = reinterpret_cast<char*>( TLSGet( m_tls_identifier ) );
        return &local_storage_table[ m_index ];
    }
};
 
template<typename T>
class ThreadLocalStorage<T> : public ThreadLocalStorageBase
{
    ThreadLocalStorage() : ThreadLocalStorageBase( sizeof( T ) )
    {
    }
}

OS TLS has the property to initialize the value to zero the first time a thread accesses it. As we support any class, we need to provide a way to construct the objects. We decided to initialize all variables before entering the thread function. If variables are not used frequently, a lazy initialization mechanism can be implemented. In Mojito, a class encapsulates the OS thread object. We have then the place to call this initialization. The best mechanism depends on the way you will use TLS variables.

In both cases, when the thread is about to end, the allocated memory must be freed. So a call to the TLS finalizer must be done. ( pthread allows a destructor function to be registered, so you can take advantage of it ). Once again, in Mojito, the thread class is taking care of it.

A question that might have arose : how do you have access to all variables spread across the code? The trick is quite simple : in the constructor, register the variable into a single-linked list. This technique ensures that as soon as main is entered, the linked list is complete. I discovered this trick in the command line option package of LLVM

As we have a class for each type, a base class has been created. It takes care of creating the list and provides the functions to initialize and finalize the variables. It also keeps track of accessing the thread local buffer.

class ThreadLocalStorageBase
{
    ...
    ThreadLocalStorageBase * m_next;
    static ThreadLocalStorageBase * m_first = 0;
 
public:
    ThreadLocalStorageBase( int object_size )
    {
        ...
        m_next = m_first;
        m_first = this;
    }
 
     static void StartupOnThreadEnter()
     {
        void * local_storage_table;
 
        local_storage_table = malloc( m_buffer_size );
        memset( local_storage_table, 0, m_buffer_size );
 
        TLSSet( m_tls_identifier, (void*)local_storage_table );
 
        ThreadLocalStorageBase * current = m_first;
 
        while( current )
        {
            current->InitializeObject();
            current = current->m_next;
        }
    }
 
    static void CleanupOnThreadExit()
    {
       ThreadLocalStorageBase * current = m_first;
 
        while( current )
        {
            current->FinalizeObject();
            current = current->m_next;
        }
 
        free( TLSGet( m_tls_identifier ) );
    }
 
protected:
 
    virtual void InitializeObject() = 0;
    virtual void FinalizeObject() = 0;
};

When the base class is in place, here are the modifications made to the subclass

template< typename T >
class ThreadLocalStorage : public ThreadLocalStorageBase
{
public:
    operator const T&() const
    {
        void * thread_local_storage = GetThreadLocalStorage();
        return *reinterpret_cast<T*>( thread_local_storage );
    }
 
    ThreadLocalStorage & operator=( const T & value )
    {
        void * thread_local_storage = GetThreadLocalStorage();
        *reinterpret_cast<T*>( thread_local_storage ) = value;
        return *this;
    }
 
protected:
 
    virtual void InitializeObject()
    {
        new( GetThreadLocalStorage() ) T();
    }
 
    virtual void FinalizeObject()
    {
        static_cast<T*>( GetThreadLocalStorage() )->~T();
    }
};

We now have a complete implementation of TLS that supports all types of object. But what if you wanted to initialize your variable to a specific value. This last modifications uses the container object to store a default value and use it when initializing the TLS instance.

The code is very simple. We just include a constant instance of the variable’s type, that we fill using the constructor. We then provide this instance to the constructor.

template< typename T >
class ThreadLocalStorage : public ThreadLocalStorageBase
{
    const T m_default_value;
 
public:
    ThreadLocalStorage( const T & default_value = T() ) :
        ThreadLocalStorageBase( sizeof( T ) ),
        m_default_value( default_value )
    {
    }
 
protected:
 
    virtual void InitializeObject()
    {
        new( GetThreadLocalStorage() ) T( value );
    }
 
    virtual void FinalizeObject()
    {
        static_cast<T*>( GetThreadLocalStorage() )->~T();
    }
}

This concludes this pair of posts on Thread Local Storage as we use them in Mojito. Comments are warmly welcomed. Special thanks to Bjoern Knafla for his comments.

The source code can be found here