PXXXXR1
Interceptor Functions

New Proposal,

This version:
http://virjacode.com/papers/interceptor001.htm
Latest version:
http://virjacode.com/papers/interceptor.htm
Author:
Thomas PK Healy <healytpk@vir7ja7code7.com> (Remove all sevens from email address)
Audience:
SG17
Project:
ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21

Abstract

This paper proposes an approach to dynamic function interception, enabling the alteration of function calls at runtime without modifying the implementation of the original function. Interceptor functions provide a flexible and efficient way to extend or modify the behaviour of existing code, without needing to recompile the original code.

Note: A change is required to the core language.

1. Introduction

Dynamic function interception is a powerful technique used to alter the behaviour of function calls at runtime. This approach is essential in various domains, including debugging, testing, and security. However, current solutions often require significant modifications to the original code or rely on complex and error-prone techniques. This paper presents interceptor functions, an approach to dynamic function interception that provides a flexible and efficient way to extend or modify the behaviour of existing code.

Compiler vendors will find that interceptor functions are similar to interrupt service routines in that they may alter the stack and the CPU registers however they like, but that they must fully restore the stack and registers before jumping.

2. Motivation

In debugging, testing and security, an interceptor function will stand as a 'man in the middle' between a caller function and a callee function, in order to perform logging, synchronisation, error-checking, as well as checking global buffers for potentially malicious data (e.g. a buffer overrun exploit).

I need to put extra paragraphs in here on what Lorand said about the similarity of interceptor functions to "function jump" in C--, which enables optimisations and features in Haskell. Specifically:
* Continuation passing pool style
* Multi-return (return twice, return any number of types)
* An optimal alternative to wrapping a lambda in an std::function<...>
* Jump to an adapter function if the arguments need to be converted

3. Design considerations

Interceptor functions are designed to be modular, reusable, and efficient. The interceptor function intercepts the call to the original function, tweaking behaviour before and after the function call. The implementation of the original function remains unchanged.

An interceptor function will typically perform:

3.1. Syntax

Interceptor functions have an inflexible, rigid syntax:

extern "C" ... "RAND_bytes"(...)
{
    auto h  = dlopen( "libcrypto.so.3", RTLD_LAZY );
    auto pf = dlsym ( h, "RAND_bytes" );
    goto -> pf;
    return;
}

The implementation of one interceptor function can be re-used for multiple functions, for example if we are writing a library and want it to export interceptors for three functions, RAND_bytes, RAND_poll, RAND_seed, then we can write:

std::mutex m;

extern "C" ... "RAND_bytes","RAND_poll","RAND_seed"(...)
{
    m.lock();
    auto h  = dlopen( "libcrypto.so.3", RTLD_LAZY );
    assert( nullptr != h );
    auto pf = dlsym ( h, __func__ );
    assert( nullptr != pf );
    goto -> pf;
    m.unlock();
    return;
}

3.2. Re-entrancy and recursion

Programmers can specify the maximum recursion depth for an interceptor function using the recursive( ) specifier. The syntax for this is as follows:

std::recursive_mutex m;

extern "C" ... "RAND_bytes"(...) recursive(4u)
{
    std::lock_guard my_lock(m);
    auto h  = dlopen( "libcrypto.so.3", RTLD_LAZY );
    auto pf = dlsym ( h, "RAND_bytes" );
    goto -> pf;
    return;
}

In the above example, the maximum recursion depth is 4. If the programmer omits recursive(4u), then it’s as if they wrote:

std::recursive_mutex m;

extern "C" ... "RAND_bytes"(...) recursive(__cpp_interceptor_max_recursion)
{
    std::lock_guard my_lock(m);
    auto h  = dlopen( "libcrypto.so.3", RTLD_LAZY );
    auto pf = dlsym ( h, "RAND_bytes" );
    goto -> pf;
    return;
}

The preprocessor macro, __cpp_interceptor_max_recursion, is defined by the compiler to be zero, however the programmer can redefine it in their own translation unit:

std::recursive_mutex m;

#define __cpp_interceptor_max_recursion 4u

extern "C" ... "RAND_bytes"(...)
{
    std::lock_guard my_lock(m);
    auto h  = dlopen( "libcrypto.so.3", RTLD_LAZY );
    auto pf = dlsym ( h, __func__ );
    goto -> pf;
    return;
}

#define __cpp_interceptor_max_recursion 12u

extern "C" ... "RAND_seed"(...)
{
    std::lock_guard my_lock(m);
    auto h  = dlopen( "libcrypto.so.3", RTLD_LAZY );
    auto pf = dlsym ( h, __func__ );
    goto -> pf;
    return;
}

#define __cpp_interceptor_max_recursion 0u

    /* remainder of translation unit goes here */

A program is ill-formed if it undefines __cpp_interceptor_max_recursion, or if it is defined not to be an integer constant expression.

A value of zero for maximal recursion means that the function is not re-entrant and therefore cannot be called recursively. This provides flexibility for programmers to control the re-entrancy behaviour of interceptor functions based on their specific requirements. If the recursion limit is reached, for example if a function marked recursive(1u) is re-entered for a second time, then std::abort is called.

4. Possible implementations

4.1. x86_64 Linux - GNU compiler

GodBolt: https://godbolt.org/z/9drMfdro9

#include <cassert>    // assert
#include <iomanip>    // hex, setfill, setw
#include <iostream>   // cout, endl
#include <mutex>      // mutex

#include <dlfcn.h>    // dlopen, dlsym

// ===============================================
//   To change the library name or function name,
//   only edit the following two lines:
#define MACRO_FILE     libcrypto.so.3
#define MACRO_FUNC     RAND_bytes
// -----------------------------------------------
//   Don't edit the next five lines:
#define Q(x) #x
#define QUOTE(x) Q(x)
extern "C" void MACRO_FUNC(void);
constexpr char str_func[] = QUOTE(MACRO_FUNC);
constexpr char str_file[] = QUOTE(MACRO_FILE);
// ===============================================

std::mutex m;

// A thread_local variable is used to store
// the original return address because we
// can neither use a caller-saved nor a
// callee-saved register for this purpose
thread_local void (*original_return_address)(void) = nullptr;

// The "Inwards" function does 4 things:
// (1) Lock the mutex
// (2) Store the original return address
//     inside a thread_local variable
// (3) Log the calling of the function
// (4) Return the address of the original
//     function
extern "C" auto Inwards( void (*const arg)(void) ) -> void(*)(void)
{
    m.lock();
    std::cout << "Mutex is locked\n";

    assert( nullptr == original_return_address );  // To ensure no recursion

    original_return_address = arg;

    std::cout << "Function called: '" << str_func << "'\n";

    auto const h = dlopen(str_file, RTLD_LAZY);
    assert( nullptr != h );
    auto const pf = dlsym(h, str_func);
    assert( nullptr != pf );
    return (void(*)(void))pf;
}

// The "Outwards" function does 3 things:
// (1) Log the return of the original function
// (2) Unlock the mutex
// (3) Return the original return address
extern "C" void (*Outwards(void))(void)
{
    std::cout << "Function '" << str_func << "' has returned\n";

    m.unlock();
    std::cout << "Mutex has been unlocked\n";

    auto const temp = original_return_address;
    original_return_address = nullptr;  // To ensure no recursion
    return temp;
}

__asm(
".macro push_all\n"    // Note that RAX is not pushed
"    pushfq\n"
"    push %rdi\n"
"    push %rsi\n"
"    push %rdx\n"
"    push %rcx\n"
"    push %r8\n"
"    push %r9\n"
"    push %r10\n"
"    push %rbp\n"
"    push %rbx\n"
"    push %r12\n"
"    push %r13\n"
"    push %r14\n"
"    push %r15\n"
".endm\n"

".macro pop_all\n"    // Note that RAX is not popped
"    pop %r15\n"
"    pop %r14\n"
"    pop %r13\n"
"    pop %r12\n"
"    pop %rbx\n"
"    pop %rbp\n"
"    pop %r10\n"
"    pop %r9\n"
"    pop %r8\n"
"    pop %rcx\n"
"    pop %rdx\n"
"    pop %rsi\n"
"    pop %rdi\n"
"    popfq\n"
".endm\n"

".macro align_stack\n"
    "pushq %rsp\n"
    "pushq (%rsp)\n"
    "andq $-16, %rsp\n"
".endm\n"

".macro unalign_stack\n"
    "movq 8(%rsp), %rsp\n"
".endm\n"

QUOTE(MACRO_FUNC) ":\n"
    "push_all\n"                        // save the entire CPU state
    "mov 14*8(%rsp), %rdi\n"            // copy original return address into RDI
    "align_stack\n"
    "call Inwards\n"
    "unalign_stack\n"
    // --------  Address of original function is now in RAX
    // --------  Original return address is now in thread_local variable
    "pop_all\n"                         // restore the entire CPU state
    "add $8, %rsp\n"                    // remove old return address from top of stack
    "lea come_back_here(%rip), %r11\n"  // copy new return address into temp register
    "push %r11\n"                       // set the new return address at top of stack
    "jmp *%rax\n"                       // jump to original function
 "come_back_here:\n"
    "push_all\n"
    "push %rax\n"                       // must save the return value
    "align_stack\n"
    "call Outwards\n"
    "unalign_stack\n"
    // --------  Original return address is now in RAX
    "mov %rax, %r11\n"
    "pop %rax\n"                        // must restore the return value
    "pop_all\n"
    "jmp *%r11\n"                       // jump back to caller
);

// ======================================================
// ======================================================
// ======================================================
// ======================================================
// Now let's test it out:

using std::cout, std::endl;

void PrintBytes( char unsigned (&arr)[8u] )
{
  cout << "Random bytes = ";

  for ( auto c : arr )
  {
    cout << std::hex << std::setfill('0') << std::setw(2u) << (unsigned)c;
  }

  cout << endl;
}

int main(void)
{
  cout << "First line in main.\n";

  auto const pRAND_bytes = (int (*)(char unsigned*,int))&RAND_bytes;

  char unsigned buf[8u] = {};

  PrintBytes(buf);
  int const retval = pRAND_bytes( buf, 8 );
  PrintBytes(buf);

  cout << "Return value from original function = " << retval << endl;

  cout << "Last line in main.\n";
}

4.2. x86_32 MS-Windows - MSVC compiler

GodBolt: https://godbolt.org/z/r6qxxhedh

#include <cassert>    // assert
#include <cstdint>    // uintptr_t
#include <iostream>   // cout, endl
#include <mutex>      // recursive_mutex

extern "C" void *__stdcall LoadLibraryA(char const*);
extern "C" void *__stdcall GetProcAddress(void*, char const*);

// ===============================================
//   To change the library name or function name,
//   only edit the following two lines:
#define MACRO_FILE     kernel32.dll
#define MACRO_FUNC     EnumResourceTypesA
// -----------------------------------------------
//   Don't edit the next five lines:
#define Q(x) #x
#define QUOTE(x) Q(x)
extern "C" void MACRO_FUNC(void);
constexpr char str_func[] = QUOTE(MACRO_FUNC);
constexpr char str_file[] = QUOTE(MACRO_FILE);
// ===============================================

std::recursive_mutex m;

// A thread_local variable is used to store
// the original return address because we
// can neither use a caller-saved nor a
// callee-saved register for this purpose
constexpr unsigned max_recursion = 16u;
thread_local unsigned count_recursion = -1;
thread_local void (*original_return_address[max_recursion])(void);

// The "Lock" function does 4 things:
// (1) Lock the mutex
// (2) Store the original return address
//     inside a thread_local variable
// (3) Log the calling of the function
// (4) Return the address of the original
//     function
extern "C" auto __cdecl Inwards(void (* const arg)(void)) -> void(*)(void)
{
    m.lock();
    std::cout << "Mutex is locked\n";

    ++count_recursion;
    assert(count_recursion < max_recursion);
    original_return_address[count_recursion] = arg;

    std::cout << "Function called: '" << str_func << "' (recursion depth = " << count_recursion << ")\n";

    auto const h = LoadLibraryA(str_file);
    assert(nullptr != h);
    auto const pf = GetProcAddress(h, str_func);
    assert(nullptr != pf);

    return (void(*)(void))pf;
}

// The "Unlock" function does 3 things:
// (1) Log the return of the original function
// (2) Unlock the mutex
// (3) Return the original return address
extern "C" auto __cdecl Outwards(void)  -> void(*)(void)
{
    std::cout << "Function '" << str_func << "' has returned from recursive call " << count_recursion << "\n";

    m.unlock();
    std::cout << "Mutex has been unlocked\n";

    return original_return_address[ count_recursion-- ];
}

void __declspec(naked) MACRO_FUNC(void)
{
    // This function preserves every register except for EAX and ECX
    // (yes it preserves flags, special registers and floating point)

    __asm {
        pushfd
        mov eax, [esp+4]  // save original return address
        push ebp
        push ebx
        push ecx
        push edi
        push edx
        push esi
        push ds
        push es
        push fs
        push gs
        push ss
        sub esp, 108  // Allocate space for FPU state
        fsave [esp]   // Save FPU state to stack

        push eax      // set 1st arg = original return address
        call Inwards
        add esp, 4    // remove 1st arg from stack

        frstor [esp]  // Restore FPU state from stack
        add esp, 108  // Deallocate space for FPU registers
        pop ss
        pop gs
        pop fs
        pop es
        pop ds
        pop esi
        pop edx
        pop edi
        pop ecx
        pop ebx
        pop ebp
        popfd

        add esp, 4               // remove return address from top of stack 
        lea edx, come_back_here  // load new return address into temp register
        push edx                 // set new return address at top of stack
        jmp eax                  // jump to original function

    come_back_here:
        pushfd
        push eax                 // This is the original func's return value
        push ebp
        push ebx
        // ecx is not listed here because we need it for temp value
        push edi
        push edx                 // This could be also the original func's retval
        push esi
        push ds
        push es
        push fs
        push gs
        push ss
        sub esp, 108  // Allocate space for FPU state
        fsave [esp]   // Save FPU state to stack

        call Outwards
        mov ecx, eax

        frstor [esp]  // Restore FPU state from stack
        add esp, 108  // Deallocate space for FPU registers
        pop ss
        pop gs
        pop fs
        pop es
        pop ds
        pop esi
        pop edx
        pop edi
        // ecx is not listed here because we need it for temp value
        pop ebx
        pop ebp
        pop eax
        popfd

        jmp ecx       // jump back to the original return address
    }
}

// ======================================================
// ======================================================
// ======================================================
// ======================================================
// Now let's test it out:

int __stdcall MyCallback(void* const h, char const* const s, void (* const arg)(void))
{
    // This callback function is invoked by EnumResourceTypesA,
    // and so if we invoke EnumResourceTypesA from within here,
    // we'll get recursion.

    if ( 0u == ((uintptr_t)s & 0xFFFF0000) )
    {
        if ( LoadLibraryA("user32.dll") != h )
        {
            auto const pf = (int(__stdcall*)(void*, void*, void*))arg;
            pf( LoadLibraryA("user32.dll"), MyCallback, &EnumResourceTypesA );
        }

        return 0;
    }

    if ( s && *s ) std::cout << "Resource Type: " << s << std::endl;

    return 1;
}

int main(void)
{
    std::cout << "First line in main.\n";

    auto const pf = (int(__stdcall*)(void*, void*, void*))&EnumResourceTypesA;
    int const retval = pf( LoadLibraryA("shell32.dll"), MyCallback, &EnumResourceTypesA );

    std::cout << "Return value = " << retval << std::endl;

    std::cout << "Last line in main.\n";
}

5. Alternatives

The only alternative to writing an interceptor function in C++ is to write an interceptor function in assembly language. Of course, if the library is to be built for several different architectures, e.g. x86, ARM, MIPS, PowerPC, RISC-V, HPPA, Itanium, SuperH 4 and Motorola 68K, then the programmer will need to write an implementation for each architecture, ending up with a Makefile something like:

OUTPUT := libmonkey.so

ARCH := $(shell uname -m)    # Detect the architecture, e.g. x86_64

SOURCES_ASM := interceptors_$(ARCH).S    # Choose the right implementation

SOURCES_CXX := main.cpp utils.cpp

OBJECTS := $(SOURCES_CXX:.cpp=.cpp.o) $(SOURCES_ASM:.S=.S.o)

all: $(OUTPUT)

%.cpp.o: %.cpp
    $(CXX) -o $@ -std=c++26 -c $<

%.S.o: %.S
    $(AS) -o $@ -c $<

$(TARGET): $(OBJECTS)
    $(CXX) -o $@ -fPIC -shared -std=c++26 $^

6. Proposed wording

The proposed wording is relative to [N4950].

In subclause X.Y.Z [temp.deduct.general], append a paragraph under the heading "Words words words"

11 -- word word words
      word word words
      word word words
      word word words

7. Impact on the standard

This proposal is a change to the core language. The change to the core language is to be added to X.Y.Z [temp.deduct.general]. The addition has no effect on any other part of the standard.

8. Impact on existing code

No existing code becomes ill-formed. The behaviour of all existing code is unaffected.

9. Revision history

000001

     • In example code snippets, lock and unlock mutex

10. Acknowledgements

Thiago Macieira, Lorand Szollosi

References

Normative References

[N4950]
Thomas Köppe. Working Draft, Standard for Programming Language C++. 10 May 2023. URL: https://wg21.link/n4950