Note: A change is required to the core language.
1. Introduction
Dynamic function interception is a powerful technique used to alter the behaviour of function calls at runtime. This approach is essential in various domains, including debugging, testing, and security. However, current solutions often require significant modifications to the original code or rely on complex and error-prone techniques. This paper presents interceptor functions, an approach to dynamic function interception that provides a flexible and efficient way to extend or modify the behaviour of existing code.
Compiler vendors will find that interceptor functions are similar to interrupt service routines in that they may alter the stack and the CPU registers however they like, but that they must fully restore the stack and registers before jumping.
2. Motivation
In debugging, testing and security, an interceptor function will stand as a 'man in the middle' between a caller function and a callee function, in order to perform logging, synchronisation, error-checking, as well as checking global buffers for potentially malicious data (e.g. a buffer overrun exploit).
I need to put extra paragraphs in here on what Lorand said about
the similarity of interceptor functions to "function jump" in C--, which
enables optimisations and features in Haskell. Specifically:
* Continuation passing pool style
* Multi-return (return twice, return any number of types)
* An optimal alternative to wrapping a lambda in an
* Jump to an adapter function if the arguments need to be converted
3. Design considerations
Interceptor functions are designed to be modular, reusable, and efficient. The interceptor function intercepts the call to the original function, tweaking behaviour before and after the function call. The implementation of the original function remains unchanged.
An interceptor function will typically perform:
-
Logging: Write details of the function call to a log file.
-
Synchronisation: Lock a mutex to make thread-safe a library that is not thread-safe.
-
Dynamic Library Loading: Load a library using the
anddlopen / LoadLibrary
functions.dlsym / GetProcAddress
3.1. Syntax
Interceptor functions have an inflexible, rigid syntax:
extern "C" ... "RAND_bytes" (...) { auto h = dlopen ( "libcrypto.so.3" , RTLD_LAZY ); auto pf = dlsym ( h , "RAND_bytes" ); goto -> pf ; return ; }
-
An interceptor function must always be marked
, as name mangling is not possible when the return type and parameter types are unknown.extern "C" -
The name of the interceptor function is enclosed in quotes in order to allow names with symbols such as how Microsoft mangles names like
.? OpenSocket @@YAHHPEAD_N @Z -
The return type must always be
.... -
The parameter list must always be
.... -
A jump to another function must use the syntax
in order to distinguish it from the ordinarygoto ->
.goto -
A
statement must be without an expression, i.e.return
.return ;
The implementation of one interceptor function can be re-used for multiple
functions, for example if we are writing a library and want it to
export interceptors for three functions,
,
,
, then we can write:
std :: mutex m ; extern "C" ... "RAND_bytes" , "RAND_poll" , "RAND_seed" (...) { m . lock (); auto h = dlopen ( "libcrypto.so.3" , RTLD_LAZY ); assert ( nullptr != h ); auto pf = dlsym ( h , __func__ ); assert ( nullptr != pf ); goto -> pf ; m . unlock (); return ; }
3.2. Re-entrancy and recursion
Programmers can specify the maximum recursion depth for an interceptor
function using the
specifier. The syntax for this is as follows:
std :: recursive_mutex m ; extern "C" ... "RAND_bytes" (...) recursive ( 4u ) { std :: lock_guard my_lock ( m ); auto h = dlopen ( "libcrypto.so.3" , RTLD_LAZY ); auto pf = dlsym ( h , "RAND_bytes" ); goto -> pf ; return ; }
In the above example, the maximum recursion depth is 4. If the
programmer omits
, then it’s as if they wrote:
std :: recursive_mutex m ; extern "C" ... "RAND_bytes" (...) recursive ( __cpp_interceptor_max_recursion ) { std :: lock_guard my_lock ( m ); auto h = dlopen ( "libcrypto.so.3" , RTLD_LAZY ); auto pf = dlsym ( h , "RAND_bytes" ); goto -> pf ; return ; }
The preprocessor macro,
, is defined
by the compiler to be zero, however the programmer can redefine it in
their own translation unit:
std :: recursive_mutex m ; #define __cpp_interceptor_max_recursion 4u extern "C" ... "RAND_bytes" (...) { std :: lock_guard my_lock ( m ); auto h = dlopen ( "libcrypto.so.3" , RTLD_LAZY ); auto pf = dlsym ( h , __func__ ); goto -> pf ; return ; } #define __cpp_interceptor_max_recursion 12u extern "C" ... "RAND_seed" (...) { std :: lock_guard my_lock ( m ); auto h = dlopen ( "libcrypto.so.3" , RTLD_LAZY ); auto pf = dlsym ( h , __func__ ); goto -> pf ; return ; } #define __cpp_interceptor_max_recursion 0u /* remainder of translation unit goes here */
A program is ill-formed if it undefines
,
or if it is defined not to be an integer constant expression.
A value of zero for maximal recursion means that the function is not re-entrant and therefore
cannot be called recursively. This provides flexibility for programmers
to control the re-entrancy behaviour of interceptor functions based on
their specific requirements. If the recursion limit is reached, for
example if a function marked
is re-entered for a second
time, then
is called.
4. Possible implementations
4.1. x86_64 Linux - GNU compiler
GodBolt: https://godbolt.org/z/9drMfdro9
#include <cassert>// assert #include <iomanip>// hex, setfill, setw #include <iostream>// cout, endl #include <mutex>// mutex #include <dlfcn.h>// dlopen, dlsym // =============================================== // To change the library name or function name, // only edit the following two lines: #define MACRO_FILE libcrypto.so.3 #define MACRO_FUNC RAND_bytes // ----------------------------------------------- // Don't edit the next five lines: #define Q(x) #x #define QUOTE(x) Q(x) extern "C" void MACRO_FUNC ( void ); constexpr char str_func [] = QUOTE ( MACRO_FUNC ); constexpr char str_file [] = QUOTE ( MACRO_FILE ); // =============================================== std :: mutex m ; // A thread_local variable is used to store // the original return address because we // can neither use a caller-saved nor a // callee-saved register for this purpose thread_local void ( * original_return_address )( void ) = nullptr ; // The "Inwards" function does 4 things: // (1) Lock the mutex // (2) Store the original return address // inside a thread_local variable // (3) Log the calling of the function // (4) Return the address of the original // function extern "C" auto Inwards ( void ( * const arg )( void ) ) -> void ( * )( void ) { m . lock (); std :: cout << "Mutex is locked \n " ; assert ( nullptr == original_return_address ); // To ensure no recursion original_return_address = arg ; std :: cout << "Function called: '" << str_func << "' \n " ; auto const h = dlopen ( str_file , RTLD_LAZY ); assert ( nullptr != h ); auto const pf = dlsym ( h , str_func ); assert ( nullptr != pf ); return ( void ( * )( void )) pf ; } // The "Outwards" function does 3 things: // (1) Log the return of the original function // (2) Unlock the mutex // (3) Return the original return address extern "C" void ( * Outwards ( void ))( void ) { std :: cout << "Function '" << str_func << "' has returned \n " ; m . unlock (); std :: cout << "Mutex has been unlocked \n " ; auto const temp = original_return_address ; original_return_address = nullptr ; // To ensure no recursion return temp ; } __asm ( ".macro push_all \n " // Note that RAX is not pushed " pushfq \n " " push %rdi \n " " push %rsi \n " " push %rdx \n " " push %rcx \n " " push %r8 \n " " push %r9 \n " " push %r10 \n " " push %rbp \n " " push %rbx \n " " push %r12 \n " " push %r13 \n " " push %r14 \n " " push %r15 \n " ".endm \n " ".macro pop_all \n " // Note that RAX is not popped " pop %r15 \n " " pop %r14 \n " " pop %r13 \n " " pop %r12 \n " " pop %rbx \n " " pop %rbp \n " " pop %r10 \n " " pop %r9 \n " " pop %r8 \n " " pop %rcx \n " " pop %rdx \n " " pop %rsi \n " " pop %rdi \n " " popfq \n " ".endm \n " ".macro align_stack \n " "pushq %rsp \n " "pushq (%rsp) \n " "andq $-16, %rsp \n " ".endm \n " ".macro unalign_stack \n " "movq 8(%rsp), %rsp \n " ".endm \n " QUOTE ( MACRO_FUNC ) ": \n " "push_all \n " // save the entire CPU state "mov 14*8(%rsp), %rdi \n " // copy original return address into RDI "align_stack \n " "call Inwards \n " "unalign_stack \n " // -------- Address of original function is now in RAX // -------- Original return address is now in thread_local variable "pop_all \n " // restore the entire CPU state "add $8, %rsp \n " // remove old return address from top of stack "lea come_back_here(%rip), %r11 \n " // copy new return address into temp register "push %r11 \n " // set the new return address at top of stack "jmp *%rax \n " // jump to original function "come_back_here: \n " "push_all \n " "push %rax \n " // must save the return value "align_stack \n " "call Outwards \n " "unalign_stack \n " // -------- Original return address is now in RAX "mov %rax, %r11 \n " "pop %rax \n " // must restore the return value "pop_all \n " "jmp *%r11 \n " // jump back to caller ); // ====================================================== // ====================================================== // ====================================================== // ====================================================== // Now let's test it out: using std :: cout , std :: endl ; void PrintBytes ( char unsigned ( & arr )[ 8u ] ) { cout << "Random bytes = " ; for ( auto c : arr ) { cout << std :: hex << std :: setfill ( '0' ) << std :: setw ( 2u ) << ( unsigned ) c ; } cout << endl ; } int main ( void ) { cout << "First line in main. \n " ; auto const pRAND_bytes = ( int ( * )( char unsigned * , int )) & RAND_bytes ; char unsigned buf [ 8u ] = {}; PrintBytes ( buf ); int const retval = pRAND_bytes ( buf , 8 ); PrintBytes ( buf ); cout << "Return value from original function = " << retval << endl ; cout << "Last line in main. \n " ; }
4.2. x86_32 MS-Windows - MSVC compiler
GodBolt: https://godbolt.org/z/r6qxxhedh
#include <cassert>// assert #include <cstdint>// uintptr_t #include <iostream>// cout, endl #include <mutex>// recursive_mutex extern "C" void * __stdcall LoadLibraryA ( char const * ); extern "C" void * __stdcall GetProcAddress ( void * , char const * ); // =============================================== // To change the library name or function name, // only edit the following two lines: #define MACRO_FILE kernel32.dll #define MACRO_FUNC EnumResourceTypesA // ----------------------------------------------- // Don't edit the next five lines: #define Q(x) #x #define QUOTE(x) Q(x) extern "C" void MACRO_FUNC ( void ); constexpr char str_func [] = QUOTE ( MACRO_FUNC ); constexpr char str_file [] = QUOTE ( MACRO_FILE ); // =============================================== std :: recursive_mutex m ; // A thread_local variable is used to store // the original return address because we // can neither use a caller-saved nor a // callee-saved register for this purpose constexpr unsigned max_recursion = 16u ; thread_local unsigned count_recursion = -1 ; thread_local void ( * original_return_address [ max_recursion ])( void ); // The "Lock" function does 4 things: // (1) Lock the mutex // (2) Store the original return address // inside a thread_local variable // (3) Log the calling of the function // (4) Return the address of the original // function extern "C" auto __cdecl Inwards ( void ( * const arg )( void )) -> void ( * )( void ) { m . lock (); std :: cout << "Mutex is locked \n " ; ++ count_recursion ; assert ( count_recursion < max_recursion ); original_return_address [ count_recursion ] = arg ; std :: cout << "Function called: '" << str_func << "' (recursion depth = " << count_recursion << ") \n " ; auto const h = LoadLibraryA ( str_file ); assert ( nullptr != h ); auto const pf = GetProcAddress ( h , str_func ); assert ( nullptr != pf ); return ( void ( * )( void )) pf ; } // The "Unlock" function does 3 things: // (1) Log the return of the original function // (2) Unlock the mutex // (3) Return the original return address extern "C" auto __cdecl Outwards ( void ) -> void ( * )( void ) { std :: cout << "Function '" << str_func << "' has returned from recursive call " << count_recursion << " \n " ; m . unlock (); std :: cout << "Mutex has been unlocked \n " ; return original_return_address [ count_recursion -- ]; } void __declspec ( naked ) MACRO_FUNC ( void ) { // This function preserves every register except for EAX and ECX // (yes it preserves flags, special registers and floating point) __asm { pushfd mov eax , [ esp + 4 ] // save original return address push ebp push ebx push ecx push edi push edx push esi push ds push es push fs push gs push ss sub esp , 108 // Allocate space for FPU state fsave [ esp ] // Save FPU state to stack push eax // set 1st arg = original return address call Inwards add esp , 4 // remove 1st arg from stack frstor [ esp ] // Restore FPU state from stack add esp , 108 // Deallocate space for FPU registers pop ss pop gs pop fs pop es pop ds pop esi pop edx pop edi pop ecx pop ebx pop ebp popfd add esp , 4 // remove return address from top of stack lea edx , come_back_here // load new return address into temp register push edx // set new return address at top of stack jmp eax // jump to original function come_back_here : pushfd push eax // This is the original func's return value push ebp push ebx // ecx is not listed here because we need it for temp value push edi push edx // This could be also the original func's retval push esi push ds push es push fs push gs push ss sub esp , 108 // Allocate space for FPU state fsave [ esp ] // Save FPU state to stack call Outwards mov ecx , eax frstor [ esp ] // Restore FPU state from stack add esp , 108 // Deallocate space for FPU registers pop ss pop gs pop fs pop es pop ds pop esi pop edx pop edi // ecx is not listed here because we need it for temp value pop ebx pop ebp pop eax popfd jmp ecx // jump back to the original return address } } // ====================================================== // ====================================================== // ====================================================== // ====================================================== // Now let's test it out: int __stdcall MyCallback ( void * const h , char const * const s , void ( * const arg )( void )) { // This callback function is invoked by EnumResourceTypesA, // and so if we invoke EnumResourceTypesA from within here, // we'll get recursion. if ( 0u == (( uintptr_t ) s & 0xFFFF0000 ) ) { if ( LoadLibraryA ( "user32.dll" ) != h ) { auto const pf = ( int ( __stdcall * )( void * , void * , void * )) arg ; pf ( LoadLibraryA ( "user32.dll" ), MyCallback , & EnumResourceTypesA ); } return 0 ; } if ( s && * s ) std :: cout << "Resource Type: " << s << std :: endl ; return 1 ; } int main ( void ) { std :: cout << "First line in main. \n " ; auto const pf = ( int ( __stdcall * )( void * , void * , void * )) & EnumResourceTypesA ; int const retval = pf ( LoadLibraryA ( "shell32.dll" ), MyCallback , & EnumResourceTypesA ); std :: cout << "Return value = " << retval << std :: endl ; std :: cout << "Last line in main. \n " ; }
5. Alternatives
The only alternative to writing an interceptor function in C++ is to write
an interceptor function in assembly language. Of course, if the library is to be
built for several different architectures, e.g. x86, ARM, MIPS, PowerPC,
RISC-V, HPPA, Itanium, SuperH 4 and Motorola 68K, then the programmer will
need to write an implementation for each architecture, ending up with a
something like:
OUTPUT := libmonkey.soARCH := $( shell uname -m) # Detect the architecture, e.g. x86_64 SOURCES_ASM := interceptors_$( ARCH) .S# Choose the right implementation SOURCES_CXX := main.cpp utils.cppOBJECTS := $( SOURCES_CXX:.cpp= .cpp.o) $( SOURCES_ASM:.S= .S.o) all : $( OUTPUT ) %.cpp.o : %.cpp $( CXX) -o$@ -std= c++26 -c $<%.S.o : %.S $( AS) -o$@ -c $<$(TARGET) : $( OBJECTS ) $( CXX) -o$@ -fPIC -shared -std= c++26 $^
6. Proposed wording
The proposed wording is relative to [N4950].
In subclause X.Y.Z [temp.deduct.general], append a paragraph under the heading "Words words words"
11 -- word word words word word words word word words word word words
7. Impact on the standard
This proposal is a change to the core language. The change to the core language is to be added to X.Y.Z [temp.deduct.general]. The addition has no effect on any other part of the standard.
8. Impact on existing code
No existing code becomes ill-formed. The behaviour of all existing code is unaffected.
9. Revision history
000 → 001
• In example code snippets, lock and unlock mutex
10. Acknowledgements
Thiago Macieira, Lorand Szollosi