TOCTOU: Funny Name for a Serious Bug

What is TOCTOU

Time-of-check, time-of-use — or TOCTOU — is a type of software bug that can lead to serious security vulnerabilities. At the time of writing, searching the keyword “TOCTOU” in the Common Vulnerabilities Database returns 94 cases where a TOCTOU bug could be exploited maliciously. These cases show examples of arbitrary code execution, privilege escalation, unintended file deletion, and many other exploits in widely used software.

TOCTOU is a specific type of race condition. For a full technical description, Mitre’s list of common software weaknesses offers the following:

The software checks the state of a resource before using that resource, but the resource’s state can change between the check and the use in a way that invalidates the results of the check. This can cause the software to perform invalid actions when the resource is in an unexpected state.

In simpler terms, the program wants to do something but only if some condition is true.

if (condition) {
    doSomething();
}

However, a time-of-check to time-of-use bug may allow doSomething to execute while condition is not currently true. A TOCTOU bug occurs when another thread or process running in parallel changes the value of condition after the if check, but before the call to doSomething. The end result is that doSomething is called when condition is false, and this can lead to disastrous consequences.

Take, for example, a pseudo code program to prevent a checking account from being overdrawn.

if (accountBalance() > 0) {
   withdrawMoney();
}

Assume the account starts with $1. The account owner makes two transactions simultaneously, causing this code block to be executed twice in parallel. If both threads check the accountBalance before either thread has withdrawn any money, both threads will see $1 in the account. Then both threads will withdraw money, likely causing the account to become overdrawn.

This is a time-of-check to time-of-use bug because the state (accountBalance) is changed between the check (accountBalance() > 0) and the use withdrawMoney().

Classic Example

Wikipedia gives an excellent classical example of a Time of Check Time of Use bug. This example shows a race condition between multiple processes that allows an attacker to access a protected file without permission.

Victim
if (access("file", W_OK) != 0) {
   exit(1);
}

fd = open("file", O_WRONLY);
// Actuall writing over /etc/passwd
write(fd, buffer, sizeof(buffer));

Attacker


// After the access check
symlink("/etc/passwd", "file");
// Before the open, 
//  "file" -> password database

In this case, the vulnerable program intends to check if the user can write to a file using access and then if — and only if — the user has permission, it will open and write to the file.

However, the permission check and the actual opening of the file are not atomic, making it possible for some other process to interleave the permission “check” and “use” of that permission.

An attacker may be able to change the file to point to something that the user does not have access to read, in this case /etc/passwd, but because the victim program has already succeeded the call to access, it continues and opens the file anyway.

Assuming file was initially a symlink to a file that the attacker created and has write access to, the chain of events leading to a vulnerability are:

The victim program calls access to see if the attacker has write permission to file
The access check succeeds because file currently points to the text file created by the attacker
In another process, the attacker changes file to point to /etc/passwd which they do not have permission to write to
The victim program calls open("file", O_WRONLY) and allows the attacker to write to /etc/passwd

Thus the attacker was able to write to the highly confidential /etc/passwd file.

This bug’s root cause is a race condition involving the filesystem, where the victim program expected the execution to be atomic. Inter-process race conditions like this have led to many security vulnerabilities. However, TOCTOU is possible anywhere parallelism can occur.

Multi-threaded TOCTOU

Although the classic examples of TOCTOU usually show a race on the filesystem between different processes, TOCTOU can be just as dangerous in multi-threaded software.

Null Pointer Dereference

Consider the following example.

Object *global;

// Thread 1
if (global != nullptr) {

     // null dereference
    auto value = *global;
}

// Thread 2
global = nullptr;

In this example, the programmer attempted to avoid a nullptr dereference in thread one by only dereferencing Object *global if it is not null. However, as the check and dereference are not done atomically, thread two can set global to be null after the check, but before the dereference. This results in thread one attempting to dereference nullptr.

One potential fix is to ensure that the check and use are made atomic and cannot be interleaved.

Object *global;

// Thread 1
pthread_mutex_lock(&lock);
if (global != nullptr) {        
    auto value = *global;
}
pthread_mutex_unlock(&lock);

// Thread 2
pthread_mutex_lock(&lock);
global = nullptr;
pthread_mutex_unlock(&lock);

Now the locks ensure that thread two can not interleave the check and dereference on thread one.

Although this particular case may seem relatively straightforward, this pattern can lead to serious security vulnerabilities in real software.

TOCTOU in Windows

A time-of-check to time-of-use race triggered a critical use after free vulnerability in Windows XP. The vulnerability allowed attackers to crash the system and potentially even execute arbitrary code with elevated privileges.

Based on the description, a rough recreation of what the vulnerability might have looked like is shown below.

struct Procedure {
  bool isProcessing;
  void (*function)(); // function pointer
};

std::list<Procedure*> pendingProcedures;

void* workerThread(void* arg) {
    while (!pendingProcedures.empty()) {
        auto it = pendingProcedures.begin();
        while (it != pendingProcedures.end()) {
            Procedure* proc = *it;
            if (proc->isProcessing) {
                continue;
            }
            proc->isProcessing = true;

            // Process Procedure
            proc->function();

            // Update "it" to next procedure in list
            // Remove proc from pendingProcedures

            delete proc;
        }
    }
}

First, notice there is a Procedure struct that contains a flag isProcessing. Next, there is a list of Procedure* called pendingProcedures. Lastly, there is a function called workerThread that loops over the list of pending Procedures and processes them.

The worker thread searches for a procedure that is not processing at the line if(proc->isProcessing). Once a procedure with isProcessing set to false is found, the thread “acquires” the procedure by setting isProcessing to true.

The trouble here is a race on the isProcessing flag. Multiple worker threads can acquire the same procedure through the following chain of events.

// Thread 1
auto proc = *it; [proc = 0xabc123]

if (proc->isProcessing) 
proc->isProcessing = true;

// Process proc
// ...
delete proc;

// Thread 2
auto proc = *it; [proc = 0xabc123]
if (proc->isProcessing) 


proc->isProcessing = true;


// proc has already been deleted
// Dangerous Use After Free!
proc->function();

When both worker threads acquire the same procedure for processing, it is likely one thread may process the procedure after another thread has already called delete proc. This causes a use after free error, and can potentially be exploited by malicious agents to execute arbitrary code.

Preventing TOCTOU

Despite the dire consequences, there is no consensus on how to detect and prevent TOCTOU bugs reliably.

For inter-process and filesystem-level TOCTOU race conditions, file locks, transactional operating systems, and other approaches have so far been proposed, but none have yet emerged as the de-facto solution.

Multi-threaded TOCTOU seem to be even more difficult to detect and prevent. Despite a wealth of research, there are very few, if any production ready tools for detecting TOCTOU bugs in concurrent programs. Tools like Valgrind or Intel Inspector may be able to detect the side effect of a TOCTOU bug (e.g. a use after free, race condition, or double delete), but neither can detect the TOCTOU directly. However, newer tools like Coderrect’s code scanner offer some support for detecting TOCTOU directly, as well as other types of concurrency bugs.

Overall, it seems that for now the best approach for preventing TOCTOU is developer awareness. As more developers become familiar with, and aware of, TOCTOU style race conditions they will be less likely to inadvertently allow TOCTOU bugs in to critical code. Nonetheless, mistakes are inevitable, and for those cases, we can rely on tools like Coderrect, Valgrind, and Intel Inspector to assist developers in detecting problems in their code.