The Most Common Usage

When a variable is declared as volatile, it tells the compiler that even though the current code being compiled does not modify this variable, the corresponding memory data might be modified for other reasons. There are many possible reasons for this, such as the memory location corresponding to the variable being mapped to a peripheral port using the memory mapped I/O mechanism. Essentially, we are accessing a hardware register, and the value of that register can change independently of the program.

So, why do we need to tell the compiler about this? Because it ensures that during assembly code generation, every access to this variable will involve reading the memory location to get the most recent value. On the other hand, without volatile, the compiler might load the variable into a register for efficiency, and later accesses to the variable would read from the register without checking the memory again, causing the code to use stale data from the register. Commonly used functions like ioread encapsulate volatile operations to ensure the latest data is read. You can refer to build_mmio_read for specific definitions.

However, aside from this memory mapped I/O scenario and a few special cases, if a variable may be accessed concurrently by multiple processes, the volatile keyword should not be used to ensure that each process sees the latest value of the variable. The correct approach is to use locks to protect the variable. Once the lock is acquired, the protected variable only needs to be read from memory once and stored in a register, and subsequent accesses can use the register value, which is more efficient. Since the lock mechanism ensures no other process can modify the variable before we exit the critical section, the data in the register remains valid. Declaring the protected variable as volatile in this case would be unnecessary and could even hinder optimization within the critical section, forcing the compiler to read from memory every time, which is clearly inefficient.

Preventing Instruction Reordering

Another use case for volatile involves the READ_ONCE/WRITE_ONCE macros. Kernel comments mention that these macros prevent compiler reordering:

The compiler is also forbidden from reordering successive instances of
READ_ONCE and WRITE_ONCE

We’ll analyze the READ_ONCE macro in detail below.

From the kernel’s definition of this macro, its essence is that it uses the volatile keyword to modify the type of the variable. At first glance, it doesn’t seem like it could act as a compiler barrier to prevent compiler reordering.

#define READ_ONCE(x) \
({\
    compiletime_assert_rwonce_type(x); \
    __READ_ONCE(x); \
})
#define __READ_ONCE(x)  (*(const volatile __unqual_scalar_typeof(x) *)&(x))

So, let’s test it out.

First, a piece of C code:

int a, b;
int i, j;

void foo()
{
    a = i;
    b = j / 16;
}

Using gcc -O2 example.c -S generates the following assembly code:

movl    j(%rip), %edx // read j
movl    i(%rip), %eax // read i
testl   %edx, %edx
movl    %eax, a(%rip)
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, b(%rip)

Clearly, the read order of i and j is reversed compared to the C code. To prevent this optimization, let’s first try using a compiler barrier, barrier(), and see the effect.

#define barrier() __asm__ __volatile__("": : :"memory")

int a, b;
int i, j;

void foo()
{
    a = i;
    barrier();
    b = j / 16;
}

The assembly code now looks like this:

movl    i(%rip), %eax // read i
movl    %eax, a(%rip)
// Barrier here
movl    j(%rip), %edx // read j
testl   %edx, %edx
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, b(%rip)

Clearly, the barrier() works effectively. It informs the compiler that the code before the barrier and the code after the barrier belong to two different “worlds.” The instructions before the barrier cannot be reordered with those after, and vice versa. This means no instruction reordering can cross the barrier.

After witnessing the effect of the compiler barrier, let’s see if volatile can also act as a compiler barrier.

#define __READ_ONCE(x)  (*(const volatile int *)&(x))

int a, b;
int i, j;

void foo()
{
    a = __READ_ONCE(i);
    b = __READ_ONCE(j) / 16;
}

The resulting assembly code is:

movl    i(%rip), %eax // read i
movl    j(%rip), %edx // read j
movl    %eax, a(%rip) // write a
testl   %edx, %edx
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, b(%rip) // write b

Now, the read order of i and j is preserved. However, notice that while volatile can ensure the order of some instructions, it is still not a compiler barrier. Using __READ_ONCE, we can only guarantee the read order of i and j, but not the write order of a and b. Theoretically, the compiler could generate the following code:

movl    i(%rip), %eax // read i
movl    j(%rip), %edx // read j
movl    %eax, %ecx // temporarily store i in ecx
testl   %edx, %edx
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, b(%rip) // write b
movl    %ecx, a(%rip) // write i to a

Although this situation is difficult to construct, it’s theoretically possible.

The reason why the compiler enforces this behavior with volatile is due to the following C standard:

The least requirements on a conforming implementation are:

At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.

...
...
...

The following are the sequence points described in 5.1.2.3:

The end of a full expression: an initializer (6.7.8); the expression in an expression
statement (6.8.3); the controlling expression of a selection statement (if or switch)
(6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
expressions of a for statement (6.8.5.3); the expression in a return statement
(6.8.6.4).

Here, the concept of a sequence point is introduced. In simple terms, a sequence point ensures that the effects of expressions before it are completed and that subsequent accesses have not yet started. In plain language, a sequence point acts as a boundary, and before accessing any volatile variables after the point, all prior accesses must be completed. According to this standard, a semicolon (;) is considered a sequence point, so the a = __READ_ONCE(i) and b = __READ_ONCE(j) / 16 statements are separated by a sequence point, ensuring that i is accessed before j.

However, the compiler only guarantees the read order of volatile variables relative to each other. The read order between non-volatile and volatile variables may still be reordered.

For example, if we remove the __READ_ONCE from j:

#define __READ_ONCE(x)  (*(const volatile int *)&(x))

int a, b;
int i, j;

void foo()
{
    a = __READ_ONCE(i);
    b = j / 16;
}

The generated assembly would be:

movl    j(%rip), %edx // read j
movl    i(%rip), %eax // read i
testl   %edx, %edx
movl    %eax, a(%rip)
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, b(%rip)

As you can see, the read order of i and j has been reversed again.

At this point, we’ve thoroughly analyzed the role of READ_ONCE (i.e., volatile) in variable reads. It can ensure that variables are strictly read in the order specified by the code. Similarly, WRITE_ONCE ensures the write order of variables.

Now, what happens if READ_ONCE and WRITE_ONCE are used together? According to the C standard, the ordering guarantees are not limited to read-read or write-write operations, so read-write order will also be preserved. Let’s look at an example:

int a, b;
int i;

void foo()
{
    a = i / 16;
    b = 0;
}

The assembly code for this function is:

movl    $0, b(%rip) // write b
movl    i(%rip), %edx // read i
testl   %edx, %edx
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, a(%rip) // write a

As you can see, the read of i, the write to a, and the write to b are completely out of order.

Now, let’s use volatile:

#define __READ_ONCE(x)  (*(const volatile int *)&(x))
#define __WRITE_ONCE(x, val) do {*(volatile typeof(x) *)&(x) = (val);} while(0)

int a, b;
int i;

void foo()
{
    a = __READ_ONCE(i) / 16;
    __WRITE_ONCE(b, 0);
}

The resulting assembly code is:

movl    i(%rip), %edx // read i
movl    $0, b(%rip) // write b
testl   %edx, %edx
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, a(%rip) // write a

Now, the read of i and the write to b follow the exact order of the C code, indicating that volatile has taken effect. But the write to a still occurs after the write to b. This is because the write to a was not marked as volatile. If we rewrite the code like this:

__WRITE_ONCE(a, __READ_ONCE(i) / 16);
__WRITE_ONCE(b, 0);

The generated assembly would be:

movl    i(%rip), %edx // read i
testl   %edx, %edx
leal    15(%rdx), %eax
cmovns  %edx, %eax
sarl    $4, %eax
movl    %eax, a(%rip) // write a
movl    $0, b(%rip) // write b

Now, the order perfectly matches the C code. However, this can be simplified further because a depends on i, and the compiler will ensure that i is read before a is written. Therefore, the first __READ_ONCE can be removed, and the following code will have the same effect:

__WRITE_ONCE(a, i / 16);
__WRITE_ONCE(b, 0);

Postscript

The role of volatile in preventing reordering is quite clear in Java, where the JVM behaves much like an operating system. In Java, after compilation to bytecode, instruction reordering can still occur, leading to compiler optimizations. Therefore, the volatile keyword in Java explicitly prevents these optimizations, which has become common knowledge among Java developers. However, in C, the behavior of volatile is somewhat more obscure.