Bypassing Return Flow Guard (RFG)

At the end of 2016, while checking for updates in Microsoft’s bounty program, I saw a reference to a new defense mechanism called “Return Flow Guard” (RFG). Since at that time I just finished the work on Liberation Guard, I took the time to check if can bypass this new protection method. This post will describe my attack on Microsoft’s Return Flow Guard, an attack that achieves full bypass of the protection method.

Return Flow Guard 101

The first step on my research was to gather as much information about the new defense mechanism. I quickly came upon this excellent technical review, by Tencent Xuanwu Lab. As I found out, RFG was a software implementation of a Intel’s stack hardware protection. In short, by using a duplicate stack (sometimes called “Shadow Stack”), a special hook at the end of each function, will compare the return address of the user with it’s shadow copy, aiming to prevent attacks on the user’s stack. This figure illustrates Microsoft’s software implementation of the shadow stack:

shadow stacks

Attack Plan #1 – Hijack RSP

What first caught my eye in Microsoft’s design, is the fact that RSP, the stack register, actually points to both stacks, as can be seen in the above figure. On 32 bit executables it is common that the function’s prologue stores EBP to the stack, and assigns EBP = ESP, while the function’s epilogue performs ESP=EBP, and restores EBP from the stack. This means that a Buffer-Overflow on the stack can change the stored EBP, which in turn will effect the ESP of the caller function.

shadow stacks - hijack RSP

 

Unfortunately, on 64 bit executables this pattern is rare. The compiler prefers to directly use SS:[RSP + X] to access stack variables, while the function’s epilogue simply adds the frame size back to RSP. Without the use of the “base pointer”, it will be much harder to “hijack” RSP. We will need a better plan.

Attack Plan #2 – Controlled Pair

A major flaw in Microsoft’s software implementation is the fact that the shadow stack resides in the user’s address space. The user has RW permissions on the shadow stack’s pages, and this leads us to our new attack plan:

shadow stacks - controlled pair

If we could find a “Write-What-Where-Double” primitive (or a “Controlled Pair” in short), we could be able to simultaneously override both return addresses, thus bypassing the validation check in the function’s epilogue.

We will need to achieve 3 goals for this attack plan:

  1. Locate the address of our stack
  2. Locate the address of the shadow stack
  3. Gain a Controlled Pair primitive

Cartography – Locate the Shadow Stack

Microsoft took special measures so that there won’t be any pointer to the shadow stack from the user’s address space. In my next post I’ll show how to easily overcome this obstacle, and efficiently and reliably locate the hidden shadow stack.

Cartography – Locate our thread’s stack

While there are several techniques for locating our thread’s stack, I tried a different approach this time. During my searches through MSDN, I found this extremely useful function: GetCurrentThreadStackLimits:

Retrieves the boundaries of the stack that was allocated by the system for the current thread.

Instead of manually searching the memory of our target for a stack pointer, we could simply call this function and get it for free. An additional bonus is that this new technique will work for every target process, offering a generic solution instead of a dedicated recon step for every target.

Gain a Controlled Pair primitive

While testing the new API function, I found out that it returns two results:

  • The stack’s base address
  • The stack’s top address

And since it needs to return two results, it has the following signature:

VOID WINAPI GetCurrentThreadStackLimits(
_Out_ PULONG_PTR LowLimit,
_Out_ PULONG_PTR HighLimit
);

And by sheer luck we found a candidate function for our controlled pair primitive. We now have to check several key issues:

  1. How does the function calculates the returned values?
  2. Can we control these values?

A fast debugging check shown that the values are simply taken from the thread’s context (TEB):

GetCurrentThreadStackLimits

more information about the thread’s context can be found here.

This answers our first question. We can now rephrase our second question:

  1. Is the thread’s context writable?
  2. Can we locate our thread’s context?

The answer to the first question is: Yes. Now we only need to find the context in the address space.

Locate the TEB

My first step was to find the address using a debugger, and now I only needed to find it somewhere in the memory. After several random walks in the memory, focusing mainly on the heap, I decided to take a look on the stack. Surprisingly a pointer to the TEB can be found 3 times in a row on the stack, also supplying us a clear indication for a “hit”. Here is a screen shot from iexplorer.exe:

teb_pointer

After a short investigation I found out the cause for this useful memory pattern. Most user processes, at least the ones that attackers are usually interested in, use windows API to create and use a pool of worker threads. As part of the creation of the the worker threads, the pointer to their TEB is stored 3 times to the stack, probably as part of some struct that is stored in the stack.

Wrapping it all together

Assuming we know the address of the shadow stack (will be presented in my next post), we can bypass RFG following these steps:

  1. Call GetCurrentThreadStackLimits() to find the stack’s base and the stack’s top
  2. Traverse the stack from it’s base, until the same address will be found 3 times
  3. Compare the TEB values to those returned in step #1
  4. Update both TEB values to the desired return address
  5. Call again GetCurrentThreadStackLimits() with two arguments:
    1. The address of the current return value on the thread’s stack
    2. The address of the matching return value on the shadow stack

The last step will give us the desired “Write-What-Where-Double” primitive, thus hijacking both addressees simultaneously.

RFG’s Status

After I finished my research, I waited for an official Windows 10 version (Creators Update), hoping  it will include Return Flow Guard. Unfortunately, in Jan 31, 2017, shortly after it was announced, Microsoft updated the bug bounty page and excluded RFG from the program. They later added that their Red Team found a flaw in the mechanism, and that Microsoft chose to wait to Intel’s hardware implementation of the Shadow Stack.

Conclusion

In this post I presented several attack scenario’s against Microsoft experimental Return Flow Guard protection mechanism. While the first attack scenario might work in some cases, the second plan presents a full exploit using a “Controlled Pair” primitive, bypassing RFG’s validation check. Although RFG was discontinued, I believe that the GetCurrentThreadStackLimits() API function will be useful in other scenarios, both for the recon phase and the actual exploitation phase.

Author: Eyal Itkin

Former white hat security researcher.

4 thoughts on “Bypassing Return Flow Guard (RFG)”

  1. Great article! Thanks for sharing.
    Question – why do we need to use “GetCurrentThreadStackLimits()” in the last step? why can’t we just overwrite the values?

    Like

    1. We can overwrite the values directly, however since our write primitive will probably be a result of a function call, we won’t be able to overwrite BOTH values in the same function. Since RFG checks the values on each RET, we need the “controlled pair” primitive to successfully overwrite both values in a single function call.

      Like

      1. Thanks for the answer!
        I get the “controlled pair” issue, but I think I don’t get something more basic… we have the memory address of the “current return value on the thread’s stack”, why can’t we just do something like DOWRD* addr = current_return_value_on_threads_stack; *addr = new_return_addr; (and the same for the second address..) ? Why do we need to overwrite the address with the help of a function call?
        Thanks again!

        Like

      2. We try to bypass RFG in a scenario in which we run inside a “vm” such as a flash / javascript / font interpreter. Using an initial vulnerability, we can for example call Array.write() to gain a write-what-where primitive. You can read more about such flash exploits in previous posts, or in rapid7’s posts.

        Like

Leave a comment