Spectre Attacks Lab
Due Date: Mar 21; Last Updated Date: Feb 14
Table of Contents
- Introduction
- Part 0: Lab Infrastructure
- Part 1: Leaking Kernel Memory via Flush+Reload (35%)
- Part 2: Basic Spectre (40%)
- Part 3: Advanced Spectre (25%)
- Behind the Scene: How this lab infrastructure was developed?
Collaboration Policy
Our full Academic Honesty policy can be found on the Course Information page of our website. As a reminder, all 6.5950/6.5951 labs should be completed individually. You may discuss the lab at a high level with a classmate, but you may not work on code together or share any of your code.
Getting Started
Log into your assigned machine. Your credentials and machine information have been emailed to you. It will be one of the arch-sec-[1-4].csail.mit.edu
machines. To connect via ssh, run ssh username@arch-sec-X.csail.mit.edu
.
We are using git
for all the labs – instructions for setting up the git repository can be found on the labs page.
In addition to submitting code, you are required to submit a PDF lab report containing your answers to Discussion Questions to gradescope. We provide a markdown template in the starter code (report.md
).
Introduction
In this lab, you will complete the following tasks:
- Understand how Spectre works across privilege boundaries.
- Solve three CTF (capture-the-flag) puzzles with increasing difficulty levels. You will start with implementing the basic Spectre attack. We will then test your understanding of how hardware works by challenging you to implement an advanced Spectre in the last part of this lab.
Part 0: Lab Infrastructure
Interacting with Linux Kernel
The highlight of this lab is that you will implement your own version of the famous Spectre attack and use it to leak secrets from the Linux kernel, across privilege boundaries. It presents a good opportunity for you to understand the existing technique used to isolate kernelspace from userspace.
Our virtual address space is divided into the kernelspace and the userspace. Unprivileged application code resides in the userspace as shown in the figure below. There are several restrictions on the userspace code. The userspace code cannot directly access kernelspace data or directly branch into the kernelspace and execute kernel code. For example, the load 0xABCD
(a kernelspace virtual address) operation will trigger a page permission check failure, or segmentation fault
as reported when running a C program. Similarly, executing the instruction jump 0x1234
will also panic with a segmentation fault.
So how can the userspace code interact with the kernelspace and still ensure privilege isolation? The right way is to use kernelspace exposed API interface. When calling a correct kernelspace API, the code jumps to the kernelspace entrypoint (the only place in the kernel allows transition from userspace). The entrypoint code performs tons of work for context switch and then jumps to the requested API function.
In our lab infrastructure, we provide a custom Linux kernel module (blue box) sitting in the kernelspace. This kernel module provided a limited interface for userspace code to call into. The module is embedded with vulnerable Spectre gadgets, operates on some secret data (red box), and uses secret data as addresses to access the shared buffer (green box). Read the section at the end of this handout for more details about how the lab infrastructure is designed. Obviously, your code, residing in the userspace, will not be able to directly access the secret buffer in the kernelspace. Fortunately, we know that the kernelspace and userspace code, when they execute, share all the microarchitectural structures.

Lab Infrastructure Setup
The Secret
The secret in each part is a string of the form MIT{some_secret_value}
. The string can be up to 64 bytes, including the NULL terminator. You can consider the secret complete once you leak the NULL terminator.
The characters in the string may NOT be printable ASCII. Your code should be able to leak arbitrary 8-bit secrets byte by byte.
Do not make any assumption about the secret other than it is a NULL
terminated string of length up to 64 bytes (including the NULL
terminator). The secrets will not change from run to run (they are constant for the lifetime of the kernel module). During grading, we may use different secret values to evaluate your implementation.
Code Skeleton
inc/labspectre.h
andsrc-common/spectre_lab_helper.c
provide a set of utility functions for you to use.src-common/main.c
is used in all three parts. Themain
function sets up a shared memory region (shared_memory
corresponding to the green box in the figure above) of sizeSHD_SPECTRE_LAB_SHARED_MEMORY_SIZE
bytes, which is shared between the userspace and kernel. It also sets up a file descriptor for communicating with the kernel module. The technique behind this communication is called procfs write handling, detailed in the section at the end of this handout.inc/labspectreipc.h
contains bindings for the interface to the kernel module from userspace. You do not need to understand this, as our provided code handles the communication with the kernel.part1-src/attacker-part1.c
is the file you will modify in Part 1. The methodcall_kernel_part1
can be used for calling into the kernel module. The code for Part2 and Part 3 follow the exact same pattern.
Compile, Test, and Autograde
This lab will be autograded. After you hand in your code, we will embed different secret strings in the kernel and rerun your code to see whether it effectively leaks these strings. If your code works reliably with the autograder, you should expect no surprise with your grades. Instructions for compiling the code and running the autograder are below.
From the root directory, use make
to compile the project. The binaries part[1-3]
will be produced in the same directory (run them by calling ./part[1-3]
. The results of your code will be printed to the console – on success you should see the secret leaked from kernel memory printed to the console.
An example of the expected output is below:
$ ./part1
MIT{part1_secret_value}
You can invoke the autograder with ./check.py X
, where X
is the part to check.
An example of the expected output is below:
$ ./check.py 1
Checking part 1 ...
You passed 950 of 1000 runs (95.0%)
Success! Good job
You can check all parts at once with make
and then ./check.py all
Part 1: Leaking Kernel Memory via Flush+Reload (35%)
In this part you will set up a cache-based side channel to leak information from the kernel using Flush+Reload.
Get to Know the Victim
The pseudocode for the kernel victim code of Part 1 is shown below.
def victim_part1(shared_mem, offset):
secret_data = part1_secret[offset]
load shared_mem[4096 * secret_data]
The victim function takes a pointer shared_mem
and an integer offset
as input. Both variables are passed from the userspace and determined by the attacker, e.g., you. The variable shared_mem
points to the starting of the shared memory region, the green box in the figure above.
First, the code loads a secret byte from a secret array named part1_secret
, located inside kernelspace. The byte to leak is chosen by the attacker-controlled offset
. When the offset
is 0, the first secret byte will be loaded; when offset
is 1, the second byte will be loaded, and so on. Next, the victim multiplies the secret byte with 4096
and uses the result as an index into the shared memory array. For example, if the secret data was the character ‘A’ (0x41), then the first cache line of the 0x41’th page in the shared memory region will be loaded into the cache.
Your Attack Plan
Recall that the secret is a string up to 64 characters long (including the NULL
terminator). The attacker can leak the secret one byte at a time using Flush+Reload. Reuse your attack strategy from the Part 2 in the cache lab here, with the only difference at step 1, that is, the attacker needs to call the victim code to perform the secret-dependent memory access. Without losing generality, we summarize the attack outline for you below.
- Flush the memory region from the cache using
clflush
. - Call the victim method using the desired
offset
to leak the secret byte. - Reload the memory region, measure the latency of accessing each address, and use the latency to derive the value of the secret. When the value is
0x00
(i.e.NULL
), the attack is complete.
1-1 Discussion Question
Given the attack plan above, how many addresses need to be flushed in the first step?
Allowed Code
You can define your own helper methods as you desire. You can use any method in inc/labspectre.h
as well as the provided methods in part1-src/attacker-part1.c
.
You should only use the provided call_kernel_part1
method to interact with the kernel module. This function takes three arguments: a file descriptor to the kernel module, a pointer to the shared memory region, and an offset. kernel_fd
and shared_memory
can be directly passed to this method without modification. The offset
for a given invocation is up to you.
Build your attack step-by-step: start by leaking one character first, then try to leak the whole string.
1-2 Exercise
Implement the Flush+Reload attack in
part1-src/attacker-part1.c
to leak the secret string. Build the project withmake
and run./part1
from the main directory to see if you get the secret. Run./check.py 1
from the main directory to repeat the experiment multiple (5 by defualt) times.
Submission and Grading
Submit your code part1-src/attacker-part1.c
to your assigned Github repo. Full credit will be awarded to solutions that report the correct secret at least 80% of the time, while partial credit will be awarded for solutions which perform worse than that. Each attempt (i.e., each run of ./part1
) should take no longer than 30 seconds.
Part 2: Basic Spectre (40%)
Now that Flush+Reload is working, let’s move on to actually implementing a Spectre attack!
Get to Know the Victim
Below is the pseudocode for Part 2’s victim code. This victim is quite similar to Part 1, except it will only perform the load if the offset is within a specific range (e.g., offset<4
).
part2_limit = 4
def victim_part2 (shared_mem, offset):
secret_data = part2_secret[offset]
mem_index = 4096 * secret_data
# to delay the subsequent branch
flush(part2_limit)
if offset < part2_limit:
load shared_mem[mem_index]
2-1 Discussion Question
Copy your code in
run_attacker
fromattacker-part1.c
toattacker-part2.c
. Does your Flush+Reload attack from Part 1 still work? Why or why not?
Attack Outline
Below are the steps required to leak a single byte. You may need to alter your approach to account for system noise.
- Train the branch predictor to speculatively perform the load operation (i.e., take the branch).
- Flush the shared memory region from the cache using
clflush
. - Call the victim function with an offset beyond the limit, leaking the secret byte during speculative execution.
- Reload the memory region, measure the latency of accessing each address, and use the latency to determine the value of the secret.
As you’ve observed in previous labs, side channel attacks generally do not work on the first attempt. You should try to use the good practices you have learned from the cache lab when attempting for any microarchitectural attacks. For example,
- DO NOT measure while printing.
- To improve attack precision, you can repeat measurements multiple times and use statistical methods to decode secret.
- Try to avoid using systemcall-related functions during attack. Both the
printf
andsleep
functions trigger enough noise to seriously destruct your cache state and your branch predictor state.In addition, here is one more hint specific to the branch predictor. Modern processors employ branch predictors with significant complexity. Branch predictors can use global prediction histories, which allow different branches to interfere each other. Besides, the branch predictor is shared between userspace and kernel space. If the speculation is not working as expected, you may need to reduce the number of branches in your attack code.
2-2 Exercise
Implement the Spectre attack in
attacker-part2.c
to leak the secret string. Build the project withmake
and run./part2
to see if you get the secret. Run./check.py 2
to repeat the experiment multiple (5 by defualt) times.
2-3 Discussion Question
In our example, the attacker tries to leak the values in the array
secret_part2
. In a real-world attack, attackers can use Spectre to leak data located in an arbitrary address in the victim’s space. Explain how an attacker can achieve such leakage.
2-4 Discussion Question
Experiment with how often you train the branch predictor. What is the minimum number of times you need to train the branch (i.e.
if offset < part2_limit
) to make the attack work?
Submission and Grading
This part is graded in the same way as Part 1. Full credit will be awarded to solutions that report the correct secret at least 80% of the time, while partial credit will be awarded for solutions which perform worse than that. Each attempt (i.e., each run of ./part2
) should take no longer than 30 seconds.
Part 3: Advanced Spectre (25%)
Now that we’ve got our Spectre attack working, let’s try a harder version of the same problem.
Get to Know the Victim
Below is the pseudocode for Part 3:
part3_limit = 4
def victim_part3 (shared_mem, offset):
if offset < part3_limit:
false_dependency = lengthy computation # the computation result is 0
secret_data = part3_secret[offset]
mem_index = 4096 * secret_data
load shared_mem[mem_index + false_dependency]
There are two key differences in the victim code compared to Part 2. First, the victim no longer flushes the limit variable (partX_limit
) before the branch. Second, we have added a false dependency before the memory access, making the memory access start later in the speculation window.
If you copy run_attacker
from Part 2, you should see that your attack does not work with the new victim. This is because in the modified victim code, the memory access instruction we try to monitor may not be issued speculatively for three reasons:
- The speculation window becomes shorter. The speculation window starts at the cycle the branch (
if offset < part3_limit
) enters the processor, and ends at the cycle when the branch condition is resolved. If thepart3_limit
variable is cached, it will take a very short time to obtain its value, detect it is a branch misprediction, and squash the instructions after this branch. As a result, the speculative window becomes shorter. - The issue time of the secret-dependent memory access is delayed. Due to the data dependency between the
false_dependency
line and theload shared_mem
line, the secret-dependent memory access can only be issued after the variablefalse_dependency
is computed. It is possible that the branch condition is resolved before the speculative load even executes. - There is a hidden source of timing delay due to TLB misses. Feel free to refer to the section at the end of this handout for more information. You do not need to understand this factor for making your attack work.
To make your attack work, you will need to find a way to increase the speculation window such that the speculative load has a higher chance of occuring. Note that you cannot change the long latency memory address dependency. Similar as before, use the good practices for microarchitectural attacks: do not use systemcall-related functions during attack, such as
printf
andsleep
.
Note
If your implementation from Part 2 can pass the test for Part 3, congratulations and please reach out to us! We have designed this part to make basic Spectre attack implementations work ineffectively, and we’d be curious to learn how you made it work in one shot.
3-1 Exercise
Optimize the attack in
attacker-part3.c
to leak the secret string. Build the project withmake
and run./part3
to see if you get the secret. Run./check.py 3
to repeat the experiment multiple (5 by defualt) times.
3-2 Discussion Question
Describe the strategy you employed to extend the speculation window of the target branch in the victim.
3-3 Discussion Question
Assume you are an attacker looking to exploit a new machine that has the same kernel module installed as the one we attacked in this part. What information would you need to know about this new machine to port your attack? Could it be possible to determine this information experimentally? Briefly describe in 5 sentences or less.
Submission and Grading
Full credit will be awarded to solutions that report the correct secret at least 20% of the time, while partial credit will be awarded for solutions which perform worse than that. Each attempt (i.e., each run of ./part3
) should take no longer than 10 minutes. We will give partial credit if the attack can recover some part of the secret string.
You can check all parts at once with make
and then ./check.py all
As always, do not forget to include answers to the discussion questions in your lab report and submit the report to gradescope.
Behind the Scene: How this lab infrastructure was developed?
For those who are curious, here is a brief description of how this lab infrastructure was developped. The victims you are interacting with are part of a custom kernel module. You can find the source code of this kernel module in module-src/labspectrekm.c
. The communication between the userspace and the kernel module is handled using a technique called procfs write handling. Specifically, whenever the userspace code writes to a file (i.e., /proc/labspectre-victim
), a procfs write handler (i.e., spectre_lab_victim_write
function in module-src/labspectrekm.c
) in the kernel module will start to execute, using the written data (i.e., the local_cmd
variable in call_kernel_partX
functions) as the userbuf
arguments.
On the lab machine, SMAP (supervisor mode access prevention) and SMEP (supervisor mode execution prevention) are both on, which means that the kernel cannot directly read or execute userspace memory. You may wonder, in this case, how the kernel can read the shared_mem
array, which is located in userspace. This is done by temporarily remapping an alias of the shared memory region into the kernel space. What we end up with is two different virtual addresses, one in the userspace and one the kernel space, both mapping to the same physical address. This is similar to what we have seen in Part 2 of the cache lab, where two virtual addresses in two processes are mapped to the same physical address.
The interaction between the kernel module and the userspace code involves context switches. When the userspace code calls the kernel module (via the write
syscall), the processor transitions from the userspace to the kernelspace, which will flush some microarchitecture structures, such as TLBs. The custom kernel module will then create an alias mapping for the shared memory region and execute the requested function. Before returning to the userspace, it will unmap the shared region. Therefore, every time the kernel module is called, the first accesses to each page will incur TLB misses. In Parts 2, we deliberately prevent TLB misses to make your attack easier by forcing page walks before performing any secret-dependent memory accesses. In Part 3, these redundant accesses are removed. You will need to craft an advanced Spectre attack that can succeed despite the added latency due to TLB misses. So in Part 3, in addition to the false dependency, the TLB misses also contribute to the extra latency.
Acknowledgements
Contributors: Joseph Ravichandran, Mengjia Yan, Peter Deutsch.