# Formal Verification for Hardware Security

Mengjia Yan Spring 2024

Slides credit: Sharad Malik (Princeton)







https://twitter.com/gf\_256/status/1321677851633029120/

## **Recall Hardware Bugs**





Implementation does not match specification (Errata)

Bugs in the specification



Vague specification

## **Program/Design Testing**



Program testing can be quite effective for showing the presence of bugs, but is hopelessly inadequate for showing their absence. - Edsger Dijkstra

- In principle: *Exhaustive* testing can prove correctness
- In practice: Test cases are generated to cover <u>some (not all)</u> inputs/statements/branches/paths etc.

## **Program/Design Verification**



The goal: (under some conditions), program verifier

- can provide a proof (if program is right)
- or provide a counterexample (if program is wrong)

## **Formal Verification**

"Verification": formally **prove** that the program/design is correct

- Rigor: uses well established mathematical foundations
- Exhaustiveness: considers all possible program behaviors
- Automation: uses computers to verify programs!



# Overall, it is a search problem...

## How does formal verification work?



Program verification, program synthesis, test generation, etc. Some SystemVerilog Code + Assertion check for specification violation

Symbolic execution, model checking, invariant generation, etc.

(! (= a (\* 2 (+ 10 b))))

SAT, SMT, BDDs, proof systems, etc.

## Symbolic Execution: A Simple Example #1

```
C code:
                                     Rosette code:
int hash(int z){
                                     (define (hash z)
   return (z+10)*2;
                                       (* (+ z 10) 2)
int obscure(int x, int y)
                                     (define (obscure x y)
{
                                       (if (= x (hash y)))
   if (x==hash(y))
                                                (assert #t)
      assert(false);
                                                (- x y))
   return 1;
                                                              DEMO
                     How will fuzzing
                      behave to find
                       this error?
```

## A Simple Example #2

```
int hash2(int z){
   if (z>10)
     z = z - 10;
   return z;
}
int obscure(int x, int y)
{
  if (x==hash2(y))
     error();
   return x-y;
```

- Build execution tree with all the execution paths
- Each execution path has logical formula to describe path conditions
- The common pitfall: extremely large formula -> memory overhead and scalability issue

### How does formal verification work?



Program verification, program synthesis, test generation, etc.

int hash2(int z){ if (z>10) z = z - 10;return z; int obscure(int x, int y) if (x==hash2(y)) error(); return x-y;

=> Linux kernel, crypto libraries, processor Verilog code...

Symbolic execution, model checking, invariant generation, etc.

(! (= a (\* 2 (+ 10 b)))))

![](_page_10_Picture_7.jpeg)

Success with SAT is at the heart of formal reasoning about systems.

## **Big Advancements in the Past Decade**

#### (1) SAT: is a Boolean formula f satisfiable?

![](_page_11_Figure_2.jpeg)

(2) SMT (Satisfiability Modulo Theory): is a first-order logic formula theory-satisfiable?

![](_page_11_Figure_4.jpeg)

### SAT in a Nutshell

Given a propositional logic (Boolean) formula, find a variable assignment such that the formula evaluates to 1, or prove that no such assignment exists.

$$F = (a + b)(a' + b' + c)$$

 $\Box$  For *n* variables, there are  $2^n$  possible truth assignments to be checked.

![](_page_12_Picture_4.jpeg)

□ First established NP-Complete problem.

S. A. Cook, The complexity of theorem proving procedures, Proceedings, Third Annual ACM Symp. on the Theory of Computing, 1971, 151-158

### Where are we today?

- Complexity of SAT: NP-complete
  - But often tractable in practice
- Intractability of the problem no longer daunting
  - Can regularly handle practical instances with millions of variables and constraints
- SAT has matured from theoretical interest to practical impact
  - Electronic Design Automation (EDA)
    - Widely used in many aspects of chip design
  - Increasing use in software verification
    - Commercial use at Microsoft, Amazon,...

### **Problem Representation**

#### Conjunctive Normal Form (CNF)

- Representation of choice for modern SAT solvers
- Every clause needs to be evaluated to TRUE

![](_page_14_Figure_4.jpeg)

## SAT Solvers: A Condensed History

#### Deductive

- Davis-Putnam 1960 [DP]
- Iterative existential quantification by "resolution"
- Backtrack Search
  - Davis, Logemann and Loveland 1962 [DLL]
  - Exhaustive search for satisfying assignment
- Conflict Driven Clause Learning [CDCL]
  - GRASP: Integrate a constraint learning procedure, 1996
- Locality Based Search
  - Emphasis on exhausting local sub-spaces, e.g. Chaff, Berkmin, miniSAT and others, 2001 onwards

 $\bigcirc$ 

- Added focus on efficient implementation
- □ "Pre-processing"
  - Peephole optimization, e.g. miniSAT, 2005

We cover these two algorithms to give you a taste of how the search works.

![](_page_16_Figure_1.jpeg)

![](_page_16_Picture_2.jpeg)

M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Communications of the ACM, 5:394–397, 1962

| ) |
|---|
| ) |
| ) |
| ) |
|   |

![](_page_17_Picture_2.jpeg)

![](_page_18_Figure_1.jpeg)

![](_page_19_Figure_1.jpeg)

![](_page_20_Figure_1.jpeg)

Think about the search performance:

- What factor determines how fast we find a SAT assignment?

How fast a conflict is detected. Order matters.

![](_page_21_Figure_1.jpeg)

![](_page_22_Figure_1.jpeg)

![](_page_23_Figure_1.jpeg)

![](_page_24_Figure_1.jpeg)

x1 + x4 x1 + x3' + x8' x1 + x8 + x12 x2 + x11 x7' + x3' + x9 x7' + x8 + x9' x7 + x8 + x10' x7 + x10 + x12'

J. P. Marques-Silva and Karem A. Sakallah, "GRASP: A Search Algorithm for Propositional Satisfiability", *IEEE Trans. Computers*, C-48, 5:506-521, 1999.

- x1 + x4
- x1 + x3' + x8'
- x1 + x8 + x12
- x2 + x11
- x7' + x3' + x9
- x7' + x8 + x9'
- x7 + x8 + x10'
- x7 + x10 + x12'

![](_page_26_Picture_9.jpeg)

![](_page_26_Picture_10.jpeg)

Red text means evaluated to 0, and green means evaluated to 1

For the graph on the left:

Blue circles means free variable, and brown circles mean inferred variable. Edge describes the inferred relationship.

- x1 + x4
- x1 + x3' + x8'
- x1 + x8 + x12
- $x^{2} + x^{11}$
- x7' + x3' + x9
- x7' + x8 + x9'
- x7 + x8 + x10'
- x7 + x10 + x12'

![](_page_27_Picture_9.jpeg)

![](_page_27_Picture_10.jpeg)

![](_page_28_Figure_2.jpeg)

![](_page_28_Picture_3.jpeg)

![](_page_29_Figure_2.jpeg)

![](_page_29_Figure_3.jpeg)

![](_page_30_Figure_2.jpeg)

![](_page_30_Figure_3.jpeg)

![](_page_31_Figure_2.jpeg)

![](_page_31_Figure_3.jpeg)

![](_page_32_Figure_2.jpeg)

![](_page_32_Figure_3.jpeg)

![](_page_33_Figure_1.jpeg)

![](_page_34_Figure_1.jpeg)

![](_page_35_Figure_1.jpeg)

![](_page_36_Figure_1.jpeg)

![](_page_37_Figure_1.jpeg)

![](_page_38_Figure_1.jpeg)

#### What's the big deal?

![](_page_39_Figure_1.jpeg)

Significantly prune the search space – learned clause is useful forever!

Useful in generating future conflict clauses.

## **Big Advancements in the Past Decade**

#### (1) SAT: is a Boolean formula f satisfiable?

![](_page_40_Figure_2.jpeg)

(2) SMT (Satisfiability Modulo Theory): is a first-order logic formula theory-satisfiable?

![](_page_40_Figure_4.jpeg)

### **The Basic SMT Problem**

• Determining the satisfiability of a logical formula with regards to some combination of background theories

![](_page_41_Figure_2.jpeg)

### **Background Theories**

.....

Uninterpreted Funs  $x = y \Rightarrow f(x) = f(y)$ Integer/Real Arithmetic  $2x+y=0 \land 2x-y=4 \Rightarrow x=1$ Floating Point Arithmetic  $x+1 \neq NaN \land x < \infty \Rightarrow x+1 > x$  $4 \cdot (x \gg 2) = x \& -3$ **Bit-vectors** Strings and RegExs  $x = y \cdot z \land z \in ab * \Rightarrow |x| > |y|$  $i = j \Rightarrow \text{store}(a, i, x) [j] = x$ Arrays Algebraic Data Types  $x \neq Leaf \Rightarrow \exists I, r : Tree(\alpha). \exists a : \alpha. x = Node(I, a, r)$ Finite Sets  $e1 \in x \land e2 \in x \setminus e1 \Rightarrow$  $\exists y, z : Set(\alpha). |y| = |z| \land x = y \cup z \land y \neq \emptyset$ **Finite Relations**  $(x, y) \in r \land (y, z) \in r \Rightarrow (x,z) \in r$ 

## CDCL(T): Key Idea

- SAT solver handles Boolean structure of the formula
  - Treat each atomic formula as a propositional variable
  - Resulting formula is called a *Boolean abstraction (B)*
- Example

$$=: \underbrace{(x=z)}_{b1} \land \underbrace{((y=z \land x = z+1)}_{b2} \lor \neg \underbrace{(x=z)}_{b3})$$

B(F): b1 ∧ ((b2 ∧ b3) ∨ ¬b1)

Boolean abstraction (B) is defined inductively over formulas B is a bijective function, B<sup>-1</sup> also exists

$$B^{-1}$$
 (b1  $\land$  b2  $\land$  b3): (x=z)  $\land$  (y=z)  $\land$  (x=z+1)

B<sup>-1</sup> (b1 ∨ b2'): (x=z) ∨ ¬(y=z)

## CDCL(T): Key Idea

![](_page_44_Figure_1.jpeg)

![](_page_44_Picture_2.jpeg)

B(F) is an over-approximation of F

- Use SAT solver to decide satisfiability of B(F)
  - If B(F) is Unsat, then F is Unsat
  - If B(F) has a satisfying assignment A, F may still be Unsat
- Example: b1, b2, b3 are not independent propositions! SAT solver finds a satisfying assignment A: b1 ^ b2 ^ b3 But, B<sup>-1</sup>(A) is unsatisfiable modulo theory (x=z) ^ (y=z) ^ (x=z+1) is not satisfiable

## **CDCL(T): Simple Version**

- 1. Generate a Boolean abstraction B(F)
- 2. Use SAT solver to decide satisfiability of B(F)
  - If B(F) is Unsat, then F is Unsat
  - Otherwise, find a satisfying assignment A
- 3. Use theory solver to check if  $B^{-1}(A)$  is satisfiable modulo T
  - If B<sup>-1</sup>(A) is satisfiable modulo theory T, then F is satisfiable
  - Otherwise, B<sup>-1</sup>(A) is unsatisfiable modulo T
     Add ¬A to B(F), and backtrack in SAT

Repeat (2, 3) until there are no more satisfying assignments

## Interacting with SAT/SMT Solvers

A counterexample is generated. You can use it to fix your program.

![](_page_46_Picture_2.jpeg)

Interact with a solver

A proof is generated. Your program is bug-free!

![](_page_46_Picture_5.jpeg)

👔 (most of the time) ...

Clueless. Basically the solver does not generate (a result since the search cannot complete.

Need to consult other approaches, which require formal-method expertise: Induction proof, find invariants, theorem proving, etc. If interested, check out 6.512 https://frap.csail.mit.edu/main

## **Verifying Hardware Designs**

• Hardware RTL code works as if a big loop

![](_page_47_Figure_2.jpeg)

```
module divideby3FSM (input clk, input reset, output q);
    reg [1:0] state, nextstate;
    always @ (posedge clk) // state register
        if (reset) state <= 2'b00;</pre>
        else state <= nextstate;</pre>
    always @ (*) // next state logic
        case (state)
        2'b00: nextstate = 2'b01;
        2'b01: nextstate = 2'b10;
        2'b10: nextstate = 2'b00;
        default: nextstate = 2'b00;
    endcase
    assign q = (state == 2'b00); // output logic
endmodule
```

## **Toolchains to Verify Hardware**

![](_page_48_Figure_1.jpeg)

## An Example: Verify ISA Correctness

![](_page_49_Figure_1.jpeg)

**RISC-V** Instruction Set Specification

- Question 1: What assertion should we put into our RTL code?
- Question 2: If I have a 5-stage pipelined processor, when do I place the assertion?
- Question 3: If I want to catch some bypass bugs, how should I initialize the state of the processor?

## **A Tentative Plan**

![](_page_50_Figure_1.jpeg)

The instruction encoding below follows ARM ISA, different from RISCV from the last slide.

```
assign ADD_retiring = (pre.opcode & 16'b1111_1110_0000_0000) == 16'b0001_1000_0000_0000;
assign ADD_result = pre.R[pre.opcode[8:6]] + pre.R[pre.opcode[5:3]];
assign ADD_Rd = pre.opcode[2:0];
assert property (@(posedge clk) disable iff (reset_n)
ADD retiring |-> (ADD result == post.R[ADD Rd]));
```

End-to-End Verification of ARM® Processors with ISA-Formal; Reid et al.; CAV'16

## **A Problem: Register Renaming**

• A performance optimization to resolve WAW (write-after-write) data dependency

![](_page_51_Figure_2.jpeg)

- Modern out-of-order processors do register renaming on-the-fly
  - Many different implementations, check out 6.823/6.5900
- Problem: How do we verify such processors?

Shadow logic to implement correct renaming logic

## Summary

• Formal Verification: rigor, exhaustiveness, automation

![](_page_52_Figure_2.jpeg)

For hardware verification: often needs domain expertise to translate specification to assertions

See symbolic execution as an example There exist many other approaches: model checking, theorem proving, etc.

See some algorithms for SAT and SMT Understand how complex and unpredictable the solver's performance can be

## **Next: Recitations**

## - RISCV System Programming -Hardware Formal Verification Toolchains

![](_page_53_Picture_2.jpeg)

![](_page_53_Picture_3.jpeg)