Getting started with storage diffs

Why do storage diffs matter?

Storage diffs are the unsung heroes of the EVM compliant blockchains. They ensure all nodes agree on the current blockchain state, enabling smart contracts to function correctly, saving resources, and enhancing security by detecting unauthorised changes.

Obtaining storage diffs is a complex task that demands considerable computational resources, making it less accessible to the average consumer.

At the same time, they are incredibly difficult to decode making the information they hold not available to the majority of the network users. If you want to know the value at a specific memory location, you can use web3's getStorageAt function. However, when it comes to tracking storage changes within a block, it's a challenge. You typically only see the values before and after the entire block, which is less than ideal. To make matters more difficult, many memory locations are calculated as hashes of slots and keys, making it difficult to iterate over hash-maps without knowing the key in advance. Here is some more theory on the subject.

Token Flow's Ethereum Data Warehouse features a readily available storage layout for verified contracts, structured for querying with ease. Our datasets include decoded data enriched with semantics - this means that the data set should be easy to use to a regular analyst, not just to smart contract experts.

Here are some examples of what data you can retrieve from the blockchain using our datasets.

Finding a smart contract to decode

We're stating by finding a smart contract that is simple enough to walk you through how we process and store storage diff data.

-- Top 10 most used smart contracts after block 16'000'000 
-- Latest block at the time of the query was 18'426'535

SELECT 
  contract_address, 
  count(*) as total 
FROM 
  storage_diffs 
WHERE
  block_number > 16000000 
GROUP BY contract_address 
ORDER BY total desc  
LIMIT 10;

Decoding 0xc02a...56cc2 or WETH

The Wrapped Ether (WETH) smart contract, identified by the address 0xc02a...56cc2, stands out as the most widely used with almost 230 million interactions. Its simplicity makes it an excellent choice to kickstart our exploration of storage diffs decoding.

To start, we're going to do a simple analysis of the smart contract source code:

pragma solidity ^0.4.18;

contract WETH9 {
    string public name     = "Wrapped Ether";
    string public symbol   = "WETH";
    uint8  public decimals = 18;

    event  Approval(address indexed src, address indexed guy, uint wad);
    event  Transfer(address indexed src, address indexed dst, uint wad);
    event  Deposit(address indexed dst, uint wad);
    event  Withdrawal(address indexed src, uint wad);

    mapping (address => uint)                       public  balanceOf;
    mapping (address => mapping (address => uint))  public  allowance;
    
    ...
}

The contract does not implement any abstraction and consists of 5 simple variables. There are no structures in this contract.

Starting from the top (reflecting the variable organisation in storage), we observe the following in the provided code excerpt:

  • name - type: string, value: Wrapped Ether;

  • symbol - type: string, value: WETH;

  • decimals - type: uint8, value: 18;

  • balanceOf - type: mapping. It is a hashmap from address to uint;

  • allowance - type: double mapping. It is a double hashmap from pair of addresses to uint;

We're going to use JSON to structure the source code for future decoding.

Note that storage slot number starts at 0, and not at 1 (so the five variables above are slot 0 to 4 in order). Solidity type definitions allow us to know how may bytes a variable occupies in a slot.

{
  "contract_name": "WETH9",
  "slots": [
    {
      "slot_no": 0,
      "variables": [
        {
          "name": "name",
          "type": "string",
          "size": 32
        }
      ]
    },
    {
      "slot_no": 1,
      "variables": [
        {
          "name": "symbol",
          "type": "string",
          "size": 32
        }
      ]
    },
    {
      "slot_no": 2,
      "variables": [
        {
          "name": "decimals",
          "type": "uint8",
          "size": 1,
        }
      ]
    },
    {
      "slot_no": 3,
      "variables": [
        {
          "name": "balanceOf",
          "type": "mapping",
          "size": 32,
          "from_type": "address",
          "to_type": "uint"
        }
      ]
    },
    {
      "slot_no": 4,
      "variables": [
        {
          "name": "allowance",
          "type": "mapping",
          "size": 32,
          "from_type": "address",
          "to_type": {
            "type": "mapping",
            "size": 32,
            "from_type": "address",
            "to_type": "uint"

          }
        }
      ]
    }
  ],
  "structs": []
}

Looking at the smart contract source code, we notice that slots 0, 1 and 2 (name, symbol and decimals) are constant values. Let's look at slots 3 ("balanceOf") and 4 ("allowance").

// Query using either location or variable and the name of the variable or
// using directly the slot number 

SELECT 
  block_number, 
  variable, 
  slot, 
  prev_value, 
  curr_value 
FROM storage_diffs 
WHERE 
  contract_address = '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2' 
  and location like '%balanceOf%'  
LIMIT 10; 
// Query using either location or variable and the name of the variable or
// using directly the slot number 

SELECT 
  block_number, 
  variable, 
  slot, 
  prev_value, 
  curr_value 
FROM storage_diffs 
WHERE 
  contract_address = '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2' 
  and slot = 4 
LIMIT 10; 

Slot 3 - balanceOf

Location

raw_location = 3[0x980cd96ca20d257c2d38236ab404462a929e08f6].0

Slot 3 is a simple map from address to uint. In our raw datasets the mapping type has a schema: slot[key0].field0.

We extracted the semantics of the variable "balanceOf" from the contract.

Since the key is already given as an address, it remains unchanged.

The ".0" denotes a potential structure field, but in this particular contract, there are no structures, so this value can be disregarded.

// This is what raw data looks like
location = 3[0x980cd96ca20d257c2d38236ab404462a929e08f6].0
slot = 3
variable = 3
type = MAPPING
key0 = 0x980cd96ca20d257c2d38236ab404462a929e08f6
field0 = 0

// This is what decoded data looks like
location = balanceOf[0x980cd96ca20d257c2d38236ab404462a929e08f6]
slot = 3
variable = balanceOf
type = MAPPING
key0 = 0x980cd96ca20d257c2d38236ab404462a929e08f6
field0 = 

Previous value & current value

raw_prev_value = 0x46c61bea48f514d0
raw_curr_value = 0x2c97984e462734d0

Let's left-pad each location with zeros to full 32 bytes (or 64 characters). Values are always trimmed from the end.

Variables in slot 3 have a 32 bytes length - the entire slot has been used. For smaller size, we would trim from the end (i.e. size = 20 => 40 characters)

According to solidity types, uint is an alias to uint256. How can we decode it? We used a helper function in python for this:

#uint_decoder.py

def decode_uint(value: str | int) -> int:
    return int(self.value, 16)
raw_prev_value =0x00000000000000000000000000000000000000000000000046c61bea48f514d0
prev_value = 5099794321103983824 (uint)

raw_curr_value = 0x0000000000000000000000000000000000000000000000002c97984e462734d0
curr_value = 3213204321103983824 (uint)

Interpretation

balanceOf[0x980c...e08f6] shows how many tokens (the value) a specific address owns. Using slot 1 (symbol) and slot 2 (decimals) we know that this token's symbol is WETH and has 18 decimal places.

The number of tokens belonging to 0x980c...e08f6 changed from 5.099 WETH to 3.213 WETH.

Slot 4 - allowance

Location

raw_location = 4[0x9f7...fff55]. 0[0xe5c...be4e1].0

Slot 4 is a double mapping: address map to address map to uint. In our raw datasets the double mapping type has a schema: slot[key0].field0[key1].field1.

We extracted the semantics of the variable "allowance" from the contract.

The ".0" denotes a potential structure field, but in this particular contract, there are no structures, so this value can be disregarded.

// This is what raw data looks like
location = 4[0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55].0[0xe5c783ee536cf5e63e792988335c4255169be4e1].0
slot=4
type=DOUBLE MAPPING
key0=0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55
field0=0
key1=0xe5c783ee536cf5e63e792988335c4255169be4e1
field1=0

// This is what decoded data looks like
location=allowance[0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55][0xe5c783ee536cf5e63e792988335c4255169be4e1]
slot=allowance
key0=0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55
field0=
key1=0xe5c783ee536cf5e63e792988335c4255169be4e1
field1=

Previous value & current value

Using the same process as for slot 3, we will get these values:

raw_prev_value = 0x0000000000000000000000000000000000000000000000000000000000000000
prev_value = 0 (uint)

raw_curr_value = 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
curr_value = 115792089237316195423570985008687907853269984665640564039457584007913129639935 (uint)

Interpretation

allowance[0x9f7...fff55][0xe5c...be4e1] shows the maximum number of tokens that address1 (key1) can transfer from address0's (key0) account.

Using slot 1 (symbol) and slot 2 (decimals) we know that this token's symbol is WETH and has 18 decimal places.

The maximum value of the number of tokens that 0xe5c7...be4e1 can transfer from 0x9f7f...fff55 is 115...457.584 WETH, the largest possible number on the chain.

Last updated