Comment on page
Getting started with storage diffs
Storage diffs are the unsung heroes of the EVM compliant blockchains. They ensure all nodes agree on the current blockchain state, enabling smart contracts to function correctly, saving resources, and enhancing security by detecting unauthorised changes.
Obtaining storage diffs is a complex task that demands considerable computational resources, making it less accessible to the average consumer.
At the same time, they are incredibly difficult to decode making the information they hold not available to the majority of the network users. If you want to know the value at a specific memory location, you can use web3's
getStorageAt
function. However, when it comes to tracking storage changes within a block, it's a challenge. You typically only see the values before and after the entire block, which is less than ideal. To make matters more difficult, many memory locations are calculated as hashes of slots and keys, making it difficult to iterate over hash-maps without knowing the key in advance. Here is some more theory on the subject. Token Flow's Ethereum Data Warehouse features a readily available storage layout for verified contracts, structured for querying with ease. Our datasets include decoded data enriched with semantics - this means that the data set should be easy to use to a regular analyst, not just to smart contract experts.
Here are some examples of what data you can retrieve from the blockchain using our datasets.
We're stating by finding a smart contract that is simple enough to walk you through how we process and store storage diff data.
1
-- Top 10 most used smart contracts after block 16'000'000
2
-- Latest block at the time of the query was 18'426'535
3
4
SELECT
5
contract_address,
6
count(*) as total
7
FROM
8
storage_diffs
9
WHERE
10
block_number > 16000000
11
GROUP BY contract_address
12
ORDER BY total desc
13
LIMIT 10;

Token Flow Ethereum Data Warehouse (Snowflake / xs-warehouse)
The Wrapped Ether (WETH) smart contract, identified by the address 0xc02a...56cc2, stands out as the most widely used with almost 230 million interactions. Its simplicity makes it an excellent choice to kickstart our exploration of storage diffs decoding.
To start, we're going to do a simple analysis of the smart contract source code:
1
pragma solidity ^0.4.18;
2
3
contract WETH9 {
4
string public name = "Wrapped Ether";
5
string public symbol = "WETH";
6
uint8 public decimals = 18;
7
8
event Approval(address indexed src, address indexed guy, uint wad);
9
event Transfer(address indexed src, address indexed dst, uint wad);
10
event Deposit(address indexed dst, uint wad);
11
event Withdrawal(address indexed src, uint wad);
12
13
mapping (address => uint) public balanceOf;
14
mapping (address => mapping (address => uint)) public allowance;
15
16
...
17
}
The contract does not implement any abstraction and consists of 5 simple variables. There are no structures in this contract.
Starting from the top (reflecting the variable organisation in storage), we observe the following in the provided code excerpt:
name
- type: string, value: Wrapped Ether;symbol
- type: string, value: WETH;decimals
- type: uint8, value: 18;balanceOf
- type: mapping. It is a hashmap from address to uint;allowance
- type: double mapping. It is a double hashmap from pair of addresses to uint;
We're going to use JSON to structure the source code for future decoding.
Note that storage slot number starts at 0, and not at 1 (so the five variables above are slot 0 to 4 in order). Solidity type definitions allow us to know how may bytes a variable occupies in a slot.
1
{
2
"contract_name": "WETH9",
3
"slots": [
4
{
5
"slot_no": 0,
6
"variables": [
7
{
8
"name": "name",
9
"type": "string",
10
"size": 32
11
}
12
]
13
},
14
{
15
"slot_no": 1,
16
"variables": [
17
{
18
"name": "symbol",
19
"type": "string",
20
"size": 32
21
}
22
]
23
},
24
{
25
"slot_no": 2,
26
"variables": [
27
{
28
"name": "decimals",
29
"type": "uint8",
30
"size": 1,
31
}
32
]
33
},
34
{
35
"slot_no": 3,
36
"variables": [
37
{
38
"name": "balanceOf",
39
"type": "mapping",
40
"size": 32,
41
"from_type": "address",
42
"to_type": "uint"
43
}
44
]
45
},
46
{
47
"slot_no": 4,
48
"variables": [
49
{
50
"name": "allowance",
51
"type": "mapping",
52
"size": 32,
53
"from_type": "address",
54
"to_type": {
55
"type": "mapping",
56
"size": 32,
57
"from_type": "address",
58
"to_type": "uint"
59
60
}
61
}
62
]
63
}
64
],
65
"structs": []
66
}
Looking at the smart contract source code, we notice that slots 0, 1 and 2 (name, symbol and decimals) are constant values. Let's look at slots 3 ("balanceOf") and 4 ("allowance").
// Query using either location or variable and the name of the variable or
// using directly the slot number
SELECT
block_number,
variable,
slot,
prev_value,
curr_value
FROM storage_diffs
WHERE
contract_address = '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2'
and location like '%balanceOf%'
LIMIT 10;

// Query using either location or variable and the name of the variable or
// using directly the slot number
SELECT
block_number,
variable,
slot,
prev_value,
curr_value
FROM storage_diffs
WHERE
contract_address = '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2'
and slot = 4
LIMIT 10;

raw_location = 3[0x980cd96ca20d257c2d38236ab404462a929e08f6].0
Slot 3 is a simple map from address to uint. In our raw datasets the
mapping
type has a schema: slot[key0].field0
. We extracted the semantics of the variable "balanceOf" from the contract.
Since the key is already given as an address, it remains unchanged.
The ".0" denotes a potential structure field, but in this particular contract, there are no structures, so this value can be disregarded.
// This is what raw data looks like
location = 3[0x980cd96ca20d257c2d38236ab404462a929e08f6].0
slot = 3
variable = 3
type = MAPPING
key0 = 0x980cd96ca20d257c2d38236ab404462a929e08f6
field0 = 0
// This is what decoded data looks like
location = balanceOf[0x980cd96ca20d257c2d38236ab404462a929e08f6]
slot = 3
variable = balanceOf
type = MAPPING
key0 = 0x980cd96ca20d257c2d38236ab404462a929e08f6
field0 =
raw_prev_value = 0x46c61bea48f514d0
raw_curr_value = 0x2c97984e462734d0
Let's left-pad each location with zeros to full 32 bytes (or 64 characters). Values are always trimmed from the end.
Variables in slot 3 have a 32 bytes length - the entire slot has been used. For smaller size, we would trim from the end (i.e. size = 20 => 40 characters)
According to solidity types,
uint
is an alias to uint256
. How can we decode it? We used a helper function in python for this:#uint_decoder.py
def decode_uint(value: str | int) -> int:
return int(self.value, 16)
raw_prev_value =0x00000000000000000000000000000000000000000000000046c61bea48f514d0
prev_value = 5099794321103983824 (uint)
raw_curr_value = 0x0000000000000000000000000000000000000000000000002c97984e462734d0
curr_value = 3213204321103983824 (uint)
balanceOf[0x980c...e08f6]
shows how many tokens (the value) a specific address owns. Using slot 1 (symbol) and slot 2 (decimals) we know that this token's symbol is WETH and has 18 decimal places.The number of tokens belonging to
0x980c...e08f6
changed from 5.099 WETH
to 3.213 WETH
.raw_location = 4[0x9f7...fff55]. 0[0xe5c...be4e1].0
Slot 4 is a double mapping: address map to address map to uint. In our raw datasets the
double mapping
type has a schema: slot[key0].field0[key1].field1
.We extracted the semantics of the variable "allowance" from the contract.
The ".0" denotes a potential structure field, but in this particular contract, there are no structures, so this value can be disregarded.
// This is what raw data looks like
location = 4[0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55].0[0xe5c783ee536cf5e63e792988335c4255169be4e1].0
slot=4
type=DOUBLE MAPPING
key0=0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55
field0=0
key1=0xe5c783ee536cf5e63e792988335c4255169be4e1
field1=0
// This is what decoded data looks like
location=allowance[0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55][0xe5c783ee536cf5e63e792988335c4255169be4e1]
slot=allowance
key0=0x9f7ff7431302ca6ae2a56e2bb50ccc0babffff55
field0=
key1=0xe5c783ee536cf5e63e792988335c4255169be4e1
field1=
Using the same process as for slot 3, we will get these values:
raw_prev_value = 0x0000000000000000000000000000000000000000000000000000000000000000
prev_value = 0 (uint)
raw_curr_value = 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
curr_value = 115792089237316195423570985008687907853269984665640564039457584007913129639935 (uint)
allowance[0x9f7...fff55][0xe5c...be4e1]
shows the maximum number of tokens that address1 (key1
) can transfer from address0's (key0
) account. Using slot 1 (symbol) and slot 2 (decimals) we know that this token's symbol is WETH and has 18 decimal places.
The maximum value of the number of tokens that
0xe5c7...be4e1
can transfer from 0x9f7f...fff55
is 115...457.584 WETH
, the largest possible number on the chain.