Storage diffs - some theory

What is storage? What are storage diffs?

Storage is a global and persistent repository used to store blockchain state changes.

Ethereum and other EVM-compatible chains use a structure known as the Merkle Patricia Tree to efficiently store and organize data within smart contracts. Each contract has its own storage, which is like a key-value store where data is stored in key-value pairs. Think of it as a vast map with 2^256 slots, each consisting of 32 bytes.

While storage permits both reading and writing, it's important to note that it can also be updated. Because it operates on-chain, storing data comes at a high cost, with expenses subject to volatility, primarily due to its substantial gas consumption.

Storage diffs refers to the changes made to the storage of a smart contract between two consecutive transactions or states.

Here's how storage diffs work in the EVM:

  1. Initial State: When a smart contract is deployed, it starts with an initial state, where the storage is essentially empty.

  2. Transactions: As transactions are executed on the contract (e.g., function calls or interactions), they can modify the values stored in the contract's storage. These modifications create a "diff" or a record of changes to the storage.

  3. Storage Changes: The EVM tracks and records the changes made to the storage during the execution of each transaction. It keeps a record of which storage slots have been modified and the new values associated with them.

  4. Final State: After all the transactions are executed, the EVM combines these diffs to produce the final state of the contract's storage. This final state reflects the cumulative changes made by all the transactions.

  5. Gas Costs: Modifying storage incurs gas costs in Ethereum. The more storage slots you change and the more data you modify, the higher the gas costs associated with the transaction.

Understanding storage diffs is crucial for developers, especially when optimising gas usage and analysing the state of smart contracts. It helps in tracking changes to a contract's data and can be useful for debugging and auditing smart contract behaviour.

Token Flow's Ethereum Data Warehouse features a readily available storage layout for verified contracts, structured for querying with ease. 😎

How does Ethereum store data in storage?

To grasp how Ethereum stores data, let's start by getting to know what a "slot" is.

A slot represents the smallest unit of memory that the Ethereum Virtual Machine (EVM) can allocate, with a size always equal to 32 bytes. This concept resembles blocks in traditional file systems like ext4 or NTFS, which aim to divide and manage larger memory spaces efficiently.

Consider this example :

contract TestContract{
	int a;
	bool b;
	int c;
}

The storage memory layout for TestContract looks like this :

Let’s examine another example with three contracts - A, B, and C - which are defined in files A.sol, B.sol, and C.sol, respectively.

File a.sol :

import "./B.sol";
import "./C.sol";

struct TestStruct{} 

contract A is B, C {   }

File b.sol :

import { TestStruct } from \\\\"./C.sol\\\\";

contract B {   
	bool[2] public arrBools;
  bytes16 public varBytes2; 
}

File c.sol :

struct TestStruct {uint256 value;  address owner;}

contract C {   
	bytes16 public varBytes1; 
	TestStruct[] public arrStructs;  
}

Let's explore how these contracts are organized in memory by following these steps:

1. Resolve imports:

  • File A.sol imports files B.sol and C.sol, granting access to all definitions stored there.

  • File B.sol utilizes a single named import from C.sol for the structure TestStruct.

  • File C.sol has no imports.

2. Resolve inheritance:

  • Contract A inherits from B and C, in that order.

  • Contracts B and C are root contracts with no parents.

3. Gather variables:

  • As contract B comes first in the inheritance order, its variables are: arrBools and varBytes2.

  • Next is contract C with variables arrStructs and varBytes1.

  • Finally, contract A has no variables.

4. Place variables in slots:

  • The first variable, arrBools, is a static array of bools with a size of 2. Static arrays always occupy an entire slot, so it lands in slot 1 with no space left.

  • The second variable, varBytes2, has a size of 16. It starts in slot 1, takes 16 bytes, and leaves 16 bytes free.

  • Next is arrStructs, which is a dynamic array of TestStructs. TestStruct has a uint256 and an address inside, totaling 52 bytes. However, dynamic arrays always take up one slot. They do not store data itself, but an address to memory where data is allocated. Hence, this address is placed in slot 2.

  • Finally, varBytes1 occupies 16 bytes in slot 3.

This is the final storage layout:

We can draw a few observations from this layout :

  1. Although file A.sol has an empty definition of TestStruct that is not used anywhere, we must keep in mind that each file has its own import table. Thus, we cannot arbitrarily select a definition with the same name. We must trace it back, file by file, to the one used by this specific file. This is not uncommon in production contracts spanning tens or hundreds of files where external libraries are embedded in the code.

  2. It is apparent that we can save storage space. In contract C, we can swap variables arrStructs and varBytes1, resulting in varBytes1 and varBytes2 occupying one slot, saving us a slot.

  3. Assuming that the code does not require this, varBools can be split into two normal bools rather than an array. This would not save space in contract A since the next variable takes up 32 bytes, but it could save space if contract C is inherited by other contracts or is used standalone.

The challenging aspect is determining the correct order of variables within the storage layout. Numerous rules and guidelines must be followed, such as structure nesting or handling dynamic/static arrays. However, the most crucial mechanism to comprehend is C3 Linearization and the Inheritance Mechanism.

Inheritance is a fundamental concept in object-oriented programming that allows for the creation of new classes based on existing ones. Ethereum's Solidity language employs the C3 Linearization algorithm, also known as Method Resolution Order (MRO), to manage multiple inheritance and resolve potential conflicts.

C3 Linearization ensures that the inheritance hierarchy is consistently linearized, simplifying reasoning about the order in which base contracts are initialized and functions are called. This mechanism guarantees that all base contracts are initialized in the correct order, particularly important when handling multiple inheritance scenarios.

The algorithm processes the inheritance graph from left to right, adhering to the rules of C3 Linearization. It enforces a strict order in which base contracts are initialized, ensuring a consistent execution flow is maintained.

contract A{}
contract D is A{}
contract C is A{}
contract D is B, C {}

In this case, the C3 Linearization algorithm determines the inheritance order as follows: D, B, C, A. This means that contract A's constructor is called first, followed by constructors of contracts B and C, and finally, contract D's constructor.

Applications of Storage Memory Layouts

Storage memory layouts are crucial in optimizing smart contract development and analysis on the Ethereum network. By understanding how data is stored and organized, developers and analysts can enhance contract efficiency, troubleshoot issues, and analyze contract behavior more effectively.

Understanding the storage layout is vital when upgrading smart contracts to ensure seamless transitions without disrupting existing data. Developers can use their knowledge of storage layouts to design upgradeable contracts, preserving stored data integrity while implementing new features or improvements.

Memory layouts play a significant role in optimizing gas usage and storage costs within the Ethereum ecosystem. By understanding how data is organized and stored in smart contracts, users can implement efficient data structures and techniques that reduce the overall storage footprint. This, in turn, leads to decreased gas costs associated with contract execution and data manipulation. Moreover, a well-organized storage layout helps minimize the risk of unintended contract behavior, potentially resulting in excessive gas consumption or security vulnerabilities.

Last updated