Guides
Hands on! Some examples
In this guide we will show some examples and go through some useful tips for data analyses on TF data. We will assume you have basic understanding of SQL and of the Ethereum ecosystem.
We will have examples covering
Token Balances
Let’s start with a simple question, given a token and a wallet address, we want to query for the history of the token balances for the wallet.
Normally, you would do this by aggregating events data emitted by the token contract. While you can still do that with Token Flow’s data, using storage diffs make it trivial.
Let’s go over the query. We’re getting diffs for the Ethereum USDT contract. If we look at the contract, we can see that there is a balances
storage variable. This variable stores the balances for every address.
Specifically, it’s a mapping address->int
. We subset for storage diffs modifying this variable, then we specify that we want only those modifying the value associated to the wallet address we’re interested in. We do this by subsetting on the hashmap_keys
variable, which is an array that contains all the mapping keys associated to the specified storage variable.
Since we know the mapping only takes one argument, we access the 0-th element of the array using the syntax hashmap_keys[0]
, and we can subset on a random address to access its balance.
Lastly we make sure to exclude reverted
actions as they’re not useful for this analysis.
We now have the relevant storage diffs. Each storage diff is specified by a storage location, a prev_value
, the value before the diff, and a curr_value
, the value after the diff.
Let’s say we want to get end-of-day balances, therefore we need to get the latest value for each day for the curr_value
variable. Since there can be multiple diffs done at the same timestamp, i.e. in the same block, we need to use another way to order them.
A storage diff is uniquely identified by its block_number
and its order_index
, the order in which it was applied in the block.
With the help of the last_value
window function we are able to retrieve this value.
Every value is stored agnostically as a string, so in this case we make sure to convert it to an integer type. Since balances can take very high values, we use the biggest integer type available in Snowflake, which is NUMBER(38,0)
to avoid losing precision.
After adjusting for token decimals, we will have the results readily available!
Hourly Uniswap v3 prices
Now that storage diffs are clearer to you, we can move to a slightly harder example.
Let’s say we are interested in hourly prices of WBTC on Uniswap v3 in USDC units.
The idea is we can look at the pool contract for the pair WBTC-USDC and compute its price.
Normally, without diffs, you would look at the events emitted after every trade, and compute the prices from the exchanged volumes.
But the Uniswap v3 Pool contract already contains a storage slot for the price. More precisely, the slot0
storage variable is a struct - a named array - and the element slot0.qrtPriceX96
contains the price in scaled form. You can refer to this article by Uniswap for the derivation of the actual price.
Let’s see how we can obtain this value with a simple query on the storage_diffs
table:
In the first CTE, we extract the events emitted by the Uniswap Factory contract when a pool is created, with the address of the pool and of the involved tokens. Pay attention to how we access the named fields of the parameters
array: the syntax to access fields of named arrays is column[name]
.
In the second CTE, we look for a pool for the pair WBTC-USDC, we take the address of the pool, and we keep track if the tokens are defined as token0
or token1
in the pool. In this case, the procedure will get the inverse of the price.
In the third CTE, we query the storage diffs table for the pool contract we obtained and we compute the median value of the prices for every hour.
Pay attention to the syntax to obtain elements of a struct
: after subsetting for the slot0
we use the structs
variable to access the needed element.
In the final statement, we simply perform the derivation of the price, and adjust for decimals.
Once again, storage diffs helped obtain the result in a very intuitive way, and without the need to perform pre-aggregations on event data.
Total fees earned for a Uniswap v3 Position
We can also look at something that is simply not possible using events data.
When we put tokens into a Uniswap position and act as liquidity providers, we receive fees in return, if we own the liquidity pool NFT. These fees can either be collected by the NFT owner or remain uncollected in the contract.
While collected fees are retrievable from emitted events, there is no easy way to obtain the unclaimed fees without calling a function of the contract.
But there is a storage variable that keeps track of exactly that!
The positions
storage variable is a mapping bytes->struct
, the struct in turn contains another struct: Info
, which contains the integers tokensOwed0
and tokensOwed1
representing the amounts owed each of the tokens in the pool. In this case we have a nested struct, but you will see how accessing nested items looks easy and familiar.
We selected a random market maker with an open position at the time of writing, we access the relevant item by subsetting on the hashmap_keys
variable using the position key, which is just a hash of the owner address, the lower and the upper tick, as in Solidity keccak256(abi.encodePacked(owner, tickLower, tickUpper))
. This can be easily computed beforehand and we leave it outside of the scope of this article. Then we access the nested item in the struct as "Info.tokensOwed0"
and "Info.tokensOwed1"
using the structs
variable.
Examining Starknet Fee Market
Since the launch of STARK token, users have been able to pay gas fees using ETH (denominated as WEI), or STARK (denominated as FRI). We can have a look at how the fee market on Starknet is behaving by analyzing the daily fees paid in ETH and STARK, by converting them to a common unit, for example USDC.
We use DEX prices to convert ETH and STARK to USDC. In this case we use Ekubo, one of the leading DEXs on Starknet.
First, we look at swap events emitted by the Ekubo Factory contract. We examine pairs for STARK (0x04718f5a0fc34cc1af16a1cdee98ffb20c31f5cd61d6ab07201858f4287c938d
) and WETH (0x049d36570d4e46f48e99674bd3fcc84644ddd6b96f7c741b1562b82f9e004dc7
) with USDC (0x053c91253bc9682c04929ca02ed00b3e423f6710d2ee7e0d5ebb06f3ecf368a8
).
After adjusting for decimals (USDC has 6 decimals, STARK and WETH have 18) we can use the traded amount of USDC for unit of STARK and ETH as our price. We take the daily (group by date(block_timestamp)
) median price for our analysis.
In addition, we recommend casting all big integers to NUMBER(38,0)
, the biggest integer type available in our backend, to avoid overflow issue. The casting operator is ::
.
We then look at the transactions
table, which stores all the information about the paid fees.
Specifically the actual_fee
and the actual_fee_unit
columns contain everything we need for each transaction.
The actual_fee_unit
specifies the currency in which the fees were paid (WEI or FRI).
The actual_fee
specifies the total amount of fees paid for the transaction, for both priority and execution.
The result is a daily overview of the Starknet fee market.
Throughout the query we also subset on the last 3 months of data using the condition
where block_timestamp >= dateadd(month, -3, current_date())
and we exclude the current day to avoid incomplete analysis.
and block_timestamp < current_date()
You can use the block_timestamp
and block_date
columns to apply any such filter to your analyses.
Conclusion
We have shown you some examples to help you unleash the power of blockchain data through the help of storage diffs, an innovative way to look at data. Now it’s your turn! Do you have any use case you’ve struggled to solve with existing tools? Do you want to know how storage diffs can help you tackle it? Hop on our Discord and let us know.
Useful External Documentation
Known Limitations
The maximum integer representable within Token Flow Studio's backend is 99999999999999999999999999999999999999
(i.e. 10³⁹ - 1 ). A lot of values in the Ethereum ecosystem are stored as int256
(max value: 2²⁵⁶) so they could potentially overflow this limit.
Our backend's limit should be well enough for most use cases, but if you encounter issues, you should consider switching to one of our paid plans to get direct DB access, which will allow you to use our data outside of the DB and in your own application, circumventing this limit.
Last updated