Guides

Hands on! Some examples

In this guide we will show some examples and go through some useful tips for data analyses on TF data. We will assume you have basic understanding of SQL and of the Ethereum ecosystem.

We will have examples covering

Token Balances

Let’s start with a simple question, given a token and a wallet address, we want to query for the history of the token balances for the wallet.

Normally, you would do this by aggregating events data emitted by the token contract. While you can still do that with Token Flow’s data, using storage diffs make it trivial.

select distinct date(block_timestamp) as _date,
last_value(curr_value) over (partition by date(block_timestamp) order by block_number, order_index) 
	:: NUMBER(38,0) / pow(10, 6) as balance
from ethereum.core.storage_diffs
where contract_address='0xdac17f958d2ee523a2206206994597c13d831ec7' -- USDT Optimism address
and variable='balances'
and hashmap_keys[0]='0xad3b67bca8935cb510c8d18bd45f0b94f54a968f' -- wallet address
and not reverted
order by _date desc

Let’s go over the query. We’re getting diffs for the Ethereum USDT contract. If we look at the contract, we can see that there is a balances storage variable. This variable stores the balances for every address.

Specifically, it’s a mapping address->int. We subset for storage diffs modifying this variable, then we specify that we want only those modifying the value associated to the wallet address we’re interested in. We do this by subsetting on the hashmap_keys variable, which is an array that contains all the mapping keys associated to the specified storage variable.

Since we know the mapping only takes one argument, we access the 0-th element of the array using the syntax hashmap_keys[0], and we can subset on a random address to access its balance.

Lastly we make sure to exclude reverted actions as they’re not useful for this analysis.

We now have the relevant storage diffs. Each storage diff is specified by a storage location, a prev_value, the value before the diff, and a curr_value, the value after the diff.

Let’s say we want to get end-of-day balances, therefore we need to get the latest value for each day for the curr_value variable. Since there can be multiple diffs done at the same timestamp, i.e. in the same block, we need to use another way to order them.

A storage diff is uniquely identified by its block_number and its order_index, the order in which it was applied in the block.

With the help of the last_value window function we are able to retrieve this value.

Every value is stored agnostically as a string, so in this case we make sure to convert it to an integer type. Since balances can take very high values, we use the biggest integer type available in Snowflake, which is NUMBER(38,0) to avoid losing precision.

After adjusting for token decimals, we will have the results readily available!

Hourly Uniswap v3 prices

Now that storage diffs are clearer to you, we can move to a slightly harder example.

Let’s say we are interested in hourly prices of WBTC on Uniswap v3 in USDC units.

The idea is we can look at the pool contract for the pair WBTC-USDC and compute its price.

Normally, without diffs, you would look at the events emitted after every trade, and compute the prices from the exchanged volumes.

But the Uniswap v3 Pool contract already contains a storage slot for the price. More precisely, the slot0 storage variable is a struct - a named array - and the element slot0.qrtPriceX96 contains the price in scaled form. You can refer to this article by Uniswap for the derivation of the actual price.

Let’s see how we can obtain this value with a simple query on the storage_diffs table:

with univ3_pool_events as (
    select parameters::$pool as pool,
    parameters['token0'] as token0,
    parameters['token1'] as token1
    from ethereum.core.events
    where contract_address = '0x1f98431c8ad98523631ae4a59f267346ea31f984' /* UniV3 Factory */
    and event_name='PoolCreated'
    and not reverted
),

usdc_wbtc_pool as (
    select *, 
    /* we use this variable to keep track if USDC is token0 (flipped=TRUE) or token1 (flipped=FALSE) */
    if(token0='0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48', TRUE, FALSE) as flipped 
    from univ3_pool_events
    where (token0='0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48' /* USDC */ and token1='0x2260fac5e5542a773aa44fbcfedf7c193bc2c599' /* WBTC */)
    or (token0='0x2260fac5e5542a773aa44fbcfedf7c193bc2c599' and token1='0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48')
    limit 1 /* we only really need one pair, in case there is more they're irrelevant */
),

sqrtPricex96_hourly as (
    select distinct 
    date_trunc('hour', block_timestamp) as hour,
    median(curr_value :: NUMBER(38,0)) over (partition by hour)  as median_sqrtPrice,
    (select flipped from usdc_wbtc_pool) as flipped
    from ethereum.core.storage_diffs
    and contract_address=(select pool from usdc_wbtc_pool)
    /* Price is stored in the slot0 variable, which is a struct. We're interested in the sqrtPriceX96 field */
    and variable='slot0' 
    and structs[0] = 'Slot0.sqrtPriceX96'
    and not reverted
)

/* refer to <https://blog.uniswap.org/uniswap-v3-math-primer> for price calculation from SqrtPriceX96 */
select hour,
/* we derive the price and adjust for decimals, WBTC has 8 decimals, USDC 6 */
pow(pow(median_sqrtPrice/pow(2,96), 2) / pow(10, if(flipped, 8-6, 6-8)), if(flipped, -1, 1)) as price
from sqrtPricex96_hourly
order by hour desc

In the first CTE, we extract the events emitted by the Uniswap Factory contract when a pool is created, with the address of the pool and of the involved tokens. Pay attention to how we access the named fields of the parameters array: the syntax to access fields of named arrays is column[name].

In the second CTE, we look for a pool for the pair WBTC-USDC, we take the address of the pool, and we keep track if the tokens are defined as token0 or token1 in the pool. In this case, the procedure will get the inverse of the price.

In the third CTE, we query the storage diffs table for the pool contract we obtained and we compute the median value of the prices for every hour.

Pay attention to the syntax to obtain elements of a struct: after subsetting for the slot0 we use the structs variable to access the needed element.

In the final statement, we simply perform the derivation of the price, and adjust for decimals.

Once again, storage diffs helped obtain the result in a very intuitive way, and without the need to perform pre-aggregations on event data.

Total fees earned for a Uniswap v3 Position

We can also look at something that is simply not possible using events data.

When we put tokens into a Uniswap position and act as liquidity providers, we receive fees in return, if we own the liquidity pool NFT. These fees can either be collected by the NFT owner or remain uncollected in the contract.

While collected fees are retrievable from emitted events, there is no easy way to obtain the unclaimed fees without calling a function of the contract.

But there is a storage variable that keeps track of exactly that!

The positionsstorage variable is a mapping bytes->struct, the struct in turn contains another struct: Info, which contains the integers tokensOwed0 and tokensOwed1 representing the amounts owed each of the tokens in the pool. In this case we have a nested struct, but you will see how accessing nested items looks easy and familiar.

with uncollected_fees_token0 as (
    select distinct '0xce14d6cf27212027e369ef2714391dfc0da11fbcf94ee8a44df3f9bf571867d2' as position_key, -- position key
    curr_value as token0_owed
    from ethereum.core.storage_diffs
    where contract_address='0xcbcdf9626bc03e24f779434178a73a0b4bad62ed' -- WBTC-ETH pool
    and variable='positions' -- positions is a mapping bytes->struct
    and hashmap_keys[0]='0x448acc3c7c979f4a7e84a22cd216bc602670e4a0568fb7f67f65a069526ef30'
    and structs[0]='Info.tokensOwed0' -- getting the relevant struct item
    and not reverted
    order by block_number desc, order_index desc
    limit 1
    ),

uncollected_fees_token1 as (
    select distinct '0xce14d6cf27212027e369ef2714391dfc0da11fbcf94ee8a44df3f9bf571867d2' as position_key, -- position key
    curr_value as token1_owed
    from ethereum.core.storage_diffs
    where contract_address='0xcbcdf9626bc03e24f779434178a73a0b4bad62ed'
    and variable='positions' -- positions is a mapping address->struct
    and hashmap_keys[0]='0x448acc3c7c979f4a7e84a22cd216bc602670e4a0568fb7f67f65a069526ef30'
    and structs[0]='Info.tokensOwed1' -- getting the relevant struct item
    and not reverted
    order by block_number desc, order_index desc 
    limit 1
)

select * from uncollected_fees_token0 join uncollected_fees_token1 using(position_key)

We selected a random market maker with an open position at the time of writing, we access the relevant item by subsetting on the hashmap_keys variable using the position key, which is just a hash of the owner address, the lower and the upper tick, as in Solidity keccak256(abi.encodePacked(owner, tickLower, tickUpper)). This can be easily computed beforehand and we leave it outside of the scope of this article. Then we access the nested item in the struct as "Info.tokensOwed0" and "Info.tokensOwed1" using the structs variable.

Examining Starknet Fee Market

Since the launch of STARK token, users have been able to pay gas fees using ETH (denominated as WEI), or STARK (denominated as FRI). We can have a look at how the fee market on Starknet is behaving by analyzing the daily fees paid in ETH and STARK, by converting them to a common unit, for example USDC.

with daily_ekubo_stark_price as (
    select
    date(block_timestamp) as block_date,
    median(
        (parameters['delta']['amount1']['mag'] :: NUMBER(38,0) / pow(10,6)) /
        (parameters['delta']['amount0']['mag'] :: NUMBER(38,0) / pow(10,18))
    ) as median_stark_price
    from starknet.starknet.receipt_events
    where contract_address = '0x00000005dd3d2f4429af886cd1a3b08289dbcea99a294197e9eb43b0e0325b4b' -- Ekubo Core
    and parameters['pool_key']['token0'] = '0x04718f5a0fc34cc1af16a1cdee98ffb20c31f5cd61d6ab07201858f4287c938d' -- STRK Token address
    and parameters['pool_key']['token1'] = '0x053c91253bc9682c04929ca02ed00b3e423f6710d2ee7e0d5ebb06f3ecf368a8' -- USDC Token address
    and event_name = 'Swapped' -- Listen to swap event
    and not reverted
    and block_timestamp >= dateadd(month, -3, current_date())
    and block_timestamp < current_date()
    group by block_date
),

daily_ekubo_eth_price as (
    select
    date(block_timestamp) as block_date,
    median(
        (parameters['delta']['amount1']['mag'] :: NUMBER(38,0) / pow(10,6)) /
        (parameters['delta']['amount0']['mag'] :: NUMBER(38,0) / pow(10,18))
    ) as median_eth_price
    from starknet.starknet.receipt_events
    where contract_address = '0x00000005dd3d2f4429af886cd1a3b08289dbcea99a294197e9eb43b0e0325b4b' -- Ekubo Core
    and parameters['pool_key']['token0'] = '0x049d36570d4e46f48e99674bd3fcc84644ddd6b96f7c741b1562b82f9e004dc7' -- ETH Token
    and parameters['pool_key']['token1'] = '0x053c91253bc9682c04929ca02ed00b3e423f6710d2ee7e0d5ebb06f3ecf368a8' -- USDC Token address
    and event_name = 'Swapped' -- Listen to swap event
    and not reverted
    and block_timestamp >= dateadd(month, -3, current_date())
    and block_timestamp < current_date()
    group by block_date
)

select date(t.block_timestamp) as Day, 
sum(iff(actual_fee_unit='WEI' or actual_fee is NULL, actual_fee, 0))/1e18 * median_eth_price as "Gas Paid in ETH ($)",
sum(iff(actual_fee_unit='FRI', actual_fee, 0))/1e18 * median_stark_price as "Gas Paid in STARK ($)"
from starknet.starknet.transactions t 
left join
daily_ekubo_stark_price s on date(t.block_timestamp)=s.block_date 
left JOIN
daily_ekubo_eth_price e on date(t.block_timestamp)=e.block_date 
where block_timestamp >= dateadd(month, -3, current_date())
and block_timestamp < current_date()
group by Day
order by Day

We use DEX prices to convert ETH and STARK to USDC. In this case we use Ekubo, one of the leading DEXs on Starknet.

First, we look at swap events emitted by the Ekubo Factory contract. We examine pairs for STARK (0x04718f5a0fc34cc1af16a1cdee98ffb20c31f5cd61d6ab07201858f4287c938d) and WETH (0x049d36570d4e46f48e99674bd3fcc84644ddd6b96f7c741b1562b82f9e004dc7) with USDC (0x053c91253bc9682c04929ca02ed00b3e423f6710d2ee7e0d5ebb06f3ecf368a8).

After adjusting for decimals (USDC has 6 decimals, STARK and WETH have 18) we can use the traded amount of USDC for unit of STARK and ETH as our price. We take the daily (group by date(block_timestamp)) median price for our analysis.

In addition, we recommend casting all big integers to NUMBER(38,0), the biggest integer type available in our backend, to avoid overflow issue. The casting operator is ::.

We then look at the transactions table, which stores all the information about the paid fees.

Specifically the actual_fee and the actual_fee_unit columns contain everything we need for each transaction.

The actual_fee_unit specifies the currency in which the fees were paid (WEI or FRI).

The actual_fee specifies the total amount of fees paid for the transaction, for both priority and execution.

The result is a daily overview of the Starknet fee market.

Throughout the query we also subset on the last 3 months of data using the condition

where block_timestamp >= dateadd(month, -3, current_date())

and we exclude the current day to avoid incomplete analysis.

and block_timestamp < current_date()

You can use the block_timestamp and block_date columns to apply any such filter to your analyses.

Conclusion

We have shown you some examples to help you unleash the power of blockchain data through the help of storage diffs, an innovative way to look at data. Now it’s your turn! Do you have any use case you’ve struggled to solve with existing tools? Do you want to know how storage diffs can help you tackle it? Hop on our Discord and let us know.

Useful External Documentation

Known Limitations

The maximum integer representable within Token Flow Studio's backend is 99999999999999999999999999999999999999 (i.e. 10³⁹ - 1 ). A lot of values in the Ethereum ecosystem are stored as int256 (max value: 2²⁵⁶) so they could potentially overflow this limit. Our backend's limit should be well enough for most use cases, but if you encounter issues, you should consider switching to one of our paid plans to get direct DB access, which will allow you to use our data outside of the DB and in your own application, circumventing this limit.

Last updated