Guides

Hands on! Some examples

In this guide we will show some examples and go through some useful tips for data analyses on TF data. We will assume you have basic understanding of SQL and of the Ethereum ecosystem.

We will focus our examples on the *_diffs datasets, as they differ from standard indexed data found elsewhere. The rest of the tables contain standard data and we hope they will look familiar to you.

We will have examples covering

Token Balances

Let’s start with a simple question, given a token and a wallet address, we want to query for the history of the token balances for the wallet.

Normally, you would do this by aggregating events data emitted by the token contract. While you can still do that with Token Flow’s data, using storage diffs make it trivial.

select distinct date(block_timestamp) as _date,
last_value(curr_value) over (partition by date(block_timestamp) order by block_number, order_index ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
	:> DECIMAL(65,0) / pow(10, 6) as balance
from MCDW.STORAGE_DIFFS
where chain_id=10 
and contract_address='0x94b008aa00579c1307b0ef2c499ad98a8ce58e58' -- USDT Optimism address
and variable='_balances'
and hashmap_keys::$`0`='0xd3d3a295be556cf8cef2a7ff4cda23d22c4627e8' -- wallet address
and not reverted
order by _date desc

Let’s go over the query. We’re getting diffs for the Optimism USDT contract (chain_id=10). If we look at the contract, we can see that there is a _balances storage variable. This variable stores the balances for every address.

Specifically, it’s a mapping address->int. We subset for storage diffs modifying this variable, then we specify that we want only those modifying the value associated to the wallet address we’re interested in. We do this by subsetting on the hashmap_keys variable, which is an array that contains all the mapping keys associated to the specified storage variable.

Since we know the mapping only takes one argument, we access the 0-th element of the array using the syntax hashmap_keys::$0`` (notice the backticks around the 0). JSON elements are stored as quoted strings, and the $ symbol before the backtick unquotes the string.

Lastly we make sure to exclude reverted actions as they’re not useful for this analysis.

We now have the relevant storage diffs. Each storage diff is specified by a storage location, a prev_value, the value before the diff, and a curr_value , the value after the diff.

Let’s say we want to get end-of-day balances, therefore we need to get the latest value for each day for the curr_value variable. Since there can be multiple diffs done at the same timestamp, i.e. in the same block, we need to use another way to order them.

A storage diff is uniquely identified by its block_number and its order_index, the order in which it was applied in the block.

With the help of the last_value window function we are able to retrieve this value.

When using window functions on Singlestore, you have to pay attention to specify the window fame (in this case ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING), otherwise results might not be what you expect due to how the data is sharded.

Every value is stored agnostically as a string, so in this case we make sure to convert it to an integer type. Since balances can take very high values, we use the biggest integer type available in Singlestore, which is DECIMAL(65,0).

After adjusting for token decimals, we will have the results readily available!

Hourly Uniswap v3 prices

Now that storage diffs are clearer to you, we can move to a slightly harder example.

Let’s say we are interested in hourly prices of WBTC on Uniswap v3 in USDC units.

The idea is we can look at the pool contract for the pair WBTC-USDC and compute its price.

Normally, without diffs, you would look at the events emitted after every trade, and compute the prices from the exchanged volumes.

But the Uniswap v3 Pool contract already contains a storage slot for the price. More precisely, the slot0 storage variable is a struct - a named array - and the element slot0.qrtPriceX96 contains the price in scaled form. You can refer to this article by Uniswap for the derivation of the actual price.

Let’s see how we can obtain this value with a simple query on the storage_diffs table:

with univ3_pool_events as (
    select parameters::$pool as pool,
    parameters::$token0 as token0,
    parameters::$token1 as token1
    from MCDW.events
		where chain_id=10
    and contract_address = lower('0x1F98431c8aD98523631AE4a59f267346ea31F984') /* UniV3 Factory */
    and event_name='PoolCreated'
    and not reverted
),

usdc_wbtc_pool as (
    select *, 
    /* we use this variable to keep track if USDC is token0 (flipped=TRUE) or token1 (flipped=FALSE) */
    if(token0='0x0b2c639c533813f4aa9d7837caf62653d097ff85', TRUE, FALSE) as flipped 
    from univ3_pool_events
    where (token0='0x0b2c639c533813f4aa9d7837caf62653d097ff85' /* USDC */ and token1='0x68f180fcce6836688e9084f035309e29bf0a2095' /* WBTC */)
    or (token0='0x68f180fcce6836688e9084f035309e29bf0a2095' and token1='0x0b2c639c533813f4aa9d7837caf62653d097ff85')
    limit 1 /* we only really need one pair, in case there is more they're irrelevant */
),

sqrtPricex96_hourly as (
    select distinct 
    date_trunc('hour', block_timestamp) as hour,
    median(curr_value :> DECIMAL(65,0)) over (partition by date_trunc('hour', block_timestamp))  as median_sqrtPrice,
    (select flipped from usdc_wbtc_pool) as flipped
    from MCDW.storage_diffs
		where chain_id=10
    and contract_address=(select pool from usdc_wbtc_pool)
    /* Price is stored in the slot0 variable, which is a struct. We're interested in the sqrtPriceX96 field */
    and variable='slot0' 
    and structs::$`0` = 'Slot0.sqrtPriceX96'
    and not reverted
)

/* refer to <https://blog.uniswap.org/uniswap-v3-math-primer> for price calculation from SqrtPriceX96 */
select hour,
/* we derive the price and adjust for decimals, WBTC has 8 decimals, USDC 6 */
pow(pow(median_sqrtPrice/pow(2,96), 2) / pow(10, if(flipped, 8-6, 6-8)), if(flipped, -1, 1)) as price
from sqrtPricex96_hourly
order by hour desc

In the first CTE, we extract the events emitted by the Uniswap Factory contract when a pool is created, with the address of the pool and of the involved tokens. Pay attention to how we access the named fields of the parameters array: the syntax is column:$name

In the second CTE, we look for a pool for the pair WBTC-USDC, we take the address of the pool, and we keep track if the tokens are defined as token0 or token1 in the pool. In this case, the procedure will get the inverse of the price.

In the third CTE, we query the storage diffs table for the pool contract we obtained and we compute the median value of the prices for every hour.

Pay attention to the syntax to obtain elements of a struct: after subsetting for the slot0 we use the structs variable (an array of quoted strings), to access the needed element.

In the final statement, we simply perform the derivation of the price, and adjust for decimals.

Once again, storage diffs helped obtain the result in a very intuitive way, and without the need to perform pre-aggregations on event data.

Total fees earned for a Uniswap v3 Position

We can also look at something that is simply not possible using events data.

When we put tokens into a Uniswap position and act as liquidity providers, we receive fees in return, if we own the liquidity pool NFT. These fees can either be collected by the NFT owner or remain uncollected in the contract.

While collected fees are retrievable from emitted events, there is no easy way to obtain the unclaimed fees without calling a function of the contract.

But there is a storage variable that keeps track of exactly that!

The positionsstorage variable is a mapping bytes->struct, the struct in turn contains another struct: Info, which contains the integers tokensOwed0 and tokensOwed1 representing the amounts owed each of the tokens in the pool. In this case we have a nested struct, but you will see how accessing nested items looks easy and familiar.

with uncollected_fees_token0 as (
    select distinct '0xce14d6cf27212027e369ef2714391dfc0da11fbcf94ee8a44df3f9bf571867d2' as position_key, -- position key
    curr_value as token0_owed
    from MCDW.storage_diffs
		where chain_id=10
    and contract_address='0x68f5c0a2de713a54991e01858fd27a3832401849' -- WBTC-ETH pool
    and variable='positions' -- positions is a mapping address->struct
    and hashmap_keys::$`0`='0x448acc3c7c979f4a7e84a22cd216bc602670e4a0568fb7f67f65a069526ef30'
    and structs::$`0`='Info.tokensOwed0' -- getting the relevant struct item
    and not reverted
    order by block_number desc, order_index desc
    limit 1
    ),

uncollected_fees_token1 as (
    select distinct '0xce14d6cf27212027e369ef2714391dfc0da11fbcf94ee8a44df3f9bf571867d2' as position_key, -- position key
    curr_value as token1_owed
    from MCDW.storage_diffs
		where chain_id=10
    and contract_address='0x68f5c0a2de713a54991e01858fd27a3832401849'
    and variable='positions' -- positions is a mapping address->struct
    and hashmap_keys::$`0`='0x448acc3c7c979f4a7e84a22cd216bc602670e4a0568fb7f67f65a069526ef30'
    and structs::$`0`='Info.tokensOwed1' -- getting the relevant struct item
    and not reverted
    order by block_number desc, order_index desc 
    limit 1
)

select * from uncollected_fees_token0 join uncollected_fees_token1 using(position_key)

We selected a random market maker with an open position at the time of writing, we access the relevant item by subsetting on the hashmap_keys variable using the position key, which is just a hash of the owner address, the lower and the upper tick, as in Solidity keccak256(abi.encodePacked(owner, tickLower, tickUpper)) .This can be easily computed beforehand and we leave it outside of the scope of this article. Then we access the nested item in the struct as "Info.tokensOwed0" and "Info.tokensOwed1" using the structs variable.

STARK Token Holding Rate

Lastly, we can look at an airdrop analysis. STARK Token has just been released on Starknet and we can look at how many users in percentage are holding their tokens fully, or have partially or totally transferred.

with claims as (
    select recipient, sum(token_amount) as claimed from MCDW.starknet_provisions_claims
    where recipient is not null
    group by recipient
),

stark_transfers as (
    select parameters::$`from` as recipient,
    sum(parameters::`value`::$`low` :> DECIMAL(65,0) / pow(10,18)) as transferred
    from MCDW.events
    where contract_address='0x04718f5a0fc34cc1af16a1cdee98ffb20c31f5cd61d6ab07201858f4287c938d'
    and event_name='Transfer'
    group by recipient
),

claims_and_transfers as (
    select recipient, 
    claimed, 
    coalesce(transferred, 0) as _transferred,
    (case 
        when _transferred = 0 then 'holding fully'
        when _transferred < claimed then 'transferred partially'
        when _transferred = claimed then 'fully transferred'
        when _transferred > claimed then 'transferred more than claims'
    end) as category
    from claims left join stark_transfers 
    using(recipient)
)

SELECT category,
       (COUNT(*) * 100.0 / (SELECT COUNT(*) FROM claims_and_transfers)) AS percentage
FROM claims_and_transfers
GROUP BY category;

In this case, we only make use of event data and of a view: starknet_provision_claims that we abstracted to make it easy to analyze claimed tokens for the STARK token provisions.

The view contains all the information about claimed entries, but also unclaimed ones.

Pay special attention on how we access nested elements (eg. parameters::value::$low`` ). In this case, we access the element value of the parameters named array, ie. the output of the Transfer event. value is a named array itself, and we’re interested in the low element, which is the number of tokens transferred. Since value ,low , and from are reserved keywords in the database, we have to surround them by backticks “”. We also make use of the $` operator before the last nested element to make sure it is unquoted.

Conclusion

We have shown you some examples to help you unleash the power of blockchain data through the help of storage diffs, an innovative way to look at data. Now it’s your turn! Do you have any use case you’ve struggled to solve with existing tools? Do you want to know how storage diffs can help you tackle it? Hop on our Discord and let us know.

Last updated