BF skip indexes for Ethereum
This repository includes all data needed to reproduce the experiments presented in [1]. The paper describes the BF skip index, a data structure based on Bloom filters [2] that can be used for answering inter-block queries on blockchains efficiently. The article also includes a historical analysis of logsBloom filters included in the Ethereum block headers, as well as an experimental analysis of the proposed data structure. The latter was conducted using the data set of events generated by the CryptoKitties Core contract, a popular decentralized application launched in 2017 (and also one of the first applications based on NFTs).
In this description, we use the following abbreviations (also adopted throughout the paper) to denote two different sets of Ethereum blocks.
D1: set of all Ethereum blocks between height 0 and 14999999.
D2: set of all Ethereum blocks between height 14000000 and 14999999.
Moreover, in accordance with the terminology adopted in the paper, we define the set of keys of a block as the set of all contract addresses and log topics of the transactions in the block. As defined in [3], log topics comprise event signature digests and the indexed parameters associated with the event occurrence.
Data set description
File structure
We now describe the structure of the files included in this repository.
filters_ones_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 3 columns. Note that it is not necessary to decompress this file, as the provided code is capable of processing it directly in its compressed form. The columns have the following meaning.
blockId: the identifier of the block.
timestamp: timestamp of the block.
numOnes: number of bits set to 1 in the logsBloom filter of the block.
receipt_stats_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 5 columns. As for the previous file, it is not necessary to decompress this file.
blockId: the identifier of the block.
txCount: number of transactions included in the block.
numLogs: number of event logs included in the block.
numKeys: number of keys included in the block.
numUniqueKeys: number of distinct keys in the block (useful as the same key may appear multiple times).
All CSV files related to the CryptoKitties Core events (i.e., Approval.csv, Birth.csv, Pregnant.csv, Transfer.csv) have the same structure. They consist of 1 million rows (one for each block in D2) and 2 columns, namely:
blockId: identifier of the block.
numOcc: number of event occurrences in the block.
events.xz is a compressed binary file describing all unique event occurrences in the blocks of D2. The file contains 1 million data chunks (i.e., one for each Ethereum block). Each chunk includes the following information. Do note that this file only records unique event occurrences in each block, meaning that if an event from a contract is triggered more than once within the same block, there will be only one sequence within the corresponding chunk.
blockId: identifier of the block (4 bytes).
numEvents: number of event occurrences in the block (4 bytes).
A list of numEvent sequences, each made up of 52 bytes. A sequence represents an event occurrence and is indeed the concatenation of two fields, namely:
Address of the contract triggering the event (20 bytes).
Event signature digest (32 bytes).
keys.xz is a compressed binary file describing all unique keys in the blocks of D2. As for the previous file, duplicate keys only appear once. The file contains 1 million data chunks, each representing an Ethereum block and including the following information.
blockId: identifier of the block (4 bytes)
numAddr: number of unique contract addresses (4 bytes).
numTopics: number of unique topics (4 bytes).
A sequence of numAddr addresses, each represented using 20 bytes.
A sequence of numTopics topics, each represented using 32 bytes.
There are no views created for this resource yet.
Additional Information
Field | Value |
---|---|
Data last updated | December 4, 2024 |
Metadata last updated | December 4, 2024 |
Created | December 4, 2024 |
Format | CSV |
License | Creative Commons Attribution |
Has views | False |
Id | c4579ab2-45ad-4b32-b6df-4475fcf1f357 |
Package id | fe3d810e-868a-4ade-933a-d0d49bc31917 |
Position | 0 |
State | active |