General information
This repository includes all data needed to reproduce the experiments presented in [1]. The paper describes the BF skip index, a data structure based on Bloom filters [2] that can be used for answering inter-block queries on blockchains efficiently. The article also includes a historical analysis of logsBloom filters included in the Ethereum block headers, as well as an experimental analysis of the proposed data structure. The latter was conducted using the data set of events generated by the CryptoKitties Core contract, a popular decentralized application launched in 2017 (and also one of the first applications based on NFTs).
In this description, we use the following abbreviations (also adopted throughout the paper) to denote two different sets of Ethereum blocks.
D1: set of all Ethereum blocks between height 0 and 14999999.
D2: set of all Ethereum blocks between height 14000000 and 14999999.
Moreover, in accordance with the terminology adopted in the paper, we define the set of keys of a block as the set of all contract addresses and log topics of the transactions in the block. As defined in [3], log topics comprise event signature digests and the indexed parameters associated with the event occurrence.
Data set description
File
Description
filters_ones_0-14999999.csv.xz
Compressed CSV file containing the number of ones for each logsBloom filter in D1.
receipt_stats_0-14999999.csv.xz
Compressed CSV file containing statistics about all transaction receipts in D1.
Approval.csv
CSV file containing the Approval event occurrences for the CryptoKitties Core contract in D2.
Birth.csv
CSV file containing the Birth event occurrences for the CryptoKitties Core contract in D2.
Pregnant.csv
CSV file containing the Pregnant event occurrences for the CryptoKitties Core contract in D2.
Transfer.csv
CSV file containing the Transfer event occurrences for the CryptoKitties Core contract in D2.
events.xz
Compressed binary file containing information about all contract events in D2.
keys.xz
Compressed binary file containing information about all keys in D2.
File structure
We now describe the structure of the files included in this repository.
filters_ones_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 3 columns. Note that it is not necessary to decompress this file, as the provided code is capable of processing it directly in its compressed form. The columns have the following meaning.
blockId: the identifier of the block.
timestamp: timestamp of the block.
numOnes: number of bits set to 1 in the logsBloom filter of the block.
receipt_stats_0-14999999.csv.xz is a compressed CSV file with 15 million rows (one for each block in D1) and 5 columns. As for the previous file, it is not necessary to decompress this file.
blockId: the identifier of the block.
txCount: number of transactions included in the block.
numLogs: number of event logs included in the block.
numKeys: number of keys included in the block.
numUniqueKeys: number of distinct keys in the block (useful as the same key may appear multiple times).
All CSV files related to the CryptoKitties Core events (i.e., Approval.csv, Birth.csv, Pregnant.csv, Transfer.csv) have the same structure. They consist of 1 million rows (one for each block in D2) and 2 columns, namely:
blockId: identifier of the block.
numOcc: number of event occurrences in the block.
events.xz is a compressed binary file describing all unique event occurrences in the blocks of D2. The file contains 1 million data chunks (i.e., one for each Ethereum block). Each chunk includes the following information. Do note that this file only records unique event occurrences in each block, meaning that if an event from a contract is triggered more than once within the same block, there will be only one sequence within the corresponding chunk.
blockId: identifier of the block (4 bytes).
numEvents: number of event occurrences in the block (4 bytes).
A list of numEvent sequences, each made up of 52 bytes. A sequence represents an event occurrence and is indeed the concatenation of two fields, namely:
Address of the contract triggering the event (20 bytes).
Event signature digest (32 bytes).
keys.xz is a compressed binary file describing all unique keys in the blocks of D2. As for the previous file, duplicate keys only appear once. The file contains 1 million data chunks, each representing an Ethereum block and including the following information.
blockId: identifier of the block (4 bytes)
numAddr: number of unique contract addresses (4 bytes).
numTopics: number of unique topics (4 bytes).
A sequence of numAddr addresses, each represented using 20 bytes.
A sequence of numTopics topics, each represented using 32 bytes.
Notes
For space reasons, some of the files in this repository have been compressed using the XZ compression utility. Unless otherwise specified, these files need to be decompressed before they can be read. Please make sure you have an application installed on your system that is capable of decompressing such files.
References
Loporchio, Matteo et al. "Skip index: supporting efficient inter-block queries and query authentication on the blockchain". (2023).
Bloom, Burton H. "Space/time trade-offs in hash coding with allowable errors." Communications of the ACM 13.7 (1970): 422-426.
Wood, Gavin. "Ethereum: A secure decentralised generalised transaction ledger." Ethereum project yellow paper 151.2014 (2014): 1-32.