The full dataset report is available at: https://arxiv.org/abs/2403.16861
The DISL dataset features a collection of 514, 506 unique Solidity files that have been deployed to Ethereum mainnet. It caters to the need for a large and diverse dataset of real-world smart contracts. DISL serves as a resource for developing machine learning systems and for benchmarking software engineering tools designed for smart contracts.
from datasets import load_dataset
# Load the raw dataset
dataset = load_dataset("ASSERT-KTH/DISL", "raw")
# OR
# Load the decomposed dataset
dataset = load_dataset("ASSERT-KTH/DISL", "decomposed")
# number of rows and columns
num_rows = len(dataset["train"])
num_columns = len(dataset["train"].column_names)
# random row
import random
random_row = random.choice(dataset["train"])
# random source code
random_sc = random.choice(dataset["train"])['source_code']
print(random_sc)
Paper | Code | Results | Date | Stars |
---|