A daily emerging stock market dataset (Chinese CSI 300 dataset) including 300 stocks and 5,088 time steps from the CSMAR database. We construct our stock dataset using a pool of stocks from the CSI 300 index for the last 21 years, from 01/02/2000 to 12/31/2020. Instead of all stocks in the market, we select the stocks that used to belong to the major market index CSI 300, and filter out stocks that have missing price data over the period.

For each trading day, we use the fundamental price features as the features of stocks, including open price, close price, and volume. Additionally, we normalize price features such as open price and close price with logarithm.

The dataset randomly splits stocks into five non-overlapping sub-datasets. For each subset, the first 90% of trading days are used as train data, the following 5% as validation data, and the rest 5% as test data.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages