Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods

Adamantios Ntakaris*, Martin Magris, Juho Kanniainen, Moncef Gabbouj, Alexandros Iosifidis

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Managing the prediction of metrics in high-frequency financial markets is a challenging task. An efficient way is by monitoring the dynamics of a limit order book to identify the information edge. This paper describes the first publicly available benchmark dataset of high-frequency limit order markets for mid-price prediction. We extracted normalized data representations of time series data for five stocks from the Nasdaq Nordic stock market for a time period of 10 consecutive days, leading to a dataset of ∼4,000,000 time series samples in total. A day-based anchored cross-validation experimental protocol is also provided that can be used as a benchmark for comparing the performance of state-of-the-art methodologies. Performance of baseline approaches are also provided to facilitate experimental comparisons. We expect that such a large-scale dataset can serve as a testbed for devising novel solutions of expert systems for high-frequency limit order book data analysis.

Original languageEnglish
Pages (from-to)852-866
Number of pages15
JournalJournal of Forecasting
Issue number8
Early online date22 Aug 2018
Publication statusPublished - Dec 2018


  • high-frequency trading
  • limit order book
  • machine learning
  • mid-price
  • ridge regression
  • single hidden feedforward neural network


Dive into the research topics of 'Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods'. Together they form a unique fingerprint.

Cite this