# Interpretable ML-driven Strategy for Automated Trading Pattern Extraction - Reproducibility Package Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing over time properties, making their analysis a complex task. Improvement of understanding and development of methods for financial time series analysis is essential for successful operation on financial markets. In this study we propose a volume-based data pre-processing method for making financial time series more suitable for machine learning pipelines. We use a statistical approach for assessing the performance of the method. Namely, we formally state the hypotheses, set up associated classification tasks, compute effect sizes with confidence intervals, and run statistical tests to validate the hypotheses. We additionally assess the trading performance of the proposed method on historical data and compare it to a previously published approach. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than a price action-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models on example of CatBoost estimator, as well as formally assess the relatedness of the proposed approach and SHAP feature interactions with a positive outcome. In this code repository you will find sample models, datasets and notebooks to reproduce all experiments from our paper ## Functionality The code repository provides means to reproduce results from our paper and validate the answer to the research questions it discusses. * RQ1 - Classification Performance of VCRB Bars * Performance analysis of Volume Centered Range Bars vs No-information model at binary classification of reversals and crossings * Effect sizes * p-values * RQ2 - Comparison of VCRB vs Price Level Trading * Performance analysis of Volume Centered Range Bars vs Price Levels at binary classification of reversals and crossings * Effect sizes * p-values * RQ3 - Impact of Market Liquidity on VCRB * Performance analysis of Volume Centered Range Bars vs Price Levels on a liquid market at binary classification of reversals and crossings * Effect sizes * p-values * RQ4 - Feature Interaction Associations * Relatedness analysis of feature interactions from SHAP values & Monoforest-based methods vs bootstrapped data * p-values * feature interaction ranks distances ## Dependencies All code was tested using python 3.7.5, use `python -m pip install -r requirements.txt` to install all dependencies. * Data Processing * Pandas * numpy * json * Plotting * matplotlib * seaborn * Model training * Catboost * Feature Analysis * shap (if using python3.7 there is an issue with deps use --no-deps to circumvent) * Statistical evaluation * sklearn * scipy.stats * pingouin * Running code * jupyter (need to relog to update executable path) ## Repository content * Datasets for Price Levels(PL) in `datasets_pl` * Datasets for Volume Centered Range Bars(VCRB) in `datasets_vol` * Statistics for the model performance are available for your convenience in json format in the `fit_stats_pl` folder for PL and `fit_stats_vol` for VCRB * Analysis of the feature interaction performance for both instruments is available in the `interactions` folder ## Running the code All code for visualising the models performances is available in the `RQsSubmission.ipynb` notebook. To run type jupyter notebook in the command line and navigate to `localhost:8888` in your browser. ## Citation If you use any of the resources provided on this page in any of your publications we ask you cite the paper : \ Bibtex ``` ``` # Final remarks Full dataset samples cannot be disclosed as it it infringes CME Copyright, but any of the code provided can then be applied to the data you may have available. If any issues are encountered please contact the authors at artur.sokolovsky@gmail.com or arnaboldiluca314@gmail.com. # License Attribution 4.0 International (CC BY 4.0)