# Tutorial for Multiple Stock Trading¶

Presented at NeurIPS 2020: Deep RL Workshop.

The Jupyter notebook codes are available on our Github and Google Colab.

Tip

## Overview¶

To begin with, I would like explain the logic of multiple stock trading using Deep Reinforcement Learning.

We use Dow 30 constituents as an example throughout this article, because those are the most popular stocks.

A lot of people are terrified by the word “Deep Reinforcement Learning”, actually, you can just treat it as a “Smart AI” or “Smart Stock Trader” or “R2-D2 Trader” if you want, and just use it.

Suppose that we have a well trained DRL agent “DRL Trader”, we want to use it to trade multiple stocks in our portfolio.

• Assume we are at time t, at the end of day at time t, we will know the open-high-low-close price of the Dow 30 constituents stocks. We can use these information to calculate technical indicators such as MACD, RSI, CCI, ADX. In Reinforcement Learning we call these data or features as “states”.

• We know that our portfolio value V(t) = balance (t) + dollar amount of the stocks (t).

• We feed the states into our well trained DRL Trader, the trader will output a list of actions, the action for each stock is a value within [-1, 1], we can treat this value as the trading signal, 1 means a strong buy signal, -1 means a strong sell signal.

• We calculate k = actions *h_max, h_max is a predefined parameter that sets as the maximum amount of shares to trade. So we will have a list of shares to trade.

• The dollar amount of shares = shares to trade* close price (t).

• Update balance and shares. These dollar amount of shares are the money we need to trade at time t. The updated balance = balance (t) −amount of money we pay to buy shares +amount of money we receive to sell shares. The updated shares = shares held (t) −shares to sell +shares to buy.

• So we take actions to trade based on the advice of our DRL Trader at the end of day at time t (time t’s close price equals time t+1’s open price). We hope that we will benefit from these actions by the end of day at time t+1.

• Take a step to time t+1, at the end of day, we will know the close price at t+1, the dollar amount of the stocks (t+1)= sum(updated shares * close price (t+1)). The portfolio value V(t+1)=balance (t+1) + dollar amount of the stocks (t+1).

• So the step reward by taking the actions from DRL Trader at time t to t+1 is r = v(t+1) − v(t). The reward can be positive or negative in the training stage. But of course, we need a positive reward in trading to say that our DRL Trader is effective.

• Repeat this process until termination.

Below are the logic chart of multiple stock trading and a made-up example for demonstration purpose:  Multiple stock trading is different from single stock trading because as the number of stocks increase, the dimension of the data will increase, the state and action space in reinforcement learning will grow exponentially. So stability and reproducibility are very essential here.

We introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies.

FinRL is characterized by its reproducibility, scalability, simplicity, applicability and extendibility. ## Problem Definition¶

This problem is to design an automated trading solution for multiple stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The components of the reinforcement learning environment are:

• Action: {−k, …, −1, 0, 1, …, k}, where k denotes the number of shares. For 30 stocks the entire action space is (2k+1)³⁰, in this article we use k≤h_max=100, so the entire action space is around 10⁶⁰. It means we can sample a maximum of 10⁶⁰ pairs of state and action combinations.

• State: {balance, close price, shares, MACD, RSI, CCI, ADX}, 181-dimensional vector (30 stocks * 6 + 1)

• Reward function: r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively

• Environment: multiple stock trading for Dow 30 constituents.

The data of the Dow 30 constituents stocks that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.

Install the unstable development version of FinRL:

 ```1 2``` ``` # Install the unstable development version in Jupyter notebook: !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git ```
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ``` # import packages import pandas as pd import numpy as np import matplotlib import matplotlib.pyplot as plt matplotlib.use('Agg') import datetime from finrl.config import config from finrl.marketdata.yahoodownloader import YahooDownloader from finrl.preprocessing.preprocessors import FeatureEngineer from finrl.preprocessing.data import data_split from finrl.env.environment import EnvSetup from finrl.env.EnvMultipleStock_train import StockEnvTrain from finrl.env.EnvMultipleStock_trade import StockEnvTrade from finrl.model.models import DRLAgent from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot ```

```class YahooDownloader:
"""
Provides methods for retrieving daily stock data from Yahoo Finance API

Attributes
----------
start_date : str
start date of the data (modified from config.py)
end_date : str
end date of the data (modified from config.py)
ticker_list : list
a list of stock tickers (modified from config.py)

Methods
-------
fetch_data()
Fetches data from yahoo API
"""
```

 ```1 2 3 4 5 6``` ``` # Download and save the data in a pandas DataFrame: df = YahooDownloader(start_date = '2009-01-01', end_date = '2020-09-30', ticker_list = config.DOW_30_TICKER).fetch_data() print(df.sort_values(['date','tic'],ignore_index=True).head(30)) ``` ## Preprocess Data¶

FinRL uses a FeatureEngineer class to preprocess data.

Perform Feature Engineering:

 ```1 2 3 4 5 6``` ``` # Perform Feature Engineering: df = FeatureEngineer(df.copy(), use_technical_indicator=True, tech_indicator_list = config.TECHNICAL_INDICATORS_LIST, use_turbulence=True, user_defined_feature = False).preprocess_data() ``` In real life trading, the model needs to be updated periodically using rolling windows. In this article, we just cut the data once into train and trade set.

## Build Environment¶

FinRL uses a EnvSetup class to setup environment.

```class EnvSetup:
"""
Provides methods for retrieving daily stock data from Yahoo Finance API

Attributes
----------
stock_dim: int
number of unique stocks
hmax : int
maximum number of shares to trade
initial_amount: int
start money
transaction_cost_pct : float
reward_scaling: float
scaling factor for reward, good for training
tech_indicator_list: list
a list of technical indicator names (modified from config.py)
Methods
-------
fetch_data()
Fetches data from yahoo API
"""
```

State Space and Action Space Calculation:

The action space is just the number of unique stocks 30. The state space is 181 in this example.

 ```1 2``` ``` stock_dimension = len(train.tic.unique()) state_space = 1 + 2*stock_dimension + len(config.TECHNICAL_INDICATORS_LIST)*stock_dimension ```

Initialize an environment class:

 ``` 1 2 3 4 5 6 7 8 9 10``` ``` # Initialize env: env_setup = EnvSetup(stock_dim = stock_dimension, state_space = state_space, hmax = 100, initial_amount = 1000000, transaction_cost_pct = 0.001, tech_indicator_list = config.TECHNICAL_INDICATORS_LIST) env_train = env_setup.create_env_training(data = train, env_class = StockEnvTrain) ```

User-defined Environment: a simulation environment class.

The environment for training and trading is different in multiple stock trading case.

Training v.s. Trading: turbulence index is used as a risk aversion signal after the actions generated by the DRL algorithms. Turbulence index should not be included in training, because it is not a part of model training, so only a trading environment should include the risk aversion signal.

FinRL provides blueprint for training and trading environments in multiple stock trading.

## Implement DRL Algorithms¶ FinRL uses a DRLAgent class to implement the algorithms.

```class DRLAgent:
"""
Provides implementations for DRL algorithms

Attributes
----------
env: gym environment class
user-defined class
Methods
-------
train_PPO()
the implementation for PPO algorithm
train_A2C()
the implementation for A2C algorithm
train_DDPG()
the implementation for DDPG algorithm
train_TD3()
the implementation for TD3 algorithm
train_SAC()
the implementation for SAC algorithm
DRL_prediction()
make a prediction in a test dataset and get results
"""
```

Model Training:

 ``` 1 2 3 4 5 6 7 8 9 10 11``` ``` print("==============Model Training===========") now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M') sac_params_tuning={ 'batch_size': 128, 'buffer_size': 100000, 'ent_coef':'auto_0.1', 'learning_rate': 0.0001, 'learning_starts':200, 'timesteps': 50000, 'verbose': 0} model_sac = agent.train_SAC(model_name = "SAC_{}".format(now), model_params = sac_params_tuning) ```

We use Soft Actor-Critic (SAC) for multiple stock trading, because it is one of the most recent state-of-art algorithms. SAC is featured by its stability.

Assume that we have \$1,000,000 initial capital at 2019/01/01. We use the SAC model to trade the Dow 30 stocks.

 ```1 2 3 4 5 6 7 8 9``` ``` # create trading env env_trade, obs_trade = env_setup.create_env_trading(data = trade, env_class = StockEnvTrade, turbulence_threshold=250) ## make a prediction and get the account value change df_account_value = DRLAgent.DRL_prediction(model=model_sac, test_data = trade, test_env = env_trade, test_obs = obs_trade) ``` ## Backtesting Performance¶

FinRL uses a set of functions to do the backtesting with Quantopian pyfolio.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24``` ``` # BackTestStats # pass in df_account_value, this information is stored in env class print("==============Get Backtest Results===========") perf_stats_all = BackTestStats(account_value = df_account_value) perf_stats_all = pd.DataFrame(perf_stats_all) perf_stats_all.to_csv("./"+config.RESULTS_DIR+"/perf_stats_all_"+now+'.csv') print("==============Get Baseline Stats===========") baesline_perf_stats=BaselineStats('^DJI', baseline_start = '2019-01-01', baseline_end = '2020-09-30') # BackTestPlot # pass the account value memory into the backtest functions # and select a baseline ticker print("==============Compare to DJIA===========") %matplotlib inline # S&P 500: ^GSPC # Dow Jones Index: ^DJI # NASDAQ 100: ^NDX BackTestPlot(df_account_value, baseline_ticker = '^DJI', baseline_start = '2019-01-01', baseline_end = '2020-09-30') ```

The left table is the stats for backtesting performance, the right table is the stats for Index (DJIA) performance. Plots:   