# Tutorial for Portfolio Allocation¶

Presented at NeurIPS 2020: Deep RL Workshop.

The Jupyter notebook codes are available on our Github and Google Colab.

Tip

Check our previous tutorials: Tutorial for Single Stock Trading and Tutorial for Multiple Stock Trading for detailed explanation of the FinRL architecture and modules.

## Overview¶

To begin with, we would like to explain the logic of portfolio allocation using Deep Reinforcement Learning.We use Dow 30 constituents as an example throughout this article, because those are the most popular stocks.

Let’s say that we got a million dollars at the beginning of 2019. We want to invest this \$1,000,000 into stock markets, in this case is Dow Jones 30 constituents.Assume that no margin, no short sale, no treasury bill (use all the money to trade only these 30 stocks). So that the weight of each individual stock is non-negative, and the weights of all the stocks add up to one.

We hire a smart portfolio manager- Mr. Deep Reinforcement Learning. Mr. DRL will give us daily advice includes the portfolio weights or the proportions of money to invest in these 30 stocks. So every day we just need to rebalance the portfolio weights of the stocks.The basic logic is as follows. Portfolio allocation is different from multiple stock trading because we are essentially rebalancing the weights at each time step, and we have to use all available money.

The traditional and the most popular way of doing portfolio allocation is mean-variance or modern portfolio theory (MPT): However, MPT performs not so well in out-of-sample data. MPT is calculated only based on stock returns, if we want to take other relevant factors into account, for example some of the technical indicators like MACD or RSI, MPT may not be able to combine these information together well.

We introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance. FinRL is a DRL library designed specifically for automated stock trading with an effort for educational and demonstrative purpose. ## Problem Definition¶

This problem is to design an automated trading solution for portfolio allocation. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The components of the reinforcement learning environment are:

• Action: portfolio weight of each stock is within [0,1]. We use softmax function to normalize the actions to sum to 1.

• State: {Covariance Matrix, MACD, RSI, CCI, ADX}, **state space shape is (34, 30). 34 is the number of rows, 30 is the number of columns.

• Reward function: r(s, a, s′) = p_t, p_t is the cumulative portfolio value.

• Environment: portfolio allocation for Dow 30 constituents.

Covariance matrix is a good feature because portfolio managers use it to quantify the risk (standard deviation) associated with a particular portfolio.

We also assume no transaction cost, because we are trying to make a simple portfolio allocation case as a starting point.

Install the unstable development version of FinRL:

 ```1 2``` ``` # Install the unstable development version in Jupyter notebook: !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git ```

Import Packages:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28``` ``` # import packages import pandas as pd import numpy as np import matplotlib import matplotlib.pyplot as plt matplotlib.use('Agg') import datetime from finrl.config import config from finrl.marketdata.yahoodownloader import YahooDownloader from finrl.preprocessing.preprocessors import FeatureEngineer from finrl.preprocessing.data import data_split from finrl.env.environment import EnvSetup from finrl.env.EnvMultipleStock_train import StockEnvTrain from finrl.env.EnvMultipleStock_trade import StockEnvTrade from finrl.model.models import DRLAgent from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot, backtest_strat, baseline_strat from finrl.trade.backtest import backtest_strat, baseline_strat import os if not os.path.exists("./" + config.DATA_SAVE_DIR): os.makedirs("./" + config.DATA_SAVE_DIR) if not os.path.exists("./" + config.TRAINED_MODEL_DIR): os.makedirs("./" + config.TRAINED_MODEL_DIR) if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR): os.makedirs("./" + config.TENSORBOARD_LOG_DIR) if not os.path.exists("./" + config.RESULTS_DIR): os.makedirs("./" + config.RESULTS_DIR) ```

```class YahooDownloader:
"""
Provides methods for retrieving daily stock data from Yahoo Finance API

Attributes
----------
start_date : str
start date of the data (modified from config.py)
end_date : str
end date of the data (modified from config.py)
ticker_list : list
a list of stock tickers (modified from config.py)

Methods
-------
fetch_data()
Fetches data from yahoo API
"""
```

 ```1 2 3 4``` ``` # Download and save the data in a pandas DataFrame: df = YahooDownloader(start_date = '2008-01-01', end_date = '2020-12-01', ticker_list = config.DOW_30_TICKER).fetch_data() ```

## Preprocess Data¶

FinRL uses a FeatureEngineer class to preprocess data.

```class FeatureEngineer:
"""
Provides methods for preprocessing the stock price data

Attributes
----------
df: DataFrame
feature_number : int
number of features we used
use_technical_indicator : boolean
we technical indicator or not
use_turbulence : boolean
use turbulence index or not

Methods
-------
preprocess_data()
main method to do the feature engineering
"""
```

Perform Feature Engineering: covariance matrix + technical indicators:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24``` ``` # Perform Feature Engineering: df = FeatureEngineer(df.copy(), use_technical_indicator=True, use_turbulence=False).preprocess_data() # add covariance matrix as states df=df.sort_values(['date','tic'],ignore_index=True) df.index = df.date.factorize() cov_list = [] # look back is one year lookback=252 for i in range(lookback,len(df.index.unique())): data_lookback = df.loc[i-lookback:i,:] price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close') return_lookback = price_lookback.pct_change().dropna() covs = return_lookback.cov().values cov_list.append(covs) df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list}) df = df.merge(df_cov, on='date') df = df.sort_values(['date','tic']).reset_index(drop=True) df.head() ``` ## Build Environment¶

FinRL uses a EnvSetup class to setup environment.

```class EnvSetup:
"""
Provides methods for retrieving daily stock data from
Yahoo Finance API

Attributes
----------
stock_dim: int
number of unique stocks
hmax : int
maximum number of shares to trade
initial_amount: int
start money
transaction_cost_pct : float
reward_scaling: float
scaling factor for reward, good for training
tech_indicator_list: list
a list of technical indicator names (modified from config.py)
Methods
-------
create_env_training()
create env class for training
create_env_validation()
create env class for validation
"""
```

Initialize an environment class:

User-defined Environment: a simulation environment class.The environment for portfolio allocation:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217``` ``` import numpy as np import pandas as pd from gym.utils import seeding import gym from gym import spaces import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt class StockPortfolioEnv(gym.Env): """A single stock trading environment for OpenAI gym Attributes ---------- df: DataFrame input data stock_dim : int number of unique stocks hmax : int maximum number of shares to trade initial_amount : int start money transaction_cost_pct: float transaction cost percentage per trade reward_scaling: float scaling factor for reward, good for training state_space: int the dimension of input features action_space: int equals stock dimension tech_indicator_list: list a list of technical indicator names turbulence_threshold: int a threshold to control risk aversion day: int an increment number to control date Methods ------- _sell_stock() perform sell action based on the sign of the action _buy_stock() perform buy action based on the sign of the action step() at each step the agent will return actions, then we will calculate the reward, and return the next observation. reset() reset the environment render() use render to return other functions save_asset_memory() return account value at each time step save_action_memory() return actions/positions at each time step """ metadata = {'render.modes': ['human']} def __init__(self, df, stock_dim, hmax, initial_amount, transaction_cost_pct, reward_scaling, state_space, action_space, tech_indicator_list, turbulence_threshold, lookback=252, day = 0): #super(StockEnv, self).__init__() #money = 10 , scope = 1 self.day = day self.lookback=lookback self.df = df self.stock_dim = stock_dim self.hmax = hmax self.initial_amount = initial_amount self.transaction_cost_pct =transaction_cost_pct self.reward_scaling = reward_scaling self.state_space = state_space self.action_space = action_space self.tech_indicator_list = tech_indicator_list # action_space normalization and shape is self.stock_dim self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,)) # Shape = (34, 30) # covariance matrix + technical indicators self.observation_space = spaces.Box(low=0, high=np.inf, shape = (self.state_space+len(self.tech_indicator_list), self.state_space)) # load data from a pandas dataframe self.data = self.df.loc[self.day,:] self.covs = self.data['cov_list'].values self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0) self.terminal = False self.turbulence_threshold = turbulence_threshold # initalize state: inital portfolio return + individual stock return + individual weights self.portfolio_value = self.initial_amount # memorize portfolio value each step self.asset_memory = [self.initial_amount] # memorize portfolio return each step self.portfolio_return_memory =  self.actions_memory=[[1/self.stock_dim]*self.stock_dim] self.date_memory=[self.data.date.unique()] def step(self, actions): # print(self.day) self.terminal = self.day >= len(self.df.index.unique())-1 # print(actions) if self.terminal: df = pd.DataFrame(self.portfolio_return_memory) df.columns = ['daily_return'] plt.plot(df.daily_return.cumsum(),'r') plt.savefig('results/cumulative_reward.png') plt.close() plt.plot(self.portfolio_return_memory,'r') plt.savefig('results/rewards.png') plt.close() print("=================================") print("begin_total_asset:{}".format(self.asset_memory)) print("end_total_asset:{}".format(self.portfolio_value)) df_daily_return = pd.DataFrame(self.portfolio_return_memory) df_daily_return.columns = ['daily_return'] if df_daily_return['daily_return'].std() !=0: sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \ df_daily_return['daily_return'].std() print("Sharpe: ",sharpe) print("=================================") return self.state, self.reward, self.terminal,{} else: #print(actions) # actions are the portfolio weight # normalize to sum of 1 norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum() weights = norm_actions #print(weights) self.actions_memory.append(weights) last_day_memory = self.data #load next state self.day += 1 self.data = self.df.loc[self.day,:] self.covs = self.data['cov_list'].values self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0) # calcualte portfolio return # individual stocks' return * weight portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights) # update portfolio value new_portfolio_value = self.portfolio_value*(1+portfolio_return) self.portfolio_value = new_portfolio_value # save into memory self.portfolio_return_memory.append(portfolio_return) self.date_memory.append(self.data.date.unique()) self.asset_memory.append(new_portfolio_value) # the reward is the new portfolio value or end portfolo value self.reward = new_portfolio_value #self.reward = self.reward*self.reward_scaling return self.state, self.reward, self.terminal, {} def reset(self): self.asset_memory = [self.initial_amount] self.day = 0 self.data = self.df.loc[self.day,:] # load states self.covs = self.data['cov_list'].values self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0) self.portfolio_value = self.initial_amount #self.cost = 0 #self.trades = 0 self.terminal = False self.portfolio_return_memory =  self.actions_memory=[[1/self.stock_dim]*self.stock_dim] self.date_memory=[self.data.date.unique()] return self.state def render(self, mode='human'): return self.state def save_asset_memory(self): date_list = self.date_memory portfolio_return = self.portfolio_return_memory #print(len(date_list)) #print(len(asset_list)) df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return}) return df_account_value def save_action_memory(self): # date and close price length must match actions length date_list = self.date_memory df_date = pd.DataFrame(date_list) df_date.columns = ['date'] action_list = self.actions_memory df_actions = pd.DataFrame(action_list) df_actions.columns = self.data.tic.values df_actions.index = df_date.date #df_actions = pd.DataFrame({'date':date_list,'actions':action_list}) return df_actions def _seed(self, seed=None): self.np_random, seed = seeding.np_random(seed) return [seed] ```

## Implement DRL Algorithms¶ FinRL uses a DRLAgent class to implement the algorithms.

```class DRLAgent:
"""
Provides implementations for DRL algorithms

Attributes
----------
env: gym environment class
user-defined class
Methods
-------
train_PPO()
the implementation for PPO algorithm
train_A2C()
the implementation for A2C algorithm
train_DDPG()
the implementation for DDPG algorithm
train_TD3()
the implementation for TD3 algorithm
DRL_prediction()
make a prediction in a test dataset and get results
"""
```

Model Training:

We use A2C for portfolio allocation, because it is stable, cost-effective, faster and works better with large batch sizes.

Trading:Assume that we have \$1,000,000 initial capital at 2019/01/01. We use the A2C model to perform portfolio allocation of the Dow 30 stocks.

 ```1 2 3 4 5 6 7 8 9``` ``` trade = data_split(df,'2019-01-01', '2020-12-01') env_trade, obs_trade = env_setup.create_env_trading(data = trade, env_class = StockPortfolioEnv) df_daily_return, df_actions = DRLAgent.DRL_prediction(model=model_a2c, test_data = trade, test_env = env_trade, test_obs = obs_trade) ``` The output actions or the portfolio weights look like this: ## Backtesting Performance¶

FinRL uses a set of functions to do the backtesting with Quantopian pyfolio.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ``` from pyfolio import timeseries DRL_strat = backtest_strat(df_daily_return) perf_func = timeseries.perf_stats perf_stats_all = perf_func( returns=DRL_strat, factor_returns=DRL_strat, positions=None, transactions=None, turnover_denom="AGB") print("==============DRL Strategy Stats===========") perf_stats_all print("==============Get Index Stats===========") baesline_perf_stats=BaselineStats('^DJI', baseline_start = '2019-01-01', baseline_end = '2020-12-01') # plot dji, dow_strat = baseline_strat('^DJI','2019-01-01','2020-12-01') import pyfolio %matplotlib inline with pyfolio.plotting.plotting_context(font_scale=1.1): pyfolio.create_full_tear_sheet(returns = DRL_strat, benchmark_rets=dow_strat, set_context=False) ```

The left table is the stats for backtesting performance, the right table is the stats for Index (DJIA) performance. Plots:   