Backtesting Stock Trading Strategies In Python 🐍

tl;dr If you are not ready to commit endless hours in building an algo trading bot then you might as well just buy and hold 😐


Preface

We've all wanted to create that one trading bot that can consistently make us big money and beat the market. I believe it's possible but one would have to invest an insane amount of time learning, researching, and building something like this. Nonetheless, it's always fun to play around with money so here we go.

What We'll Be Covering

  1. Getting free historical stock data from Alphavantage
  2. Working with a backtesting framework in python, Backtesting.py
  3. Using technical indicators, pandas-ta
  4. Coding up a couple of Systematic Trading Strategies

Setting Up The env

You don't need to have this, but I used Conda to create a separate environment for this project.

You can go to the next section if you don't want to work with Conda.

Once you've installed it, you can create a new Conda environment. I'm naming the new environment algo and using python 3.7:

conda create --name algo python=3.7

After the environment is setup we can go inside it like so:

conda activate algo

If you are using Vscode, you'll want to install the extension Python, as well as change your python environment to the new Conda env:

Changing python environment in Visual Studio Code (vscode)

Getting Historical Stock Data

We'll be getting our historical stock data from Alphavantage. Their free tier allows you 5 API requests per minute and a max of 500 requests a day. Plenty for what we need.

First, let's install a couple of libraries that we'll be needing for this.

pip install alpha_vantage pandas python-dotenv
  1. alpha_vantage, a wrapper around the Alphavantage REST API
  2. pandas, a popular library use for messing around with data
  3. python-dotenv, for loading in our secret API keys

For this mini-project, I'll be working in a directory called algo. Inside there we'll have two files .env for our environment variables and algo.py for our code. We'll also want a data/ folder, we'll be storing our historical stock data files here.

Once you've got your API key you can put it in your .env file:

AV_API_KEY=REPLACE_THIS_WITH_YOUR_API_KEY

To get the stock data, all we really need to do is this:

from alpha_vantage.timeseries import TimeSeries

ts = TimeSeries(key=av_api_key, output_format='csv')
df, _ = ts.get_daily_adjusted(symbol, outputsize='full')

However... we want to be nice developers and not hit the Alphavantage server all the time. So let's save the data locally and we can retrieve it locally when needed.

from os import path, getenv
from dotenv import load_dotenv
import pandas as pd
from alpha_vantage.timeseries import TimeSeries


load_dotenv()

av_api_key = getenv('AV_API_KEY')

def download_daily_data(symbol: str, filepath: str):
    ts = TimeSeries(key=av_api_key, output_format='pandas')
    rawDf, _ = ts.get_daily_adjusted(symbol, outputsize='full') # pylint: disable=unbalanced-tuple-unpacking

    # Doing this to let python know it's a df 🤷‍♂️
    df = pd.DataFrame(rawDf)

    # Rename the columns before we save
    df.columns = ['open', 'high', 'low', 'close', 'adjusted_close', 'volume', 'dividend_amount', 'split_coefficient']
    df.to_csv(filepath)


def get_daily_data(symbol) -> pd.DataFrame:
    data_file_path = f'./data/{symbol}.csv'

    if not path.exists(data_file_path):
        download_daily_data(data_file_path, symbol)

    df = pd.read_csv(
        data_file_path,
        index_col='date',
        usecols=['date', 'open', 'high', 'low', 'close', 'adjusted_close', 'volume'],
        parse_dates=True,
    )

    # Rename them so Backtesting.py is happy
    df.columns = ['Open', 'High', 'Low', 'Close', 'adjusted_close', 'Volume']

    # We reverse the data frame so it goes from oldest to newest
    return df[::-1]


if __name__ == '__main__':
    symbol = 'spy'
    get_daily_data(symbol)

Inside the function get_daily_data, I've started preparing the data in the way Backtesting.py likes. Backtesting.py expects the following columns: Open, High, Low, Close, and Volume.

Note: We can also add additional columns we may want to reference in our strategies. Will come in handly in strategy 2.

Now we're all good right? That's what I thought at first, but there is more prep work we're going to need to do.

Formating Historical Data For Backtesting

You would be good if you are trading Forex or Crypto but in my examples, I'll be trading stocks.

In the stock trading world, there's a concept of stock splits. This framework however doesn't take that into account, so we'll need to adjust all the OHLC.

Lucky for us, Alphavantage provides us with an adjusted_close value which we can use to scale the OHLC data.

Let's create the new helper function and update our main block to call that instead:

def get_daily_data_adjusted(symbol: str) -> pd.DataFrame:
    df = get_daily_data(symbol)

    adjusted_df = pd.DataFrame(index=df.index)

    # Adjust the OHLC
    df['ratio'] = df.adjusted_close / df.Close
    adjusted_df['Open'] = df.Open * df.ratio
    adjusted_df['High'] = df.High * df.ratio
    adjusted_df['Low'] = df.Low * df.ratio
    adjusted_df['Close'] = df.adjusted_close
    adjusted_df['Volume'] = df.Volume
    
    return adjusted_df


if __name__ == '__main__':
    symbol = 'spy'
    get_daily_data_adjusted(symbol)

Writing Our First Strategy

In the README.md file, they have a Simple Moving Average Crossover strategy. We'll be coding up another one for you to reference that's equally as simple.

RSI Strategy

Our first strategy will be using the indicator, Relative Strength Index (RSI). This is a very common indicator. It's a momentum indicator that's used to measure if a particular asset is overbought or oversold. The link I've shared goes more in-depth on how to calculate it but we'll be using the pandas-ta library to help crunch these numbers.

The RSI is a value and people typically use below 30 as an indication that the asset is oversold and above 70 as overbought. So our strategy will simply trade based on this.

Implementing The RSI Strategy

First let's install the backtesting framework along with pandas_ta:

pip install backtesting pandas_ta

Next, import these libraries at the top of our file:

from backtesting import Backtest, Strategy
from pandas_ta import rsi

To create our strategy, we'll have our strategy inherit from Backtesting's Strategy class:

class BasicRsiStrategy(Strategy):
    def init(self):
        self.rsi = self.I(rsi, self.data.df.Close, length=14)

    def next(self):
        today = self.rsi[-1]
        yesterday = self.rsi[-2]

        # Crosses below 30 (oversold, time to buy)
        if yesterday > 30 and today < 30 and not self.position.is_long:
            self.buy()

        # Crosses above 70 (overbought, time to sell)
        elif yesterday < 70 and today > 70 and self.position.size > 0:
            self.position.close()

The init function is where we can precompute our technical indicator data. Inside there we've assigned self.rsi with the RSI indicator. We need to precompile the data via the self.I function. This function will help us magically pipe the data values as needed in the next function.

The next function is ran at each tick (OHLC bar). Since we'll be using daily data, it simulates one trading day at a time.

We also have additional checks, to make sure that we only have one long position (we are not shorting). And when it's time to sell the shares we've bought, we close our long position (selling all the shares we own).

Now let's call our strategy by updating our main block:

if __name__ == '__main__':
    symbol = 'spy'
    strategy = BasicRsiStrategy

    data = get_daily_data_adjusted(symbol)

    bt = Backtest(
        data=data,
        strategy=strategy,
        cash=1000,
        commission=.002,
        exclusive_orders=False,
        trade_on_close=False,
    )

    stats = bt.run()
    print(stats)

    # Creates a nice HTML file with graphs
    bt.plot(
        smooth_equity=True,
        superimpose=True,
    )

This framework uses a % of the trade as commission. This might work for you depending on your broker or the type of asset your trading. Unfortunately for me, my broker charges a flat rate of $10 per trade and this backtesting framework can't simulate that 🤷‍♂️

We set trade_on_close to False to have the backtest perform the trade on the next day's open. For some strategies, you might want to perform on the current day's close instead, in that case just change the value to True.

RSI Strategy Results

Let's take a look at the stats that was printed:

Start                     1999-11-01 00:00:00
End                       2021-04-16 00:00:00
Duration                   7837 days 00:00:00
Exposure Time [%]                   37.618077
Equity Final [$]                  1675.520564
Equity Peak [$]                   1675.520564
Return [%]                          67.552056 <-
Buy & Hold Return [%]              358.976317
Return (Ann.) [%]                    2.438276
Volatility (Ann.) [%]               16.252559
Sharpe Ratio                         0.150024
Sortino Ratio                        0.220649
Calmar Ratio                         0.044493
Max. Drawdown [%]                  -54.801191
Avg. Drawdown [%]                   -3.833282
Max. Drawdown Duration     2928 days 00:00:00
Avg. Drawdown Duration      129 days 00:00:00
# Trades                                   18
Win Rate [%]                        83.333333 <-
Best Trade [%]                      13.096123 <-
Worst Trade [%]                    -30.365652 <-
Avg. Trade [%]                       3.295865
Max. Trade Duration         725 days 00:00:00
Avg. Trade Duration         163 days 00:00:00
Profit Factor                        2.617049
Expectancy [%]                       3.859875
SQN                                  1.480562

From what we can tell, RSI isn't a strategy we should use by itself. The returns over the course of 21 years is 67.55%. I mean, it's a good win rate (83%) but we have big losing trades and small winning trades. If we just held SPY we would have gotten a much better, 358%, return.

And the auto generated pretty graph:

RSI trading strategy backtesting results for SP500

The Second Strategy

MA Channel

I came across this strategy from watching David's video at Critical Trading on Youtube. He goes over the traditional Golden Cross and the one we are interested in, the Moving Average (MA) Channel.

His strategy is more complex, allowing for 10 positions, using a scanner for Annual Return of Capital (ROC) ranking and multiple market checks to make sure we are not trading when the market is in a heavy downtrend.

The backtesting framework we are using doesn't allow us to easily do all of this. We are going to simplify it with no scanner for Annual ROC. The framework also only allows for trading 1 asset so we'll only be holding 1 position at a time.

Implementing The Strategy

For our market downtrend indicator we're going to use the S&P 100 data. So we will only buy when the market is in an uptrend (sp100 > sma(sp100)).

We'll need to add the S&P 100's adjusted close into our data frame inorder to have it accessible in our strategy.

Let's create a new function to help us with that:

def add_symbol_adjusted_close(symbol: str, full_df: pd.DataFrame) -> pd.DataFrame:
    new_df = get_daily_data(symbol)

    close_df = pd.DataFrame(index=new_df.index)
    close_df[symbol] = new_df.adjusted_close

    return full_df.merge(close_df, left_index=True, right_index=True)

We can use this to pass in our existing data frame along with the new symbol we want to add as a column.

Let's update our main block:

if __name__ == '__main__':
    symbol = 'spy'
    strategy = MaHighLowChannel # Our new strategy

    data = get_daily_data_adjusted(symbol)
    data = add_symbol_adjusted_close('oef', data)

    # You can add more symbols to the data frame as needed
    # Example:
    # add_symbol_adjusted_close('aapl', data)
    # add_symbol_adjusted_close('tsla', data)

    bt = Backtest(
        data=data,
        strategy=strategy,
        cash=1000,
        commission=.002,
        exclusive_orders=False,
        trade_on_close=False,
    )

We are adding OEF because that is an asset that tracks the S&P 100 index. You can add additional stocks as your strategy calls for.

Now for the strategy:

class MaHighLowChannel(Strategy):

    def init(self):
        self.sp100Ma = self.I(sma, self.data.df.oef, length=200, plot=True)

        # Moving Average High (mah) and Moving Average Low (mal)
        self.mah = self.I(sma, self.data.df.High, length=10, plot=True)
        self.mal = self.I(sma, self.data.df.Low, length=10, plot=True)


    def should_buy(self):

        # If the SP100 close is below the SMA, downtrend
        if self.data.df.oef[-1] <= self.sp100Ma[-1]:
            return False

        for i in range(1, 6):
            if self.data.Low[-i] <= self.mah[-i]:
                return False
        return True

    
    def should_sell(self):
        for i in range(1, 6):
            if self.data.High[-i] >= self.mal[-i]:
                return False
        return True


    def next(self):
        if not self.position.is_long and self.should_buy():
            self.buy()
        elif self.position.is_long and self.should_sell():
            self.position.close()

This strategy is also pretty straightforward. I just wanted to show you an example of how to add more data into your strategy via the data frame.

MA Channel Strategy Results

Start                     2000-10-27 00:00:00
End                       2021-04-16 00:00:00
Duration                   7476 days 00:00:00
Exposure Time [%]                   67.210567
Equity Final [$]                  2995.257031
Equity Peak [$]                   2995.257031
Return [%]                         199.525703 <-
Buy & Hold Return [%]              342.323678
Return (Ann.) [%]                    5.516882
Volatility (Ann.) [%]               10.779165
Sharpe Ratio                          0.51181
Sortino Ratio                        0.741254
Calmar Ratio                         0.274589
Max. Drawdown [%]                   -20.09138
Avg. Drawdown [%]                   -1.580463
Max. Drawdown Duration     1377 days 00:00:00
Avg. Drawdown Duration       42 days 00:00:00
# Trades                                   24
Win Rate [%]                        70.833333 <-
Best Trade [%]                      32.135879 <-
Worst Trade [%]                      -10.8418 <-
Avg. Trade [%]                       4.941294
Max. Trade Duration         690 days 00:00:00
Avg. Trade Duration         208 days 00:00:00
Profit Factor                        3.513146
Expectancy [%]                         5.4784
SQN                                  2.499001

The MA Channel is overall better than the RSI Strategy. The win rate is a bit lower but we hold onto our winning trades longer. But this still isn't better than buying and holding.

Next Steps

There are still a lot more things I want to try for Systematic Trading:

  • Using the Annual ROC ranking David suggested
  • Take multiple smaller positions
  • Trade multiple assets
  • Utilize different strategies depending on market conditions (ex: downtrend vs uptrend)
  • Test out more backtesting frameworks
  • Just coding up a 💩 ton more strategies

Conclusion

If what they say is true, stonks only go up, then it'll be quite tough to beat the buy and hold strategy. Assuming you pick the right stock though 🌚