- Speaker #0
Hello, welcome back to Papers with Backtest podcast. Today we dive into another Algo trading research paper.
- Speaker #1
Hi there. Yes, this one's interesting. It looks at big data and machine learning in quant finance.
- Speaker #0
Exactly. Specifically focusing on how trading signals get generated and importantly, how they're evaluated using backtesting.
- Speaker #1
Right,
- Speaker #0
the performance So for you listening, we're going to explore these, well, alternative data sources, and the machine learning techniques used to build algorithmic trading rules. And we'll really dig into the backtest results, what they actually tell us.
- Speaker #1
Maybe what to watch out for when you're building your own strategies. Yeah.
- Speaker #0
Definitely. So where should we start? The paper talks about the shift in thinking. Ah,
- Speaker #1
yes. The move away from just looking at individual stocks in isolation.
- Speaker #0
Right. Towards finding these common threads or factors that link different investments together.
- Speaker #1
Exactly. And these factors... They're found using quantitative methods, right? They can be, well, transient here today, gone tomorrow, or they might stick around longer.
- Speaker #0
And they represent either risk or potentially risky returns.
- Speaker #1
Precisely. The paper makes the point that a big goal now for, you know, many investment processes is actually identifying and then improving these factor-based insights.
- Speaker #0
Okay, so it's less about picking one winner and more about understanding these underlying currents.
- Speaker #1
That's a good way to put it.
- Speaker #0
And this is where machine learning comes into the picture, doesn't it? To potentially find patterns that traditional methods might just miss.
- Speaker #1
Yeah, because if you only look at standard fundamental data income statements, balance sheets, you're maybe only getting part of the story.
- Speaker #0
The paper draws a neat parallel, like how neural networks spot patterns in image pixels.
- Speaker #1
Right. And the idea is, what if we apply that to finance, maybe using more granular data?
- Speaker #0
Like higher frequency data or these alternative data sources we hear so much about, things indicating sales or revenue from, say, web traffic or geolocation.
- Speaker #1
Exactly. The hope is that ML can uncover relationships that aren't just simple linear ones you'd get from quarterly reports.
- Speaker #0
But finance isn't static, is it? The markets are always changing.
- Speaker #1
That's a huge point the paper makes. Static models, they tend to become less effective over time.
- Speaker #0
And it mentions many quant models focus more on comparing across assets at one point in time, cross-sectional analysis.
- Speaker #1
Yeah, rather than pure time series forecasting, predicting how one specific thing will move next.
- Speaker #0
Because traditional time series often relies on past prices predicting future ones, or big market shocks.
- Speaker #1
Right, which can be limiting. The real market drivers are probably way more complex, more subtle.
- Speaker #0
So machine learning could potentially understand these nuances better for forecasting.
- Speaker #1
It could. But, and this is important, the paper adds a note of caution. Financial data is notoriously noisy. And information isn't always evenly spread. It can be segmented. So making accurate forecasts is still really, really challenging.
- Speaker #0
Okay, so thinking about trading rules then, the paper contrasts different approaches based on time frame, right? High frequency versus fundamental.
- Speaker #1
It does. It explains that high frequency trading, or HFT, often looks at the really fine details the market microstructure.
- Speaker #0
Like volume. Bid-ask spreads, tiny timing effects. Exactly. Trying to predict very short-term supply and demand imbalances. They're generally less focused on whether the company itself is a good long-term bet.
- Speaker #1
Whereas a fundamental approach is the opposite, analyzing the business, the financials, the prospects to predict returns over a longer horizon.
- Speaker #0
Precisely. And, you know, a lot of the academic work on multi-factor risk premia tries to link those longer-term returns. Back to fundamental, like, undiversifiable business risks.
- Speaker #1
So the type of trading rule you build really depends on whether you're playing the short game or the long game.
- Speaker #0
Absolutely. It dictates the data you need, the models you might use, everything.
- Speaker #1
Which leads us nicely into alternative data. The paper says there's been a huge increase in these data sets becoming commercially available.
- Speaker #0
Yeah, a real explosion. But evaluating them isn't straightforward.
- Speaker #1
Why is that?
- Speaker #0
Well, the information you get from different providers can be inconsistent. Plus, for the big traditional asset managers, just testing a new dataset can be a slow internal process.
- Speaker #1
Makes sense. What kind of alternative data are we talking about here? The paper gives examples. Oh, yes. Some really concrete ones relevant for signal generation. Think about forecasts for US company earnings that are based on what people are searching for online.
- Speaker #0
Interesting.
- Speaker #1
Or geolocation data showing foot traffic to stores, even point of sale data showing actual transactions.
- Speaker #0
Wow.
- Speaker #1
Then there's stuff like company procurement data, maybe that signals debt levels or salary benchmarking data, even tracking things like unexpected executive departures from corporate governance data.
- Speaker #0
It sounds like a treasure trove for finding new trading edges.
- Speaker #1
It could be. But the paper is quite realistic here. It points out that finding actual, consistent alphas, you know, beating the market from this data alone is often pretty difficult.
- Speaker #0
So it's not a magic bullet.
- Speaker #1
Not usually. It seems most often this alternative data gets used as one piece of the puzzle within a broader multi-factor strategy.
- Speaker #0
Okay. And if you are using it, you need to backtest any trading rules you developed from it.
- Speaker #1
Critically important. And the paper highlights some challenges there too. Such as? Well, first, you need enough history in the data. And it needs to cover the assets you care about sufficiently.
- Speaker #0
Right. Otherwise, the backtest isn't meaningful.
- Speaker #1
Exactly. And often these new data sets weren't really designed with backtesting in mind. They might have, like, poorly tagged security identifiers, UCIPS, CDALs.
- Speaker #0
Oh, that sounds like a nightmare for matching data.
- Speaker #1
It can be. Or inconsistent timestamps. All things that make it really hard to... accurately simulate how a strategy would have performed historically?
- Speaker #0
The paper mentions event studies, though, as potentially helpful for certain types.
- Speaker #1
Yes, especially for event-based alternative data, like maybe a series of earnings forecasts or specific news releases.
- Speaker #0
How does that work?
- Speaker #1
You basically look at how the asset's price behaves around the time of the event before and after. It can give you a sense of whether the data provides a predictable reaction and, crucially, over what kind of timeframe that reaction happens. Helps refine the trading rule.
- Speaker #0
Gotcha. Does the paper give concrete examples of rules and backtests using this stuff?
- Speaker #1
It touches on a few interesting ones. There's the Alpha DNA Digital Revenue Signal, or DRS.
- Speaker #0
What's that based on?
- Speaker #1
It tries to predict company revenue using web traffic and social media data. The paper notes it takes quite a bit of work just to link that digital activity back to the right companies to make it usable for trading.
- Speaker #0
I can imagine. Any others?
- Speaker #1
Yeah, there's one using Tmall sales data that's a big e-commerce platform, right? Looking at monthly sales value in units sold.
- Speaker #0
OK. And the trading rule?
- Speaker #1
A simple one. Basically trading based on month over month changes in those sales figures for stocks in China, Hong Kong, Japan, Korea and the U.S.
- Speaker #0
How did that perform or what were the caveats?
- Speaker #1
Well, the paper points out that a simple rule like that can be very noisy. Think about promotions, currency effects, things that distort the underlying trend. So you need to be careful in the backtest interpretation.
- Speaker #0
Right. Noise is always the enemy. What about consumer complaints?
- Speaker #1
Ah, yeah. Data from the CFPB, the Consumer Financial Protection Bureau. They looked at whether that could predict returns or volatility. Some initial signs that maybe fewer complaints meant better performance, but the results weren't super consistent. Maybe due to the limited data history.
- Speaker #0
Potentially.
- Speaker #1
But interestingly, they did find a negative correlation in a regression analysis between how often complaints happened and how volatile the stock's return was.
- Speaker #0
So maybe a signal there, but perhaps not a straightforward alpha generator on its own, needs careful handling.
- Speaker #1
Exactly. And there was a brief mention of a Ravenpack news analytics case study.
- Speaker #0
Using news sentiment for trading.
- Speaker #1
Yeah, for trading developed market FX pairs. The backtest apparently showed the news-based strategy doing better than a simple trend-following one, at least on a risk-adjusted basis.
- Speaker #0
So incorporating unstructured data like news can potentially add value to a trading rule.
- Speaker #1
It seems so, based on that example.
- Speaker #0
OK, let's shift gears slightly to the machine learning techniques themselves applied to these trading rules and their backtests.
- Speaker #1
Right. The paper mentions things like ensemble learning, gradient boosting being used in quant equity strategies.
- Speaker #0
And did they show performance improvements?
- Speaker #1
Well, it references a backtest where a sort of naive, equally weighted portfolio built using boosted trees, which incorporated lots of different features, apparently outperformed simpler, traditional multi-factor portfolios.
- Speaker #0
That's pretty significant. Suggests ML really can find complex patterns that lead to better signals.
- Speaker #1
It does suggest that potential, yeah. And a key benefit of tree-based models, like boosting, is that you can look at variable importance.
- Speaker #0
To see what's actually driving the prediction.
- Speaker #1
Exactly. Which factors or features the model found most useful. That's crucial for understanding why a trading rule might be working.
- Speaker #0
And the paper stresses the importance of the whole ML workflow, right? tuning, training, testing.
- Speaker #1
Absolutely critical. Feature engineering, designing the inputs properly. Label transformation, defining what you're trying to predict clearly. Rigorous testing is key to avoid overfitting and ensure the training rules are actually robust.
- Speaker #0
Makes sense. Finally, the paper also mentions deep learning, specifically LSTMs. Yes,
- Speaker #1
long short-term memory networks are a type of recurrent neural network.
- Speaker #0
Good for sequential data like prices.
- Speaker #1
Exactly. They're designed to capture longer-term dependencies in sequences. which is often a challenge for simpler models. They can help overcome issues like vanishing gradients you sometimes see in deep networks.
- Speaker #0
And how are they applied here?
- Speaker #1
The paper describes an experiment predicting S&P 500 stock returns using just historical returns as input to an LSTM. And then using those predictions to formulate an automated trading policy, basically, rules for when to buy, hold, or sell based on the LSTM's forecast.
- Speaker #0
Did it mention specific back test results like hit rates or returns?
- Speaker #1
The outline mentions hit ratio and average returns for long-only and long-short portfolios, suggesting those details might be deeper in the paper. But the core idea is using these advanced networks to generate the trading signals themselves.
- Speaker #0
Okay, so wrapping up our deep dive here, what are the main takeaways regarding these algorithmic trading rules?
- Speaker #1
Well, it seems clear that alternative data and machine learning definitely open up exciting new avenues for creating trading strategies. There's real potential there.
- Speaker #0
But, there's always a but, isn't there?
- Speaker #1
Yeah, the paper really hammers home the need for extremely rigorous testing. Backtesting is non-negotiable.
- Speaker #0
And you have to understand the limits, right? The limits of the data you're using and the limits of the ML techniques themselves.
- Speaker #1
Absolutely. It's not magic. Finding data sets that are genuinely useful, that contain real exploitable alpha, and then turning that into a profitable, robust trading rule, that's still a massive challenge.
- Speaker #0
So potential is high, but the bar for success is also very high. Requires careful work.
- Speaker #1
Couldn't have said it better myself.
- Speaker #0
Thank you for tuning in to Papers with Backtest podcast. We hope today's episode gave you useful insights. Join us next time as we break down more research. And for more papers and backtests, find us at https.paperswithbacktests.com. Happy trading.