Wall Street robots are having a hard time decoding Reddit's world of 'memes and typos'
Reddit forum WallStreetBets is hard for humans to follow at the best of times. But spare a thought for the machines.
Rocket emojis for stock gains. “Tendies” as slang for profits. GIFs with company tickers.
Reddit forum WallStreetBets (WSB) is hard for humans to follow at the best of times. But spare a thought for the machines.
After the retail stock frenzy last month caused unprecedented havoc, hedge funds trawling the platform with algorithms have a renewed sense of purpose in their mission to figure out the next market craze.
Yet it’s proving a massive pain. It’s not easy training computers to extract the amateur chatter on message boards into data that’s anywhere fit for trading in the real world.
Even for the very basic task of identifying securities, an algo has to learn how to match millennial-speak, memes and typos with the intended subject. And that’s only the start of it.
Just ask the people behind the Reddit robots like Stefan Nann.
“You cannot just apply the standard English library of words,” said the chief executive officer of Stockpulse, a social-media analytics firm in Germany. “We were reading through these comments and deciding ourselves if this comment is positive or negative -- that’s how we train the machine.”
Sentiment analysis on WSB is the latest thing in the world of alternative data, which is projected to grow from last year’s $1.64 billion to $17.35 billion in vendor revenue by 2027. NN Investment Partners and PanAgora Asset Management are among systematic investors who scrape social media for trading signals, while more brokerages are offering clients tools to do just that.
For good reason in theory. A strategy of following the herd would have yielded big profits, in retrospect.
According to Stockpulse, an indicator measuring GameStop’s buzz on Reddit first peaked in early December, a solid month before its price started climbing. A strategy of simply buying the five companies most discussed on WSB in the previous week could have returned 61% in 2020, a backtest by data provider Quiver Quantitative showed.
This is roughly how it works. A Reddit user is waxing lyrical on why BlackBerry Ltd. is worth more than four times its stock price. An algo trained by a vendor like MarketPsych then records the ticker and sweeps the post for trading sentiment signals.
It tries to figure out the intensity of bullish calls from cues like “market leader,” emojis, the use of future tense, even expletives. Then the process gets repeated across swaths of securities.
“Some are calling us because they’re trying to take advantage of the herd,” said Richard Peterson, a board-certified psychiatrist who founded MarketPsych. “Some are just trying to find ways to protect themselves.”
But for the form of artificial intelligence known as natural-language processing, message boards aren’t as straightforward as, say, figuring out signals from a corporate executive on an earnings call.
If a new slang emerges, the dictionary for the machines needs to be updated by the humans. Doing this worldwide is even harder. It took some time for MarketPsych’s team to figure out what different emojis mean in different cultures, or that British traders talking down a stock might use more subtle insults.
In one example cited by StockPulse, the computer has to learn the difference between “hold” as a verb versus an exchange-traded fund with a ticker containing the same characters.
These efforts are in big demand.
“At some point it starts to resemble a single hedge fund because they all behave at the same time in the same way,” said Francesco Filia, CEO of hedge fund Fasanara Capital, which started monitoring Reddit internally last month. “You need to be on top of it.”
Thinknum and Social Market Analytics are among the growing number of sentiment data providers who’ve recently rolled out products tallying up the stock chatter on Reddit in one form or the other. Bloomberg LP, the parent of Bloomberg News, provides access to alternative data sources on the terminal and via the Bloomberg Data License.
With individuals now accounting for about a quarter of US equity trading volume, knowing where the retail cash heads next should in theory prove lucrative. Such insights would help the long-short crowd figure where the next market bombs land. The retail army in January famously inflicted a record industry squeeze as they crowded into some of the most hated names to hurt short sellers, or “shorties” in Reddit vernacular.
At NN Investment Partners, data scientist Melissa Lin says its signal tracking sentiment in news and social media has been performing better than simply chasing price momentum.
But particular forums like Reddit don’t have a sufficiently long history of stock-picking for the data crunching that quants like Lin do. And anecdotal evidence aside, retail investors don’t have a consistently strong track record, she says.
“It’s not a simple linear relationship between sentiment and stock returns,” Lin said. “You need a lot more research to filter out the noise to actually make it useful.”
Emmanuel Hauptmann says his quant team at RAM Active Investments has recently started monitoring stock babble on Twitter -- which it sees as highly correlated with Reddit -- but his expectations are low.
“For the moment we see it as a purely tail risk reduction,” the fund manager said. “We don’t use it as an alpha signal.”
Still, even with all the data in the world to track the ebbs and flows of retail sentiment, market timing is everything.
“Once people are all-out in your face -- saying ‘I love this, death to the shorts!’ -- that’s too late,” said Peterson at MarketPsych.