Western Models Lose 80% Capital In One Week

Can AI trade crypto? Jay Azhang, a computer engineer and finance bro from New York, is putting this question to the test with Alpha Arena. The project pits the greatest large language models (LLM) against each other, each with 10 thousand dollars worth of capital, to see which can make more money trading crypto. The models include Grok 4, Claude Sonnet 4.5, Gemini 2.5 pro, ChatGPT 5, Deepseek v3.1, and Qwen3 Max.
Now, you might be thinking “wow, that’s a great idea!” and you would be surprised, at the time of writing, three out of the five AIs are underwater, with Qwen3 and Deepseek — the two Chinese open source models — leading the charge.

That’s right, the western world’s most powerful, closed source, proprietary artificial intelligences run by giants like Google and OpenAI, have lost over $8,000 dollars, 80% of their crypto trading capital in little over a week, while their eastern open source counterparts are in the green.
The most successful trade so far? Qwen3 — moisturised and in its lane — with a simple 20x bitcoin long position. Grok 4 — to no one’s surprise — has been long Doge with 10x leverage for most of the competition… having at one point been at the top of the charts along with Deepseek, now close to 20% underwater. Maybe Elon Musk should tweet a doge meme or something to, you know, get Grok out of the dog house.

Meanwhile, Google’s Gemini is relentlessly bearish, being short on all the crypto assets available to trade, a position that echoes their general crypto policy over the past 15 years.
Last but not least is ChatGibitty, which has made every bad trade possible for a week straight, a remarkable achievement! It takes skill to be that bad, especially when Qwen3 just longed bitcoin and went fishing. If this is the best closed-source AI has to offer, then maybe OpenAI should just keep it closed source and spare us.
A new benchmark for AI
All joking aside, the idea of pitting off AI models against each other in a crypto trading arena has some very profound insights. For starters, AI can not be pre-trained on answers to knowledge tests with crypto trading since it is so unpredictable, an issue that other benchmarks suffer from. To put it another way, many AI models are being given the answers to some of these tests in their training, and so of course they perform well when tested. But some research has demonstrated that slight changes to some of these tests lead to radically different AI benchmark results.
This controversy begs the question: What is the ultimate test of intelligence? Well, according to Elon Musk, Iron Man enthusiast and creator of Grok 4, predicting the future is the ultimate measure of intelligence.
And let’s face it, there’s no future more uncertain than the short-term price of crypto. In the words of Azhang, “Our goal with Alpha Arena is to make benchmarks more like the real world, and markets are perfect for this. They’re dynamic, adversarial, open-ended, and endlessly unpredictable. They challenge AI in ways that static benchmarks cannot. — Markets are the ultimate test of intelligence.”
This insight about markets is deeply embedded in the libertarian principles from which Bitcoin was born. Economists like Murray Rothbard and Milton Friedman made the case over a hundred years ago that markets were fundamentally unpredictable by central planners, that only individuals making real economic decisions with something to lose could make rational economic calculations.
In other words, the market is the most difficult thing to predict as it depends on the individual perspectives and decisions of intelligent individuals throughout the world, and thus, it is the best test of intelligence.
Azhang mentions in its project description that the AIs are instructed to trade not just for gains, but for risk-adjusted returns. This risk dimension is critical, as one bad trade can wipe out all previous returns, as seen, for example, in the downfall of Grok 4’s portfolio.
There’s another question that remains, which is whether these models are learning from their experience trading crypto, a matter that is not technically easy to achieve, given that AI models are very expensive to pre-train in the first place. They could be fine-tuned with their own trading history or other people’s history, and they might even keep recent trades in their short-term memory or context window, but that can only take them so far. Ultimately, the right AI trading model might have to really learn from its own experiences, a technology that was recently announced among academic circles but has a long way to go before it becomes a product. MIT calls them self-adaptating AI models.
How do we know it is not just luck?
Another analysis of the project and its results so far is that it may be indistinguishable from a ‘random walk’. A random walk is akin to throwing dice for every decision. What would that look like on a chart? Well, there’s actually a simulator you can use to answer that question; it would not look too different, actually.

This question of luck in markets has also been described quite carefully by intellectuals like Nassim Taleb in his book Antifragile. In it, he argues that from the perspective of statistics, it is perfectly normal and possible for one trader, say Qwen3 in this case, to be lucky for a whole week straight! Leading to the appearance of superior reasoning. Taleb goes a lot further than that, arguing that there are enough traders on Wall Street that one of them could easily be lucky for 20 years in a row, developing a god-like reputation, with everyone around them assuming this trader is just a genius, until, of course, luck runs out.
Thus, for Alpha Arena to produce valuable data, it will actually have to run for a long time, and its patterns and results will need to be replicated independently as well, with real capital at stake, before they can be identified as different than a random walk.
Ultimately, it’s great to see the open-source, cost-efficient models like DeepSeek outperform their closed-source counterparts so far. Alpha Arena has so far been a great source of entertainment, as it has gone viral on X.com over the past week. Where it goes is anyone’s guess; we will have to see if the gamble its creator took, giving $50,000 to five chatbots to gamble on crypto with, pays off in the end.
Source link