The team of experts at Nof1.ai has launched a pilot project to test how some of the most advanced artificial intelligence models perform in real financial markets. It is important to note that this project is not just a simulation; the AI models conducted actual trades on the Hyperliquid cryptocurrency exchange, each using $10,000 in real funds.
Six well-known models participated in the trial: GPT-5, DeepSeek V3.1, Grok 4, Gemini 2.5 Pro, Claude 4.5 Sonnet, and Qwen3-Max. Each model received the same initial instructions, but they independently determined their own strategies and operated fully autonomously.
The results from the first few days of the trial surprised many, as some of the most popular and “intelligent” models performed poorly. For instance, GPT-5 lost approximately 67% of its deposit, reducing its balance to $3,300, while Gemini 2.5 Pro suffered losses of over 54%. These results indicate that while these models are strong in complex language and logic tasks, they struggle to manage market volatility and associated risks.
Only two models managed to generate a profit during the trial. DeepSeek V3.1 led the field with gains of over 5%, followed by Qwen3-Max with more than 2% profit. This suggests that a model like DeepSeek, which is less internationally recognized than GPT, may be better at managing trading algorithms and market timing.
Although the project’s creators noted that Grok and DeepSeek understood market structures better, the trial overall demonstrates that AI still has much to learn and improve before it can autonomously trade in financial markets reliably. The first season of the trial will conclude on November 3, and the final results will provide more comprehensive insights into AI’s real capabilities in this field.