Are you a client?
Sign in to view the full news archive.
Elon Musk's artificial intelligence company xAI released Grok 4 late on Wednesday evening, its latest large language model (LLM), alongside a "heavy" version that employs multiple AI agents working in parallel. The company demonstrated the system during a live presentation, though as ever we have to take any performance metrics with a grain of salt. Alongside the two model releases xAI also announced a new $300-per-month AI subscription plan, the most expensive of any LLM supplier to date.
Despite all the hype in the release presentation about the capabilities of Musk’s new AI model, just days before xAI faced widespread criticism when its Grok chatbot posted antisemitic content and praised Adolf Hitler on X. The AI referred to itself as "MechaHitler" and made inflammatory comments about Jewish people. Musk commented on X, saying "Grok was too compliant to user prompts, too eager to please and be manipulated, essentially. That is being addressed."
According to xAI, Grok 4 has increased training compute by 100 times compared to Grok 2, though the company provided limited technical details about the underlying architecture. The model was tested on academic benchmarks including the "Humanities Last Exam,” a challenging test measuring AI’s ability to answer thousands of crowdsourced questions on subjects like math, humanities, and natural science. According to xAI, Grok 4 scored 25.4% on Humanity’s Last Exam without “tools,” outperforming Google’s Gemini 2.5 Pro, which scored 21.6%, and OpenAI’s o3 (high), which scored 21%.
During the demonstration, Grok 4 appeared to handle various tasks including mathematical problems, sports prediction, and basic physics simulations. However, xAI representatives acknowledged current limitations, particularly in visual understanding capabilities, with one describing the model as "effectively just looking at the world squinting through glass." Musk also said that with respect to academic questions, Grok is better than PhD level in every subject, with no exceptions. Though he admitted at times it may lack common sense, and it has not yet invented new technologies or discovered new physics, but that is just a matter of time according to Musk. The company also announced plans for specialised coding models and enhanced multimodal capabilities, with training for video generation models expected to begin within weeks using what they describe as over 100,000 advanced GPUs.
The competitive landscape for LLM’s continues to evolve rapidly, with OpenAI’s GPT-5 expected to also be released any day now. From an enterprise perspective however, leveraging these LLM’s is about more than just compute processing and intelligence levels. Successful AI projects must identify the right use cases, and have a wraparound of tools that can monitor and manage cost, performance, security, ethics, safety, governance and integration with company data. For this reason, I think Google Cloud, Microsoft (through Azure AI studio), Open AI and Amazon (through Bedrock) are going to remain the GenAI and Agentic AI platforms of choice. xAI is going to have to demonstrate a significant leap in ability to outweigh all the potential negative aspects of running a model controlled by Musk and his world views.
Posted by: Simon Baxter at 10:22