HawkInsight

  • Contact Us
  • App
  • English

GPT-5, Genie 3, and Gork 4 take turns to battle AI giants and set off the big model of ecological reconstruction. The competition reaches a critical point

The three giants acted collectively at the same night and competed in the air. It was wonderful.

On August 5, the field of artificial intelligence ushered in a dramatic "Super Tuesday".OpenAI suddenly announced the open source of two large models, gpt-oss-120b and gpt-oss-20b, late at night, ending a six-year closed-source strategy; almost at the same time, Google DeepMind launched the third-generation world model Genie 3, claiming that it has "key capabilities to AGI";xAI's Gork 4 was launched in a high-profile manner, and Musk said its IQ had reached the doctoral level.

The three giants acted collectively at the same night and competed in the air. It was wonderful.

OpenAI: Late night open source model gpt-oss achieves o4-mini performance

The specific parameters and performance of the two models released by OpenAI this time are as follows:

GPT-OSS-120B (117 billion total parameters, 5.1 billion activation) runs on a single H100 GPU (80GB memory memory). The Codeforces programming competition scores 2622 points, surpassing the closed-source model o3-mini and equaling o4-mini; it even surpassed o4-mini in the health diagnosis benchmark HealthBench and mathematics competition AIME, breaking the performance ceiling of open source models.

GPT-OSS-20B (21 billion total parameters, 3.6 billion activation) only requires 16GB of memory, generates code on the M3 Pro chip MacBook at a speed of 23.72 tokens/second, and its performance matches the o3-mini.Its consumer-grade hardware adaptability completely subverts the traditional monopoly of computing power, making it possible to deploy high-level AI on mobile phones.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

Similar to the OpenAI o series of reasoning models in the API, both open weight models support low, medium, and high reasoning strength settings, allowing developers to trade-off between performance and response speed based on specific usage scenarios and latency requirements.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

After the model was released, OpenAI CEO Sam Altman was filled with excitement on social media: gpt-oss was released!We made an open model with o4-mini performance and running on high-end notebooks.Being super proud of the team is a major technical victory.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

Behind this shift in OpenAI's open source strategy is the pressure of fierce market competition and the pressure of customer needs.

Months after open source models such as DeepSeek caused industry shocks, Sam Altman publicly admitted that he was "on the wrong side of history" on open source issues.But the more direct pressure comes from business reality: corporate customers are already widely using open source models to complete various tasks, seriously affecting OpenAI's customer base.

In this case, rather than stick to the enclosed garden, it is better to actively embrace the ecology.By lowering the model deployment threshold to the consumer hardware level, we build a broader developer base and cultivate an ecosystem around its technology stack.

GPT-5 may be released at any time from now on

There are indications that GPT-5 may be officially unveiled in early August.

On July 19, Sam Altman posted on the X platform: "We are about to release GPT-5.A few days later, on July 24, he first mentioned the internal testing experience of GPT-5 in a podcast, calling it "shocking" and saying "we will release it soon."

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

According to the news, GPT-5 can automatically adjust the depth of reasoning based on the complexity of the problem, without the need to manually switch between "basic version" or "deep thinking" modes.The o3 inference engine adopts a chain thinking mechanism to build a thinking chain internally through invisible "inference tokens".When dealing with complex problems, the system breaks down tasks, generates sub-inference chains, verifies logical consistency, and finally synthesizes answers, making the model reach a 35/42 gold medal level in the International Mathematical Olympiad, far exceeding the benchmark performance of GPT-4.

Compared to GPT-4 's 128K token limit, GPT-5 standard mode supports 256K, and extended mode reaches 1M token.This means it can digest the text volume of an entire large novel, or analyze the complete code base of a large software project.The output capabilities have simultaneously jumped, expanding from 4K tokens to 100K, allowing it to generate long-form professional content such as technical documents and legal contracts.

Microsoft's internal documents reveal that GPT-5 will launch a triple version of the architecture: the full flagship version of GPT-5 is aimed at enterprise-level complex tasks;GPT-5 mini optimizes real-time interaction; and GPT-5 nano adapts to edge devices.Ordinary users can access the basic version for free through ChatGPT, while Plus/Pro subscribers can unlock the premium version.

Google DeepMind: Genie 3 reshapes the virtual world

Last night, Google's third-generation universal world model Genie 3 was officially released.

With just simple text instructions, Genie 3 can generate an interactive 3D world at 720p resolution at 24 frames per second in real time, maintaining environmental consistency for minutes.What is even more eye-catching is its "can prompt world events" function: when users explore the dynamic world, they only need to enter new instructions (such as "add snowstorms" or "add dinosaurs"), and the virtual environment reconstructs physical rules and ecosystems in real time. system, users seem to be the ruler of the world.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

Genie 3 brings triple breakthroughs.

The first is the qualitative change of the real-time streaming architecture: unlike traditional generation models that require complete processing of input and output, Genie 3 uses autoregressive frame generation technology, which only takes 41.7 milliseconds of computing time per frame, truly realizing the instantaneous response of "hints are the world".Secondly, there is the self-evolution of the physics engine: the model independently learns complex laws such as gravity and fluid dynamics by analyzing 4 million hours of YouTube videos, and can accurately simulate physical phenomena such as splashing water and fluttering clothes without pre-setting programming rules.The third is a breakthrough memory mechanism: the system can trace a minute of visual history, and when the user returns to the scene, wall graffiti and moving objects remain unchanged-this emerging memory ability even surprised developers.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

DeepMind Research Director Shlomi Fruchter emphasized in a technical briefing: "This is the first real-time interactive universal world model that allows AI agents to learn causal reasoning in a safe environment, just as children learn to walk through falls."When the team put the general agent SIMA into the warehouse environment generated by Genie 3, AI successfully completed tasks such as cargo sorting, obstacle avoidance and navigation, and the training efficiency was 10 times higher than that in the real world.Genie 3 can independently understand that helicopters on the edge of cliffs need to maintain a safe distance, rocks in streams can change the direction of water flow, etc. The technological progress brought about by this "machine instinct" is particularly valuable.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

xAI Gork 4: The first "Doctoral AI" charges the most expensive in the world!

On August 4, xAI, an artificial intelligence company owned by Elon Musk, officially released the fourth-generation large-language model Grok 4 series, including the single-agent version of Grok 4 and the multi-agent collaborative version of Grok 4 Heavy.

At the live press conference, Musk positioned it as "the world's strongest AI model" and claimed that its academic capabilities have surpassed human doctoral levels in all subject areas.

From the perspective of architectural design, Grok4Heavy adopts a four-agent parallel collaboration mechanism. Each agent focuses on different sub-tasks (such as retrieval, reasoning, and generation), and then fuses the results through distributed computing. This architecture enables the resolution of complex tasks. Efficiency increased by nearly ten times.At the hardware level, Grok4 calls more than 100,000 NVIDIA H100 GPU cluster resources, and the training volume is 100 times that of the previous generation Grok2. The proportion of intensive learning is as high as 60%. The underlying pre-training directly integrates tool calling capabilities rather than relying on post-plug-ins.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

In terms of performance, Grok 4 has set new records in many authoritative tests.In the HLE benchmark test, known as the "last exam for mankind"(covering 2500 closed-book doctoral questions), the accuracy rate of the basic version reached 25.4% without using tools, and jumped to 38.6% after enabling tools; and with the help of multi-agent collaboration, Grok4 Heavy's score soared to 44.4%, far exceeding the 26.9% of Google Gemini 2.5 Pro and the 20.3% of the OpenAI o3 model, becoming the first AI model to "answer more questions than mistakes" in this test.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

In the fields of mathematics and engineering, Grok4 also showed dominant performance: full marks in the AIME25 mathematics competition, a 96.7% accuracy rate in the Harvard-Massachusetts Institute of Technology Mathematics Competition (HMMT). In the software engineering benchmark SWE-Bench, its dedicated programming variant Grok4 Code achieves an accuracy rate of 75%, significantly surpassing professional tools such as Copilot.What is even more eye-catching is the verification of the business scenario-in the vending machine operation simulation test, Grok 4 created twice the net assets of the second-place model, and generated a complete FPS game prototype within 4 hours to realize automated asset purchase.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

According to xAI's official website, the basic version of Grok 4 is priced at US$30/month, compared to OpenAI's US$20 Pro members; while the SuperGrok Heavy subscription fee that unlocks all capabilities is as high as US$300/month (annual fee of US$3000), making it the most expensive AI service in the world.API pricing also reflects high-end positioning: per million tokens input charges US$3 and US$15 for output, which is significantly higher than the industry average price.Behind the high price lies the high computing power cost of xAI-Grok 4 is based on self-developed Colossus supercomputing cluster training. Grok 3 training alone uses 200,000 GPUs, and the amount of Grok 4 training is astronomical.

GPT-5、Genie 3、 Gork 4轮番上阵 AI巨头混战引爆生态重构 大模型竞赛进入临界点

Since August, the large-scale model scuffle in the United States has begun in full swing.

OpenAI is intensively preparing for the release of GPT-5, trying to define the industry benchmark again;

Google's Gemini series is also constantly evolving, relying on its deep accumulation in search and cloud computing, trying to penetrate AI capabilities into every corner;

Anthropic's Claude series is known for its security and controllability, and has won the favor of many corporate users.

At the same time, Meta is also making great efforts to set up a top-level AI laboratory, and has recently recruited people from many companies such as OpenAI and Tesla in an attempt to catch up.

In this context, the release of Grok 4 is not only a one-man show for xAI, but also a charge for a new round of AI arms race.

What was the final result of this immortal fight?We will wait and see.

·Original

Disclaimer: The views in this article are from the original Creator and do not represent the views or position of Hawk Insight. The content of the article is for reference, communication and learning only, and does not constitute investment advice. If it involves copyright issues, please contact us for deletion.