Chinese AI company MiniMax released M2.7, its latest large language model, on March 18 with a notable technical claim: the model handled 30 to 50 percent of its own reinforcement learning optimization during training. MiniMax describes this as a “self-evolving” capability, where M2.7 was used to build, monitor, and adjust the reinforcement learning harnesses that shaped its own behavior — a departure from the standard practice of relying entirely on human-designed training pipelines.
M2.7 is a proprietary model available through API access only; the core weights remain closed. On the SWE-Pro benchmark, which tests software engineering capabilities, M2.7 scored 56.22 percent. On the GDPval-AA evaluation, it achieved an Elo score of 1495. The model carries a reported hallucination rate of 34 percent — a figure MiniMax disclosed proactively, a practice that remains uncommon among model providers.
The self-evolving training approach raises questions about reproducibility and oversight. If a model participates in designing its own reward signals and training loops, the standard assumption that humans fully specify the optimization objective no longer holds. MiniMax has not published technical details on how it bounded the model’s influence over its own training or what safeguards prevented reward hacking — a known failure mode where models exploit gaps in their reward functions rather than developing genuine capabilities.
Despite the title of a Hacker News discussion suggesting M2.7 would be open weights, the model is closed. MiniMax has released some previous models under open licenses, but M2.7’s architecture and parameter count remain undisclosed. The company competes in a crowded field of Chinese AI labs alongside DeepSeek, Qwen (Alibaba), and Zhipu AI, all of which have released frontier models in 2026 with varying degrees of openness.
M2.7’s 34 percent hallucination rate, while transparently reported, places it below the accuracy threshold that many enterprise deployments require. For comparison, leading models from OpenAI and Anthropic report hallucination rates below 10 percent on standard factuality benchmarks. MiniMax’s competitive advantage may ultimately depend less on raw benchmark scores and more on whether self-evolving training produces meaningful improvements in future model generations — a hypothesis that will require several more release cycles to evaluate.
