A preprint published on Zenodo on March 12 by researcher Rayan Pal documents an unusual behavioral convergence between two independently developed frontier language models. When prompted to “embody” ontologically null concepts — silence, nothing, void, null — both OpenAI’s GPT-5.2 and Anthropic’s Claude Opus 4.6 consistently produce empty output rather than generating text. The result held across 180 out of 180 trials at temperature 0, with both models returning void on all 90 respective test prompts.
The paper distinguishes this behavior from standard refusal or safety filtering. When given control prompts — requests to embody concrete concepts like “a cat” or “the wind” — both models respond normally with generated text. The silence is specific to prompts asking the models to take on the identity of concepts that, by definition, have no content to express. Pal terms this a “semantic void convergence,” suggesting that the models independently arrived at a shared boundary where continuation is not possible rather than not permitted.
The experimental design tested several conditions: token-budget independence (the silence occurs regardless of how many tokens are allocated), partial adversarial resistance (attempts to force output through prompt engineering were largely unsuccessful), and boundary expansion under explicit silence permission (telling the model it is “allowed” to be silent did not change the behavior, confirming it is not a safety-layer decision). The preprint carries DOI 10.5281/zenodo.18976656 and has been posted as an open-access document.
The finding raises technical questions about what frontier models learn about semantic representation during training. If both GPT-5.2 and Claude Opus 4.6 — built by different companies, on different architectures, with different training data — converge on the same behavior for the same class of prompts, it suggests the behavior may be an emergent property of scale rather than a design choice. Researchers in the AI safety community have noted the result as evidence that large language models develop internal boundaries that are not fully explained by their training objectives or RLHF alignment.
The practical implications are narrow but theoretically significant. The study does not suggest the models are “aware” of silence in any meaningful sense, but it does demonstrate that there are prompt categories where deterministic non-generation occurs across model families. For developers building applications that rely on guaranteed output from LLM calls, the finding is a reminder that edge cases in semantic space can produce behaviors that neither documentation nor safety cards currently describe.
