The Darwin Godel Machine (DGM) proved that LLM-based self-improvement works in practice. HyperAgents (DGM-H) takes the next step by making the meta-mechanism itself editable. This single architectural change unlocks cross-domain transfer, eliminates the need for per-domain human engineering, and opens the path to truly open-ended self-improvement.
- DGM: fixed, hand-crafted meta-mechanism; Apache-2.0 license; domain-specific
- DGM-H / HyperAgents: editable meta-mechanism; CC BY-NC-SA 4.0; domain-general transfer
- DGM results: SWE-bench 20.0% to 50.0%, Polyglot 14.2% to 30.7%
- DGM-H advantage: imp@50 = 0.630 on unseen domains (DGM: 0.0)
- Key insight: making the meta-mechanism editable is the critical architectural choice
DGM: The Direct Predecessor
The Darwin Godel Machine was introduced by Jenny Zhang et al. in 2025 as the first practical implementation of self-improving AI agents built on large language models. The name references two foundational ideas: Darwin's natural selection (the evolutionary search over agent variants) and Godel's self-referential systems (the agent's ability to reason about and modify its own code). The DGM codebase is available at github.com/jennyzzt/dgm under the Apache-2.0 license, making it commercially usable.
DGM delivered impressive results on practical benchmarks. On SWE-bench, a standard benchmark for automated software engineering, DGM improved task performance from 20.0% to 50.0% through iterative self-modification. On the Polyglot benchmark for multilingual code patching, performance rose from 14.2% to 30.7%. These results demonstrated that an LLM-based agent could meaningfully improve its own code over successive generations.
However, DGM has a fundamental architectural limitation that Jenny Zhang and colleagues identified and addressed in the HyperAgents follow-up paper (arXiv:2603.19461, March 2026). That limitation is the fixed meta-mechanism.
The Critical Difference: Editable Meta-Mechanisms
In DGM, the process of "generating improvement instructions" is a fixed procedure written by human engineers. The task agent and meta agent share a single coding agent, and the meta-level logic that decides what to improve, how to analyze failures, and what modification strategies to attempt is hardcoded. The task-level code can evolve, but the improvement process itself cannot.
Think of it this way: DGM is like giving a student a fixed study method and letting them practice. The student gets better at the subject, but they can never change how they study. If the study method does not work for a new subject, the student is stuck.
In DGM-H (the HyperAgents architecture), the task agent and meta agent are combined into a single hyperagent, and the meta agent itself can be rewritten. The system can modify not just its task-solving code but also its improvement-generating code. The study method itself evolves alongside the knowledge being studied.
DGM Architecture:
Task Agent Code (editable) ──► Coding Agent ──► Modified Task Code
▲ │
│ │
Fixed Meta Logic ─────────── Improvement Instructions
(hand-crafted, (generated by fixed process)
NOT editable)
DGM-H / HyperAgents Architecture:
┌─────────────────────────────────┐
│ Hyperagent Codebase │
│ ┌───────────┐ ┌────────────┐ │
│ │ Task Agent │ │ Meta Agent │ │ ◄── Both are editable
│ │ (code) │ │ (code) │ │
│ └───────────┘ └────────────┘ │
└─────────────────────────────────┘
│ │
▼ ▼
Task Execution Self-Modification
(can rewrite ITSELF)
Consequences of Editable Meta-Mechanisms
This single architectural change has cascading consequences.
Cross-domain transfer becomes possible. When the meta-mechanism can evolve, it develops domain-general improvement strategies that transfer to unseen domains. DGM's hand-crafted meta achieves imp@50 = 0.0 on unseen domains; DGM-H achieves imp@50 = 0.630. The meta-mechanism is no longer the bottleneck.
Human re-engineering is eliminated. DGM requires a human engineer to design the meta-mechanism for each new domain. The engineer must understand the domain well enough to specify what kinds of improvements are valuable and how to detect them. DGM-H removes this requirement. The meta-mechanism evolves to suit the domain automatically.
Open-ended improvement becomes viable. With a fixed meta-mechanism, improvement eventually plateaus because the meta-strategies cannot adapt to the current state of the task agent. With an editable meta-mechanism, the improvement process itself can grow more sophisticated as the task agent grows more capable, enabling sustained improvement over longer horizons.
Detailed Dimension Comparison
| Dimension | DGM-H (HyperAgents) | No Self-Improve | No Open-Ended | DGM | Human-Customized DGM |
|---|---|---|---|---|---|
| Meta editable? | Yes | No | Yes | No | No |
| Open-ended archive? | Yes | No | No | Yes | Yes |
| Cross-domain transfer | imp@50 = 0.630 | N/A | Limited | imp@50 = 0.0 | Domain-specific |
| Human effort per domain | Minimal | High | Moderate | High | Very high |
| Emergent capabilities | Observed (4 types) | None | Limited | Task-level only | Task-level only |
| Self-referential? | Yes | No | Partially | No | No |
The "No Self-Improve" and "No Open-Ended" columns represent ablation variants tested in the HyperAgents paper to isolate the contribution of each component. Removing self-improvement eliminates the meta-agent entirely, reducing the system to a standard agent. Removing the open-ended archive replaces population-based exploration with a single lineage, reducing diversity.
Performance Comparison Across Domains
Both DGM and DGM-H show strong performance improvements over their baselines, but DGM-H consistently outperforms DGM on the metrics that matter most for open-ended self-improvement.
On Polyglot, DGM improved pass@1 from 14.2% to 30.7%. DGM-H further improves upon this baseline through its evolved meta-mechanisms, though the primary advantage appears in the transfer setting rather than single-domain performance. On SWE-bench, DGM achieved a notable 2.5x improvement from 20.0% to 50.0%, demonstrating that LLM agents can substantially improve their own code-patching capabilities through iterative self-modification.
Where DGM-H truly separates itself is in the multi-domain and transfer settings. When evaluated across the four domains (Polyglot, Paper Review, Robotics, IMO Grading) described in the HyperAgents paper, DGM-H with editable meta-mechanisms and open-ended archives consistently produces the highest imp@k curves. The gap widens as k increases, suggesting that the advantages of editable meta-mechanisms compound over longer improvement horizons.
Safety and Oversight: Common Ground
Both DGM and DGM-H emphasize sandbox execution and human oversight as critical safety measures. All code modifications are executed in sandboxed environments where the agent cannot access external networks, modify system files, or affect other processes. Both systems implement human approval checkpoints where a human operator can review proposed modifications before they are applied.
Both systems use population-based approaches where multiple agent variants coexist, providing natural redundancy and rollback capability. If a modification produces a regression, the archive retains the previous variants, and the population can recover.
The key safety difference is that DGM-H's editable meta-mechanisms create additional audit surface area. When the meta-mechanism is fixed, auditors only need to review task-level modifications. When the meta-mechanism can be rewritten, auditors must also review changes to the improvement process itself, which requires deeper reasoning about second-order effects.
Historical Lineage: Godel Machine to HyperAgents
The intellectual lineage of HyperAgents spans over two decades of research in self-improving systems.
| System | Year | Key Contribution | Limitation |
|---|---|---|---|
| Godel Machine | 2003 | Theoretical framework for provably optimal self-improvement (Schmidhuber) | Requires formal proofs of improvement; impractical for real systems |
| DGM | 2025 | First practical LLM-based self-improving agent (Jenny Zhang et al.) | Fixed meta-mechanism; domain-specific; no cross-domain transfer |
| DGM-H / HyperAgents | 2026 | Editable meta-mechanism; cross-domain transfer (Jenny Zhang et al., Meta) | Transfer validated on limited domain pairs; CC BY-NC-SA 4.0 license |
Schmidhuber's 2003 Godel Machine was purely theoretical. It proposed an agent that could rewrite any part of its own code, but only if it could prove that the modification would improve expected future performance. This proof requirement made the system impractical because generating formal proofs of improvement is computationally intractable for complex real-world tasks.
DGM replaced the formal proof requirement with empirical validation. Instead of proving an improvement would work, the system tries modifications and measures whether they actually improve performance on validation data. This made self-improvement practical but introduced the fixed meta-mechanism as a new constraint.
HyperAgents removes that constraint by making the meta-mechanism editable, completing the vision of a fully self-referential self-improving system. The system does not yet achieve Schmidhuber's goal of provably optimal self-improvement, but it achieves something arguably more practical: empirically effective and transferable self-improvement across diverse domains.
Related Work: AI Scientist-v2
The AI Scientist-v2, developed by SakanaAI in 2025, is another notable system in the self-improving agent space. It focuses specifically on automating scientific research, including hypothesis generation, experimental design, and paper writing. The AI Scientist-v2 includes a reviewer agent that evaluates generated papers, and this reviewer agent was used as a baseline comparison in the HyperAgents paper for the Paper Review domain.
The key distinction between AI Scientist-v2 and HyperAgents is scope of self-modification. AI Scientist-v2 modifies experimental parameters and research directions but does not modify its own core reasoning or improvement mechanisms. It operates at the task level with sophisticated automation but without the metacognitive self-modification that defines the hyperagent category.
Framework Landscape Comparison
How does the HyperAgents framework relate to the broader ecosystem of AI agent frameworks? The comparison reveals that HyperAgents occupies a fundamentally different position in the design space.
| Framework | License | Primary Focus | Self-Modification | Meta Editable? |
|---|---|---|---|---|
| HyperAgents (DGM-H) | CC BY-NC-SA 4.0 | Self-referential self-improvement | Full codebase (task + meta) | Yes |
| DGM | Apache-2.0 | Self-improving coding agents | Task code only | No |
| AutoGen (Microsoft) | MIT | Multi-agent collaboration | None (orchestration layer) | No |
| LangChain / LangGraph | MIT | Durable agent execution | None (chain/graph framework) | No |
| CrewAI | MIT | Role-playing multi-agent | None (role assignment) | No |
| LlamaIndex | MIT | RAG and data agents | None (retrieval framework) | No |
None of the mainstream agent frameworks (AutoGen, LangChain, LangGraph, CrewAI, LlamaIndex) natively provide hyperagent capabilities. They are designed for orchestration, collaboration, and execution of predefined agent behaviors. Implementing self-referential self-modification on top of these frameworks would require significant additional engineering: code generation pipelines, sandbox execution environments, validation loops, and archive management. The HyperAgents framework provides these components as first-class primitives.
DGM's Apache-2.0 license permits commercial use, making it suitable for production deployment. HyperAgents' CC BY-NC-SA 4.0 license restricts commercial use, positioning it primarily as a research tool. Teams considering production deployment of self-improving agents should evaluate the licensing implications carefully.
Frequently Asked Questions
What is the main difference between HyperAgents and DGM?
The critical difference is meta-mechanism editability. In DGM, the meta-mechanism that generates improvement instructions is fixed and hand-crafted. In HyperAgents (DGM-H), the meta-mechanism itself is editable and can be rewritten by the system. This enables cross-domain transfer (imp@50 = 0.630 vs. 0.0) and eliminates the need for per-domain human engineering of improvement strategies.
Is HyperAgents open source? What license does it use?
Yes, HyperAgents is open source under the CC BY-NC-SA 4.0 license (non-commercial, share-alike). The code is available at github.com/facebookresearch/HyperAgents. DGM uses the more permissive Apache-2.0 license at github.com/jennyzzt/dgm. The licensing difference matters for commercial deployment.
How does HyperAgents compare to AutoGen, LangChain, and CrewAI?
AutoGen, LangChain, and CrewAI are agent orchestration frameworks for building multi-agent systems with predefined behaviors. HyperAgents is fundamentally different: it is a research framework where agents can rewrite their own improvement mechanisms. These orchestration frameworks would require significant additional engineering to implement hyperagent-level self-modification.
What is the historical lineage of HyperAgents?
HyperAgents traces from the Godel Machine (Schmidhuber, 2003), a theoretical framework for provably optimal self-improvement, through the Darwin Godel Machine (Jenny Zhang et al., 2025), the first practical LLM-based self-improving agent, to DGM-H/HyperAgents (Jenny Zhang et al., Meta, 2026), which adds editable meta-mechanisms and cross-domain transfer.
Can DGM achieve cross-domain transfer like HyperAgents?
No. DGM's hand-crafted meta-mechanism achieves imp@50 = 0.0 when transferred to unseen domains, meaning zero improvement after 50 modification steps. DGM-H achieves imp@50 = 0.630 in the same setting. DGM requires human re-engineering of the meta-mechanism for each new domain, while HyperAgents evolves domain-general improvement strategies automatically.