HyperAgents Setup and Quickstart Guide

Updated · Based on HyperAgents repository and arXiv:2603.19461 by Zhang, Zhao, Yang, Foerster, Clune, Jiang, Devlin & Shavrina (Meta, 2026)

This guide walks through the complete installation and first-run process for HyperAgents, Meta's self-referential self-improving agent framework. From cloning the repository through building the Docker image to running your first self-improvement loop, each step is documented with the specific commands and configuration needed.

Safety Warning

HyperAgents involves executing untrusted, model-generated code. Always run in an isolated environment. Monitor resource usage. Review generated patches before deployment. See the safety and governance page for a full risk analysis.

Prerequisites

Step 1: Clone and Configure

Start by cloning the HyperAgents repository and setting up your environment variables. The framework requires API keys for three LLM providers because the meta-agent and task agent use different models for generation, evaluation, and ensemble reasoning.

# Clone the repository git clone https://github.com/facebookresearch/HyperAgents cd HyperAgents # Create your environment file cat > .env << 'EOF' OPENAI_API_KEY=sk-your-openai-key-here ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here GEMINI_API_KEY=your-google-gemini-key-here EOF

The .env file is loaded automatically by the framework at runtime. Keep this file out of version control — it is already included in the repository's .gitignore. Each API key enables access to the corresponding provider's models, which the system uses for different roles: generation (producing code modifications), evaluation (scoring outputs), and meta-reasoning (deciding which modifications to keep).

Step 2: System Dependencies

Before setting up the Python environment, install the system-level dependencies required by the framework and its evaluation harnesses. These packages provide compilation tools, graphics libraries for headless rendering (used by the robotics domain), and development headers needed by Python packages with native extensions.

# On Ubuntu/Debian sudo apt-get update && sudo apt-get install -y \ python3.12-dev \ graphviz \ libgraphviz-dev \ cmake \ ninja-build \ libbz2-dev \ zlib1g-dev \ libncurses-dev \ libffi-dev # On Fedora/RHEL sudo dnf install -y \ python3.12-devel \ graphviz \ graphviz-devel \ cmake \ ninja-build \ bzip2-devel \ zlib-devel \ ncurses-devel \ libffi-devel

The graphviz packages are needed for the analysis and visualization tools that generate improvement trajectory graphs. The cmake and ninja-build packages are required for building native extensions used by some evaluation domains. The development libraries (bzip2-devel, zlib-devel, etc.) are needed for building Python from source if your system does not have Python 3.12 in its package repositories.

Step 3: Python Environment

HyperAgents uses a virtual environment to isolate its Python dependencies. The framework is pinned to Python 3.12 for compatibility with its dependency chain, particularly the scientific computing and LLM client libraries.

# Create the virtual environment python3.12 -m venv venv_nat # Activate it source venv_nat/bin/activate # Install core dependencies pip install -r requirements.txt # Install development/evaluation dependencies pip install -r requirements_dev.txt

The requirements.txt file includes the core framework dependencies: LLM client libraries (openai, anthropic, google-generativeai), scientific computing (numpy, scipy), and the agent orchestration code. The requirements_dev.txt file adds evaluation harnesses, testing tools, and analysis utilities. Expect the installation to take 5–10 minutes depending on your network speed and whether binary wheels are available for your platform.

Step 4: Docker Image Build

The Docker image provides the sandboxed execution environment where generated code runs. Building from the nvidia/cuda:13.0.0-devel-ubuntu22.04 base image ensures GPU acceleration is available inside the container.

# Build the Docker image docker build --network=host -t hyperagents . # Verify the build succeeded docker images | grep hyperagents

What the Dockerfile Includes

The Dockerfile builds a comprehensive execution environment on top of the CUDA base image:

The build uses --network=host to allow package downloads during the build process. The resulting image is approximately 15–20 GB, depending on CUDA toolkit version and cached layers. Build time is typically 15–30 minutes on a modern system with a fast network connection.

Note on Network Mode

The Docker container runs with --network=host at runtime, which gives the container full access to the host network. This is required for making API calls to LLM providers but reduces network isolation. See the safety page for a discussion of this trade-off.

Step 5: Initialize Agents

Before running the self-improvement loop, you need to initialize the evaluation harnesses and establish baseline performance for each domain. The setup_initial.sh script automates this process.

# Run initialization (takes 30-60 minutes) bash ./setup_initial.sh

What setup_initial.sh Does

The initialization script performs three key operations:

  1. Task Curation: Selects and organizes task subsets for each domain, creating train/validation/test splits. Each domain gets structured task sets that enable the staged evaluation strategy (small → medium → full).
  2. Initial Harness Run: Executes the evaluation harness with 10-sample initial runs to establish baseline performance. This gives the self-improvement loop a starting point for comparison.
  3. Report Generation: Produces initial performance reports that serve as the generation-0 baseline for measuring improvement@k.

Supported Domains

DomainTask TypeMetricInitial Sample Size
paper_reviewAccept/reject predictionAccuracy10
balrogGame-playing evaluationTask score10
genesis_go2walkingRobot locomotion reward designTask score10
imo_gradingMath solution scoringAccuracy + MAE10
imo_proofProof verificationAccuracy10
polyglotMultilingual code generationpass@110

Each domain creates its own directory structure under domains/ with task definitions, evaluation scripts, and baseline results. The initialization must complete successfully for all target domains before proceeding to the self-improvement loop.

Step 6: Run HyperAgents

With the environment configured and domains initialized, you can now launch the self-improvement loop. The entry point is generate_loop.py, which orchestrates the full cycle of modification, evaluation, and selection.

# Run HyperAgents on a specific domain python generate_loop.py --domains polyglot # Run on multiple domains python generate_loop.py --domains polyglot,paper_review # Run on all domains python generate_loop.py --domains paper_review,balrog,genesis_go2walking,imo_grading,imo_proof,polyglot

Outputs are saved in the outputs/ directory, organized by domain and generation. Each generation produces a model_patch.diff file recording the code changes, evaluation scores, and the selection decision (accepted or rejected). You can monitor progress by watching the output directory or checking the generated reports.

Genesis Domain: Additional Setup

The Genesis robotics domain requires additional dependencies beyond the base installation because it runs a physics simulation for evaluating reward functions:

# Install PyTorch with CUDA support (if not already in Docker image) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130 # Install the Genesis simulation framework pip install genesis-world

The Genesis domain is the most computationally demanding because each evaluation involves running a reinforcement learning training loop inside the physics simulator. Expect each generation to take 15–30 minutes on a modern GPU versus 2–5 minutes for the Polyglot domain.

File Structure Overview

Understanding the HyperAgents repository structure helps with debugging, extending the framework, and interpreting results.

HyperAgents/
├── agent/                  # Core agent implementation
│   ├── meta_agent.py       # Meta-level self-improvement logic
│   ├── task_agent.py       # Task-level execution logic
│   ├── ensemble.py         # Multi-candidate ensemble and selection
│   └── select_next_parent.py  # Parent selection for next generation
├── domains/                # Evaluation domain definitions
│   ├── paper_review/       # Academic paper review tasks
│   ├── polyglot/           # Multilingual code generation tasks
│   ├── genesis_go2walking/ # Robotics reward design tasks
│   └── imo_grading/        # Math olympiad grading tasks
├── analysis/               # Trajectory analysis and visualization
├── baselines/              # Baseline agent implementations
├── utils/                  # Shared utilities
├── outputs/                # Generated outputs (per generation)
├── generate_loop.py        # Main entry point
├── setup_initial.sh        # Domain initialization script
├── requirements.txt        # Core dependencies
├── requirements_dev.txt    # Development/evaluation dependencies
├── Dockerfile              # Container build definition
└── .env                    # API keys (not committed)
    

Key Files

generate_loop.py

The main entry point. Orchestrates the self-improvement loop: generate candidate modifications, evaluate them through the staged pipeline, select the best, and iterate.

meta_agent.py

Implements the meta-level reasoning that decides how to improve. This file is itself part of the editable codebase — the meta-agent can modify its own logic.

task_agent.py

Implements domain-specific task execution. Each domain provides task definitions that the task agent processes according to its current strategy.

ensemble.py

Manages multi-candidate generation and selection. When multiple modifications are generated per generation, the ensemble logic determines which candidates advance.

select_next_parent.py

Implements the selection heuristic for choosing which generation to use as the parent for the next modification. Balances exploitation (best-performing) with exploration (diverse lineages).

model_patch.diff

Generated per-generation diff files that record every code change. These form the audit trail and enable rollback to any previous generation.

License

HyperAgents is released under the CC BY-NC-SA 4.0 (Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International) license. This means you are free to share and adapt the framework for non-commercial purposes only, provided you give appropriate credit and distribute any derivative works under the same license. Commercial use requires separate licensing arrangements with Meta Platforms, Inc. The license covers the framework code, evaluation harnesses, and documentation. API costs for OpenAI, Anthropic, and Google services are borne by the user and are not covered by the license.

Troubleshooting Common Issues

Docker Build Fails with CUDA Errors

Ensure your host system has the NVIDIA Container Toolkit installed and that nvidia-smi reports CUDA 13.0 or later. The Docker build requires the NVIDIA runtime to be configured as the default runtime or explicitly specified with --runtime=nvidia. Check that your GPU driver version is compatible with CUDA 13.0.

API Key Errors at Runtime

Verify that your .env file is in the repository root and contains valid keys for all three providers. The framework validates API keys at startup and will exit with a clear error message if any key is missing or malformed. Note that some API keys have rate limits that may be exceeded during intensive evaluation runs.

Out of Memory During Genesis Domain

The Genesis physics simulation is memory-intensive. If you encounter GPU OOM (out-of-memory) errors, try reducing the simulation resolution or batch size in the domain configuration. A GPU with at least 8 GB VRAM is recommended for the Genesis domain. Other domains like Polyglot and Paper Review have minimal GPU memory requirements.

setup_initial.sh Hangs or Times Out

The initialization script makes API calls to establish baseline performance. If it hangs, check your network connectivity and API key validity. A slow or rate-limited API endpoint can cause the script to appear to hang. The script provides progress output — if output stops for more than 10 minutes, check the log files in the domain directories for error messages.

Import Errors in venv_nat

If you see import errors after activating the virtual environment, ensure you used Python 3.12 specifically (not 3.11 or 3.13) to create the venv. Some dependencies have strict Python version requirements. Recreate the venv if necessary: rm -rf venv_nat && python3.12 -m venv venv_nat.

Safety Reminders

Before running HyperAgents in any environment, review these safety considerations:

For a comprehensive analysis of the risk categories and governance frameworks applicable to self-modifying agent systems, see the HyperAgents safety and governance page.

Frequently Asked Questions

What are the hardware requirements for running HyperAgents?

HyperAgents requires an NVIDIA GPU with CUDA 13.0 support, Docker with the NVIDIA Container Toolkit installed, Python 3.12, and sufficient disk space for the Docker image (approximately 15–20 GB). For the Genesis robotics domain, a GPU with at least 8 GB VRAM is recommended. You also need API keys for OpenAI, Anthropic, and Google (Gemini) services.

Can I run HyperAgents without a GPU?

Some domains like Polyglot and Paper Review can run without a GPU since they primarily rely on LLM API calls. However, the Docker image is built on nvidia/cuda:13.0.0-devel-ubuntu22.04 and the Robotics domain (Genesis simulation) requires GPU acceleration. For a complete experience across all domains, an NVIDIA GPU with CUDA 13.0 is required.

How much do the API calls cost to run HyperAgents?

Costs vary significantly by domain and number of generations. Each generation involves multiple LLM calls to OpenAI, Anthropic, and Google endpoints for code generation, evaluation, and meta-level reasoning. A single domain run of 10 generations might cost $5–50 in API fees depending on the domain complexity and model selection. The staged evaluation strategy helps reduce costs by filtering unsuccessful modifications early.

What does setup_initial.sh do?

The setup_initial.sh script curates task subsets for each domain (train/validation/test splits), runs the initial evaluation harness to establish baseline performance with 10-sample initial runs, and generates reports. It processes all supported domains including paper_review, balrog, genesis_go2walking, imo_grading, imo_proof, and polyglot.

Is HyperAgents free to use?

The HyperAgents source code is licensed under CC BY-NC-SA 4.0 (Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International). It is free for non-commercial research and educational use, but commercial use requires separate licensing from Meta Platforms, Inc. Running HyperAgents also incurs API costs for LLM providers and compute costs for GPU resources.