Introduction
Overview of the Terminal-Bench agent interface.
Creating a custom agent
Terminal-Bench tasks are complex environments with pre-configured lifecycles, include custom startup and teardown instructions. The Terminal-Bench harness handles this complexity for you, so you can focus on writing your agent.
If you haven't already, make sure to install Terminal-Bench by following our installation guide.
Integration Approaches
There are three approaches to integrating your own agent into Terminal-Bench:
- Install and run in the task container
- Implement the
BaseAgentinterface - Implement the
MCPAgentinterface
Install and run in the task container via AbstractInstalledAgent
If your agent is installable from the command line, then the most straightforward approach is to implement the AbstractInstalledAgent interface. Implementing the interface involves defining an installation script, the execution command, and environment variables.
To get started, you'll need to implement the _install_agent_script_path, _run_agent_commands, _env, and name methods.
Below is a minimal example of installing Claude Code inside the harness. Note that this is an illustrative example - we already support Claude Code (see our supported agents), so you don't need to implement this yourself.
We will also need to supply a setup.sh script that will be copied into the task container and run to install the agent:
In order to maximize compatibility, your script shouldn't make any assumptions about dependencies (including that curl is pre-installed) other than that they are running on debian linux.
You can then run your agent by passing its import path to the harness:
Implement the BaseAgent interface
Implementing the BaseAgent interface is more robust than installing your agent in the task container, but, depending on your agent's implementation, can require a bit of surgery to implement.
The BaseAgent interface only requires you to implement one method:
To integrate your agent by implementing the BaseAgent interface, you'll need to
Want help?
We are also willing to pair-program with you to integrate your agent. Email mikeam@cs.stanford.edu or alex@laude.org (or both!) if you'd like help.
Implement the MCPAgent interface
Coming soon!
Running a custom agent
Once you have implemented your custom agent, you can run it by using the --agent-import-path flag when running the harness.
Submitting to the leaderboard
See our submitting to the leaderboard guide for instructions on how to submit your agent results to the leaderboard.