Skip to Content
E2E tests (local LLM)

E2E tests with local LLM

For contributors. This page is for running optional E2E tests with a local LLM. You don’t need it for normal use or to run the app.

Run end-to-end tests against a real local LLM (Ollama). One command starts the server if needed and runs the suite: web search, custom code, browser fetch, self-improvement, chat-driven design, and more. These tests are not run in CI.


Run the tests

From the repo root, run:

npm run test:e2e-llm

The script starts Ollama if it is not already running, then runs the E2E suite. No environment variable is required.


Quick start

1. Install Ollama

Install the Ollama runtime for your platform:

Windows: Download the installer from ollama.com  and run it. Ollama runs in the system tray and serves at http://localhost:11434.

2. Pull a model

Default model is Qwen 2.5 3B. Pull it (or another suggested model):

ollama pull qwen2.5:3b

3. Run E2E

From the Agentron repo root:

npm run test:e2e-llm

Optional: Podman (or Docker) is needed for run-code and container scenarios; those tests are skipped if unavailable.


Default and suggested models

ModelEnvNotes
Qwen 2.5 3B (default)(none) or E2E_LLM_MODEL=qwen2.5:3bGood balance of quality and speed
Llama 3.2E2E_LLM_MODEL=llama3.2ollama pull llama3.2
Phi-3E2E_LLM_MODEL=phi3ollama pull phi3

Override when running: E2E_LLM_MODEL=llama3.2 npm run test:e2e-llm


E2E test files

Tests live under packages/ui/__tests__/e2e/:

FileScenario
web-search.e2e.tsWeb search (std-web-search)
custom-code.e2e.tsCustom code (create_code_tool, std-run-code)
browser-fetch.e2e.tsBrowser fetch (std-browser / std-fetch-url)
self-improvement.e2e.tsSelf-improvement (get_run_for_improvement)
chat-driven-design.e2e.tsChat-driven design (one turn → create workflow, agent, execute)

Environment variables

VariableDefaultDescription
OLLAMA_BASE_URLhttp://localhost:11434Ollama API base URL
E2E_LLM_MODELqwen2.5:3bModel name (must be pulled in Ollama)
E2E_SAVE_ARTIFACTS(unset)Set to 1 to write run output/trail for debugging
E2E_LOG_DIRpackages/ui/__tests__/e2e/artifactsArtifact directory when E2E_SAVE_ARTIFACTS=1

Covered areas

  • Web search: std-web-search
  • Custom code: create_code_tool, std-run-code
  • Browser fetch: std-browser / std-fetch-url
  • Self-improvement: get_run_for_improvement
  • Chat-driven design: One turn to create workflow, agent, execute
  • Multi-agent workflow: Fetch to summarize (planned)
  • Containers: std-container-run (skip if Podman unavailable)
  • Request user help: std-request-user-help, respond_to_run

Logging and artifacts

  • Stdout: Tests log with an [e2e] prefix (scenario, runId, tool calls, outcome, duration).
  • Artifacts: Set E2E_SAVE_ARTIFACTS=1 to write run output and execution trail to the artifacts directory for failed runs or for improving Agentron.

Use E2E_SAVE_ARTIFACTS=1 npm run test:e2e-llm to capture run output and trails under packages/ui/__tests__/e2e/artifacts/ (or E2E_LOG_DIR).

Last updated on