Ipassact

Automate Documentation Testing with AI Agents: A Step-by-Step Guide

Learn to build an AI agent using GitHub Copilot CLI and Dev Containers to automatically test documentation, catching silent drift and missing steps like a synthetic new user.

Ipassact · 2026-05-02 00:28:43 · Open Source

Introduction

For early-stage open-source projects, the Getting Started guide is often a developer’s first real interaction. If a command fails, an output doesn’t match, or a step is unclear, most users won’t file a bug report—they’ll just move on. Drasi, a CNCF sandbox project for real-time data change detection, faced this challenge: a small team of four engineers shipping code faster than they could manually test tutorials. A 2025 GitHub Dev Container update bumped minimum Docker versions, breaking every tutorial silently. This incident forced a realization: with advanced AI coding assistants, documentation testing can be converted from a manual chore into an automated monitoring problem. In this guide, you’ll learn how to build an AI agent that acts as a “synthetic new user” to test your documentation end-to-end, using GitHub Copilot CLI and Dev Containers.

Automate Documentation Testing with AI Agents: A Step-by-Step Guide
Source: azure.microsoft.com

What You Need

  • GitHub Copilot CLI – installed and authenticated (requires a GitHub Copilot subscription)
  • Dev Containers – Visual Studio Code with the Remote – Containers extension or a compatible environment
  • A Getting Started tutorial – your project’s step-by-step guide (e.g., Markdown or HTML file)
  • A Docker runtime – Docker Desktop or Docker Engine for running containers
  • A sandbox environment – the exact dependencies your tutorial expects (e.g., specific database, CLI tool versions)
  • Basic scripting knowledge – ability to write shell scripts or Python to orchestrate agent actions

Step-by-Step Instructions

  1. Step 1: Set Up a Reproducible Dev Container Environment

    Create a .devcontainer/devcontainer.json file that mirrors the exact environment your tutorial assumes. Include all dependencies: Docker, k3d, sample databases, and any CLI tools. Use a Dockerfile to pin specific versions so that upstream changes don’t break your tests unexpectedly. Commit this configuration to your repository so the agent can spin up the container consistently.

  2. Step 2: Install and Configure GitHub Copilot CLI

    Inside the Dev Container, install the GitHub Copilot CLI by following the official documentation. Run gh copilot init to authenticate and set default prompts. The agent will use this CLI to interact with the terminal, executing commands exactly as written in the tutorial and verifying outputs.

  3. Step 3: Build the AI Agent Script

    Write a script (e.g., in Python or Bash) that acts as a synthetic new user. This script should:

    • Read each step of your Getting Started guide sequentially.
    • For each command, invoke GitHub Copilot CLI with a naïve prompt: e.g., gh copilot run "execute: docker run -d --name nginx nginx:latest".
    • Capture the output and compare it against expected results stated in the tutorial.
    • If the output matches, proceed; if not, log a failure and continue (or stop).
    • Report a summary of all passed/failed steps at the end.

    Key: treat the agent as completely literal and unforgiving. It should not infer missing steps or correct typos.

  4. Step 4: Enable Naïvety with Prompts

    Ensure the agent has no prior knowledge of your project. For each step, feed only the exact text from the tutorial. Do not include context like “wait for bootstrap”—the agent must not bring any implicit understanding. To enforce this, clear the conversation history between steps. Use a fresh Copilot session for each command to avoid the AI learning from previous successes.

  5. Step 5: Add Output Validation

    For each step that specifies an expected output (e.g., “You should see ‘Success’”), parse the terminal output for that exact string. If the output is missing or different, flag it as a bug. For steps without a defined expected output, define a baseline: the command should exit with code 0 and produce no errors. The agent should treat any stderr as a failure.

    Automate Documentation Testing with AI Agents: A Step-by-Step Guide
    Source: azure.microsoft.com
  6. Step 6: Run the Agent Against the Tutorial

    Execute the script inside the Dev Container. Watch it walk through the tutorial step by step. The agent will produce a report of all commands that succeeded and those that failed. Note that failures can be due to documentation errors, environment mismatches, or actual bugs in the getting-started flow.

  7. Step 7: Analyze Failures and Identify Silent Drift

    Review the report. Common failure types include: outdated commands, missing prerequisite steps, deprecated flags, or changed default behaviors. For example, if a tutorial says “run drasi list query” but the current version requires drasi query list, the agent will catch it. This reveals the silent drift that human reviewers often miss.

  8. Step 8: Integrate into CI/CD Pipeline

    Automate the agent to run on every pull request that touches the documentation directory. Use GitHub Actions or similar to spin up the Dev Container, install the agent, and execute the test. Fail the pipeline if any tutorial step breaks. This turns documentation testing into a continuous monitoring problem, ensuring tutorials remain valid as dependencies evolve.

Tips for Success

  • Embrace naivety – The agent’s greatest strength is its total ignorance of your project. Resist the urge to add hints or shortcuts. If your tutorial is missing an explicit step, the agent will catch it—that’s the point.
  • Keep environments isolated – Use a fresh Dev Container for each test run to avoid state pollution. Reset databases and containers between tests.
  • Monitor upstream changes – Subscribe to release notes of all dependencies listed in your tutorial. When a new version drops, run the agent preemptively to see if anything breaks. The 2025 Docker version bump is a classic example of silent drift.
  • Log everything – Save the agent’s full output (commands run, outputs seen, decisions made) for debugging. This becomes invaluable when diagnosing why a step fails.
  • Iterate on the script – Start with a simple linear script, then add features like retries for transient failures (but only if the tutorial explicitly mentions them). The goal is to mimic a real user, not to tolerate flakiness.
  • Share the agent with your team – Treat the agent like a new developer. Ask team members to review its failures and update documentation accordingly. Over time, the agent trains your team to write clearer, more executable docs.

Recommended