Stochastic Macro
Engineering notes · Review-pipeline architecture

A multi-domain parallel AI review pipeline with dismissal-based merge gating.

Most teams that plug AI into their code review pipeline start with one big general-purpose reviewer bot. That bot has to be competent at architecture, security, accessibility, testing discipline, database design, and everything else — simultaneously. The output is predictably mediocre: a wide, shallow pass that misses domain-specific issues and dilutes the signal humans were supposed to receive.

A better shape is several specialized AI reviewers, each narrowly scoped to one domain, each producing structured output, all running in parallel, with a single aggregator producing one consolidated review per PR. This post describes that pattern end-to-end, including the merge-gate trick that lets you clear branch protection after a clean cycle without letting a bot authored approval count toward your required-approvals quorum.

The pipeline shape

Pick the review domains your project actually cares about. Common ones:

  • Architecture — layer boundaries, dependency direction, module cohesion.
  • Security — authn/authz, injection, secret handling, unsafe primitives.
  • Test coverage — whether new behavior has corresponding tests at the right layer.
  • Domain modeling — whether the code conforms to whatever domain spec your project uses.
  • Accessibility — WCAG/ARIA/keyboard behavior for UI changes.
  • Inclusive design — tone, language, data-collection wording.
  • Patent / IP — novelty candidates and trade-secret exposure.

Each reviewer is its own agent with its own system prompt, its own rubric, and its own output schema. On every PR open/sync event, run all of them in parallel — GitHub Actions, GitLab parallel jobs, or a Buildkite matrix all work. Each reviewer emits a structured JSON artifact.

Structured reviewer output

The key design decision is that every reviewer emits the same JSON shape. This is what makes aggregation trivial and prevents one chatty reviewer from dominating the consolidated output. A minimal schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["reviewer", "has_blocking_findings", "findings"],
  "properties": {
    "reviewer": { "type": "string" },
    "has_blocking_findings": { "type": "boolean" },
    "findings": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["severity", "title", "body"],
        "properties": {
          "severity": { "enum": ["blocking", "major", "minor", "informational"] },
          "title":    { "type": "string" },
          "body":     { "type": "string" },
          "file":     { "type": "string" },
          "line":     { "type": "integer" }
        }
      }
    }
  }
}

Validate each reviewer's output against this schema before you feed it to the aggregator. If the output is malformed, fail the reviewer job — don't silently drop the findings.

The aggregator

The aggregator is one CI job that runs after all reviewers finish. It downloads every reviewer's JSON artifact, concatenates findings (tagging each with reviewer so humans know who raised it), computes one top-level has_blocking_findings, and posts a single GitHub review.

blocking=$(jq -s '
  any(.[]; .has_blocking_findings == true)
' reviewer-*.json)

if [ "$blocking" = "true" ]; then
  event="REQUEST_CHANGES"
else
  event="COMMENT"
fi

gh api repos/$OWNER/$REPO/pulls/$PR/reviews \
  --method POST \
  --field event="$event" \
  --field body="$(jq -s '…' reviewer-*.json)"

Aggregating into one review rather than N matters: it means the PR page shows one up-to-date AI review per cycle instead of a drift-prone stack of stale per-domain comments.

The merge-gate trick

Here's the subtle problem. On a clean cycle you want the bot to clear the blocking state so the PR can merge. The obvious move — have the bot post an APPROVE event — backfires on repos that use branch protection with required approvals. A bot-authored APPROVE counts toward the approval quorum, meaning the bot can effectively self-approve PRs if no human has reviewed yet.

You want two things at once:

  1. A dirty cycle's REQUEST_CHANGES review must be cleared when findings are resolved, otherwise branch protection remains blocked.
  2. The bot must never contribute to the required-approvals count.

The answer is to dismiss the prior REQUEST_CHANGES review via the Dismissals API, then post a fresh COMMENT review (never APPROVE). COMMENT does not contribute to approvals, and dismissing the old REQUEST_CHANGES clears the blocking state.

# Find any open REQUEST_CHANGES review this bot has left on the PR,
# and dismiss it explicitly before posting a fresh COMMENT.
gh api repos/$OWNER/$REPO/pulls/$PR/reviews --paginate \
  | jq -r '.[] | select(.user.login == "'"$BOT_LOGIN"'"
                       and .state == "CHANGES_REQUESTED") | .id' \
  | while read -r review_id; do
      gh api \
        -X PUT \
        repos/$OWNER/$REPO/pulls/$PR/reviews/$review_id/dismissals \
        --field message="Superseded by a cleaner cycle."
    done

gh api repos/$OWNER/$REPO/pulls/$PR/reviews \
  --method POST \
  --field event="COMMENT" \
  --field body="$consolidated_body"

On a dirty cycle you skip the dismissal and post REQUEST_CHANGES instead. The invariant is simple: the bot posts only REQUEST_CHANGES or COMMENT, never APPROVE, and every new review on a clean cycle is preceded by a dismissal of any prior blocking review from the same bot.

Variations

  • Per-path reviewer filtering. If a PR touches no UI files, don't run the accessibility or inclusive-design reviewers. Match on changed-path globs at the matrix level.
  • Severity bounds per domain. Some domains shouldn't be able to mark themselves blocking (e.g., a style reviewer). Enforce that at the aggregator level, not in the reviewer prompt — reviewer prompts are adversarial input too.
  • Fan-in timing. The aggregator should wait for all reviewers, but with a cap: if a reviewer times out at N minutes, run the aggregator with a missing artifact and mark the missing reviewer as degraded in the consolidated review body.

Trade-offs

  • Cost. N parallel LLM calls per cycle is more expensive than one. In practice the quality delta is large enough to justify the cost for repos where merged defects are expensive.
  • Prompt duplication. You will be tempted to share prompt infrastructure between reviewers. Resist until the duplication actually hurts — per-domain prompts are how you get domain-specific depth.
  • Dismissal churn on the PR timeline. Each clean cycle produces a dismissal entry; noisy but harmless. Humans learn to scroll past it.

Why we're publishing this

We think this shape is the right one for teams adopting AI review at scale, and the dismissal trick in particular is the kind of detail you only find out about after a few painful cycles with branch protection and a bot that approves PRs nobody read. Free to take, adapt, and ship.