← All writing
AI-Assisted WorkflowsDS-2026-006

Debugging an AI-assisted upload pipeline: ldflags, Jinja2 gotchas, and session management

Abstract

A single AI-assisted engineering session — fixing a Go → FastAPI upload pipeline, improving a dashboard, rebranding a product — surfaced six concrete failure modes worth documenting. This covers each one: what went wrong, what the fix was, and what it means for AI-assisted development at the boundary of build-time config, template engines, and multi-repo coordination.

The NetHero scanner uploads results to a FastAPI dashboard after every run. The pipeline had been broken in three separate ways across several weeks — endpoint mismatches, serialization inconsistencies, a multipart form bug that returned HTTP 200 while silently dropping data. By the time those were fixed, the Go binary had a new problem: it was successfully posting to an empty string.

This post documents a day of AI-assisted debugging on that pipeline and several connected tasks. The goal here isn't to recap the product changes — it's to be specific about the failure modes, because several of them are directly relevant to anyone building at the same intersection: Go binaries with build-time config, Python template engines, and AI agents working across long multi-step sessions.

Code displayed on a developer monitor — colorful IDE output representing a debugging session
The upload pipeline had three separate failure modes across several weeks. The hardest to catch was the one that produced no error at all.

Build-time config and the silent empty string

The Go binary embeds its backend URL at compile time using ldflags:

go build -ldflags "-X main.backendURL=${CALLSENTRY_BACKEND_URL}" -o nethero.exe

If the GitHub Actions secret CALLSENTRY_BACKEND_URL is unset or empty at build time, the resulting binary has an empty string for backendURL — no runtime fallback, no error, no warning in the output. The scanner runs its probes, generates a result, calls http.Post(""), and fails silently or with a generic network error that doesn't hint at the root cause.

This is not a bug in the code. It's a property of build-time configuration that's easy to miss: the binary looks correct, the code is correct, the secret just wasn't there when the build ran. The fix was adding the secret to GitHub Actions and triggering a rebuild. But finding the cause took longer than it should have because the failure signature — upload fails with a network error — looks identical to several other problems the pipeline had already exhibited.

The lesson for AI-assisted debugging: when an AI agent is helping trace an upload failure, make sure the question "is the binary config correct?" is explicitly on the checklist. An AI will look at code, check the endpoint implementation, and review serialization — but build-time config issues don't appear in the code the agent reads. You have to provide that context directly, or the agent will keep looking in the wrong layer.

Human-readable application logs

The Go binary had been using structured slog output: machine-readable key=value pairs that are useful if you're aggregating logs into a pipeline, and useless if you're a field technician opening a log file to figure out why an upload failed.

The replacement format targets the person reading the file directly:

[2026-05-20 15:04:05] INFO   NetHero v2.1.0 starting
[2026-05-20 15:04:05] INFO   Data directory: C:\Users\...\nethero-data
[2026-05-20 15:04:05] WARN   No backend URL configured — uploads will be skipped
                             Set CALLSENTRY_BACKEND_URL in the environment or rebuild with the correct secret
[2026-05-20 15:04:12] INFO   Scan nhr_19e430346f34 complete — uploading to https://callsentry.danielscience.com
[2026-05-20 15:04:13] ERROR  Upload failed: connection refused
                             Check that the backend is reachable and CALLSENTRY_SCANNER_TOKEN matches the server config

A few things make this format worth the extra effort. First, the startup log records whether the backend URL and scanner token are present — so a missing config is caught before the first scan attempt rather than at upload time. Second, error messages include actionable fix hints. The person reading the log shouldn't need to know the internal architecture to understand what to do next. Third, the file is written in append mode to the same directory as the binary, which means it's where the user expects to find it and accumulates across runs without being truncated.

Structured logging has its place — if you're shipping to a log aggregator, parse-friendly output matters. But for a desktop diagnostic tool used by field engineers, plain English wins. Pick the format for the audience, not for the architecture.

Dashboard: computing filters in Python, not Jinja2

The NetHero dashboard was showing aggregate statistics but nothing about the most recent scan. The fix involved passing the latest scan's raw JSON to the template and letting Jinja2 display it. The first draft of the template filtered findings by status directly in the template:

{# This does not work — 'in' is not a valid Jinja2 test #}
{% set issues = findings | selectattr('status', 'in', ['fail', 'warn', 'error']) | list %}

Jinja2's selectattr filter takes a test name as its second argument — 'eq', 'defined', 'none', and so on. The string 'in' is not a built-in Jinja2 test. This fails silently in some configurations and raises a FilterArgumentError in others, depending on Jinja2 version and whether undefined filters are configured to raise or pass through. Either way, you don't get the list you expected.

The correct approach is to compute the filtered list in Python and pass it to the template:

_NON_PASS = {"fail", "warn", "warning", "error", "critical"}
latest_scan_issues = [
    f for f in (latest_scan.get("findings") or latest_scan.get("results") or [])
    if (f.get("status") or "").lower() in _NON_PASS
    or (f.get("severity") or "").lower() in {"error", "warning", "critical"}
]

The template receives latest_scan_issues as a ready list and iterates it directly. This is the right pattern regardless — template logic should handle display, not data transformation. But it's easy to reach for a filter when you're working quickly, and the failure mode here is subtle enough that it can pass an initial test.

The same session also caught a secondary issue: sm.pass in Jinja2. pass is a Python keyword. Jinja2 doesn't parse attribute access on reserved words the same way across all versions. The safe form is always sm.get('pass') when the key is a Python keyword.

Empty logs vs. broken logs

The application logs page on the dashboard was empty. The initial assumption — reasonable, given the upload pipeline had been broken — was that the logging code was wrong. It wasn't. The scanner_logs table is written server-side on every upload attempt, successful or failed. But all the prior uploads had happened before the server-side logging code was deployed. The table was empty because nothing had been uploaded since the logging was added, not because logging was broken.

This is a failure mode worth naming: empty state that looks like a bug but is actually correct behavior with no data yet. The fix was twofold. First, verify the code path was correct (it was). Second, update the empty-state message so users understand why the table is empty:

No log entries yet. Application logs are recorded when scans are uploaded from NetHero or run via the web portal. Upload a scan to see activity here.

An AI agent diagnosing this kind of issue will tend to look at the code — which is fine, but the code being correct doesn't mean the state is correct. When an AI reports "the logging code looks right," the next question should be "has anything actually triggered it since the code was deployed?" Those are two different questions.

Rebranding without breaking internals

The product was renamed from CallSentry to NetHero. The constraint: change only what users see, leave everything else alone. The Python package is callsentry. Environment variables are CALLSENTRY_*. Database paths, imports, and internal module names all stay as-is. Only display text changes — page titles, headings, body copy, footer, browser tab titles.

The risk with a bulk rename like this is hitting technical strings by accident. A naive find-and-replace on "CallSentry" would catch things like pip install callsentry, CALLSENTRY_SCANNER_TOKEN, and file paths in comments — all of which should stay unchanged. The approach was a targeted replace with explicit exclusions and a post-change review of every modified file.

A second risk: the GitHub repo was renamed as part of the same session. GitHub automatically redirects the old URL, which handles most cases. But hardcoded clone URLs in setup scripts, badge URLs in the README, and links on the project website all need manual updates. The redirect is a safety net, not a substitute for updating the references.

For AI-assisted rebranding tasks: be explicit about what the agent should not change. "Rename CallSentry to NetHero" is ambiguous — it could mean everything. "Change only user-visible display text — not package names, environment variables, file paths, or code comments" is precise enough to work from.

Managing AI sessions across long multi-step tasks

This session covered five distinct tasks across two repositories: fixing the dashboard, fixing the logs page, rebranding, updating links after a repo rename, and writing documentation. That's a lot of context to hold across a long conversation, and context management is where AI-assisted sessions most commonly fall apart.

Laptop displaying code in a dark environment — representing a focused multi-step development session
Long AI-assisted sessions spanning multiple repositories and tasks require deliberate structure — not because the AI loses track, but because the engineer needs to stay in control of what's been verified and what hasn't.

A few patterns that helped:

State the constraint once, clearly, at the start of each task. "Use existing code when possible, don't rewrite anything if you don't have to" is a useful constraint that shapes every decision the agent makes. Restating it at the start of a new subtask keeps it active rather than letting it drift as the session grows longer.

Verify before trusting. When an agent reports "the code looks right" or "this should work," that's worth treating as a hypothesis, not a conclusion. Ask what it would look like if this was correct but had no data yet. Ask what the failure signature would be if it was broken in a different way. An agent that has read the code will tell you what the code says; it won't tell you what the runtime state is.

Separate architectural decisions from implementation. The decision to compute filtered lists in Python rather than Jinja2 is an architectural judgment. The agent can implement it either way — but the judgment about which approach is more maintainable, less surprising to a future reader, and less likely to fail silently belongs to the engineer. AI tools are fast at implementation. They're less reliable at the judgment calls that determine whether the implementation will hold up.

End each major subtask with a summary of what changed and what's still open. In a long session spanning multiple repositories and several distinct problems, it's easy to lose track of which changes are committed, which are staged, and which were discussed but not yet made. Treating each subtask as a discrete unit — with a clear end state — keeps the session recoverable if something goes wrong.

None of this is specific to any particular AI tool. It's about treating AI assistance as a force multiplier for engineering judgment, not a replacement for it. The session covered more ground in a day than would have been practical alone. But the decisions about what to change, whether the change was correct, and how to verify it were all human calls.


References

Related posts

AI-Assisted Workflows

AI assistance in security engineering: useful for drafts, not for judgment