← All writing
Network EngineeringDS-2026-004

Three attempts at a VoIP scanner, and what the first two got wrong

Abstract

Three iterations of the same portable VoIP scanner reveal a consistent failure mode: the architecturally cleaner version shipped worse. This post traces the build order mistake and what it costs.

Around the third time I SSHed into a client machine to manually run ping, traceroute, and a SIP OPTIONS check — confirming what I already knew, a routing problem with 80ms of unnecessary latency across three hops — I decided to build a tool that handled all of it automatically and stored the results somewhere useful.

I've now built that tool three times. Here's how it went.

Attempt one: voip-health

github.com/RyanDanielWillis/voip-health

asyncio probe scheduler, scapy for raw packet access, thin Flask backend. The core loop worked: run probes, collect results, POST to backend, view history in a browser.

CLI runner
  └─ asyncio probe scheduler
       ├─ ICMP ping (RTT / loss)
       ├─ UDP traceroute (path + per-hop latency)
       ├─ SIP OPTIONS (response code, latency)
       └─ RTP probe (optional jitter estimate)

Results → JSON → HTTP POST → Flask backend → SQLite

What broke: the profile system. I wanted different tolerance thresholds for hosted UCaaS, self-managed Asterisk, and Teams Direct Routing environments — which is reasonable. But instead of three YAML files with threshold values, I built inheritance, per-check weighting, condition-based severity overrides, and a scoring engine. By the end, changing a single threshold meant tracing through three layers of config to figure out what was actually active.

The diagnostic engine and upload pipeline worked fine. The profiles were unmaintainable.

Attempt two: CallSentry

github.com/RyanDanielWillis/NetHero_web

The lesson I took from voip-health was "simplify the profiles." The lesson I applied was "rebuild the whole thing as a proper platform."

CallSentry has a dashboard, dark mode, a web UI, network topology mapping, a knowledge base, log aggregation, a first-run wizard, Pydantic v2 validation, and a proper package structure. It's genuinely a more complete piece of software.

It also spent most of its development life unable to reliably upload a scan result.

One thing that did work well: passive probing. A network topology module — adapted from an existing open-source project on GitHub — built a map of the local network from observed traffic: ARP, passive DNS, SIP REGISTER activity. No active scanning, no packets sent. It discovered devices and inferred topology from what was already on the wire. That part was genuinely useful in the field, and the map it produced was accurate. The critical path — probe a site, get a result to the backend, see it in the dashboard — did not work reliably until much later.

The upload pipeline broke in three separate ways: endpoint path mismatch between scanner and backend, inconsistent serialization between what the scanner produced and what the backend expected, and a multipart form handling bug that returned 200 OK while silently dropping the data. Each was fixable. The problem was that the surface area was large enough that fixing one thing exposed another, and I kept adding features instead of closing the loop that had worked fine in version one.

The real mistake wasn't over-engineering. It was building features before the critical path was proven end-to-end. You don't need topology maps until scan results are reliably landing somewhere. Probe → result → backend → visible in dashboard. Close that loop first, then build everything else.

A note on VoIP diagnostics if you're building something similar

SIP OPTIONS is the fastest early-warning check. It tells you the endpoint is reachable, authentication is responding, and the registrar round-trip is within tolerance. Doesn't tell you call quality, but eliminates a lot of common failure modes fast. RFC 3261 §11 covers the relevant behavior.

Jitter matters more than latency for voice. 100ms RTT with no jitter is fine. 40ms RTT with 30ms jitter variance is often worse in practice because the playout buffer has to absorb the variance, which adds delay or causes drops. Inter-arrival jitter calculation is defined in RFC 3550 §6.4.1.

Path analysis is where the real problems hide. A packet going Denver → Chicago → Los Angeles → Dallas PBX is invisible from a ping to the endpoint and immediately obvious from a hop-by-hop breakdown. UDP traceroute with increasing TTL is the standard approach; the gotcha is that some routers rate-limit ICMP TTL-exceeded responses, which shows up as * * * hops that aren't actually dropped packets.

Wireshark is still the ground truth for validating what's happening at the packet level. The VoIP analysis docs are worth reading if you want to trace SIP call flows and RTP streams directly from a capture.

Attempt three

The upload pipeline in CallSentry is fixed now. The probe → backend → dashboard loop is solid. The features that were already built are usable because the foundation finally works.

But Python has a real ceiling for this kind of tool. The GIL limits true concurrency in the probe scheduler. Startup time matters on a client's machine. Scapy's raw packet performance doesn't hold up under heavy scanning loads.

The next version uses Go for the scanner and probe engine — single self-contained binary, real concurrency without threading overhead, solid network primitives in the standard library. TypeScript for the dashboard. C where we need direct access to packet buffers or hardware interfaces.

Same goal as version one: runs anywhere, finishes fast, puts results somewhere useful. Different implementation that won't start falling over in bigger environments.


Resources

Related posts

Network Engineering

SIP one-way audio, missed inbound calls, and dropped calls: a diagnostic playbook

Network Engineering

Packet capture field notes: VoIP diagnostics

Network Engineering

NetHero: a Go network scanner built for VoIP readiness checks