NetHero: a Go network scanner built for VoIP readiness checks

My last post on this topic ended with a note about rewriting the VoIP scanner in Go. This is what that looks like.

The Python version worked but never solved distribution. Getting it onto a client machine meant confirming Python was installed, running pip, and working through at least one Windows compatibility issue. For a diagnostic tool, that overhead is a problem. Go produces a single self-contained binary — nethero.exe, no runtime, no install — and that alone made the rewrite worth doing.

Template-driven checks

Check parameters in the Python scanner were scattered across config files, profile YAML, and hardcoded probe functions. In NetHero, everything a scan needs is in one YAML template: what to test, what thresholds apply, which checks run, and the remediation text for each failure.

Template (hosted-ucaas.yaml — hosted UCaaS platform)
  ├─ voip_cloud: SBC fleet across multiple regions
  ├─ Control channel: TCP control-channel (same SBC fleet as SIP)
  ├─ Provisioning: platform provisioning endpoint:443
  ├─ Thresholds: 150ms warn / 300ms fail, 30ms jitter, 1% loss
  └─ Remediation text per failed check

Adding a new platform means writing a YAML file, not touching the scanner code. That matters when you're covering five different PBX environments in the same week.

Validated in production

The template has been validated against a live hosted UCaaS deployment. Sample output from a healthy site:

NetHero Scan  nhr_[…]
Template:     hosted-ucaas v1   |   Mode: standard   |   Duration: 4772ms
Results:      20 pass  0 fail  0 warn  1 skip  0 error

DNS:
  [PASS]  [provisioning-host]         resolved to 3 addresses in 1ms        [VERIFIED]

Connectivity:
  [PASS]  voip-platform-sbc           [sbc]:5060 reachable in 139ms         [VERIFIED]
  [PASS]  SBC cloud-proxy-1           reachable in 134ms                    [VERIFIED]
  [PASS]  SBC cloud-proxy-2           reachable in 114ms                    [VERIFIED]
  [PASS]  SBC cloud-proxy-3           reachable in 101ms                    [VERIFIED]
  [PASS]  SBC cloud-proxy-4           reachable in 102ms                    [VERIFIED]
  [PASS]  SBC cloud-proxy-5           reachable in 101ms                    [VERIFIED]
  [PASS]  SBC cloud-proxy-6           reachable in 205ms                    [VERIFIED]
  [PASS]  Control-channel proxy-1…    TCP [ctrl] reachable in 102ms         [VERIFIED]
  [PASS]  Control-channel proxy-2…    TCP [ctrl] reachable in 103ms         [VERIFIED]
  [PASS]  Control-channel proxy-3…    TCP [ctrl] reachable in 102ms         [VERIFIED]
  [PASS]  Control-channel proxy-4…    TCP [ctrl] reachable in 101ms         [VERIFIED]
  [PASS]  Control-channel proxy-5…    TCP [ctrl] reachable in 79ms          [VERIFIED]
  [PASS]  Control-channel proxy-6…    TCP [ctrl] reachable in 125ms         [VERIFIED]
  [PASS]  Platform Provisioning…      HTTP 200 in 378ms                     [VERIFIED]

VoIP / SIP:
  [PASS]  voip-platform-sbc           SIP 200 in 203ms                      [VERIFIED]

Performance:
  [PASS]  voip-platform-sbc           Latency 90ms, jitter 17ms, loss 0.0% [VERIFIED]

Discovery:
  [PASS]  local interfaces            1 active interface found              [VERIFIED]
  [PASS]  ARP discovered devices      1 device on local network             [VERIFIED]
  [SKIP]  vps_probe                   no external probe endpoint configured

The SIP OPTIONS probe returned a 200 from the SBC — confirming registration is responding and the SIP stack is healthy, not just that a TCP port is open. At 90ms average RTT with 17ms jitter and zero packet loss, this site is well within standard VoIP quality thresholds.

Every control-channel path was reachable. That channel matters because it's the persistent outbound connection that keeps the phone registered after initial SIP. Blocking it is a common misconfiguration on firewall upgrades — phones appear registered but calls fail intermittently or drop mid-session.

What a blocked site looks like

For comparison, here's the output pattern on a site where the firewall has SIP ALG enabled and the control-channel port is blocked:

VoIP / SIP:
  [FAIL]  voip-platform-sbc           SIP OPTIONS timed out after 5000ms
          → Disable SIP ALG on the router/firewall. Verify outbound
            SIP is permitted to the platform's SBC addresses.

Connectivity:
  [FAIL]  Control-channel proxy-1…    TCP [ctrl] connection refused
  [FAIL]  Control-channel proxy-2…    TCP [ctrl] connection refused
          → Outbound control-channel port to the platform must be permitted.
            Phone may connect but will fail to maintain registration.

Remediation text is injected per-check directly from the template — whoever is reading the output doesn't need to know the platform internals to understand what to fix.

Network discovery

The discovery check enumerates the ARP cache and classifies each device by combining four signals in parallel:

Per host (up to 30 concurrent, 1.5s timeout each):
  ├─ OUI lookup     → MAC vendor
  ├─ Reverse DNS    → hostname pattern matching
  ├─ HTTP/S banner  → firmware version strings
  └─ SSH banner     → device header identification
       └─ Device type + confidence score

Confidence scoring matters in practice. An OUI match alone on a phone is interesting. An OUI match plus Poly hostname pattern plus firmware strings in the HTTP response is a definite match. Low-confidence results stay out of normal output and surface in verbose mode.

This is useful before you run connectivity checks because SIP ALG and double-NAT are router problems, not phone problems. Knowing what's sitting between the voice VLAN and the SBC before you start troubleshooting cuts diagnosis time significantly — the scanner has surfaced router models with SIP ALG enabled by default before a single probe was run, which narrowed the diagnosis immediately.

Check types

Check	What it validates
`tcp_port`	TCP connectivity to SBC and provisioning endpoints
`cloud_reachability`	All cloud endpoints defined in the template
`latency`	RTT, jitter, and packet loss to the SBC
`sip_options`	SIP stack responding with SIP 200 (RFC 3261 §11)
`dns`	Forward resolution for provisioning hostnames
`discovery`	LAN device inventory and classification
`vps_probe`	External path validation via deployed VPS

What's next

Core coverage is solid. Remaining gaps: SNMP walk for switch config verification, VLAN tagging validation on the wire, and an RTP stream analysis module. Templates for additional platforms — hosted UCaaS, carrier SIP trunks, on-prem Asterisk/FreePBX — are the next priority.

Passive probing is also on the roadmap. An earlier version of this tool had a working network topology map built entirely from passive observation — ARP traffic, SIP REGISTER activity, passive DNS — with no active scanning. It was one of the better features of that build. The plan is to bring that capability forward properly: passive discovery running alongside the active check suite, feeding a topology view that builds up as traffic is observed rather than as probes are sent.

References

RFC 3261 — SIP: Session Initiation Protocol — SIP OPTIONS (§11)
RFC 3550 — RTP: A Transport Protocol for Real-Time Applications — Jitter measurement (§6.4.1)
ITU-T G.114 — One-way transmission time — 150ms one-way latency threshold
IEEE 802.1Q — Virtual Bridged Local Area Networks — VLAN tagging for voice/data separation
Go standard library: net package — Network primitives used throughout the check suite
spf13/cobra — CLI framework for NetHero's command structure