So we've been building this voice AI agent that needs to work with regular phone calls, right? I thought the hard part would be making the AI sound natural. Wrong.
The actual headache? Getting SIP trunks to play nice with WebRTC.
Let me vent a bit and maybe save someone else the pain.
The basic problem
Your AI speaks WebRTC (modern browser stuff). Phone networks speak SIP (old telecom protocol from like the 90s). You need both because:
- WebRTC = how you handle audio in web apps
- SIP trunks = how you get actual phone numbers and connect to regular phones
Someone has to translate between them. That someone is you.
What's been killing us
Codec mismatches SIP trunk sends G.711. WebRTC wants Opus. Now you're transcoding in real-time which adds 20-50ms latency AND eats CPU like crazy. We were hitting 80% CPU per call before we figured out better codec negotiation.
NAT traversal issues One-way audio. The absolute worst. You hear them, they don't hear you. Or neither of you hear anything. Works perfectly in dev, completely broken in production. Spent a whole week on this before realizing our firewall was blocking the RTP port range.
DTMF is a mess When someone presses a phone key, SIP and WebRTC handle it completely differently. SIP uses RFC 2833, WebRTC doesn't support it directly. Had to build detection on the media server side and relay it through a data channel.
Call quality monitoring You can't fix quality issues if you don't know they're happening. Started monitoring packet loss, jitter, and RTT every second. Found out 15% of our calls had one-way audio because of NAT issues we didn't even know about.
What actually worked
Got it down to about 1% failure rate now. Here's what helped:
- TURN servers - forced relay mode. Costs more bandwidth but eliminates NAT problems
- Smart codec selection - prioritize PCMU/PCMA (less transcoding = less latency)
- Multiple trunk providers - when one has issues, auto-failover to backup
- Real-time monitoring - catching issues before users complain
- Pre-warmed connection pools - avoid setup delays during high traffic
The stupid stuff that bit us
- Spent 2 weeks optimizing the wrong thing (LLM latency) when network routing was the real bottleneck
- Didn't realize enterprise SBCs strip WebRTC headers until we tried deploying at a client
- One carrier's NAT was rewriting IP addresses randomly. Took forever to diagnose.
- DTMF detection was at 85% accuracy. Turns out we needed dual-method detection (in-band + RFC2833)
Still struggling with
- Scaling past 1000 concurrent calls gets expensive fast
- Some carriers just hate certain codecs for no clear reason
- Debugging production issues is still painful - logs everywhere, no single view
Anyone else dealt with this stuff? What worked for you?
Or am I just doing this completely wrong lol