VoIP Business Phone System

Traditional phone lines are dying. VoIP — voice over IP — is the standard for business telephony in 2026, ranging from cloud-hosted services that need only a desk phone and internet to fully on-premises PBX systems that businesses run themselves. The basics are simple: phone calls are encoded as digital audio packets and routed over the same network as everything else. The complexity is in the details: SIP protocols, codecs, QoS, jitter buffers, NAT traversal, emergency calling regulations. This guide walks through the architecture, the choices between hosted and on-premises systems, and the network requirements that determine whether your voice quality is great or terrible.

How VoIP actually works

A VoIP call consists of two protocols running in parallel:

SIP (Session Initiation Protocol)

The signaling protocol — used to set up, modify, and tear down calls. SIP handles "ring this number, accept the call, hang up." SIP is text-based, runs over UDP or TCP (port 5060/5061), and looks somewhat like HTTP if you squint.

A typical call setup:

Caller phone           PBX/server            Callee phone
     │                       │                       │
     │ INVITE sip:bob@example.com                    │
     │──────────────────────►│                       │
     │                       │ INVITE                │
     │                       │──────────────────────►│
     │                       │           180 Ringing │
     │                       │◄──────────────────────│
     │           180 Ringing │                       │
     │◄──────────────────────│                       │
     │                       │            200 OK     │
     │                       │◄──────────────────────│
     │            200 OK     │                       │
     │◄──────────────────────│                       │
     │ ACK                   │                       │
     │──────────────────────►│ ACK                   │
     │                       │──────────────────────►│
     │           ◄═══ RTP audio stream ═══►          │
     │                                               │

RTP (Real-time Transport Protocol)

The audio payload protocol. RTP carries the actual encoded audio packets between phones (typically directly, peer-to-peer once SIP signaling sets up the call). RTP uses UDP because retransmitting late voice packets would just produce stutter.

Standard ports: RTP uses a dynamic range (10000-20000 typically). NAT and firewalls must allow this range or use SBC/SIP-ALG to dynamically open ports.

Hosted PBX vs on-premises PBX

Hosted (cloud) PBX

The PBX runs in the cloud, operated by a provider. Your phones register over the internet to the provider's servers. Common providers: RingCentral, 8x8, Vonage, Dialpad, Nextiva, OpenPhone, GoTo Connect, Microsoft Teams Phone, Zoom Phone.

Pros:

  • No on-site hardware beyond phones.
  • Automatic updates and features.
  • Easy to add lines.
  • Mobile app integration for "soft phones".
  • Includes voicemail-to-email, auto-attendants, call recording, call analytics.
  • Disaster recovery built-in — calls can route to mobile devices if office is offline.

Cons:

  • $20-40/user/month — adds up at scale.
  • Voice quality depends on internet connection (relevant during congestion or outages).
  • Less customization than on-prem.
  • Data residency and recording policies depend on provider.

Best for: most SMBs. Default choice unless you have a specific reason to go on-prem.

On-premises PBX

You run the PBX software yourself, typically on a server or appliance in your office. Phones register to your PBX; your PBX connects to a SIP trunk provider for outside calls. Common platforms: FreePBX (open source), 3CX, Asterisk, Avaya IP Office, Mitel.

Pros:

  • Lower per-user cost at scale (after fixed upfront cost).
  • Full control over features and integrations.
  • Internal calls don't traverse the internet (lower latency, no internet outage risk).
  • Better for custom CTI (computer-telephony integration), call center features.
  • Open source options (FreePBX, Asterisk) avoid per-user licensing.

Cons:

  • Server hardware, software licensing, ongoing maintenance.
  • Requires PBX expertise to configure and troubleshoot.
  • Internet outage takes the system offline (mitigated by mobile failover).
  • Less mobile/cloud-native than hosted.

Best for: 50+ user offices with technical capacity, call centers, regulated industries that require local call recording, businesses with custom telephony integrations.

SIP trunks vs hosted minutes

If you go on-premises, you need a SIP trunk — connection from your PBX to the public telephone network. SIP trunks are sold by:

  • Channel count: How many simultaneous calls can run. 5 channels for a small office; 20-50 for mid-sized.
  • Per-minute or unlimited pricing: Per-minute is cheaper at low volume; unlimited better at high volume.
  • DID (Direct Inward Dial) numbers: Each phone number you publish costs separately, typically $1-5/month.

Common SIP trunk providers: Twilio, Bandwidth, Telnyx, Voxbeam, VoIP.ms, Flowroute. Pricing typically $25-50/month for a 5-channel trunk plus $1-2/month per DID.

Hosted PBX bundles SIP trunks, DIDs, and PBX features into one per-user price. You don't separately buy a SIP trunk; it's included.

Voice quality: what determines it

Voice quality depends on the network, not just the bandwidth. Four metrics matter:

MetricGoodAcceptableBad
One-way latency< 100 ms100-150 ms> 150 ms (noticeable lag, talking over each other)
Jitter< 10 ms10-30 ms> 30 ms (choppy audio, dropouts)
Packet loss< 0.5%0.5-1%> 1% (gaps, dropped syllables)
Available bandwidth> 200 Kbps per call100-200 Kbps< 100 Kbps (compressed codec required)

Bandwidth is rarely the bottleneck — even a single phone call is only 100 Kbps. Latency and jitter are the real concerns.

Codecs: trading bandwidth for quality

The codec determines how audio is encoded:

CodecBandwidthQualityUse case
G.711 (PCM)87 KbpsToll qualityLAN, fast WAN; default for many systems
G.72932 KbpsGoodBandwidth-constrained links
G.722 (HD voice)87 KbpsBetter than toll (HD audio)Modern phones with HD voice support
Opus10-50 Kbps adaptiveExcellent at any bitrateModern softphones; best overall
iLBC15 KbpsAcceptableHighly compressed, robust to packet loss

Default to G.711 or Opus where supported. G.722 if you want HD voice and both endpoints support it. Use the compressed codecs only when bandwidth is genuinely constrained — they trade quality for bytes.

QoS: prioritizing voice on the network

Quality of Service marks voice packets as high priority so they jump the queue when the network is congested. Without QoS, a big file upload on the same network can starve voice packets, producing dropouts.

Standard VoIP QoS:

  • DSCP EF (46) for voice payload (RTP).
  • DSCP CS3 (24) for signaling (SIP).

QoS configuration touches multiple devices:

  • Voice VLAN: Put phones on a separate VLAN tagged for voice. Switches recognize the VLAN and apply QoS automatically.
  • Switch QoS: Most managed switches support 8 priority queues. Voice traffic goes in the highest non-control queue.
  • WAN QoS: On the firewall/router, mark and prioritize voice egress to the ISP. Bandwidth shape so voice always has guaranteed capacity even under heavy other use.
  • WMM (Wireless Multimedia) on WiFi APs — the WiFi equivalent of QoS, prioritizes voice frames over the air.

The NAT problem

SIP and RTP have an awkward relationship with NAT. SIP messages embed IP addresses ("send audio to 192.168.1.50"); if the receiver is behind NAT, the embedded address is unreachable from the public internet. Three solutions:

SIP-ALG (Application Layer Gateway) — usually wrong

Many consumer/SMB routers include a "SIP-ALG" feature that tries to rewrite SIP messages to make NAT work. This frequently breaks more than it fixes — disable it on every business router.

STUN (Session Traversal Utilities for NAT)

Phones discover their public IP via a STUN server, then advertise that IP in SIP. Works for most NAT types except symmetric NAT.

SBC (Session Border Controller)

A device that sits between your phones and the SIP trunk, presenting a stable public IP and managing NAT traversal. Hosted PBX providers handle this for you; on-prem PBX may need an SBC for SIP trunk connectivity.

Emergency calling (E911) compliance

For VoIP in the US, RAY BAUM'S Act and Kari's Law require:

  • Direct 911 dialing without an access code (no "9-1-1" — just "911" works).
  • Dispatchable location sent automatically — building, floor, room.
  • Notifications to designated office contacts when 911 is dialed.

Hosted PBX providers handle this with location services. On-prem PBX requires configuring location data for each phone and notification rules. This is not optional — fines for non-compliance start at $10,000.

Capacity planning

Rough sizing for typical office VoIP:

  • Bandwidth: 100 Kbps per simultaneous call. A 10-person office with all phones in use = 1 Mbps. Plenty of headroom for any business internet.
  • Concurrent calls: Estimate 25-40% of users on calls simultaneously during business hours. 100 users = 25-40 concurrent calls.
  • SIP trunk channels: Same as concurrent calls. Order at least 30% headroom.
  • Power: Each desk phone is 5-15W via PoE. Confirm switch PoE budget.

Choosing a provider

Key questions:

  • What's the all-in monthly cost per user? Including SIP trunks, DIDs, calling minutes.
  • What's the contract length? Month-to-month is more expensive; 2-3 year contracts get better rates.
  • What hardware is supported? Some providers lock you into branded phones; others support any SIP phone.
  • What's included in basic vs premium tiers? Call recording, IVR/auto-attendant, analytics, mobile app, CRM integration often tiered.
  • What's the uptime SLA? Better providers offer 99.99%.
  • How is mobile handled? Smartphones with the provider's app extending your office phone behavior.
  • What about contact center features? If you have a call center, you need different features (queues, ACD, real-time monitoring).

Frequently Asked Questions

How much bandwidth does a VoIP call use?

A typical VoIP call uses 80-100 Kbps per direction (160-200 Kbps for the full conversation) with G.711 codec, or 30-40 Kbps with Opus/G.729. For 10 simultaneous calls: ~1 Mbps with G.711, ~400 Kbps with compressed codecs. Bandwidth is rarely the binding constraint for modern internet — latency and jitter matter much more for voice quality.

What is acceptable jitter for VoIP?

Under 30 ms is good; under 10 ms is excellent. Above 50 ms, voice quality degrades audibly with choppy audio and dropped syllables. VoIP phones use jitter buffers to smooth out arrival variations, but the buffer adds latency to compensate. Heavy jitter (over 100 ms) overwhelms the buffer and produces unintelligible audio.

What is QoS and why does VoIP need it?

Quality of Service (QoS) is the network's prioritization of voice traffic over other traffic. On a congested link, a file upload or large download can starve voice packets, causing dropped audio. QoS tags voice packets with high priority (typically DSCP EF, value 46) and the network preferentially forwards them ahead of best-effort traffic. Most business routers, switches, and firewalls support QoS; configure it on the voice VLAN to mark and prioritize SIP/RTP traffic.

Should I choose hosted or on-premises PBX?

For most SMBs in 2026: hosted (cloud) PBX. Hosted PBX (RingCentral, 8x8, Vonage, Dialpad, Microsoft Teams Phone, Zoom Phone) requires no on-site hardware, costs $20-40/user/month, includes mobile apps, and updates automatically. On-premises PBX (FreePBX, 3CX, Avaya, Mitel) gives more control and lower per-user cost at scale but requires server hardware, expertise, and ongoing operations. Hosted wins below ~50 users; on-prem may win above that point for cost reasons.

Can VoIP work over WiFi?

Yes but with care. WiFi-only desk phones exist and work well in homes and small offices. The key requirements: stable signal (no roaming through marginal coverage), low jitter (other heavy WiFi users on the same AP can cause problems), and QoS tagging (WMM on the AP prioritizes voice frames). For mobile users on company smartphones, VoIP apps over WiFi are standard. For desk phones, wired connection is more reliable but WiFi is acceptable if the WiFi infrastructure is well-tuned.

Related Guides

More From This Section