Site-to-Site VPN to Cloud
For most organizations, a site-to-site IPsec VPN over the public internet is the right way to connect on-premises networks to a cloud VPC. It is cheap, encrypted by default, supported on essentially any firewall or router, and provisioned in minutes rather than weeks. Dedicated connectivity (Direct Connect, ExpressRoute, Interconnect) wins only when sustained bandwidth or latency requirements exceed what the public internet can deliver. This guide walks through the architecture, the configuration choices that actually matter, and the operational pitfalls.
The architecture in one diagram
A site-to-site VPN has two endpoints: a customer gateway (your physical or virtual router/firewall) and a cloud VPN gateway (a managed service in your VPC). An IPsec tunnel is established between them over the public internet. Routes are exchanged either statically or dynamically via BGP. Traffic destined for the other side of the tunnel is encrypted, encapsulated in ESP packets, and sent over the internet to the remote gateway, which decrypts and forwards it.
On each cloud:
- AWS: Virtual Private Gateway (attached to VPC) or Transit Gateway (multi-VPC). Each VPN connection consists of two tunnels for redundancy. Customer Gateway is the AWS-side representation of your on-premises router.
- Azure: Virtual Network Gateway (VPN Gateway). Active-passive or active-active depending on SKU. Local Network Gateway represents your on-premises endpoint.
- GCP: HA VPN (recommended) with two interfaces and two tunnels for SLA, or Classic VPN (single interface, legacy). External VPN Gateway represents your on-premises endpoint.
When to choose VPN over dedicated connectivity
| Criterion | VPN wins when | Dedicated wins when |
|---|---|---|
| Bandwidth | Under ~5 Gbps sustained | 5+ Gbps sustained |
| Egress volume | Under 5 TB/month from cloud to on-premises | 5+ TB/month |
| Latency | 5-15 ms variance is acceptable | Need consistent <5 ms variance |
| Setup time | Need it working today | 4-12 weeks is fine |
| Fixed monthly cost | Under $200/month is required | $500-3000/month budget exists |
| Encryption | Need encryption out of the box | Need bare-metal performance more than encryption |
| Compliance posture | FIPS 140-2 IPsec is acceptable | Air-gapped private connectivity is required |
The IPsec parameters you actually pick
IPsec is configured in two phases. Both ends must agree exactly.
Phase 1 (IKE)
Establishes the secure channel for negotiating Phase 2. Modern recommendations:
- IKE version: IKEv2. Faster, more reliable, supports MOBIKE for mobile clients. IKEv1 only for legacy compatibility.
- Encryption: AES-256-GCM (preferred) or AES-256-CBC.
- Integrity: Not needed with GCM (authenticated encryption); SHA-256 or SHA-384 with CBC.
- DH Group: 19, 20, or 21 (ECDH). DH Group 14 (MODP-2048) is acceptable but slower. Avoid Group 2 (MODP-1024).
- Lifetime: 28800 seconds (8 hours).
- Authentication: Pre-shared key (PSK) for simplicity, X.509 certificates for higher security environments.
Phase 2 (IPsec SA)
The actual data tunnel.
- Protocol: ESP (Encapsulating Security Payload). Never AH alone (no encryption).
- Encryption: AES-256-GCM.
- PFS (Perfect Forward Secrecy): Enabled. Use DH Group matching Phase 1.
- Lifetime: 3600 seconds (1 hour).
- Mode: Tunnel (not transport).
All three clouds support these parameters. Their default offerings sometimes use older parameters (DH Group 2, SHA-1) for compatibility — explicitly configure modern parameters on both sides.
Static routing vs BGP
You can choose between two routing models. The choice has outsized operational consequences.
Static routing
You manually specify on each side what CIDRs to send through the tunnel. Simple to set up; rigid. Adding a new subnet on either side requires manually updating routes on both sides. No automatic failover between redundant tunnels — if you have two static tunnels, you must use floating routes or other tricks to switch active/passive.
BGP routing
Routes are advertised dynamically. Each side publishes its prefixes; the other side accepts them. New subnets propagate automatically. Failure of one tunnel causes BGP to withdraw the routes through it; traffic immediately switches to the other tunnel. This is the only model that delivers seamless failover.
BGP requires choosing ASNs:
- Your side: a 16-bit private ASN (64512-65534) or a 32-bit private ASN.
- Cloud side: assigned by the cloud. AWS uses 64512 (default Virtual Private Gateway) or 7224 (Transit Gateway). Azure assigns 65515 by default. GCP allows you to choose.
Always use BGP for production. The configuration overhead is minimal once and the operational savings compound forever.
Redundancy: two tunnels minimum
All three clouds either default to two tunnels per VPN connection (AWS) or strongly recommend it (Azure, GCP). The two tunnels typically terminate on different cloud-side gateway IPs in different availability zones. Your on-premises configuration should:
- Establish both tunnels simultaneously.
- Run BGP sessions on both.
- Use ECMP to load-balance traffic across active tunnels, or active/standby with AS path prepending on the standby.
For higher availability, use two on-premises customer gateways (different physical routers) and four tunnels total — two from each customer gateway to the cloud. This protects against single-router failure on your side.
Bandwidth limits and how to scale past them
A single IPsec tunnel saturates around 1.25 Gbps on AWS and Azure due to CPU limits on the IPsec offload hardware. GCP Classic VPN tops out around 3 Gbps; HA VPN scales higher. These limits apply per tunnel, regardless of your internet bandwidth or customer gateway capability.
To scale past a single tunnel's limit:
- AWS Transit Gateway VPN with ECMP. Each VPN attachment to a transit gateway can have multiple tunnels (up to 4 in default config). With ECMP enabled, traffic spreads across tunnels for aggregate bandwidth up to ~50 Gbps depending on gateway SKU.
- Azure VPN Gateway HA active-active with multiple instances. Higher SKUs (VpnGw3, VpnGw4, VpnGw5) provide more aggregate throughput across multiple tunnels.
- GCP HA VPN with multiple tunnels. Each HA VPN gateway supports multiple tunnel pairs.
Note that each additional tunnel requires CPU on your on-premises router too. Many small firewalls cap at 200-500 Mbps of IPsec throughput regardless of how many tunnels you configure. Confirm the on-premises bottleneck before scaling cloud-side.
MTU and MSS handling
IPsec adds 56-64 bytes of overhead per packet. With standard 1500-byte Ethernet MTU on the underlying network, this leaves 1436-1444 bytes for the inner payload. Mismatched MTU is the most common operational issue with cloud VPNs — small packets work but large transfers (file uploads, database replication, video) stall or fail.
The fix is two-part:
- Set tunnel interface MTU to 1400. Lower than the actual cap, with headroom for unexpected encapsulation later in the path.
- Enable TCP MSS clamping at 1360. Most firewalls support setting a maximum segment size on TCP SYN packets passing through the tunnel. Setting MSS = MTU - 40 (1400 - 40 = 1360) ensures TCP never tries to send larger segments than the tunnel can carry, regardless of path MTU discovery problems.
AWS and Azure VPN tunnels enable MSS clamping automatically; you may need to configure it manually on the customer gateway side.
Common operational issues
- Asymmetric NAT. If your customer gateway is behind NAT, enable NAT-Traversal (NAT-T) — RFC 3947 / UDP encapsulation. Most modern firewalls do this automatically.
- Firewall rules blocking IKE/ESP. Open UDP 500 (IKE), UDP 4500 (NAT-T), and ESP protocol 50 outbound from your customer gateway public IP. Cloud-side firewall rules are managed automatically.
- Time skew. IPsec is sensitive to clock skew (>30 seconds breaks SA negotiation). Run NTP on your customer gateway.
- Phase 2 rekey storms. Aggressive rekey timers (under 1 hour) can cause throughput drops during rekey. Use 1-hour Phase 2 lifetimes.
- BGP session flapping. If BGP keepalive timers are too aggressive (under 10 seconds) on a high-latency or lossy internet path, sessions flap. Use 30/90 second hold timers as a baseline; tighten only if you have a stable, low-latency internet path.
Frequently Asked Questions
What is the maximum throughput of a cloud site-to-site VPN tunnel?
A single IPsec tunnel on AWS or Azure is capped at approximately 1.25 Gbps. GCP Cloud VPN classic tunnels cap at about 3 Gbps; HA VPN scales higher. To exceed the per-tunnel cap, run multiple tunnels in ECMP — AWS Transit Gateway and Azure VPN Gateway both support ECMP across multiple tunnels for aggregate throughput up to roughly 50 Gbps depending on gateway SKU. For sustained needs above 5-10 Gbps, dedicated connectivity is more cost-effective.
Do I need BGP for a site-to-site VPN to the cloud?
Strongly recommended for any production setup. BGP enables dynamic failover between redundant tunnels, automatic propagation of route changes on either side, and the ability to advertise new subnets without manual configuration. Static-routed VPNs work but require manual updates whenever the IP plan changes on either side and provide no automatic failover. AWS, Azure, and GCP all support BGP over IPsec; use it unless you have a hard constraint preventing it.
Can I run a VPN over my Direct Connect circuit?
Yes — and many regulated workloads do exactly this. The dedicated circuit provides predictable bandwidth and low latency; the IPsec layer adds the encryption required for compliance regimes like HIPAA or PCI. The trade-off is an extra encapsulation hop and lower throughput than the raw circuit. Plan MTU carefully — IPsec overhead reduces usable payload from 1500 bytes to roughly 1400 bytes.
What is the difference between policy-based and route-based VPN?
Policy-based VPN ties each IPsec security association to a specific source/destination CIDR pair — for every traffic selector pair, a new SA. Route-based VPN creates a virtual tunnel interface and uses routing (static or BGP) to decide what goes through it. Route-based is the modern default — simpler to manage, supports overlapping CIDR with NAT, and supports BGP for dynamic routing. All three clouds support route-based; policy-based is supported for compatibility with older on-premises firewalls.
What MTU should I configure for an IPsec tunnel to the cloud?
Set 1400 bytes on the tunnel interface. Standard Ethernet MTU is 1500; IPsec encapsulation adds 56-64 bytes overhead depending on encryption suite, leaving 1436-1444 usable. Configuring 1400 leaves headroom for any additional encapsulation in the path and avoids fragmentation. Also enable MSS clamping (TCP MSS adjust) to 1360 to prevent path MTU discovery failures from breaking TCP connections through firewalls that drop ICMP.
Related Guides
More From This Section
All Cloud Networking Guides
VPCs, peering, NAT, transit gateways, egress costs, and load balancers.
Cloud DNS Architecture
How cloud DNS actually works — VPC resolvers, private hosted zones, conditional forwarding to on-prem, split-horizon…
Cloud Egress Costs Explained
Cloud egress pricing explained — AWS, Azure, GCP rates, inter-AZ vs inter-region, NAT processing, VPC endpoints,…
Run a Speed Test
Measure download, upload, ping, and jitter in your browser.