Where Internet Actually Matters
| Workflow | Network Need | Bottleneck Risk | Why |
|---|---|---|---|
| Dataset downloads (ImageNet, LAION, Common Crawl) | Fast download, no data cap | ISP cap and speed | Datasets range from GBs to TBs; a 100GB pull at 50 Mbps takes 4+ hours |
| Checkpoint and artifact upload to cloud | Strong upload speed | Asymmetric upload limits | Large model checkpoints move from local GPU to S3/GCS frequently |
| Remote GPU notebooks (Colab, Lambda, RunPod, Vast.ai) | Stable low-latency connection | Jitter and dropouts | Interactive Jupyter sessions feel sluggish or disconnected with variable latency |
| Local NAS dataset storage | Wired GbE or 2.5GbE LAN | Local network, not ISP | A 10Gbps training loop can stall on a 100Mbps LAN segment |
| Package and dependency installs | Moderate download | Usually not — pip/conda are fast | PyTorch, TensorFlow, CUDA wheels are large (1–3 GB) but download once |
| Team collaboration (Weights & Biases, MLflow, Hugging Face) | Low — API calls and dashboard sync | Rarely | Metric logging and model card uploads are lightweight |
Dataset Download Time Reality Check
Before choosing an ISP plan or planning a dataset pull, it helps to understand the real time cost at different speeds:
| Dataset Size | 50 Mbps | 200 Mbps | 500 Mbps | 1 Gbps |
|---|---|---|---|---|
| 10 GB (small) | 27 min | 7 min | 3 min | 1.5 min |
| 100 GB (medium) | 4.4 hr | 67 min | 27 min | 14 min |
| 500 GB (large) | 22 hr | 5.5 hr | 2.2 hr | 67 min |
| 1 TB (very large) | 44 hr | 11 hr | 4.4 hr | 2.2 hr |
At 50 Mbps, a 500 GB dataset pull takes nearly a full day. At 500 Mbps, it takes about 2 hours. For researchers who pull large datasets regularly, this difference is significant — and it is not reflected in typical "streaming household" speed guidance that treats 100 Mbps as more than enough.
Data Cap Planning
Many ISPs impose monthly data caps (1 TB to 1.5 TB is common). AI/ML workflows can exhaust these caps quickly:
- ImageNet (full): ~150 GB download
- LAION-400M image dataset: ~240 GB
- Common Crawl (one crawl): ~80 TB (accessed in subsets, but large subsets are common)
- Llama 2 70B weights: ~130 GB
- Daily checkpoint uploads at 10 GB each: 300 GB/month
If your ISP has a data cap, treat dataset pulls as scheduled events rather than background noise. Pull large datasets overnight in off-peak hours. If you regularly exceed 500 GB/month on ML work, look for plans with no cap or a high cap, or consider a business-tier plan.
Remote GPU Session Requirements
Cloud GPU providers (Google Colab, Lambda GPU Cloud, RunPod, Vast.ai, CoreWeave) deliver computation remotely while your local machine drives the interface. The internet connection requirements are lower than expected for compute, but high for interactive feel:
- Latency: Jupyter notebook interactions feel sluggish above 80–100ms round-trip. Use a provider with a data centre geographically close to you. Avoid VPNs that route through distant exit nodes during active sessions.
- Upload: Uploading a local dataset to a cloud GPU for a training run can require pushing 10–100 GB. Plan this as a separate upload step before the session, not during it.
- Stability: A disconnected notebook session may lose unsaved state depending on the provider. Use session-preserving tools (tmux, screen, nohup) for long-running jobs that should survive a connection drop.
- Bandwidth for model I/O: Streaming output tokens or intermediate activations during interactive inference sessions is low bandwidth. The bottleneck is latency, not throughput.
Local Network: LAN May Be the Real Bottleneck
When training data lives on a NAS, the local network speed — not the ISP plan — determines how fast the training loop can read data. Common bottlenecks:
| Local Setup | Max Throughput | Sufficient For |
|---|---|---|
| 100 Mbps switch (old hardware) | ~12 MB/s | Small models, image classification; not video or large language data |
| Gigabit Ethernet (GbE) | ~115 MB/s | Most training workloads; limited for multi-worker data loading |
| 2.5 GbE | ~290 MB/s | Comfortable for most deep learning data pipelines |
| 10 GbE | ~1.1 GB/s | High-throughput training, video datasets, multi-GPU setups |
If training feels slow and the GPU utilisation is low, check whether the data loader is waiting on disk or network I/O. A simple test: copy a large file from the NAS to the workstation and measure the transfer rate. If it is below 100 MB/s on a GbE link, investigate cable quality, NAS storage speed, and switch quality before assuming the ISP is the problem.
Recommended Home Setup
Use Ethernet for every AI/ML workstation — Wi-Fi introduces jitter and variable throughput that is manageable for web browsing but disruptive for large transfers and remote sessions. Invest in:
- A wired NAS for dataset storage, with GbE minimum and 2.5 GbE or 10 GbE if you use large datasets regularly
- A router with sufficient upload capacity and ideally QoS rules to prevent dataset downloads from saturating the connection during meetings
- Fiber internet if available — symmetric speeds mean upload for checkpoints matches download for datasets
- A plan without a tight data cap, or a business plan with higher or no cap
Workflow Tips
- Keep commonly used datasets and model weights local when licensing allows — re-downloading a 100 GB dataset repeatedly wastes time and cap allowance.
- Use a NAS with a clear directory structure before downloads scatter across workstation drives and laptops.
- Schedule large dataset pulls and checkpoint uploads outside peak working hours to avoid competing with meetings and collaborative sessions.
- Use
rsyncorrclonefor dataset transfers — they are resumable and more efficient than browser downloads for large files. - Pause cloud sync clients during training runs that push checkpoints — multiple simultaneous upload streams can saturate even a fast connection.
- Test both ISP speed (from workstation) and LAN speed (NAS transfer rate) before concluding the ISP is the bottleneck.
Frequently Asked Questions
How much internet speed do I need for AI and ML work?
For serious dataset and cloud work, 200–500 Mbps download removes most waiting. Strong upload (50+ Mbps) matters as much if you push checkpoints and artifacts to cloud storage. Data cap generosity matters more than raw speed if you pull large datasets regularly.
Does internet speed affect local model training?
Not directly — once data is local, the GPU, CPU, and local storage determine training speed. Internet matters for everything around training: pulling datasets, installing packages, uploading checkpoints, using remote GPUs, collaborating via cloud tools, and backing up artifacts.
Should AI workstations use Ethernet?
Yes. Ethernet eliminates the variable throughput and latency of Wi-Fi, which is particularly noticeable during large NAS transfers, remote GPU sessions, and simultaneous uploads. Use Ethernet for any machine that runs training jobs or handles large file transfers regularly.
What if my ISP has a data cap?
Track your monthly usage carefully — ML dataset pulls and checkpoint sync can exhaust a 1 TB cap in days of active work. Schedule large pulls overnight, use compression where datasets support it, keep commonly needed data local, and consider a business plan or unlimited residential plan if you consistently exceed the cap.