RAID vs Erasure Coding
RAID was designed for arrays of 4-16 disks attached to a single controller. Erasure coding generalizes the same idea to distributed systems with hundreds or thousands of disks. The two share the underlying math — split data into chunks plus parity, lose some chunks, reconstruct the rest — but operate at different scales and with different operational profiles. Modern storage uses both: RAID for the home NAS and small server arrays, erasure coding for object storage and large distributed systems.
The shared idea
Both RAID and erasure coding answer the same question: how do we tolerate drive failures without keeping a full duplicate of every byte? The answer in both cases: store data plus computed parity. If a drive fails, reconstruct its data from the remaining drives plus parity.
RAID-5 is the simplest example: N+1 layout. N data drives plus one parity drive. Lose any one drive, reconstruct from the rest. Erasure coding generalizes to N+M for any N and M.
RAID at a glance
| Level | Layout | Failure tolerance | Overhead |
|---|---|---|---|
| RAID-0 | Striped, no parity | None (any drive loss = total loss) | 0% |
| RAID-1 | Mirrored | One drive per mirror pair | 50% |
| RAID-5 | N+1 parity | One drive | 1/N |
| RAID-6 | N+2 parity | Two drives | 2/N |
| RAID-10 | Striped mirrors | One per mirror | 50% |
For details see RAID levels explained.
Erasure coding at a glance
Erasure coding generalizes RAID:
- 10+4: 10 data chunks plus 4 parity chunks. Can lose any 4 of 14. Overhead 29%.
- 8+3: 8 data chunks plus 3 parity chunks. Can lose any 3 of 11. Overhead 27%.
- 16+4: 16 data chunks plus 4 parity chunks. Can lose any 4 of 20. Overhead 20%.
The math (Reed-Solomon, Cauchy-Reed-Solomon, locally-repairable codes) is more involved than RAID-5 parity, but the result is the same: more flexibility in choosing the overhead vs durability tradeoff.
Where each shines
| Property | RAID | Erasure coding |
|---|---|---|
| Scale | Small arrays (4-24 disks) | Large distributed systems (hundreds+) |
| Implementation | Single controller or md/dm | Distributed across many nodes |
| Operational complexity | Lower | Higher |
| Write amplification | Moderate (RAID-5 read-modify-write) | Higher (compute many parity chunks) |
| Rebuild speed | Slow on large drives (multi-day) | Faster (distributed reads from many disks) |
| Failure model | Drive failure within array | Drive, node, rack, datacenter — configurable |
Why RAID-5 doesn't scale
RAID-5 with multi-TB disks has a known problem: during rebuild, the array reads every remaining disk to reconstruct the failed one. Disk read errors (Unrecoverable Read Errors, URE) become statistically significant when reading hundreds of TB of data. The bigger the drives and the older the array, the higher the chance of hitting a URE during rebuild — which can take the whole array down.
This is why RAID-6 (two parity, can survive one URE during rebuild) became standard for arrays above ~10 TB capacity, and why RAID-5 is generally not recommended for arrays with disks larger than ~2-4 TB.
Why erasure coding scales better
Erasure coding sidesteps the URE problem in two ways:
- More parity. 10+4 tolerates 4 simultaneous failures including UREs during rebuild.
- Distributed across many disks. Rebuild reads from many disks in parallel, each contributing a small portion. Rebuild time is bounded by the slowest of many reads, not the sum.
For storage systems with 1000+ disks, single-disk failures are routine and constant. Erasure coding handles this baseline rate transparently.
The compute cost
RAID-5 parity is XOR — extremely fast in software, often offloaded to controllers. RAID-6 adds a second parity computation (Reed-Solomon over GF(2^8)), still fast.
Erasure coding for N+M with M>2 involves more parity computations. Modern CPUs with SIMD instructions handle this efficiently; specialized hardware (FPGAs in some object storage systems) accelerates it further. For most workloads on modern CPUs, the compute cost is in the single-digit percent range.
Where in stack each lives
- Hardware RAID controller. RAID levels in firmware. Bound to specific controller hardware.
- Software RAID (mdadm). RAID levels in the OS. Portable across hardware.
- ZFS RAID-Z. RAID-equivalent integrated with the filesystem. Solves the RAID-5 write hole.
- Object storage (Ceph, MinIO, S3 internally). Erasure coding. Operates across nodes, not just disks.
- Distributed filesystems (HDFS, GlusterFS). Can use replication or erasure coding depending on configuration.
Failure domains
RAID protects against single-disk failures in one chassis. Erasure coding can be configured to protect across larger failure domains:
- 10+4 across 14 servers in one rack — survives 4 server failures but not the rack going down.
- 10+4 across 14 racks — survives 4 rack failures.
- 10+4 across multiple regions — survives losing entire datacenters.
The same coding math; different placement constraints. This is impossible with traditional RAID because the math operates on disks attached to one controller.
Object storage and erasure coding
AWS S3, Google Cloud Storage, Azure Blob Storage all use erasure coding internally. They publish durability targets like "11 nines" (99.999999999%) — which is the math working: enough parity that the probability of losing data is one in a hundred billion per object per year.
The user doesn't choose N and M; the provider tunes it for the chosen durability target. For 11 nines you need a lot of parity across a lot of failure domains.
When to use which
- Small home / SMB NAS (4-24 disks): RAID-Z2 or RAID-6. Operational simplicity wins.
- Enterprise storage array (24-256 disks): RAID-6 or vendor-proprietary protection schemes. Some support erasure coding.
- Large distributed object storage (hundreds+ disks): Erasure coding. The scale demands it.
- Distributed file system across nodes: Erasure coding or replication depending on access patterns.
Frequently Asked Questions
What is erasure coding?
A redundancy scheme that splits data into N data chunks plus M parity chunks (often expressed as N+M, like 10+4). The system can reconstruct the original data from any N of the N+M chunks. Storage overhead is M/(N+M) — much less than RAID-1 mirroring while tolerating more failures than RAID-5/6 at the same overhead.
How is erasure coding different from RAID?
RAID is typically rigid — fixed schemes (RAID-1, 5, 6, 10) with specific overhead ratios and failure tolerance. Erasure coding is parameterized — you choose N and M to match the durability vs cost tradeoff you want. Erasure coding generalizes well across many disks; RAID is usually applied to small arrays of 4-16 disks. Object storage systems use erasure coding; traditional NAS uses RAID.
Which is more space-efficient?
Erasure coding can be much more space-efficient. RAID-1 (mirror) is 50% overhead. RAID-5 is one parity disk overhead. Erasure coding can do 10+4 (29% overhead) or 8+3 (27% overhead) while tolerating more disk failures than RAID-6's two-parity scheme. The tradeoff is compute cost (encoding is more expensive) and rebuild patterns.
Why doesn't a small NAS use erasure coding?
The advantages of erasure coding show up at scale — many disks, distributed systems, large data volumes. For a 4-disk home NAS, RAID-5 or RAID-Z1 is functionally equivalent and operationally simpler. Erasure coding shines in distributed object storage where data is spread across hundreds of disks and any subset of N disks can serve a read.
What is the rebuild cost difference?
RAID rebuilds typically read every block from every remaining disk to reconstruct the failed disk. Slow and stressful for the array; multi-day rebuilds on multi-TB disks. Erasure-coded systems can rebuild more efficiently because they can pull from any N of the N+M chunks; rebuilds are distributed across many disks and complete faster. Some erasure-coded systems also rebuild lazily — fixing failures as data is read rather than scanning the entire failed disk's data.
Related Guides
More From This Section
All Storage & NAS Guides
RAID, NAS, Plex/Jellyfin, SMB/NFS, backups, and filesystems.
The 3-2-1 Backup Strategy Explained
3-2-1 means 3 data copies, on 2 media types, with 1 offsite.
Deduplication Explained
How storage deduplication works — inline vs post-process, fixed vs variable blocks, the deduplication table, RAM…
Run a Speed Test
Measure download, upload, ping, and jitter in your browser.