The Google File System [PDF], by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung of Google. This is a technical paper that explains Google's custom scalable cluster filesystem for storing their gigantic database of the entire Web across thousands of low-cost PCs.
First, component failures are the norm rather than the exception. The file system consists of hundreds or even thousands of storage machines built from inexpensive commodity parts and is accessed by a comparable number of client machines. The quantity and quality of the components virtually guarantee that some are not functional at any given time and some will not recover from their current failures. We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking, and power supplies. Therefore, constant monitoring, error detection, fault tolerance, and automatic recovery must be integral to the system.
Probably only interesting to real geeks.
Posted by Aaron Swartz on September 30, 2003 05:52 AM