pk.org: Articles

The Evolution of File Systems

From Sequential Tapes to Cloud Storage

Paul Krzyzanowski – August 26, 2025

When I was putting together an overview of the history of operating systems, it occurred to me that file systems deserve their own story. They are, after all, a central component of any operating system and one that is especially visible to both users and programs. We experience them every time we save a document, organize folders, or retrieve data. Unlike some internal mechanisms of an operating system, file systems are part of everyday computing life.

This article is a companion to the operating systems overview. My goal is not to dissect the technical design of any one file system but to follow their broader evolution. File systems have gone from sequential tapes to hierarchical directories, and then on to journaling, copy-on-write, distributed architectures, and today’s cloud-native object storage. Each stage reflects the needs and limitations of its time, while laying the foundation for new ways of using computers.

As with the operating systems article, this is not a comprehensive or scholarly survey. It highlights major shifts and influential ideas, but it may skip over smaller contributions and niche systems. The intent is to capture the big picture: how the challenges of storage, reliability, and usability shaped the way file systems developed over the past seven decades.

The Evolution of File Systems

File systems represent one of the most fundamental aspects of computing: how we organize, store, and retrieve persistent data. Their evolution parallels the broader history of computing, with each generation solving specific problems while introducing new capabilities that enabled entirely new types of applications.

Sequential and Flat File Systems (1950s-1960s)

Early computers treated storage as sequential media: magnetic tapes where you had to read through everything to find what you wanted:

Tape-based systems used various magnetic tape formats:

The process was cumbersome: to access a file in the middle of a tape, the computer had to read through all the preceding files, which could take many minutes for a single file access.

Early disk systems like IBM's RAMAC (1956) introduced random access storage:

The main limitation was scalability: as the number of files grew, finding specific files became increasingly difficult. Every file existed in a single, flat namespace with no organizational structure.

The Hierarchical Breakthrough (1960s-1970s)

The revolutionary insight came from recognizing that files should be organized like documents in filing cabinets: with folders, subfolders, and logical groupings:

Multics (1969) pioneered the hierarchical file system:

Unix (1970s) refined and popularized hierarchical file systems with elegant simplicity:

Key insight: Hierarchical organization matches how humans naturally organize information, making computers much more intuitive to use. The inode system's separation of names from data also provided flexibility that earlier systems lacked.

Berkeley Fast File System (1980s)

While Unix's hierarchical file system was conceptually elegant, early implementations had serious performance problems. The original Unix file system scattered related data randomly across the disk, causing excessive seek times that made file access painfully slow.

Storage technology context:

Berkeley Fast File System (FFS) (1984), developed by Marshall Kirk McKusick and colleagues at UC Berkeley, revolutionized Unix file system performance:

Cylinder groups: The breakthrough insight was organizing the disk into cylinder groups - regions containing related files and directories:

Block size optimization:

Performance improvements were dramatic:

Industry impact: FFS became the foundation for most Unix file systems (ext2/ext3/ext4 on Linux, UFS on Solaris) and established performance optimization as a core file system concern.

The key lesson: Conceptual elegance must be combined with understanding of physical storage characteristics to achieve good performance.

Local File System Sophistication (1980s-1990s)

As personal computers proliferated and storage technology advanced, file systems gained sophisticated features to improve reliability and performance:

Storage media evolution:

File Allocation Table (FAT) systems for early PCs:

Advanced file allocation schemes emerged to solve FAT's limitations:

Journaling file systems like ext3, NTFS, and HFS+ addressed the critical problem of corruption during system crashes:

The driving force was protecting increasingly valuable data as computers moved from hobbyist tools to critical business systems.

Copy-on-Write and Advanced Features (1990s-2000s)

Storage technology continued advancing dramatically during this period:

Next-generation file systems introduced sophisticated data management capabilities:

Copy-on-write file systems like ZFS (2001) and Btrfs (2007) revolutionized data integrity:

Log-Structured File Systems emerged to solve the problem of random write performance:

Sprite LFS (1991) and NetApp WAFL introduced the revolutionary concept of treating the entire disk as a circular log: - Sequential writes only: all changes written sequentially to improve performance - Garbage collection: reclaiming space from obsolete data - Crash recovery: simple reconstruction from the sequential log - Write optimization: particularly beneficial for small, random writes

Benefits:

Modern influence: LFS concepts heavily influenced SSD-optimized file systems and database storage engines.

Versioning File Systems provided time-travel capabilities:

Advanced metadata features became critical as file systems grew more complex:

Extended attributes solved the limitation of traditional file systems that could only store basic metadata (name, size, timestamps, permissions). Extended attributes allow storing arbitrary key-value pairs with files:

Access Control Lists (ACLs) extended beyond Unix's simple read/write/execute permissions to support complex organizational security requirements:

Apple's HFS+ and later APFS (2017) introduced solid-state drive optimizations:

Network File Systems (1980s-1990s)

As computing became networked, file systems needed to span multiple machines, creating entirely new challenges around consistency, performance, and security:

Network infrastructure of the era:

Network File System (NFS) (1984) by Sun Microsystems enabled the first practical networked file access:

NFS limitations included:

Andrew File System (AFS) (1985) addressed many NFS limitations:

Server Message Block (SMB/CIFS) became dominant in Windows environments:

Distributed File Systems for Big Data (2000s)

The explosion of data at companies like Google and Yahoo drove the development of file systems designed to handle massive scale:

Storage technology advances:

Google File System (GFS) (2003) pioneered big data storage:

Hadoop Distributed File System (HDFS) (2006) brought GFS concepts to open source:

Parallel File Systems served high-performance computing needs:

Lustre (2003) and IBM GPFS (now Spectrum Scale) addressed supercomputing requirements:

Key insights:

Object Storage Revolution (2000s-Present)

Cloud computing drove a fundamental rethinking of file system architecture, moving away from traditional hierarchical file systems toward web-native storage models:

Infrastructure evolution:

Amazon S3 (2006) popularized object storage:

Object storage characteristics:

Software-defined storage systems like Ceph (2006) and GlusterFS (2005):

Container and Cloud-Native Storage (2010s-Present)

Modern application deployment through containers and microservices created new storage requirements:

Current storage technology:

Flash-Optimized File Systems emerged as SSDs became mainstream:

F2FS (Flash-Friendly File System, 2012) designed specifically for flash characteristics:

JFFS/JFFS2 served embedded flash storage:

Union and Overlay File Systems enabled new deployment models:

OverlayFS and UnionFS allow combining multiple file systems:

Container-native storage addresses modern application deployment:

Modern challenges include:

Emerging technologies:

Research Directions and Experimental Systems

Several file system concepts have emerged from research that, while not achieving widespread adoption, represent important explorations of alternative approaches:

Database-Integrated File Systems attempted to merge file systems with database capabilities:

WinFS: Microsoft's cancelled project for Windows Vista aimed to replace traditional file systems with a database-backed storage system:

BeFS (BeOS File System) provided database-like functionality:

Semantic File Systems explored content-based organization:

These systems failed to achieve adoption due to:

Decentralized and Content-Addressed Storage represents current research into distributed alternatives:

InterPlanetary File System (IPFS) (2014), created by Juan Benet, represents a potential paradigm shift toward content-addressed storage:

IPFS characteristics:

Related decentralized storage projects:

Key insight: Moving from location-based addressing (URLs, file paths) to content-based addressing could solve problems of link rot, censorship, and data permanence that plague current web infrastructure.

Current status: While still experimental, IPFS has achieved adoption in blockchain applications, distributed web projects, and research institutions. Whether content-addressed storage becomes mainstream depends on solving usability and performance challenges while maintaining decentralization benefits.

Current Trends and Future Directions

Modern file systems must address unprecedented challenges:

Performance and Scale:

Security and Privacy:

Artificial Intelligence Integration:

Sustainability:

Conclusion

File system evolution demonstrates a fascinating pattern: each generation solved the scalability and reliability problems of the previous generation while introducing new capabilities that enabled entirely new types of applications. We've progressed from sequential tapes that could store a few files to distributed systems managing exabytes across the globe.

The progression shows how technical constraints drive innovation: the need to find files quickly led to hierarchical directories, network requirements drove distributed file systems, and cloud computing created object storage. Today's file systems must balance traditional POSIX compatibility with cloud-scale performance, security, and global accessibility.

Looking forward, file systems continue evolving toward greater automation, intelligence, and integration with cloud services. The fundamental challenge remains the same as in the 1950s: how to efficiently organize and retrieve information - but the scale and complexity have grown exponentially. Modern file systems don't just store data; they actively manage, protect, and optimize it across global infrastructure while maintaining the simple abstraction of "files and folders" that users understand.