Ceph: 20 Years of Cutting-Edge Storage at the Edge

Ceph started as a 40,000-line C++ implementation of the Ceph File System, and it has since evolved into a comprehensive storage solution used by organizations worldwide.

Sep 10th, 2024 5:55am by Steven J. Vaughan-Nichols

Featued image for: Ceph: 20 Years of Cutting-Edge Storage at the Edge

SUWON, South Korea — Dan van der Ster, CTO of CLYSO and Ceph Executive Council member, said in an OpenInfra Summit Asia keynote speech that as of 2023, “82% of open infrastructure users reported that they’re using Ceph” for data storage. In the beginning, though, Ceph was a University of California, Santa Cruz — Go Banana Slugs! — student project by Sage Weil, who started the project as part of his Ph.D. research.

While Ceph started as a 40,000-line C++ implementation of the Ceph File System (CephFS), it has since evolved into a comprehensive storage solution used by organizations worldwide.

From the start, however, Ceph was marked by significant support from key institutions. From 2003 to 2007, Lawrence Livermore National Laboratory, Sandia, and Los Alamos National Laboratories backed Weil’s initial work. The object then was to create a horizontally scalable object-based file system for high-performance computing (HPC) workloads at data center scale.

Intelligence to the Edge

To do this, Weil took a novel approach. Instead of focusing on managing a large array of “dumb” disks, the idea was to push as much of the intelligence to the edges as possible. In addition, the design emphasized building consistent, reliable storage with no single point of failure.

These ideas made Ceph different from other storage approaches of its day, such as Lustre, Google File System (GFS) and Parallel Virtual File System (PFVS). This includes the following features:

Distributed object storage: Ceph was designed from the ground up as a distributed object storage system, Reliable Autonomic Distributed Object Store (RADOS), rather than a traditional file system. This allowed it to scale to much larger capacities across multiple nodes.
Decoupled data and metadata: Ceph separated the management of file metadata from the storage of file data itself. This improved scalability by allowing metadata and data operations to be handled independently.
Dynamic distributed metadata management: Ceph used a novel approach called Dynamic Subtree Partitioning (DSP) to adaptively distribute metadata management across servers. This allowed it to scale metadata performance as the system grew.
CRUSH algorithm: Ceph introduced the Controlled Replication Under Scalable Hashing (CRUSH) algorithm to deterministically place data across the cluster. This eliminated the need for centralized data allocation tables.
Intelligent distributed object storage: Ceph delegated tasks like data migration, replication, failure detection and recovery to the storage nodes themselves, allowing the system to be more autonomous and scalable.
Unified storage: Ceph aimed to provide object, block and file storage interfaces from a single platform rather than separate systems for each.

Then, between 2007 and 2011, DreamHost, a web hosting company co-founded by Weil, became a crucial supporter of Ceph’s development. During this time, Ceph’s core components gained stability, new features were implemented and a roadmap for the future was established. Key developers like Yehuda Sadeh-Weinraub, Gregory Farnum and Josh Durgin joined the project, contributing to its rapid growth.

As a Red Hat history of Ceph at 10 explained, “As Sage neared the end of his research, he started talking to many traditional storage vendors about Ceph and his work surrounding the project. After watching many of his peers get hired into industry and have their interesting and innovative work abandoned or absorbed into the large proprietary systems, he realized that the industry giants wanted ‘you’ and not your project.”

Ceph architecture

The Linux of Storage

As an open-source true believer who wanted Ceph to be the “Linux of storage,” Weil licensed Ceph under the LGPL version 2.1 in 2012. In addition, he didn’t require contributors to give their code copyright to the project.

At the same time, Weil founded Inktank to promote Ceph’s widespread adoption. This move brought enterprise-level support and expertise to Ceph implementations. The goal was to improve Ceph’s performance, make it production-ready and provide support for it.

Then, Ceph wasn’t very fast on Linux. Its client relied on the slow Filesystem in Userspace (FUSE) instead of a Linux-native file system. With progress made on performance, Linus Torvalds added Ceph to the Linux kernel with the mainline 2.6.34 kernel in 2011.

Inktank was successful in taking Ceph from academic and research organizations to businesses. Red Hat realized that storage would become increasingly important, saw Ceph’s growth, and decided to acquire Inktank in 2014.

Under Red Hat’s stewardship, Ceph became production-ready software for enterprises with professional support and continuous development. Within a few months of Red Hat acquiring Inktank, it released a major new Ceph version. As Red Hat CEO Jim Whitehurst told me in an interview at the time, “Red Hat is looking to create an open source stack for infrastructure and platform as a service.”

Over the years, Ceph has continued to see significant technical improvements. A major milestone was reached with Ceph version 12. Luminous. In this release, Ceph introduced BlueStore. This enables you to directly manage SSDs and HDDs without relying on conventional file systems. This innovation greatly enhanced Ceph’s performance and efficiency.

Since BlueStore consumes raw block devices and partitions, it avoids intervening layers of abstraction, such as local file systems, that can limit performance or add complexity. As for storage metadata, BlueStore uses an embedded RocksDB key/value database. RockDB includes the all-important mapping of object names to block locations on disk. One or more checksums protect all of these to protect the data and metadata further. No data or metadata is read from the disk or returned to the user without verification.

The result, Ster said, is “magic,” which started with Ceph’s original concepts. “In the olden days of storage, you had legacy architecture with standby IP addresses, virtual IP and multipath. With CRUSH and DSP, you can specify where you have data centers, rooms and racks and make rules about where you put the data. This system is very fast and computes the locations of the objects very quickly. This means that you don’t need a large database to remember where all the objects are. You can work out where the data should be and find it quickly.”

Ceph also has its own approach to data replication and reliability. Ster pointed out that there are two old ways. One is “where you replicate disks, and if a disk fails, you can have its data. The other way you can do it is to replicate the objects.” That works, but he continued, “Both require keeping empty spare disks around, which wastes resources and maybe even more importantly, the recovery is very slow.”

“With Ceph,” Ster said, “you group objects into small groups. In practice, with Ceph, I’ve never seen a data loss on properly managed clusters, even when there are major failures. So, it allows you not only to make a very highly available system, but also to make sure that if any component in your cluster fails, all of the other components in the cluster work together to replicate the data. So, in real life, it doesn’t take hours to rebuild after a failure. It takes maybe one minute or maybe even 30 seconds.”

The Ceph Foundation

With numbers like that, it’s no wonder many companies are now using Ceph.

Don’t think that Ceph’s success story has been because of Red Hat’s support. The Ceph community has been instrumental in its success. In 2018, the Linux Foundation launched the broadly supported Ceph Foundation. Many companies and organizations are backing Ceph. It’s also no small matter that Weil continues to guide Ceph.

As Ceph enters its third decade, it has become a critical component in various environments, from enterprise deployments to cloud infrastructure. Looking ahead, Ceph is positioning itself as a key player in AI and machine learning. Ceph will only become more important for anyone who cares about data storage. And, anymore, that’s pretty much everyone in IT.

Steven J. Vaughan-Nichols, aka sjvn, has been writing about technology and the business of technology since CP/M-80 was the cutting-edge PC operating system, 300bps was a fast internet connection, WordStar was the state-of-the-art word processor, and we liked it.