Ceph Storage Cluster

Ceph Object Storage

As an alternative to a parallel file system (i.e. BeeGFS, GPFS/Spectrum Scale, Lustre, etc.) which stores data as a collection of files, ceph is a system that stores data as a series of immutable objects. One well-known implementation of object storage is the Amazon S3 API; like ceph, it also provides additional layers of abstraction that provide alternative approaches to using the storage system:

Raw Object Storage (think Amazon S3)
Block Storage (think disk images)
Parallel POSIX Filesystem (such as the parallel filesystems mentioned above)

Note that Ceph is not a RAID controller and does not implement redundancy or reliability by default in storing objects on disk; it implements data reliability through replication of objects across devices (OSDs) and/or hosts. This replication can be controlled independently for different storage pools that provide the different types of storage listed above.

Ceph can be used at many different scales, from a single node with attached storage to large clusters with hundreds of storage devices and dozens of nodes. Its architecture is scalable because it doesn't create bottlenecks in the command, metadata, or data retrieval paths.

From a overall system perspective, ceph can be used as a standalone storage platform or integrated into a kubernetes cluster -- or a hybrid approach where the kubernetes cluster uses the externally managed storage cluster for its dynamic storage provisioning needs.

In the WilliamsNet environment, ceph is used in different configurations:

A standalone cluster is hosted on storage1, which hosts the development filesystem 'workspace'
The development kubernetes cluster uses Rook to provision storage using the ceph cluster on storage1 as its platform
The production kubernetes cluster hosts its own ceph cluster internal to the kubernetes cluster, providing the 'shared' filesystem and internal kubernetes persistent storage needs.

Installing a Ceph Cluster

These instructions apply to the Octopus release of ceph, particularly version 15.2.0+ and the use of the cephadm orchestrator.

The requirements for installing and using ceph include:

at least one, and preferably an odd number of servers to host the management daemons for ceph
direct attached disks (or iSCSI mounted volumes) on the ceph cluster hosts
storage devices that are to be used must be completely empty -- no partitions or LVM signatures -- as the entire storage device is used by ceph

The ceph master node is the one that takes the initial install ... it hosts all of the different services to begin with -- but as hosts are added to the cluster, it will deploy copies of relevant daemons to the new hosts to provide redundancy and resiliency.

To zap a disk (delete its partition table) in preparation for use with Ceph, execute the following:

ceph-deploy disk zap {osd-server-name} {disk-name}
ceph-deploy disk zap osdserver1 /dev/sdb /dev/sdc

Single Host Operation

OOTB, ceph required replication to be across hosts, not just devices. For a single node cluster, this can be problematic. The following steps will add a new rule that will allow replication across OSDs instead of hosts:

#
# commands to set ceph to handle replication on one node
#
# create new crush rule allowing OSD-level replication
# ceph osd crush rule create-replicated <rulename> <root> <level>
ceph osd crush rule create-replicated osd_replication default osd

# verify that rule exists and is correct
ceph osd crush rule ls
ceph osd crush rule dump

# set replication level on existing pools
ceph osd pool set device_health_metrics size 3 

# apply new rule to existing pools
ceph osd pool set device_health_metrics crush_rule osd_replication

RBD Images

CephFS

Creating a CephFS

Mounting a Ceph FS

Mounting a ceph filesystem on a system outside the storage cluster requires four things:

The master ceph config file (ceph.conf) file from the /etc/ceph directory on any cluster node
The client keyring created on the ceph master node for client authentication
The 'mount.ceph' mount helper, available in the 'ceph-common' package
An entry in the /etc/fstab file

Ceph Config File

This file should simply be copied over to the client system from a node in the storage cluster:

sudo mkdir /etc/ceph
sudo scp <node>:/etc/ceph/ceph.conf /etc/ceph

Permissions should be 644 as this needs to be readable by non-root.

Client keyring

While the admin keyring/credentials could be used, for obvious reasons a separate user should be created for mounting the Ceph FS. While it is possible to create a separate user for each client system, there is no need to go to that level of paranoia. The keyring must be created on a system with admin access to the cluster (generally a cluster node) and then copied to the client system:

sudo ssh <cluster node>
ceph fs authorize <filesystem> client.<username> / rw > /etc/ceph/ceph.client.<username>.keyring
scp /etc/ceph/ceph.client.<username>.keyring <client>:/etc/ceph

This same keyring file can then be copied over to each client system without recreating it.

Mount Helper

All you need is the 'mount.ceph' executable, but there is no way to just get that file. So, you have to load the ceph common application bundle, which results in a bunch of dependencies that nobody will ever need:

sudo yum install -y ceph-common

/etc/fstab

This line will mount the Ceph FS on boot:

:/                      /<mountpoint>      ceph    name=<client id> 0 0

At this point, simply mount the filesystem as normal:

sudo mount /<mountpoint>