Editing Ceph Storage Cluster (section)

== Installing a Ceph Cluster ==
These instructions apply to the Octopus release of ceph, particularly version 15.2.0+ and the use of the '''cephadm''' orchestrator.

The hardware/system requirements for installing and using ceph include:
# at least one, and preferably an odd number of servers to host the management daemons for ceph
# direct attached disks (or iSCSI mounted volumes) on the ceph cluster hosts
# storage devices that are to be used must be completely empty -- no partitions or LVM signatures -- as the entire storage device is used by ceph

The ceph master node is the one that takes the initial install ... it hosts all of the different services to begin with -- but as hosts are added to the cluster, it will deploy copies of relevant daemons to the new hosts to provide redundancy and resiliency.

These directions are extracted/distilled from the [https://docs.ceph.com/docs/master/cephadm/install/ main ceph documentation] ...

=== Prerequisites ===
These must be in place to install a ceph cluster:

* Systemd
* Podman or Docker for running containers
* Time synchronization (such as chrony or NTP)
* LVM2 for provisioning storage devices
* Python 3 to run the tools/daemons

LVM2 and Python3 are not part of a minimal CentOS OS install ... but they are easily added if something else hasn't already brought them in:

 sudo yum -y install lvm2 python3

These packages must be in place for all nodes that form the ceph cluster, whether just running daemons or hosting storage (or both).

=== Cephadm ===
This tool is the secret sauce that makes ceph easy to install and use.  It was introduced in v15.2.0 -- Octopus.  I had an occasion to install and try to configure Nautilus release ... and it was very challenging.  Cephadm is first downloaded as a python script directly ... then later copied as part of the downloaded tool set.  It provides the orchestration and coordination needed to deploy a fully capable cluster using docker containers with just a few commands.

Use curl to fetch the most recent version of the standalone script:

 curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
 chmod +x cephadm

This script can be run directly from the current directory with:

 ./cephadm <arguments...>

Although the standalone script is sufficient to get a cluster started, it is convenient to have the cephadm command installed on the host. To install these packages for the current Octopus release:

 ./cephadm add-repo --release octopus
 ./cephadm install cephadm ceph-common

Unfortunately, cephadm does not properly set up the repository for Debian Buster, so you need to do it manually for those systems:

 wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
 echo deb https://download.ceph.com/debian-octopus/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
 sudo apt-get update

Note that for Fedora 33 (as of this update on 11/29/20) the latest version of ceph utilities (15.2.6) from the ceph repository '''do not work'''.  The fedora distribution actually appears to keep a reasonably current version of ceph packages in its standard repository, so it may be advisable to use the fedora repository for all package installations.  The fedora 'updates-testing' repository will have the absolute latest ceph version as it is validated for the platform, so that can be used if needed to get the newest release before it makes it into the main repository.  DO NOT simply enable the 'updates-testing' repository as it has unvalidated packages for other software, but use the '--enablerepo' option on yum/dnf as needed:

 sudo dnf install -y ceph-common --enablerepo=updates-testing

=== Bootstrapping the Cluster ===
It seems simple ... but these commands will literally create a working ceph cluster ready to start adding hosts and storage devices:

 mkdir -p /etc/ceph
 cephadm bootstrap --mon-ip *<mon-ip>*

One '''''very important''''' thing that needs to be done before bootstrapping the cluster is to set the hostname to '''just''' the base name (not a FQDN).  There is an option that will allow you to use FQDNs for hostnames, but it is not very reliable and isn't respected all through the system components.

At the end of the output from the bootstrap command will be a URL and password for the dashboard.  SAVE THE PASSWORD -- it won't show up again and you'll have to reset it somehow.  As soon as you use the password, it will force you to change it to something you can remember.

The dashboard is extremely useful, but it doesn't completely replace the command line interface ...

=== Ceph CLI ===
The ceph-common package installed above contains most of the tools that will be needed to manage the cluster, but if something more complicated occurs and you need the full toolset, you can have cephadm spin up a docker container with all the tools available:

 cephadm shell

Most of the time, however, the normal command line tools is sufficient:

 ceph <subcommand> <parameters>

=== Adding Hosts ===
While running a ceph cluster with a single host is possible, it is not recommended.  Adding hosts to the cluster is very straightforward; cephadm (through the 'ceph orch' command set) takes care of installing the needed software on the host to perform the functions it desires to place there:

 ssh-copy-id -f -i /etc/ceph/ceph.pub root@*<new-host>*
 ceph orch host add <new-host> [<new-host-ip>]

The host IP is optional, but needed when the host name doesn't resolve to the interface to be used for ceph communications.

=== Adding Disks ===
Drives are added individually as OSDs in ceph.  As mentioned above, ceph is rather particular about what it will accept for storage devices. 

An inventory of storage devices on all cluster hosts can be displayed with:

 ceph orch device ls

A storage device is considered available if all of the following conditions are met:
* The device must have no partitions.
* The device must not have any LVM state.
* The device must not be mounted.
* The device must not contain a file system.
* The device must not contain a Ceph BlueStore OSD.
* The device must be larger than 5 GB.

Ceph refuses to provision an OSD on a device that is not available. 

If a disk has been used previously, you will need to zap it so that ceph will accept it. 
 ceph orch device zap --force <host> <device>
or
 dd if=/dev/zero count=10 of=/dev/<drive>
or
 wipefs -a /dev/<drive>

If a drive has been used in a previous incarnation of ceph on that machine (i.e. you tore down a cluster and are trying to build it up again), you will need to do a bit more.  This is especially the case when using the [[Rook Storage for Kubernetes|Rook]] operator to build an internal cluster.  It does more than just check for the magic numbers on the disks, it checks the OS to see what it thinks of the drive.  Since ceph uses LVM to manage its disks, and LVM keeps its own record of its drives.  You can wipe a disk, but LVM still thinks it owns it ... and therefore the Rook operator won't touch it.  The process to make LVM forget about the drive is involved, but it works.  See [[Removing a disk from LVM]] for the process.

Once you have a clean drive, add it by issuing this command on the ceph master (not the system the drive is mounted on):
 ceph orch daemon add osd <host>:<device>

The host must have already been added in the previous section, and the cephadm takes care of installing the OSD software to manage it.  Also, if the host has been rebooted, you need to set the hostname to the base name (NOT the FQDN) before adding the drive/OSD.

=== Single Host Operation ===
OOTB, ceph required replication to be across hosts, not just devices.  For a single node cluster (or small clusters with unbalanced disks), this can be problematic.  The following steps will add a new rule that will allow replication across OSDs instead of hosts:

 #
 # commands to set ceph to handle replication on one node
 #
 # create new crush rule allowing OSD-level replication
 # ceph osd crush rule create-replicated <rulename> <root> <level>
 ceph osd crush rule create-replicated osd_replication default osd
 
 # verify that rule exists and is correct
 ceph osd crush rule ls

 # get the id of the new rule (probably will be '1', but check anyway
 ceph osd crush rule dump
 
 # set replication level on existing pools
 ceph osd pool set device_health_metrics size 3 
 
 # apply new rule to existing pools
 ceph osd pool set device_health_metrics crush_rule osd_replication

 # make the new rule the default
 # ceph config set global osd_pool_default_crush_rule <rule_id>
 ceph config set global osd_pool_default_crush_rule 1