Rook Storage for Kubernetes: Difference between revisions

Latest revision as of 15:33, 10 October 2021

Background[edit]

For a long time, the kubernetes platform has supported the allocation and assignment of storage to pods. Initially, the storage had to be allocated and assigned manually, but then a dynamic provisioning capability was added in kubernetes 1.11 (?). Rook provides an implementation of the dynamic provisioning within kubernetes, allowing for storage to be requested by pods as they are created -- and enabling that storage to be accessible wherever that pod may be scheduled to run.

Rook can use several different sources for storage, but the one being used here is Ceph. The newly developed Rook Operator automates the installation of Rook and the configuration (and sometimes creation) of the the underlying storage platform.

Installation Process[edit]

The scripts and manifests required to install Rook and Ceph (if needed) are located in the Rook repository that should be cloned for local access:

cd <working directory>
git clone --single-branch --branch release-1.7 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph

NOTE Rook versions 1.5 and later have problems that make them unusable for interfacing with external Ceph clusters. NOTE Rook version 1.3 is the last known-good version for using external clusters, but has a bug in comparing versions of ceph clusters.

The installation process is divided into two parts: Installing the Rook Operator and then either installing a Ceph Cluster in the kubernetes cluster or connecting with an existing Ceph Cluster that has been installed previously.

Note that any scripts and manifests that are custom and/or not part of the rook distribution are included in the 'k8s-admin' repository in Gitlab.

Rook Operator[edit]

Creating the Rook Operator simply requires loading two manifests with no customization. First, set up the namespace and all the roles, bindings, and support definitions:

kubectl create -f crds.yaml -f common.yaml

Then create the rook operator itself and wait for it to settle into a Running state.

kubectl create -f operator.yaml

The Rook Operator manages the entire activity of the storage enterprise, so tailing the log in a separate window will be useful should any problems arise:

kubectl get pods -n rook-ceph
kubectl logs -f -n rook-ceph <pod name>

The beauty of the operator concept in kubernetes is that it is capable of accomplishing practically anything that can be done from the command line -- all in response to what it sees in the cluster configuration. In this case, the operator looks for the Custom Resource Definition (CRD) that defines a Ceph Cluster -- with the parameters set to define what that ceph cluster looks like. If it sees that a new cluster needs to be created, it will do all the steps needed to create and provision the ceph cluster as specified. If interfacing with an external ceph cluster is required, it takes the provided credentials and identifiers and connects to that cluster to get the storage service.

Creating an internal Ceph Cluster[edit]

The reliability of creating a ceph cluster within the kubernetes space has imporved considerably -- and given that Rook is obviously biased AGAINST using external clusters these days (see above), this is really the only option if you want to use Rook to provision storage for Kubernetes. That said ...

See the instructions in the Ceph Adding Disks page for how to prepare disks to be included in the cluster ... it isn't easy. One thing that needs to be done on the hosts where the disks reside is to install the lvm2 package (it may or may not be automatically installed ... but it needs to be there).

In the same directory as the operator manifest, there is a cluster.yaml that will create a ceph cluster within the kubernetes cluster, but it has one major problem: it is configured to simply grab all available storage from every node in the kubernetes cluster. Not always what you want ... so you need to copy the file into a local directory and modify the 'storage' section. The storage section for the development cluster is shown here as an example:

 storage: # cluster level storage configuration and selection
   useAllNodes: false
   useAllDevices: false
   #deviceFilter:
   config:
     # crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
     # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
     # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
     # journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
     # osdsPerDevice: "1" # this value can be overridden at the node or device level
     # encryptedDevice: "true" # the default value for this option is "false"
   # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
   # nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
   nodes:
   - name: "storage1"
     devices: # specific devices to use for storage can be specified for each node
     - name: "sdc"
     - name: "sdd"
     - name: "sde"
     - name: "sdf"
     - name: "sdg"
     #- name: "nvme01" # multiple osds can be created on high performance devices
     #  config:
     #    osdsPerDevice: "5"
     #- name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
     #config: # configuration can be specified at the node level which overrides the cluster level config
   - name: "controller"
     devices:
     - name: "sdb"

After editing a local copy of the cluster.yaml file, apply it like normal ...

kubectl apply -f cluster.yaml

It will take a while, it will create a zillion pods, but you'll end up with a cluster ... assuming that it liked your storage. If you have to try again to get storage to connect, all you need to do is slightly modify the cluster.yaml file and re-apply it. That will cause the operator to refresh the cluster and try again to assimilate the storage.

To get a ceph dashboard, you need to make sure it is enabled in the cluster.yaml file (it is by default) and install a service to make it accessible. multiple options are provided in the distribution directory, but the simplest (if you have an IP Controller installed is to use a loadBalancer:

kubectl create -f dashboard-loadbalancer.yaml

Since everything is contained in the kubernetes cluster, there is no external interface to the cluster for control/management ... so they provide the 'rook-toolbox' deployment that will allow you to exec into the resulting pod to get the familiar 'ceph' and other related utilities. This can stay alive as long as you want (forever), and comes in a 'rook-toolbox-job' variant to allow automation of activities. Installing the toolbox is as simple as:

kubectl create -f toolbox.yaml

... and then exec into it (or access through lens/k8dash or other means):

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

If you do not have storage on 3 or more hosts, you need to reset the default replication failureDomain in the cluster itself to keep ceph from complaining about it. Go into the rook toolbox and issue the same commands as for a standalone ceph cluster located here.

Now you can skip down to creating storage classes ... and other configuration activities.

Using Existing Ceph Cluster[edit]

One option provided by the Rook Operator is to interface with an already existing Ceph Cluster. Instructions for deploying a standalone Ceph cluster are included on a separate page; the operation of that cluster should be completely validated before attempting to connect it to Rook.

It should be noted that while earlier versions of Rook (1.3 and before) can successfully interface with external clusters, that capability seems to be lost in newer versions (1.5 and later). The configuration options are there, but from the log messages it is clear that the instructions and the reality of the rook operator are NOT in sync, and even forcing some configuration options did not allow a successful connection with an external cluster.

Before you think that you can just pull rook v1.3 out and use it to connect to an external ceph cluster, it seems to have a rather nasty bug where it (seriously) thinks that version 15.2.11 is older than version 15.2.8 -- making it unusable for newer versions of the Ceph Octopus clusters. Moving to Ceph Pacific may (temporarily) solve that problem, but I was not able to get a Pacific cluster to operate ... so ...

There is another package that will allow Kubernetes to use an external ceph cluster -- it's part of the ceph distribution called Ceph-CSI. This is a very much lighter deployment and serves the need very nicely.

External cluster instructions -- DEPRECATED, but retained for historical (hysterical) reasons in case Rook gets their act straight[edit]

Though not absolutely required, it is recommended to use a separate namespace for the external cluster. The default scripts and manifests assume you will do this, so it is easier to just go with the flow ... the namespace in the provided manifests is 'rook-ceph-external'. As with the rook operator, support roles, role bindings, and such need to be created along with the actual separate namespace:

kubectl create -f common-external.yaml

The Rook Cluster definition needs authentication and identification data about the external cluster; this is loaded into kubernetes secrets with standard names so that the operator and storage provisioner can access the external ceph cluster. The data can be obtained from the ceph master node:

Cluster FSID -- run this command and copy the results:

ceph fsid

Cluster Monitor address -- located in the /etc/ceph/ceph.conf file
Cluster admin secret -- located in the /etc/ceph/ceph.client.admin.keyring

For convenience, the set of commends has been copied to a script in the k8s-admin repository; multiple copies exist for the 'prod', 'dev' and 'test' clusters.

Put the information into environment variables at the top of the script as shown below, then run the script to create the secrets:

export NAMESPACE=rook-ceph-external
export ROOK_EXTERNAL_FSID=dc852252-bd6b-11ea-b7f2-503eaa02062c
export ROOK_EXTERNAL_CEPH_MON_DATA=storage1=10.1.0.9:6789
export ROOK_EXTERNAL_ADMIN_SECRET=AQAclf9e0ptoMBAAracpRwXomJ6LgiO6L8wqfw==
bash ./import-external-cluster.sh

Note that the above script adds too many secrets -- the operator tries to create them again when building the cluster interface -- and errors out since they can't be changed. We need to either edit the script to not create the excess secrets or delete the ones that aren't needed. For now, we will delete them. First find all the ones that are present in the new namespace:

kubectl get secret -n rook-ceph-external

These are the ones that need to be deleted (at least for now):

kubectl -n rook-ceph-external delete secret \
   rook-csi-cephfs-node rook-csi-cephfs-provisioner rook-csi-rbd-node rook-csi-rbd-provisioner

Watch the operator log as you create the cluster below to see if any additional secrets need to be deleted. Now we can create the cluster definition that the operator will use to create our interface:

kubectl create -f cluster-external-management.yaml

Storage Classes[edit]

Rook will allow us to create StorageClasses for the different APIs provided by Ceph. The most common one is for RBD (block storage), but you can also use Ceph Filesystems.

RBD Storage Class[edit]

Next we create the StorageClass that kubernetes will use to request RBDs (block images) from the ceph cluster. The starting point is the manifest csi/rbd/storageclass.yaml in the rook distribution. Copy it to a local directory and customize it as described below.

The manifest creates two resources: A CephPool resource that will be used to create a new pool (or use an existing pool) in the ceph cluster, and a StorageClass entry that will allow pods to request storage. Customize this file as follows:

change 'failureDomain' in the pool definition from 'host' to 'osd' ... this will enable replication within a host, which is necessary for a small cluster
change the 'name' in the pool definition and the corresponding 'pool' in the storage class to something descriptive -- this will be the name of the pool that is created in the ceph cluster.
set the namespace (if it isn't set already) to 'rook-ceph' to match whatever was used to create the cluster

When done, install the manifest:

kubectl create -f storageclass.yaml

At this point, the Storage Provisioner is ready.

Ceph Filesystem Storage Class[edit]

There are limitations on using Ceph Filesystems within rook ...

instructions TBD

Default Storage Class (Optional)[edit]

Not all manifests specify a storage class -- this can be especially problematic with helm charts that don't expose the opportunity to specify a storage class. Kubernetes has the concept of a default storage class that is widely used by the cloud providers to point to their preferred storage solution. While not required, specifying a Rook StorageClass as default can simplify 'automatic' deployments. In reality, all that needs to be done is to set a flag on the storage class that you want to be default ... but there's some prep work ...

First, identify all the defined storage classes:

kubectl get sc

All the storage classes will be shown -- if one of them is already defined as a 'default' class, it will have (default) after the name. If there is a default class identified (and it isn't yours) you need to turn off the default status for that class (i.e. setting a new default does NOT reset the old default):

kubectl patch storageclass <old-default> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

Then you can set your storageclass as the default:

kubectl patch storageclass <new-default> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Validate that it took by looking at the storageclasses again:

kubectl get sc

Your class should now be marked '(default)'.

(excerpted from https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/)

Testing[edit]

The rook repository has some test manifests to quickly validate the successful implementation of Rook: a Wordpress installation using two deployments.

Deploy the test application; each deployment requests its own Persistent Volume. The wordpress service is a LoadBalancer, so it will allocate an IP address that should give direct access to the wordpress instance. The test manifests are in the kubernetes directory:

cd rook/cluster/examples/kubernetes

kubectl create -f mysql.yaml
kubectl create -f wordpress.yaml

Maintenance & Updates[edit]

There are two specific cases that need to to be highlighted: updating the rook operator and updating Ceph

Updating Rook[edit]

Unless otherwise noted due to extenuating requirements, upgrades from one patch release of Rook to another are as simple as updating the common resources and the image of the Rook operator. For example, when Rook v1.6.2 is released, the process of updating from v1.6.0 is as simple as running the following:

First get the latest common resources manifests that contain the latest changes for Rook v1.6.

git clone --single-branch --depth=1 --branch v1.6.2 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph

If you have deployed the Rook Operator or the Ceph cluster into a different namespace than rook-ceph, see the Update common resources and CRDs section for instructions on how to change the default namespaces in common.yaml.

Then apply the latest changes from v1.6 and update the Rook Operator image.

kubectl apply -f common.yaml -f crds.yaml
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.6.2

As exemplified above, it is a good practice to update Rook-Ceph common resources from the example manifests before any update. The common resources and CRDs might not be updated with every release, but K8s will only apply updates to the ones that changed.

Watch the logs from the operator pod to ensure that nothing is amiss, but that should be it.

Updating Ceph[edit]

Updating the Ceph installation to the latest version only requires one change to the Rook deployment; update the image version in the 'cluster-external-management.yaml' file to match the new Ceph version and apply.

If you have to reinstall or rebuild the ceph deployment in an external cluster, more is required, as the identifying information and credentials for the ceph cluster will have changed.

As above in the installation section, get the necessary information from the ceph cluster and put it into the environment variables in the 'import-xxxx-cluster.sh' script. Before you run the script, however, you must delete a bunch of things that the script will create:

kubectl -n rook-ceph-external delete secret \
   rook-csi-cephfs-node rook-csi-cephfs-provisioner rook-csi-rbd-node rook-csi-rbd-provisioner
kubectl -n rook-ceph-external delete configmap rook-ceph-mon-endpoints

bash ./import-xxxx-cluster.sh

After the script is executed, watch the operator logs to see that it uses the new information to connect with the new ceph cluster.