Editing
Cluster Operations
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Rolling Restart == The OpenSearch cluster is very resilient -- it tries to make sure it has all of its indices appropriately replicated at all times. It can handle nodes dropping out and being added -- but sometimes we don't want that. If nodes are simply restarted (to change settings) or the system is rebooted (for patching), we don't want to have OpenSearch move all its indices around. This process works for a whole cluster restart (not recommended unless absolutely necessary) or individual node restarts. If individual nodes are being restarted, only do one node at a time to avoid cluster panic should all shards of an index go offline on the nodes that get restarted. The cluster '''''should''''' recover, but why stress it unnecessarily. The process is described in the [https://www.elastic.co/guide/en/elasticsearch/reference/current/restart-cluster.html Elastic documentation], and summarized below. === Disable Shard Allocation === When you shut down a data node, the allocation process waits for <code>index.unassigned.node_left.delayed_timeout</code> (by default, one minute) before starting to replicate the shards on that node to other nodes in the cluster, which can involve a lot of I/O. Since the node is shortly going to be restarted, this I/O is unnecessary. You can avoid racing the clock by disabling allocation of replicas before shutting down data nodes: curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' { "persistent": { "cluster.routing.allocation.enable": "primaries" } }' === Flush Cache === This will prevent data loss by flushing any in-process activities to disk: curl -X POST "localhost:9200/_flush/synced?pretty" === Shut Down/Restart Nodes === Now that things have settled out, we can stop/restart the opensearch services. Note that a full cluster shutdown should be avoided, since any activity on the cluster as it is going down or coming up could cause errors to the user. Use the systemd commands to stop/start/restart the nodes: # do this on all nodes sudo systemctl stop opensearch ... do required activity on all nodes ... sudo systemctl start opensearch ... or ... # restart a single node sudo systemctl restart opensearch If it was a complete cluster shutdown, all nodes must rejoin the cluster before proceeding. In either case, wait until the cluster (or node) achieves a "YELLOW" state before proceeding ... this means that all primary shards are accounted for, but replicas may not be. === Enable Shard Allocation === As each node re-starts, it tries to identify which shards it had before. Unless something has happened to disrupt the node's data, it will simply reactivate those shards (most likely as replica shards if it is a single-node restart). To enable the cluster to get back to normal shard management, we remove the block we set above: curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' { "persistent": { "cluster.routing.allocation.enable": null } }' === Follow Through === When doing any activity that disrupts the cluster operations enough to cause it to drop into a "YELLOW" or "RED" state, wait until the cluster is "GREEN" again before proceeding. In the case of a full rolling restart (one node at a time), the cluster should be "GREEN" before proceeding to the next node. This process (node rolling restart, anyway) is captured in the script <code>node-restart.sh</code>: #!/bin/bash # # perform a soft restart of a node, minimizing the reindexing that would # normally occur when a node drops/inserts # set to one of the master nodes MASTER=master-node # replace with either the real admin password or a cert/key combination AUTH="-u admin:admin" # first shut off shard allocation curl -XPUT -H "Content-Type: application/json" <nowiki>https://$MASTER:9200/_cluster/settings</nowiki> $AUTH -d ' { "persistent": { "cluster.routing.allocation.enable": "primaries" } }' echo # now flush the cache curl -XPOST <nowiki>https://$MASTER:9200/_flush</nowiki> $AUTH echo # restart the opensearch process sudo systemctl restart opensearch # re-enable shard allocation curl -XPUT -H "Content-Type: application/json" <nowiki>https://$MASTER:9200/_cluster/settings</nowiki> $AUTH -d ' { "persistent": { "cluster.routing.allocation.enable": null } }' echo This script is intended to be run on the node being restarted.
Summary:
Please note that all contributions to WilliamsNet Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
WilliamsNet Wiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Navigation
Commons
Architecture
How-To
Systems
Hardware
SysAdmin
Kubernetes
OpenSearch
Special
Pages to create
All pages
Recent changes
Random page
Help about MediaWiki
Formatting Help
Tools
What links here
Related changes
Special pages
Page information