With the following changes a rolling restart will take 2.5 hours instead of a day (or even more).

Optimisations to reduce the downtime

The following changes that I've just performed should reduce the downtime of the cluster on a restart:

curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
 "cluster.routing.allocation.node_concurrent_recoveries": 20,
 "indices.recovery.max_bytes_per_sec": "100mb",
 "indices.recovery.concurrent_streams": 5,
 "cluster.routing.allocation.node_initial_primaries_recoveries": 30
 }
}'
Default values (highlighted below) are optimised for amazon EC2 and not for physical hardware:

[2015-02-16 11:41:53,833][INFO ][cluster.routing.allocation.decider] [dashb-ai-661] updating [cluster.routing.allocation.node_concurrent_recoveries] from [2] to [20]

[2015-02-16 11:41:53,834][INFO ][indices.recovery ] [dashb-ai-661] updating [indices.recovery.max_bytes_per_sec] from [20mb] to [100mb]

[2015-02-16 11:41:53,834][INFO ][indices.recovery ] [dashb-ai-661] updating [indices.recovery.concurrent_streams] from [3] to [5]

Update: These configurations added to the cluster config in Hiera. Also set cluster.routing.allocation.node_initial_primaries_recoveries: 30. It is usually said that it can be set bigger than node_concurrent_recoveries (default 4 vs 2) but we already have an optimistic concurrent_recoveries.

Optimisations when doing a rolling restart on the cluster

When doing an upgrade, first upgrade the search nodes and the master nodes one by one and restart the elasticsearch service. Before restarting the data nodes one-by-one do the following:

curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable": "new_primaries"}}'
then restart the node and once the elasticsearch service is up and registered in the cluster, put it back to false
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable": "all"}}'

Edit: Disable_allocation is deprecated. It's marginally better to use cluster.routing.allocation.enable. New indexes can be saved in the cluster when it's set to "new primaries" but no existing shards will reallocate. New documents still can't be indexed in an existing index while the primary shard of the index is on the restarting node, but the node restart should be quick.

Do the same thing on the next node, and on the next one, and on the next one, and so on..

Optimise number of shards for small indices

To get the templates:

curl -XGET localhost:9200/_template/?pretty

The default number of shards in our production cluster is 8. If you know in advance that you will have small indices, it is really an overhead to have 8 shards replicated by the replication factor. An example is the mig_status_template that is around 159KB per day

"mig_status_template" : {
"order" : 0,
"template" : "mig_status*",
"settings" : {
 },
"mappings" : {
"mig_status" : {
       "dynamic_templates" : [ {
         "string_not_analyzed" : {
           "mapping" : {
             "index" : "not_analyzed",
             "type" : "string"
           },
           "match" : "*",
           "match_mapping_type" : "string"
         }
       } ],
       "_timestamp" : {
         "enabled" : true,
         "path" : "ts"
       },
       "_ttl" : {
         "enabled" : true
       },
       "properties" : {
         "port" : {
           "type" : "integer"
         },
         "ts" : {
           "type" : "date"
         }
       }
     }
   },
   "aliases" : { }
We see that this template matches all indices starting with mig_status*. Let's assume that we want to change the shards from 8 to 1 for the newly created mig_status* indices. First we have to delete the template:
curl -XDELETE localhost:9200/_template/mig_status_template
And now we recreate it again with "index.number_of_shards" : "1" in its settings:
curl -XPUT localhost:9200/_template/mig_status_template -d '
{
    "order" : 0,
    "template" : "mig_status*",
    "settings" : {
      "index.number_of_shards" : "1"
    },
    "mappings" : {
      "mig_status" : {
        "dynamic_templates" : [ {
          "string_not_analyzed" : {
            "mapping" : {
              "index" : "not_analyzed",
              "type" : "string"
            },
            "match" : "*",
            "match_mapping_type" : "string"
          }
        } ],
        "_timestamp" : {
          "enabled" : true,
          "path" : "ts"
        },
        "_ttl" : {
          "enabled" : true
        },
        "properties" : {
          "port" : {
            "type" : "integer"
          },
          "ts" : {
            "type" : "date"
          }
        }
      }
}
'

Update: The templates are now saved in git it-puppet-hostgroup-dashboard/code/files/elasticsearch/templates and managed by puppet.

Backup

We have requested an nfs volume to use for the backup of the cluster. There is a puppet module that takes care of configuring it. It takes parameters from hiera, and at the moment, it is only configured for the development cluster:

elasticsearch_backup:
  qa:
    remote_host: dashb-es-backup
    remote_dir: /export/itsdc-dev
    mnt: /mnt/es_backup

WARNING For the nfs to work, the userid of elasticsearch should be the same in all the nodes! This was not the case in the development cluster.

Once the volume is mounted, the next step is to add it to elasticsearch:

curl -XPUT 'http://localhost:9200/_snapshot/my_es_backup' -d '{"type":"fs", "settings":{"location":"/mnt/es_backup/es_snapshosts", "compress": true}}'

Now, we are ready to create snapshots with

curl -XPUT "localhost:9200/_snapshot/my_es_backup/snapshot_1_apr_2014?wait_for_completion=true"

Fore more info, see the documentation

References

http://gibrown.com/2013/12/05/managing-elasticsearch-cluster-restart-time/

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#circuit-breaker

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2015-06-08 - EdwardKaravakis
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback