Understanding Elasticsearch Cluster, Node, Index and Document using example.
OverviewIn the previous post we saw how to perform basic operations in Elasticsearch. In this post we understand the Elasticsearch Cluster, Node, Index and Document using example.
Lets BeginFor this tutorial you will need to install elasticsearch head plugin. Follow the steps mentioned in Elasticsearch Head Plugin installation
Start the elasticsearch and go to http://localhost:9200/_plugin/head/
Currently we have not indexed any data so no nodes are shown.
We will create a new index and insert a record.
Now go to the url http://localhost:9200/_plugin/head/
what we can observe here is-
- A cluster named elasticsearch is created.
- Index named employees is created.
- A node named Krista Marwan is created.
- 5 shards are shown as assigned and 5 as unassigned.
- Status of cluster is yellow
Next we configure the cluster name and node name. For this go to elasticsearch/config folder and modify the elasticsearch.yml as follows-
cluster.name: cluster1 node.name: nodeRestart the elasticsearch, insert the above record again and go to url http://localhost:9200/_plugin/head/
So a cluster named cluster has been created with a single node named node.
By default the elastic search creates 5 primary shards and 5 replica shards. But a single node can by default hold only 5 shards so the 5 replica shards could not be created and so they are shown as unassigned. Due to this the cluster health is shown as yellow. The cluster health rule is as follows-
|RED Some or all of primary shards are not ready.|
|YELLOW: Some or all of replica shards have not been allocated to any node.|
|GREEN: All the shards including primary and replica are ready and allocated to any node.|
Next without closing this instance of elasticsearch, start another elastic search in a similar way.
So now we will have two nodes in the cluster. So the 10 shards(5 primary and 5 replicas) can now be distributed among these nodes. So the health of the cluster will now be green.
We can see here that all the primary shards are in one node and the replica shards in another.
Lets now have a look at the theoretical definitions-
- Cluster is a collection of one or more nodes (servers) that together holds your entire data and provides indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
- Node is a single server that is part of the cluster. It stores the data and participates in the clusters indexing and search capabilities.
- Index is like a ‘database’ in a relational database. It has a mapping which defines multiple types.
An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.
MySQL => Databases
ElasticSearch => Indices
- Document is similar to a row in relational databases. The difference is that each document in an index can have
a different structure (fields), but should have same data type for common fields.
MySQL => Databases => Tables => Columns/Rows
ElasticSearch => Indices => Types => Documents with Properties
- Type is a logical category/partition of index whose semantics is completely upto the user.