How to Configure Your Elasticsearch for Better Performance? from Emily Hill's blog

Full-text search is the most common requirement, and open-source Elasticsearch (hereafter referred to as Elastic) is currently the first choice for full-text search engines.


Elasticsearch is a RESTful-style distributed search and data analysis engine. You can quickly store, search, and analyze large amounts of data. Wikipedia, Stack Overflow, and Github all handle it.


The lowest level of Elastic is Lucene, an open-source library. However, you cannot use Lucene directly. You need to write your own code to call the interface. Elastic is a Lucene encapsulation that provides an interface to the REST API that you can use out-of-the-box.


  • Query:Elasticsearch allows you to perform and merge multiple types of searches. You can freely change the structured, unstructured, geographical location, and metric search method.


  • Analysis:Finding the 10 best documents for your query is one thing. But when faced with billions of copies of logs, how do you interpret them? Elasticsearch aggregation allows you to see the big picture and see trends and patterns in your data.


Near Real-Time (NRT) Elasticsearch is a near real-time search platform. This means that there is only a slight delay (usually 1 second) from indexing a document to searching.


A cluster is a collection of one or more nodes (servers) that collectively stores all data and provide common indexing and retrieval capabilities for all nodes. The cluster is identified by a unique name, and the unique identifier name is "elasticsearch" by default.


This name is important because if a node joins the cluster by name,  the node can only be part of the cluster. Do not use the same cluster name in different environments.


If not used, the node can join the wrong cluster. Configuring elasticsearch will not be that hectic if you are part of the best elasticsearch course.


A nodeis a distinct server that is a portion of a cluster, stores data, and participates in cluster indexing and retrieval skills. Like the cluster, the nodes are identified by name.


If you don't want to use the default node name, you can define any node name. This name is important for management because it determines which server on your network corresponds to which node in your Elasticsearch cluster.


You can add a note to a specific cluster by setting the cluster name. By default, each node is configured to join a cluster called elasticsearch.


So, assuming you start many nodes and they can find each other, they will automatically form and join a cluster called "elasticsearch". If the network is not currently running a node, by default booting the node at that point uses a single node cluster called elasticsearch.


Index (database)is an index that is a collection of documents with certain similar properties. For example, you can create a customer data index, a product catalog index, and an order data index.


Therefore, the top-level unit of Elastic data management is called an index. This is equivalent to a single database. The name of each index (such as a database) must be in lowercase. Indexing a document (verb) means storing the document in an index (noun) for search and reference.


A single record in the document index is called a document. Many documents form an index. The document is expressed in JSON format. You can group types of documents.


For example,  the weather index can be grouped by city (Beijing and Shanghai) or climate (sunny and rainy). This grouping, called type, is a virtual logical grouping used to filter documents.


Different types must have a similar structure (scheme). For example, the id field cannot be a string in this group or a value in another group. This is different from relational database tables.


Data with completely different properties (such as products and protocols) should be stored in one index as two indexes instead of two types (although this is possible). Shards and replica indexes can store large amounts of data that can exceed the hardware limits of a single node.


For example, an index of a billion documents that occupy 1 TB of space may not fit on a single node, or it may be too slow to handle search requests from a single node alone. To solve this problem, elasticsearch trainingprovides the ability to split an index into multiple shards (or shards).


When creating an index, you can easily define the number of shards you need. Each shard itself is a completely independent "index" and can be on any node in the cluster. Sharding is significant for two major explanations.

  1. You can split/expand the volume of content horizontally.


  1. This allows parallelization of sharding (possibly multiple nodes) and operations, improving performance and throughput.


There are two main reasons why replication is important.

  1. Provides high availability in the event of a shard/node failure. For this reason, it is important that the replica is not assigned to the same node as the original shard it is replicating. That is, the copy is placed on a different node.


  1. Search can be performed in parallel on all replicas to increase search volume/throughput. Overall, each index can be split into several fragments. The index can also be copied zero (that is, no copy) or multiple times. After copying, each index has a primary shard and a replica shard. When you create an index, you can define the number of shards and replicas for each index. You can dynamically change the number of replicas at any time after indexing, but you cannot change the number of fragments later.


Previous post     
     Blog home

The Wall

No comments
You need to sign in to comment

Post

By Emily Hill
Added Dec 30 '21

Tags

Rate

Your rate:
Total: (0 rates)

Archives