ELK Stack — Elastic search, Logstash, Kibana

Sandun Dayananda
8 min readApr 13, 2024

--

ELK Stack — Image by author
ELK Stack — Image by author

Elastic, elastic serarch, kibana, logstash, elk stack, elastic db…may be you have heard these different words and still doesn’t make sense?? perhaps you might have following questions:

Is elasticsearch a database or a search engine..?

Why we call it elastic..?

Then what is ELK..?

Are Elastic and ELK same..?

Okay, I’m going to answer these doubts if you have them. Let’s go.

The usage of the ELK stack in the tech industry has seen widespread adoption due to its flexibility, scalability, and ease of use. From monitoring system logs and application performance to analyzing user behavior and trends, organizations leverage ELK to gain valuable insights that drive decision-making and improve operational efficiency.

As an example, let’s say you have a data center. In a datacenter you have hundreds of servers, routers, switches etc. All of them generate logs. You have to have a way to see and understand them. In this kind of scenario, we can use ELK stack to deal with those logs.

Image by author
Image by author

Then you will be able to see anomalies in network traffic, tracking user engagement on a website, or monitoring infrastructure performance in real-time… etc. ELK offers the tools necessary to handle these kind of diverse data challenges effectively.

In essence, the ELK stack has become an indispensable asset for organizations seeking to harness the power of their data to drive innovation, optimize processes, and stay ahead in today’s competitive tech landscape.

What is ELK Stack?

ELK stack is the combination of Logstash, Elasticsearch and Kibana . These 3 can be simply considered as tools that can be handled separately for transform, data store and visualise and . Let’s see what they do in detail.

Logstash:

Logstash serves as a versatile data processing pipeline, allowing users to collect, parse, and transform data from different sources before sending it to Elasticsearch.

  • Input/ transform/ stash (store in Elastic data store)

Here I should mention that Logstash usually work with something else called Beats. Beats are setup as agents on the servers. They send data from thousands(may be hundreds) of machines/systems/servers to logstash or directly to Elasticsearch.

Based on the type of the data there are several types of Beats. As an example if the source data is logs then we can use Filebeat. Following is an overview of different beats based on source data types.

Image from docs.elastic.co
Image from docs.elastic.co

Okay, here I should clarify some scenarios related to beats and logstash.

  • In generally beats and logstash are installed on the system where the data source reside(server side)
  • When we use beats together with logstash first beats collect and send data to logstash and logstash act as a processing stage(pipeline) for those data. It has a lot of input, filter and output plugins for process the data coming from beats.
  • Beats can work alone without logstash. It can directly send data to elasticsearch. This can be useful when we don’t need to do additional/ advanced processing or transformations to data before we sent to elasticsearch.
  • Logstash can work without beats. It can receive input data from message queue syatems like RabbitMQ and Kafka. Then it can do the processing and transformation and send to elasticsearch.

Following is a simple logstash transformation in order to make you understand it. Sure, let’s consider a scenario where you have a web server that’s generating log files, and you want to use Logstash to parse these logs and send them to Elasticsearch for analysis. The log entries might look something like this:

127.0.0.1 - - [12/Apr/2024:09:22:44 +0200] "GET / HTTP/1.1" 200 2326

This is a standard format for web server logs(There is a lot of other formats out there), known as the Common Log Format. Each part of the log entry provides useful information, such as the client’s IP address, the timestamp of the request, the HTTP method, the requested resource, the HTTP version, the response status code, and the size of the response in bytes.

Here’s a simple Logstash configuration that uses the grok filter plugin to parse these log entries:

input {
file {
path => "/path/to/your/logfile.log"
start_position => "beginning"
}
}

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
}
}

If your Elasticsearch instance is running on a different machine, you would replace localhost with the IP address or hostname of that machine.

In this configuration:

  • The file input plugin is used to read from the log file.
  • The grok filter plugin is used to parse the log entries. The COMBINEDAPACHELOG pattern is a predefined pattern that matches the Common Log Format.
  • The date filter plugin is used to parse the timestamp from the log entry and use it as the timestamp for the event in Elasticsearch. The original timestamp field is removed since it’s no longer needed.
  • The elasticsearch output plugin is used to send the processed events to Elasticsearch.

Elasticsearch:

Elasticsearch is a distributed, JSON-based search and analytics engine, not a traditional database. It is designed to handle large volumes of data, can scale automatically(elastic capability), and is capable of continuously receiving and processing data. It is capable of quickly and efficiently indexing and querying large volumes of data

Elasticsearch is built on the top of Apache Lucene an opensource Java library. That has powerful indexing and search features.

Characteristics:

  1. NoSQL, JSON based data store
  2. RESTFUL(interactions with data is made via REST URLs)
  3. Use cases — data can be logs, matrices generated from different systems, Application trace data

Scalability and resilience: clusters, nodes, and shards:

When we use elastic search, we should understand following key terms to understand the basic of elasticsearch.

  1. Node: Think of a node as a worker in an office. It’s a server that holds data and does the work of searching and indexing.
  2. Cluster: A cluster is like the entire office itself. It’s a group of nodes (workers) that work together to get the job done. You can add or remove workers (nodes) as needed.
  3. Index: An index is like a project in the office. It’s a collection of documents that are somewhat related to each other. Each project (index) is divided into smaller tasks (shards).
  4. Shard: A shard is like a task in a project. It’s a self-contained piece of the index. Tasks (shards) are spread out among workers (nodes) for efficiency and backup. There are two types of tasks (shards):
  • Primary Shard: This is the original task where the work first gets done.
  • Replica Shard: This is a copy of a task. It’s like having a backup worker who can step in if the original worker is unavailable, and can also help with the workload.

Remember, the number of tasks (primary shards) is set when a project (index) starts, but the number of backup tasks (replica shards) can be changed anytime. This setup ensures that the office (cluster) runs smoothly, can handle more work when needed, and doesn’t stop working if a worker (node) is unavailable.

The data ingested into elasticsearch should be gathered around an index template. Index template is defined with an index pattern according to the names of the incoming source data.

Also, We can define lifecycle policies for each index templates in order to affect to the gathered indices(several indexes) around each template. There are several layers for lifecycle:

Image from www.elastic.co
Image from www.elastic.co

Also, we have the facility to mount the frozen tier in AWS S3. This helps to reduce the costs. It is also queryable. That is another advantage.

Elasticsearch provides a full Query Domain Specific Language (DSL) based on JSON to define queries(It has several other methods too). Following is an example how we can query elasticsearch.

This example assumes that you have an index named blog that contains blog post documents. Each document has fields like title, content, author, and tags.

Let’s say you want to find all blog posts written by the author “Sandun W” that contain the word “Elasticsearch” in the title. Here’s how you can do it using Query DSL:

GET /blog/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Elasticsearch" }},
{ "match": { "author": "Sandun W" }}
]
}
}
}

In this example, we’re using a bool query that combines multiple query clauses. The must clause is an array of queries that must match for a document to be included in the results. We have two match queries inside the must clause: one for the title field and one for the author field.

This query will return all blog posts where the title field contains the word “Elasticsearch” and the author field is exactly “John Doe”.

Kibana:

We use Kibana to create visualizations based on data stored in Elasticsearch

  • UI Dashboard
  • Visualization of data

Kibana is a dynamic and versatile data visualization tool that is part of the Elastic Stack, alongside Elasticsearch and Logstash. It provides a user-friendly interface for exploring, visualizing, and managing the data residing in Elasticsearch indices.

Kibana’s primary function is to provide an easy-to-use platform for creating real-time, interactive dashboards. These dashboards can include various types of visualizations such as bar charts, line graphs, pie charts, maps, and more. Each visualization is customizable, allowing users to drill down into specific aspects of their data.

In addition to data visualization, Kibana also offers features for data exploration and discovery. With its powerful search capabilities, users can perform complex queries on their data in Elasticsearch. This makes it an invaluable tool for data analysis and business intelligence.

Kibana also includes tools for managing Elasticsearch indices, such as the Index Management and Dev Tools interfaces. These features make it easier to interact with your Elasticsearch data and perform administrative tasks. Whether you’re a data analyst, a developer, or an IT administrator, Kibana has something to offer you.

I believe now you have answers for your doubts. Also, above mentioned examples are simple occasions that is taken to explain how they works.

If you think you learned something from this, follow me to read more useful articles like this. 📚

--

--

Sandun Dayananda

Big Data Engineer with passion for Machine Learning and DevOps | MSc Industrial Analytics at Uppsala University