Building custom search architecture for your site using ELK: Elasticsearch

0 Comment

According to Ishtar specific requirements as an art selling website, we need custom architecture which enables search operations on the site. We adopt ELK stack to implement our needs. Here’s how we made it possible.

What is ELK Stack?

The ELK is a stack comprised of four free products; Elasticsearch, Logstash, Kibana and Beats. And, It also contains additional paid plugins.

The definitions of free products are:

  1. Logstash: main Input interface. It’s not alone.
  2. Beats: another Input interface for The star of the show.
  3. Elasticsearch: the Search Engine.
  4. Kibana: sits on top of stack. It provides an interface to communicate with the search engine.

Why ELK stack?

Suppose you want a good search engine, now SQL isn’t fast enough because the queries takes a lot of time to execute, this is where NoSQL Comes to place, basically you can use SQL for Storing data, especially security sensitive data, and use NoSQL for Search Operation because of the high speed of queries. This leads you to a setup similar to the setup we are using, SQL for user tables, painting tables, and NoSQL for Log data and Log Analysis. 

Now NoSQL Comes in many flavors based on your preferences for the Platform, if you are lucky enough to customize your own workflow using Docker and Kubernetes, you would have the option to choose from an array of providers, many of which are open-source. 

However the big players right now are MongoDB (the basis for many other engines, known for Atlas for SmartPhone Integration), DynamoDB (for AWS Users) Cassandra (Google Document Based NoSQL Engine) and ElasticSearch Engine (By Elastic). 

The best option for you varies on what kind of operations you want the DB to handle and where you want to host it. 

But since we aim for text-search optimization out-of-the-box with the ability to use Docker and Cubernaties most likely you would end up with elastic search which serves as a part of the larger group called ELK provided kindly by Elastic the Company.

When used together, the components of the ELK stack give you the ability to aggregate logs from all your systems. Not only analyze them for problems but also monitor system use and find opportunities for improvement. The data analysis and visualization ELK provides can’t be beat.  

Installing Elasticsearch and Kibana

Requirements

  1. Install Java SDK which called JDK. We used version 12 in this project. (Especially Important for Kafka, Choose the Version Wisely).
  2. Install Node.JS for Kibana. Elasticsearch doesn’t need it.
  3. Download Elasticsearch zip from here. It’s open source and cross platform. We use version 12 in this project.
  4. Download Kibana zip from here. It’s not a requirement but it provides a cool interface to execute the code at Elasticsearch engine.
  5. Enable JDK_HOME in your paths file if your OS is windows because this is the way the system knows where JDK is.

The Engine

To start the engine follow these steps carefully:

  1. Decompress the Zip files. There isn’t a specific directory needed.
  2. After decompression, go to ./bin/elasticsearch.bat
  3. Open Kibana directory then start ./bin/kibana.bat
  4. This may take a while because of Node. Don’t panic.
  5. Kibana will show a message contains localhost:<some port>. The port is 5601 generally. Open it in your browser.
  6. Good work. Kibana is ready. Open dev tools from sidebar.
  7. Start writing code to interact with Elasticsearch.

Basic Command Structure in Kibana

<REST Verb> <Node>/<API Name>

{

<Additional Data in JSON Format>

}

Example

GET /bank/_search

{

“query”:{

}

}

CRUD operations with Search Engine

Create Data

The Bulk API of the Elasticsearch

Example Data:

JSON Example File: reqs_json.json

Commands

Don’t Forget the Body


Hit Send and it’s done.

Create Data From Kibana Console

Write the following Command Post _bulk <Json Data> BUT Don’t Forget to add a new line to the end of JSON as it will give you an error.

Another Tip, Specifying a Type is no longer allowed, so keep that in mind searching tutorials online.

Load Actual JSON File (For Shell (Linux) users only)

curl -H ‘Content-Type: application/x-ndjson’ -XPOST ‘localhost:9200/bank/account/_bulk?pretty’ –data-binary @accounts.json

But how Can I Check if it’s there?

Read Data

Get Request

This command is quite simple. It’s: Get /bank 

Enter this in Kibana and you’ll see a response representing the structure of the data. In order to see some actual data we should use Search API.

Read Index

Using Bank data from now on, make sure you post it using Postman copying the content of accounts.json into the body of the Request.

In Kitana use the command: GET /_cat/indices

This command shows the Indexes of all the data we entered. We can use the names of any index from here in the search API.

Search API

The basic terminology goes like this: GET /bank/_search {<query data>}

So:

1- First notice the _search which is used to indicate which API we are using.

2- Second this <query data> and it’s used with many verbs

3- Third, this commands are executed in Kibana.

Note: GET is case sensitive, always use it in CAPS.

match

It goes like this in Kitana:

GET /bank/_search

“query”: {

“match”: {

“age”:32

}

}

}

Inside the “match” JSON we specify the fields and the expected values.

match_all

Shows everything the “Database” has in this index.

GET /bank/_search

{

“query”: {

“match_all”: {}

}

}

Or you can use GET /bank/_search for short.

Queries With multiple Matches

This example illustrates the concept:

Notice that “bank” in “GET /bank/_search”  here serves as the index in which we are searching.

GET /bank/_search

{

  “query”: {

    “bool”: {

      “must”: [

        {

          “match”: {

            “city”: “Brogan”

          }

        },{

          “match”: {

            “state”: “IL”

          }

        }

      ]

    }

  }

}

So, we used match Term to replicate the = sign or equals in other SQL, PHP based operations.

In addition we used the must term to indicate the importance of this search terms.

More Complex Queries

Let’s say you want the accounts’ list in CA state and you would like to boost men Named Smith in your search result. This is how you do it using internal queries.

GET /bank/_search

{

  “query”: {

    “bool”: {

      “should”: [

        {

          “match”: {

            “state”: “CA”

          }

        },{

          “match”: {

            “lastname”: {

              “query”: “Smith”, “boost”: 3

            }

          }

        }

      ]

    }

  }

}

In our case there was the following snippet of the result:

Notice that there was somebody here who doesn’t live in CA, and that’s the difference between must and should, notice also that I used insider query in the command I sent. This will be explained in more details later.

But the interesting thing is the way Elastic search the relevant of the item regarding the query, notice in the boost I used number 3 which counts as 3 times more important than the state term.

Terms Query

Using the term “Query” Proved to be Essential at following the tutorials provided here. That’s why we’re going to explain it now:

The thing works like this:

GET bank/account/_search

{

  “query”: {

    “term”: {

      “account_number”: 516

    }

  }

}

Notice. We used term to ask for something similar to a match query, BUT be careful how you use this as a query (e.g. Text Based Search using term). When you want to use “term” use it for numeric fields. And be super specific in order to give you the result you want.

GET bank/account/_search

{

  “query”: {

    “term”: {

      “state”: “RI”

    }

  }

}

As a result, the basic Rule use term Queries for numerical values.

More than & Less Than Queries

If you want to get the accounts from 512 to 600, you will use the following format:

GET /bank/_search

{

  “query”: {

    “range”: {

      “account_number”: {

        “gte”: 512,

        “lte”: 600

      }

    }

  }

}

There are a couple of things here:

  1. gte stands for Greater Than or Equal to.
  2. lte stands for Less than or Equal to.

And we used range instead of match here.

Aggregations of Queries

This is used to add, count and so on…

Count Based on Field

Let’s say we want to know the number of accounts for every state. That would be the following:

GET /bank/_search 

{

  “size”: 0,

  “aggs”: {

    “states”: {

      “terms”: {

        “field”: “state.keyword”

      }

    }

  }

}

That seems to be hard. Let’s walk through it: 

  1. the size and we use this to tell elastic search that we don’t want the results, just the aggregations back.
  2. aggs this is the way we tell Elasticsearch to count data.
  3. terms, it works in a similar way to match, just another way of writing the query. we will discuss this later.

Averaging Data

Goal: Getting the average balance per state

Code:

GET bank/_search

{

  “size”: 0,

  “aggs”: {

    “states”: {

      “terms”: {

        “field”: “state.keyword”

      },

      “aggs”: {

        “avg_bal”: {

          “avg”: {

            “field”: “balance”

          }

        }

      }

    }

  }

}

So, size means we just want the aggregations only. states is the name of aggregation. It works the same way AS does in SQL. terms is Explained above, but in a nutshell it operates like match with more specification and using for numeric values.

aggs inside after the terms is used to calculate the average balance, how? Here’s how it goes. First avg_bal is just the name of the field, works like AS in SQL, avg is a tag used to tell the search engine of the aggregation type, next we just added the name of the field.

For full list of aggregations go to:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html.

Filtering Aggregations

We can use the match in the query as follows:

GET bank/account/_search

{

  “size”: 0,

  “query”: {

    “match”: {“state.keyword”: “CA”}

  },

  “aggs”: {

    “over35”:{

      “filter”: {

        “range”: {“age”: {“gt”: 35}}

      },

    “aggs”: {“avg_bal”: {“avg”: {“field”: “balance”} }}

    }

  }

}

Notice that the main filter here is living inside match with a field name of state.keyword. This is acquired usually by seeing the result node before filtering it. You can also filter using gt of this query, so this filter the balance average of the persons over 35 gt = greater than and gt!= gte so keep that in mind.

Finish

These are enough commands, as for update and delete I’m not going to discuss it here for these reasons:

  1. It’s not usually good to manipulate logs from a client side connection.
  2. It should be assigned as an `outlier` by the search engine itself
  3. Since the Log is time restricted *-it is associated with time stamp-*  you can’t really update the values.

You can see the code on Yes-Soft Github.

Cat : ISHTAR
Tags: ,

User Avatar
Yes Soft ( Yes Soft )

There Is No Biography


User Post Count: 14
User Profile Link:
« »

Comments Number : No Comments


Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2020 Yes Soft - All right reserved.
Implemented By Yes Soft Team.
WhatsApp chat