Working With Elasticsearch in Go

Elasticsearch is one of the most popular highly scalable full-text search engines on the market. It is based on Lucene engine and allows you to store, search, and analyze big volumes of data quickly and in near real time. It has rich REST API and clients for most popular programming languages including Go. In this post I’ll demonstrate on example how to index and search for application logs using Go.

Installing Elasticsearch

If you haven’t set it up already, installation of Elasticsearch is as easy as downloading it (from here) and running the executable file. I installed it on localhost and will use it in my examples.

When installation is over, let’s test that Elasticsearch is up and running:

curl http://localhost:9200

The response should be similar to:

{
  "name" : "Specialist",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.3.4",
    "build_hash" : "e455fd0c13dceca8dbbdbb1665d068ae55dabe3f",
    "build_timestamp" : "2016-06-30T11:24:31Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.0"
  },
  "tagline" : "You Know, for Search"
}
Quick intro to Elasticsearch

In nutshell, Elasticsearh is a distributed document storage. As a storage it allows you to execute CRUD (create, read, update, delete) operations on documents, but, what is more important, Elasticsearh lets you efficiently search for documents it stores.

So what is a document? Probably you’ve already heard that Elasticsearch documents have fields and are represented by json objects. This is correct, but only half-truth. Under the hood Elasticsearch flattens json objects, so a document is internally represented by a map from fields to their values. For example, the json document below

{
  "name" : "Zach",
  "car" : [
    {
      "make" : "Saturn",
      "model" : "SL"
    },
    {
      "make" : "Subaru",
      "model" : "Imprezza"
    }
  ]
}

will be represented internally by the following mapping

{
  "name" : "Zach",
  "car.make" : ["Saturn", "Subaru"]
  "car.model" : ["SL", "Imprezza"]
}

All document fields have type. Elasticsearch supports simple data types like string, numeric, date, boolean, binary as well as complex data types. String type is special though, because string fields can be analyzed. When a field is analyzed, Elasticsearch will allow you to find a document not only by the whole string value, but by a part of it. E.g, if you index the following document

{
  "name" : "Zach Morrison",
  "company" : "Google"
}

and have configured name field to be analyzed, then you can find this document by searching for "name" : "Morrison" or "name" : "Zack".

Elasticsearch stores documents in an index, which is implemented as inverted index. Simply speaking, inverted index is a data structure representing a map from field value to collection of documents having the field with that value. There are usually multiple indices in Elasticsearch, and similar documents are stored in the same index. E.g. a booking app could store hotels in hotels index and plane flights in flights one.

Example of working with Elasticsearch in Go

In an example below, I’ll show how Elasticsearch can be used for storing and searching application logs, which is one of the most popular Elasticsearch use cases. In the example we will create an index, store some log messages in it and find the messages using Query API.

We will utilise Elasticsearch Go client, which is Elasticsearch client written in Go.

package main

import (
	"errors"
	"fmt"
	"gopkg.in/olivere/elastic.v3"
	"reflect"
	"time"
)

const (
	indexName    = "applications"
	docType      = "log"
	appName      = "myApp"
	indexMapping = `{
						"mappings" : {
							"log" : {
								"properties" : {
									"app" : { "type" : "string", "index" : "not_analyzed" },
									"message" : { "type" : "string", "index" : "not_analyzed" },
									"time" : { "type" : "date" }
								}
							}
						}
					}`
)

type Log struct {
	App     string    `json:"app"`
	Message string    `json:"message"`
	Time    time.Time `json:"time"`
}

func main() {
	client, err := elastic.NewClient(elastic.SetURL("http://localhost:9200"))
	if err != nil {
		panic(err)
	}

	err = createIndexWithLogsIfDoesNotExist(client)
	if err != nil {
		panic(err)
	}

	err = findAndPrintAppLogs(client)
	if err != nil {
		panic(err)
	}
}

func createIndexWithLogsIfDoesNotExist(client *elastic.Client) error {
	exists, err := client.IndexExists(indexName).Do()
	if err != nil {
		return err
	}

	if exists {
		return nil
	}

	res, err := client.CreateIndex(indexName).
		Body(indexMapping).
		Do()

	if err != nil {
		return err
	}
	if !res.Acknowledged {
		return errors.New("CreateIndex was not acknowledged. Check that timeout value is correct.")
	}

	return addLogsToIndex(client)
}

func addLogsToIndex(client *elastic.Client) error {
	for i := 0; i < 10; i++ {
		l := Log{
			App:     "myApp",
			Message: fmt.Sprintf("message %d", i),
			Time:    time.Now(),
		}

		_, err := client.Index().
			Index(indexName).
			Type(docType).
			BodyJson(l).
			Do()

		if err != nil {
			return err
		}
	}

	return nil
}

func findAndPrintAppLogs(client *elastic.Client) error {
	termQuery := elastic.NewTermQuery("app", appName)

	res, err := client.Search(indexName).
		Index(indexName).
		Query(termQuery).
		Sort("time", true).
		Do()

	if err != nil {
		return err
	}

	fmt.Println("Logs found:")
	var l Log
	for _, item := range res.Each(reflect.TypeOf(l)) {
		l := item.(Log)
		fmt.Printf("time: %s message: %s\n", l.Time, l.Message)
	}

	return nil
}

Although the code is self-explanatory, a couple of notes could be worthwhile.

First, look at indexName, docType and indexMapping constants. Elasticsearch index allows you to specify a document schema, known as type, where you may declare document fields including their types and, for string field, if it is analyzed or not. In our example we create index named applications, which will store documents of single type named log having three fields: app, message and time. Both string fields declared not_analyzed.

Second, pay attention to the following snippet:

termQuery := elastic.NewTermQuery("app", appName)

Note that we use TermQuery to search for logs. TermQuery requires exact match of the term. In our case, this will require field app of log documents to exactly match our application name to guarantee that returned documents belong to our application, not an application with similar name.

Finally, when you run the code, the output should be similar to:

Logs found:
time: 2016-07-16 16:38:25.128233427 +1000 AEST message: message 0
time: 2016-07-16 16:38:25.152318895 +1000 AEST message: message 1
time: 2016-07-16 16:38:25.156900546 +1000 AEST message: message 2
time: 2016-07-16 16:38:25.159455721 +1000 AEST message: message 3
time: 2016-07-16 16:38:25.164298397 +1000 AEST message: message 4
time: 2016-07-16 16:38:25.169695943 +1000 AEST message: message 5
time: 2016-07-16 16:38:25.172610227 +1000 AEST message: message 6
time: 2016-07-16 16:38:25.175208635 +1000 AEST message: message 7
time: 2016-07-16 16:38:25.181457148 +1000 AEST message: message 8
time: 2016-07-16 16:38:25.183462045 +1000 AEST message: message 9