Elasticsearch Slow Query Logging on Elastic Cloud

I have been doing a lot of work with Elasticsearch and WordPress lately and one of the common challenges that I’ve faced is setting up logging for queries that might take longer than expected. Luckily, Elastic Cloud makes this quite simple.

Logging in Elasticsearch

There are four ways to configure logging in Elasticsearch:

  1. On the command line, which is “most appropriate when you are temporarily debugging a problem on a single node (for example, a problem with startup, or during development)”.
  2. In the elasticsearch.yml configuration file either when you’re debugging a service / daemon or for permanent settings.
  3. In the log4j2.properties configuration file, which is both powerful but also rather uncommon to use.
  4. Through the Elasticsearch API. This is ideal when you don’t want to restart or otherwise disrupt your services. This also has the advantage of being easily configurable and it is applied to all nodes. (This is also the only option that is available on Elastic Cloud and is coincidentally the best fit for general use-cases, woo.

Slow Query Logging through the API

The documentation has examples both for logging slow search and index operations:

PUT study_group/_settings
{
    "index.search.slowlog.threshold.query.warn": "10s",
    "index.search.slowlog.threshold.query.info": "5s",
    "index.search.slowlog.threshold.query.debug": "2s",
    "index.search.slowlog.threshold.query.trace": "500ms",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.threshold.fetch.info": "800ms",
    "index.search.slowlog.threshold.fetch.debug": "500ms",
    "index.search.slowlog.threshold.fetch.trace": "200ms",
    "index.indexing.slowlog.threshold.index.warn": "10s",
    "index.indexing.slowlog.threshold.index.info": "5s",
    "index.indexing.slowlog.threshold.index.debug": "2s",
    "index.indexing.slowlog.threshold.index.trace": "500ms",
    "index.indexing.slowlog.level": "info",
    "index.indexing.slowlog.source": "1000"
}

By default, the slowlog is disabled (value set to -1), and you can use warninfodebug, and trace to capture different thresholds. There are two additional tricks to collect more data:

  • To log all queries, set the threshold to 0.
  • To log all indices, use _all for the index name. But beware that this would also collect queries from internal indices (like .kibana, for example). This will only be helpful in specific debugging scenarios.

Both options will potentially create a lot of logs, so use them sparingly and change them back when you don’t need them anymore. Actually, neither of these should be used in a production context for prolonged periods of time. If they are needed in a production context, they should be used with caution.

Slow Query Logging on Elastic Cloud

Using a cluster on Elastic Cloud this example logs all queries for the index study_group, which must already exist for setting the slowlog:

# Create an index.
PUT study_group

# Log all queries.
PUT study_group/_settings
{
    "index.search.slowlog.threshold.query.warn": "0",
    "index.indexing.slowlog.threshold.index.warn": "0"
}

Creating some documents using the Bulk API and making a simple GET search request:

POST _bulk
{ "index" : { "_index" : "study_group" } }
{ "name" : "Troy" }
{ "index" : { "_index" : "study_group" } }
{ "name" : "Abed" }
{ "index" : { "_index" : "study_group" } }
{ "name" : "Jeff" }
{ "index" : { "_index" : "study_group" } }
{ "name" : "Annie" }
{ "index" : { "_index" : "study_group" } }
{ "name" : "Britta" }
{ "index" : { "_index" : "study_group" } }
{ "name" : "Pierce" }
{ "index" : { "_index" : "study_group" } }
{ "name" : "Shirley" }

GET study_group/_search
{
  "query": {
    "match": {
      "name": "Britta"
    }
  }
}

To see the log output you need to head to the Elastic Cloud admin UI and then <Cluster-Name> → Elasticsearch → Logs.

There you can find the indexing operation (per shard):

[instance-0000000000] [study_group/C2OBwoduS9SA_1EZ9ds4ow]
  took[746.5micros], took_millis[0], type[_doc], id[2], routing[],
  source[{"name":"Britta"}]
[instance-0000000001] [study_group/C2OBwoduS9SA_1EZ9ds4ow]
  took[2.7ms], took_millis[2], type[_doc], id[2], routing[],
  source[{"name":"Britta"}]

As well as the search operation — skipping the fetch phase:

[instance-0000000000] [study_group][0]
  took[147.2micros], took_millis[0],
  total_hits[1 hits], types[], stats[],
  search_type[QUERY_THEN_FETCH], total_shards[1],
  source[{
    "query": {
      "match": {
        "name": {
          "query": "Britta",
          "operator": "OR",
          "prefix_length": 0,
          "max_expansions": 50,
          "fuzzy_transpositions": true,
          "lenient": false,
          "zero_terms_query": "NONE",
          "auto_generate_synonyms_phrase_query": true,
          "boost": 1
        }
      }
    }
  }],
  id[],

This works, but admittedly the UI isn’t ideal, especially on a busy cluster. Once you’re done you should reset the slowlog to its default values:

PUT study_group/_settings
{
    "index.search.slowlog.threshold.query.warn": null,
    "index.indexing.slowlog.threshold.index.warn": null
}

Conclusion

Once you know which API to use and where to look for the logs, it’s quite easy to do. Bear in mind that if you use this method and you delete and reindex your indices, you will also need to re-PUT you slowlog settings!


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *