
A Deep Dive into Significant Terms and Significant Text Bucket Aggregations in Elasticsearch
Posted by Kirill Goltsman December 11, 2018In this article, we’ll continue our overview of Elasticsearch bucket aggregations, focusing on significant terms and significant text aggregations. These aggregations are designed to search for interesting and/or unusual occurrences of terms in your datasets that can tell much about the hidden properties of your data. This functionality is especially useful for the following use cases:
- Identifying relevant documents for the user queries containing synonyms, acronyms, etc. For example, the significant terms aggregation could suggest documents with “bird flu” when the user searches for H1N1.
- Identifying anomalies and interesting occurrences in your data. For example, by filtering documents based on location, we could identify the most frequent crime types in particular areas.
- Identifying the most significant properties of a group of subjects using the significant terms aggregation on integer fields like height, weight, income, etc.
It should be noted that both significant terms and significant text aggregations perform complex statistical computations on documents retrieved by the direct query (foreground set) and all other documents in your index (background set). Therefore, both aggregations are computationally intensive and should be properly configured to work fast. However, once you master them with the help of this tutorial, you’ll acquire a powerful tool for building very useful features in your applications and getting useful insights from your datasets. Let’s get started!