For the past four years or so, the term “Big Data” has been loosely thrown around marketing and tech conferences, publications, blog articles, and everywhere in between. The buzzword has since been defined and classified, but one particular distributed storage and processing ecosystem might as well be synonymous with it as well: Apache Hadoop.
Hadoop is composed of a very wide array of packages and tools that can bulk ingest and process data with the power of distributed clusters of commodity hardware and/or container technologies. So it comes as no surprise that organizations have been combining the power of Hadoop to perform deeper analytics and produce “actionable insights” with Elasticsearch for robust log and performance metric analysis.
In this tutorial, we shall utilize the Elastic Hadoop connector to integrate Elasticsearch with a Hadoop cluster and introduce readers to how external tables in Hive work with Elasticsearch mappings and bulk-loaded docs.