Integrating Hadoop and Elasticsearch - Best of Two Worlds

ES-Hadoop

For real-time analytics needs, organizations are using Hadoop and ElasticSearch together. Hadoop to ElasticSearch is one of very common integration pattern for API access for your data in Hadoop and with a connector provided by ElasticSearch, it makes it really easy to get data flowing with very less work.

Download ES-Hadoop Connector

1. Add Jar files to Hive project

ADD JAR hdfs:/user/elasticsearch-jars/elasticsearch-hadoop-hive-6.2.1.jar;
ADD JAR hdfs:/user/elasticsearch-jars/commons-httpclient-3.0.1.jar;

2. Ingesting Data from Hadoop to ElasticSearch

CREATE EXTERNAL TABLE  IF NOT EXISTS test_db.es_items(
  sku INT COMMENT 'SKU for an Item',
  location_id INT COMMENT 'Store Location Id',
  start_date STRING COMMENT 'Start Date for Item Location',
  end_date STRING COMMENT 'End Date for Item Location',
  channel STRING COMMENT 'Channel for Item Location',
  source_system STRING COMMENT 'Data Source for the Item Location'
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '${ES_INDEX}/${hiveconf:ES_TYPE}',
              'es.nodes' = '${ES_NODE_LIST}',
              'es.port' = '${ES_PORT}');

Now you can add data to the es_items table and you would be able to see it in your elasticsearch cluster.