We’ve got some exciting news to share around Elasticsearch and Hadoop. Elasticsearch and Hortonworks have strengthened their relationship by formally partnering up, which has resulted in Elasticsearch now being the first certified Hortonworks Apache YARN search vendor. With both Elasticsearch and Hortonworks being leaders in the open source space, this partnership will lead to a valuable integration between Elasticsearch and Hadoop that anyone using the open source products will benefit from.
This is what Hortonworks said about our newly formed partnership:
“The over 400,000 downloads a month is a testament to the level of trust put into Elasticsearch. Since Elasticsearch and Hadoop are such critical elements to the data analytics infrastructure for a huge number of users, we’re excited by this opportunity to join together and reach beyond our users’ expectations.” Shaun Conolly, Vice President, Corporate Strategy at Hortonworks
Elasticsearch’s real-time data exploration, analytics, logging and search features combine really well with Hadoop and make for a powerful combination; very useful to anyone handling large volumes of data on a day to day basis.
Our customers can now enhance their Hortonworks Hadoop based workflows with our rich query language, designed to help businesses ask better questions, get clearer answers and better analyze their business metrics, all in real-time. Elasticsearch plays well with all Hadoop distributions, including MapR, Cloudera, Pivotal HD and Amazon EMR, and we have plans to provide specific integration with Hortonworks’ HDP platform.
This is what Shay had to say about our partnership with Hortonworks:
“We’re moving quickly here at Elasticsearch. Just a few weeks ago, we announced Elasticsearch for Apache Hadoop, so it’s exciting to announce Elasticsearch now works with Hortonworks’ HDP Hadoop distribution. We’re constantly working to improve our product so we can provide the most value to all of our users.” Shay Banon, founder and CTO of Elasticsearch
Often, data stores integrated into Hadoop can become a bottleneck due to the number of requests generated by the tasks running in the cluster for each job. The distributed nature of the Map/Reduce model fits really well on top of Elasticsearch because we correlate the number of Map/Reduce tasks with the number of Elasticsearch shards for a particular query. So every time a query is run, the system dynamically generates a number of Hadoop splits proportional to the number of shards available so that the jobs are run in parallel – your Hadoop cluster scales easily alongside Elasticsearch and vice-versa.
Finally, Elasticsearch provides near real-time responses (think milliseconds) that significantly improve a Hadoop job’s execution and the cost associated with it, especially when running on ‘rented resources’ such as Amazon EMR.
We’ve got even more news in the works about Elasticsearch’s integration with all things Hadoop. Follow this blog for future details! And, let us know how you’re using Elasticsearch with Hadoop. We would love to hear about it.