So, you build a web site or an online system and you want to add search to it. But then it hits you: getting search to work is hard. You want the search solution to be fast, painless to setup and to scale. You want to be able to index data simply using JSON over HTTP without having to pre-define schemas for the index. You want the search server to always be available and to start small but potentially scale large – Big Data large. You want to create as many indices as you see fit which will support a diverse set of document types. As you search, you want it as close to real-time search as possible. Oh yeah… and it would be great if this search solution is built for the cloud.
Schema Free & Document Oriented
The data model of a search engine roots to schema free document oriented databases, and as shown by the #nosql movement, this model proves to be very effective for building applications. Elasticsearch‘s model is JSON, which slowly emerges as the de-facto standard for representing data these days. More over, with JSON, it is simple to provide semi-structured data with complex entities as well as being programming language neutral with first level parsers.
Elasticsearch is schema-less, just toss it a typed JSON document and it will automatically index it. Types such as numbers and dates are automatically detected and treated accordingly. That said, as we all know, search engines are quite sophisticated. Fields in documents can have boost levels that affect scoring, analyzers can be used to control how text gets tokenized into terms, certain fields should not be analyzed at all, and so on… . Elasticsearch allows you to completely control how a JSON document gets mapped into the search engine on a per type and per index level. Read more…
GETting Some Data
Indexing data is always done using a unique identifier (at the type level). This is very handy since many times we wish to update or delete the actual indexed data, or just GET it. Getting data could not be simpler and all that is needed is the index name, the type and the id. What we get back is the actual JSON document used to index the specific data, effectively behaving as a distributed key/value store for structured documents. Read more…
What it all boils down to at the end, is being able to search, and with Elasticsearch it could not be simpler. Issuing queries is a simple call, hiding away the sophisticated distributed based search support Elasticsearch provides. Search can be executed either using a simple Lucene based query string or an extensive JSON based search query DSL.Search does not end with just queries. Facets, highlighting, custom scripts, and more are all there to be used when needed. Read more…
A single index is already a major step forward, but what happens when we need to have more than one index. In many cases, multiple indices are required. An example can be storing an index per week of log files, or even having different indices with different settings (one with memory storage, and one with file system storage).Elasticsearch easily enables the creation of as many indices as required, allowing cross index queries to be executed and index grouping using advanced aliasing functionality.
The ability to configure is a double edged sword. We want the ability to start working with the system as fast as possible, with no configuration, and still be able to control almost every aspect of the application if need be.Elasticsearch is built with this notion in mind. Almost everything is configurable and pluggable. More over, each index can have its own settings which can override the master settings. For example, one index can be configured with memory storage and have 10 shards with 1 replica each, and another index can have file based storage with 1 shard and 10 replicas. All the index level settings can be controlled when creating an index either using a YAML or JSON format.
One of the main features of Elasticsearch is its distributed nature. Indices are broken down into shards, each shard with 0 or more replicas. Each data node within the cluster hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically and behind the scenes. View in action…
Elasticsearch has been purposely built with the cloud in mind. Starting with options like auto discovery of nodes when running in AWS EC2, to being an adaptive distributed system that automatically handles machines coming and going, and adjusts to a dynamic environment.