22 August 2016

This Week in Elasticsearch and Apache Lucene - 2016-08-22

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

“Less Code, More Nodes, More Features“
Application Scaling with Elasticsearch @ StockTwits | Elastic - https://t.co/MZCrn4OMtF
— Kraut Klíck (@QIMP3G) August 12, 2016

Elasticsearch Core

Changes in 2.x:

It should be possible to update the include_in_all setting on existing object fields.
The geohash options on geo-point fields are deprecated, as is the optimize_bbox parameter to the geo-point distance query.
Jackson has been upgraded to v2.8.1.

Changes in master:

Failing to allocate a primary shard 5 times should prevent further automated allocation attempts.
The default min and max heap sizes are now set to 2GB, which means we can remove this from the bootstrap checks.
The minimum_master_nodes setting has also been removed from bootstrap checks as it only checked that it had been set, not that it had been set correctly.
Bootstrap check exceptions no longer print stack traces, which were just obscuring the message of the exception.
Index names may no longer start with + or + - as these special characters are used in index wildcard matching.
Index creation requests must use PUT not POST, and a type-exists request has changed from HEAD index/type to HEAD index/_mapping/type.
Reindex should work with the transport client.
The snapshot-status API now supports ignore_unavailable.
String fields with index_options or position_increment_gap were not being upgraded to text fields.
Plugins should be able to upgrade custom cluster state metadata on startup.
The routing changes API makes it easier for a node to determine which shard allocation changes have taken place.
LockObtainFailedException has been renamed to ShardLockObtainFailedException because it is an in-memory lock that has nothing to do with IO.

Ongoing:

Painless will be the new default script language in 5.0
A big codebase cleanup is under way to reduce the number of packages that we have, and to remove the dependency on Guice.
SearchContext should use ref counting to prevent accessing an already closed index.
Response filtering will support exclusions like foo.*,-foo.bar
Shards should only be marked as stale when there is a non-replicated write, not when the node shuts down.
The ingest node should be able to handle dots in field names.
A post-search hook will allow logging search requests once per request instead of once per shard.
Should only text and keyword fields be included in the _all field by default?
Setting stored_fields to _none_ would skip the stored-fields phase entirely, meaning meta-fields like _id, _type, _source etc would not be returned.

Apache Lucene

The release process for Lucene 6.2.0 will begin shortly
The surprisingly massive indexing performance drop (annotation AU), unexpectedly caused by an otherwise great change, was due to a pre-existing performance bug in Lucene only uncovered after much hunting
Lucene's legacy (postings based) numeric implementation has moved to the backwards-codecs module and will soon be removed entirely for 7.0
A new Lucene test case tests that you can simultaneously close SearcherManager while it's also refreshing, and open a new SearcherManager while IndexWriter is closing, while also searching hopefully without risking SIGSEGV
Lucene now tries harder in its best effort check to detect when MMapDirectory is being used after being closed since that can cause a SIGSEGV which terminates the JVM, but its stressful test case will still provoke SIGSEGV, so it has been disabled
IntRangeField, FloatRangeField and LongRangeField let you index a range and search by ranges overlapping the indexed ranges
Lucene tests had gotten too slow recently, especially TestBoolean2
We don't need an exemption in Lucene's tests security policy for loading the Wikipedia test documents
The flakey MoreLikeThisTest that keeps failin has finally been muzzled
Another tricky corner case geo3d test failure emerges
Stemming is tricky and it's hard to make changes without a formal analysis of the impact
If MultiPhraseQuery has only one clause, the classic highlighter will hit an IllegalArgumentException
BooleanQuery can optimize rewrite in a few cases
The APIs to track external data structures along with Lucene's LeafReaders are trappy
Nested span queries somehow broke between 4.10.x and today
Making delete-by-query work with doc-values queries is horribly complex and it may make more sense to remove doc-values queries instead, though some people disagree

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

By industry

By solution

Customer spotlight

Developers

Connect

Learn

Help

See what's happening at Elastic

This Week in Elasticsearch and Apache Lucene - 2016-08-22

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS