Lucene, Solr, and Elasticsearch

Lucene, Solr, and Elasticsearch are all closely related technologies in the field of search and indexing, but they have different characteristics and use cases. Here’s a comparison:

Apache Lucene

Core Technology:
- Lucene is a high-performance, full-featured text search engine library written in Java. It’s the core foundation upon which Solr and Elasticsearch are built.
Use Case:
- Ideal for applications that require custom search functionality. It’s a library, so it requires significant development effort to implement as part of a larger application.
Features:
- Provides advanced search capabilities like full-text search, ranking, and support for various document formats.
- Very flexible but requires more programming to harness its full potential.
Scalability:
- On its own, it doesn’t handle distributed search or scalability. This needs to be implemented by the application.

Apache Solr

Based on Lucene:
- Solr is an open-source search platform built on top of Lucene. It extends Lucene and provides a search server with additional features.
Use Case:
- Suitable for enterprise-level search applications. Offers out-of-the-box search capabilities that are easier to implement compared to raw Lucene.
Features:
- Provides distributed search and indexing, replication, faceting, caching, a web admin interface, and more.
- Has a REST-like API which makes it language agnostic.
Scalability:
- Designed for high scalability and fault tolerance.

Elasticsearch

Based on Lucene:
- Like Solr, Elasticsearch is built on top of Lucene. It’s a distributed, RESTful search and analytics engine capable of solving a growing number of use cases.
Use Case:
- Ideal for real-time, distributed search and analytics use cases. It is highly scalable and designed for cloud environments.
Features:
- Known for its ease of use, robustness, and good scalability. It supports complex search queries and is highly responsive.
- Offers comprehensive REST APIs and a simple setup process.
Scalability:
- Highly scalable, can easily manage petabytes of structured and unstructured data.

Summary

Lucene is the core library for full-text indexing and search technology but requires significant programming effort.
Solr is a scalable search server that offers a lot of out-of-the-box features and is more enterprise-oriented.
Elasticsearch is renowned for its easy setup, scalability, and suitability for search and analytics use cases, especially in real-time applications.

The choice among these three depends on specific project requirements, existing technical infrastructure, and the level of customization needed.

Elasticsearch has gained significant popularity for its ease of use and performance in various scenarios, including logging (ELK stack),
while Solr is often favored in traditional enterprise search environments.

Some extra readings

Elasticsearch

Elasticsearch + some Lucene concept https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
from the “top”, as observed from a user’s perspective https://www.elastic.co/blog/found-elasticsearch-top-down
- Elasticsearch index has shards, which are Lucene indexes. And those have inverted indexes
Optimizing Elasticsearch Searches https://www.elastic.co/blog/found-optimizing-elasticsearch-searches/
Sizing Elasticsearch https://www.elastic.co/blog/found-sizing-elasticsearch/
- For time oriented data, such as logs, a common strategy is to partition data into indexes that hold data for a certain time range
- Index per User

Solr

https://solr.apache.org/guide/7_5/how-solrcloud-works.html
https://www.tutorialspoint.com/apache_solr/apache_solr_architecture.htm

Lucene

https://alibaba-cloud.medium.com/analysis-of-lucene-basic-concepts-5ff5d8b90a53
An index is composed of one or more sub-indexes. A sub-index is called a segment