Lucene, Solr, and Elasticsearch are all closely related technologies in the field of search and indexing, but they have different characteristics and use cases. Here’s a comparison:

Apache Lucene

  1. Core Technology:
    • Lucene is a high-performance, full-featured text search engine library written in Java. It’s the core foundation upon which Solr and Elasticsearch are built.
  2. Use Case:
    • Ideal for applications that require custom search functionality. It’s a library, so it requires significant development effort to implement as part of a larger application.
  3. Features:
    • Provides advanced search capabilities like full-text search, ranking, and support for various document formats.
    • Very flexible but requires more programming to harness its full potential.
  4. Scalability:
    • On its own, it doesn’t handle distributed search or scalability. This needs to be implemented by the application.

Apache Solr

  1. Based on Lucene:
    • Solr is an open-source search platform built on top of Lucene. It extends Lucene and provides a search server with additional features.
  2. Use Case:
    • Suitable for enterprise-level search applications. Offers out-of-the-box search capabilities that are easier to implement compared to raw Lucene.
  3. Features:
    • Provides distributed search and indexing, replication, faceting, caching, a web admin interface, and more.
    • Has a REST-like API which makes it language agnostic.
  4. Scalability:
    • Designed for high scalability and fault tolerance.

Elasticsearch

  1. Based on Lucene:
    • Like Solr, Elasticsearch is built on top of Lucene. It’s a distributed, RESTful search and analytics engine capable of solving a growing number of use cases.
  2. Use Case:
    • Ideal for real-time, distributed search and analytics use cases. It is highly scalable and designed for cloud environments.
  3. Features:
    • Known for its ease of use, robustness, and good scalability. It supports complex search queries and is highly responsive.
    • Offers comprehensive REST APIs and a simple setup process.
  4. Scalability:
    • Highly scalable, can easily manage petabytes of structured and unstructured data.

Summary

  • Lucene is the core library for full-text indexing and search technology but requires significant programming effort.
  • Solr is a scalable search server that offers a lot of out-of-the-box features and is more enterprise-oriented.
  • Elasticsearch is renowned for its easy setup, scalability, and suitability for search and analytics use cases, especially in real-time applications.

The choice among these three depends on specific project requirements, existing technical infrastructure, and the level of customization needed.

  • Elasticsearch has gained significant popularity for its ease of use and performance in various scenarios, including logging (ELK stack),
  • while Solr is often favored in traditional enterprise search environments.

Some extra readings

Elasticsearch

  • Elasticsearch + some Lucene concept https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
  • from the “top”, as observed from a user’s perspective https://www.elastic.co/blog/found-elasticsearch-top-down
    • Elasticsearch index has shards, which are Lucene indexes. And those have inverted indexes
  • Optimizing Elasticsearch Searches https://www.elastic.co/blog/found-optimizing-elasticsearch-searches/
  • Sizing Elasticsearch https://www.elastic.co/blog/found-sizing-elasticsearch/
    • For time oriented data, such as logs, a common strategy is to partition data into indexes that hold data for a certain time range
    • Index per User

Solr

  • https://solr.apache.org/guide/7_5/how-solrcloud-works.html
  • https://www.tutorialspoint.com/apache_solr/apache_solr_architecture.htm

Lucene

  • https://alibaba-cloud.medium.com/analysis-of-lucene-basic-concepts-5ff5d8b90a53
  • An index is composed of one or more sub-indexes. A sub-index is called a segment