Spring AI and LangChain4j: Building Java-Powered RAG and Agentic LLM Applications
AWS example - Build multi-agent systems with LangGraph and Amazon Bedrock
https://github.com/aws-samples/rag-workshop-amazon-bedrock-knowledge-bases/tree/main/01-rag-concepts
Show Me the Code - example code https://www.infoq.com/articles/spring-ai-1-0/
Introduction
The landscape of large-language-model (LLM)-based application development is rapidly evolving, and Java developers are increasingly in the game. Two frameworks that stand out in the Java ecosystem are Spring AI and LangChain4j. In this article we’ll explore how these tools enable retrieval-augmented generation (RAG), structured output, and tool/agent orchestration — compare their strengths, and walk through patterns for using them together (or choosing one) in a solid enterprise Java setting.
What is RAG (Retrieval-Augmented Generation)?
Before diving into frameworks, a quick refresher. RAG is the pattern where you augment an LLM’s prompt with retrieved context from external data (documents, vector stores, databases) so that the model generates more accurate, context-aware responses. (Home) In short:
- Retrieve relevant chunks (via vector search or other retrieval)
- Augment the user query with that context
- Generate a response using the LLM.
RAG helps address issues like the model’s outdated training data or “hallucinations”. (Medium) Given this, frameworks that make RAG and related workflows easy are highly valuable in production.
Frameworks in focus
Spring AI
The Spring ecosystem has long been the backbone of Java enterprise applications. The new Spring AI project brings LLM-integration to that world: embedding models, chat completion, vector stores, RAG flows, tool calls, structured output, observability, etc. (Home) Key features:
- Support for major model providers (OpenAI, Anthropic, AWS, local models, etc) via a portable API. (Home)
- Vector store abstraction: the
VectorStore
interface, with similarity search and native client access. (Home) - RAG flows via the
Advisor
API: out-of-the-box advisors likeQuestionAnswerAdvisor
,VectorStoreChatMemoryAdvisor
. (Home) - Structured output conversion: mapping model responses into POJOs (plain Java objects) for structured pipelines. (Ref: docs)
- Tool/Function-calling support (model triggers external services) and integrations with Spring Boot starters. (Ref: docs)
- All built in a familiar Spring style (beans, configuration, auto-configuration) so Java teams can pick it up quickly.
LangChain4j
LangChain (in Python) has been a seminal framework in the LLM space for prompt chaining, tool/agent orchestration, retrieval, memory, etc. LangChain4j is the Java adaptation/port that brings similar capabilities to JVM teams. (Medium) Highlights:
- Fluent API for building chains, agent orchestration, tool invocation. (Medium)
- Good support for RAG workflows: vector store integration, retrieval + generation, memory. (Medium)
- Integrations with Spring via a “langchain4j-spring” starter module. (GitHub)
- Offers more flexibility in building multi-step agent workflows (e.g., deciding when and how to call tools or sub-agents) rather than just “index + retrieve + generate”.
Key technical features
RAG with Spring AI
The “Retrieval-Augmented Generation” reference page of Spring AI outlines that the library supports modular architectures: you can build custom RAG flows or use prebuilt Advisor
flows such as QuestionAnswerAdvisor
, VectorStoreChatMemoryAdvisor
. (Home)
From the concepts page: “The Spring AI library helps you implement solutions based on the ‘stuffing the prompt’ technique (i.e., RAG)”. (Home)
Also: Spring AI provides the vector store abstractions (VectorStore
, similaritySearch, etc) so you can plug in Redis, PGVector, Milvus, etc. (Home)
Example flow (conceptual):
- Ingest your documents (PDFs, database extracts) into a vector store via Spring AI.
- At runtime when user asks a question: use
VectorStore.similaritySearch(...)
to retrieve relevant docs. - Use an
Advisor
(or custom bean) to incorporate the retrieved docs + user query into an LLM prompt, then call the model. - Map the response (optionally) into a POJO. Spring Boot application support makes it fit well into enterprise deployments.
Tool / Function calling & Structured output in Spring AI
From the docs: Spring AI supports “tools / function calling — permits the model to request the execution of client-side tools and functions, thereby accessing necessary real-time information as required.” (Home) Also, there’s an API for “structured-output converter”. The docs (you referenced) mention the Structured Output Converter in the Spring AI API. That means you can ask the model to return JSON (or other structured formats) and then map them into Java classes. This aligns with the pattern used in LangChain’s Structured Output + LangGraph documentation (which you also referenced).
Structured Output & LangGraph (in context of LangChain4j)
In the Python/LangChain world you’ll often see the pattern of asking the LLM for structured output (JSON, YAML) which is then parsed into your domain classes. (Medium) With LangChain4j, this pattern carries over: you build chains/pipelines that execute the LLM, parse the structured output, and continue. Combined with Spring AI’s structured output support, you can build robust pipelines where the output is strongly typed in your Java application.
Agentic & Tooling Patterns: Inline Agents
The reference you noted — “Create Dynamic Tooling Inline Agents” (AWS Bedrock Recipes) — is relevant: the idea being the LLM selects or invokes tools/functions dynamically (e.g., “fetch customer info”, “call database”, “run calculation”) rather than just generate text. Both Spring AI and LangChain4j support tool-invocation or agent patterns:
- In Spring AI you can register tools/services and let the LLM call those which integrate with Spring beans or REST services.
- In LangChain4j you have more explicit “Agent” abstractions and chaining of sub-tasks.
Hence, if your application needs dynamic invocation of real-world services (e.g., query a microservice, update a DB, fetch real-time data) the agent/tool pattern is important.
Sample architecture / how you might wire them
Here’s a rough architecture for a Java Spring Boot application combining both frameworks (or picking one) for a knowledge-driven chatbot or virtual assistant.
Scenario: “Enterprise Documentation Q&A + Agent Tools”
Suppose you work in an enterprise that wants a chat interface for employees to ask questions about internal documentation and then possibly call internal services (HR balance, expense check, etc).
Flow:
- Ingestion: Use Spring AI’s document reader + vector store to ingest PDFs, Confluence pages, database exports into your vector store. (Spring AI handles the document reader and vector store abstraction).
- Vector store retrieval: At runtime, when a user asks something, use Spring AI’s
VectorStore.similaritySearch(...)
to pick top N context documents. -
LLM prompt + tooling:
-
Use LangChain4j to build a chain:
- Step 1: Retrieve documents (via Spring AI).
- Step 2: Create an LLM prompt combining user query + fetched context.
- Step 3: Agent/tool invocation: If the LLM decides it needs to call a service (“fetch employee leave balance”), call a Spring service bean via tool invocation.
- Step 4: Structured output: Ask the model to return a structured JSON (e.g.,
{ "answer": "...", "action": { "type": "fetchLeaveBalance", "employeeId": "12345" } }
). - Step 5: Parse the JSON into POJOs (via Spring AI structured-output converter) and execute the action if needed, then return final answer.
-
- Return result: Send response back to user (chat UI, REST API).
- Observability / telemetry: Use Spring Boot metrics, logging of LLM calls, vector search latencies, etc.
Code snippet (very simplified)
// Ingestion (Spring AI)
@Autowired
VectorStore vectorStore;
public void loadDocs(List<Document> docs) {
vectorStore.add(docs);
}
// Retrieval + generation
@Autowired
ChatClient chatClient;
public String answerQuestion(String userQuery) {
List<Document> context = vectorStore.similaritySearch(userQuery);
String prompt = buildPrompt(userQuery, context);
ChatCompletionResult result = chatClient.prompt(prompt).call();
return result.getContent();
}
// src/main/java/com/acme/ai/rag/RagService.java
package com.acme.ai.rag;
import org.springframework.stereotype.Service;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.rag.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.search.SearchRequest;
@Service
public class RagService {
private final ChatClient chatClient;
private final VectorStore vectorStore;
public RagService(ChatClient chatClient, VectorStore vectorStore) {
this.chatClient = chatClient;
this.vectorStore = vectorStore;
}
public String answer(String question) {
var qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder().similarityThreshold(0.70).topK(6).build())
.build();
return chatClient.prompt()
.advisors(qaAdvisor)
.user(question)
.call()
.content();
}
}
Referencee
- https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html
- https://docs.spring.io/spring-ai/reference/api/tools.html
- https://docs.spring.io/spring-ai/reference/api/structured-output-converter.html
-
Structured-Output with LangGraph: [How to return structured data from a model 🦜️🔗 LangChain](https://python.langchain.com/docs/how_to/structured_output/)
-
- Create Dynamic Tooling Inline Agents - Amazon Bedrock Recipes