Introduction to Spring AI
An application framework for AI engineering. Apply Spring's portable and modular design principles to the AI domain with support for OpenAI, Azure, Ollama, and more.
The AI landscape has exploded with powerful models from OpenAI, Google, Anthropic, and others. But for Java developers, integrating these models has meant dealing with inconsistent APIs, manual HTTP calls, and the constant fear of vendor lock-in. Spring AI changes everything. It brings the same productivity and abstraction that Spring brought to enterprise Java development—now for artificial intelligence.
Just as Spring Data lets you switch between databases with minimal code changes, Spring AI's portable abstractions let you move between AI providers seamlessly. Start development with local Ollama models for free, switch to OpenAI for testing, and deploy to Azure OpenAI for enterprise compliance—all without rewriting your application logic.
Why Spring AI?
Python has dominated AI development, but enterprise applications are built in Java. Spring AI bridges this gap by providing production-grade AI integration with the reliability and patterns Java developers already know. You get proper dependency injection, transaction management, security integration, and testability—not just raw API calls.
The framework handles the complexity: automatic retries with exponential backoff when models are overloaded, streaming responses for real-time user feedback,structured output parsing to map AI responses to your domain objects, andconversation memory for stateful chatbots. These patterns took years to develop in the Python ecosystem—Spring AI gives them to you out of the box.
Core Philosophy
Portable API
Write once, run anywhere. Switch between AI providers (OpenAI, Azure, Bedrock, Ollama) with minimal code changes. Your business logic stays unchanged—only configuration differs.
Modular Design
Components for Models, Prompts, Output Parsers, and RAG are designed as loosely coupled modules. Mix and match providers—use OpenAI for chat but Ollama for embeddings.
POJO Support
Map AI outputs directly to your domain objects using sophisticated Output Parsers and JSON mode. No more string parsing—get type-safe POJOs from LLM responses.
Key Concepts
Models
Interfaces that abstract interaction with AI models including Chat, Image, Audio, and Embedding across providers.
Prompts
Encapsulates creation of inputs for AI models including PromptTemplate for parameter substitution.
RAG
Retrieval Augmented Generation — load documents, compute embeddings, store in Vector Databases.
Advisors
Interceptors that modify requests/responses—add memory, inject RAG context, log interactions, or implement custom logic.
Function Calling
Let the LLM invoke your Java methods—query databases, call APIs, or perform calculations based on user intent.
Streaming
Receive tokens as they're generated for real-time UX. Built on Project Reactor's Flux for reactive applications.
Understanding ChatClient
The ChatClient is the heart of Spring AI. It provides a fluent API for interacting with language models, inspired by Spring's RestClient and WebClient patterns. The builder pattern lets you configure defaults once—system prompts, model parameters, advisors—and reuse them across your application.
Unlike raw API calls, ChatClient handles the full lifecycle: constructing properly formatted messages, managing conversation context, parsing responses, and handling errors gracefully. It's the difference between writing boilerplate HTTP code and using Spring's abstractions—you focus on business logic, not infrastructure.
Quick Setup
Add Spring AI OpenAI dependency and configure your API key
<!-- Maven Dependency --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai-spring-boot-starter</artifactId></dependency>
# Application Properties
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4
spring.ai.openai.chat.options.temperature=0.7Basic Usage
Using ChatClient
@ServicepublicclassAiService{privatefinalChatClient chatClient;publicAiService(ChatClient.Builder builder){this.chatClient = builder
.defaultSystem("You are a helpful assistant for our e-commerce platform.").build();}publicStringchat(String userMessage){return chatClient.prompt().user(userMessage).call().content();}publicFlux<String>streamChat(String userMessage){return chatClient.prompt().user(userMessage).stream().content();}// Structured output - map to POJOspublicProductextractProduct(String description){return chatClient.prompt().user("Extract product info from: "+ description).call().entity(Product.class);}}Notice the .entity(Product.class) method—this is Spring AI's magic. It uses JSON mode to ensure the LLM returns valid JSON matching your POJO structure, then deserializes it automatically. No more regex parsing of AI responses. Your domain objects become the contract.
Supported Providers
Chat Models
- OpenAI — GPT-4, GPT-4o, GPT-3.5-Turbo
- Azure OpenAI — Enterprise deployment with compliance
- Anthropic — Claude 3 Opus, Sonnet, Haiku
- Google Vertex AI — Gemini Pro, Gemini Ultra
- Ollama — Local LLMs (Llama 3, Mistral, Phi-3)
- Amazon Bedrock — Claude, Titan, Command R+
Vector Stores
- Redis — With vector search module
- PostgreSQL/PGVector — Native vectors in Postgres
- Pinecone — Managed, serverless vector DB
- Weaviate — GraphQL-native vector search
- Chroma — Lightweight for development
- Oracle 23ai — Enterprise vector search
Provider portability in action: Develop locally with Ollama (free, no API keys), test with OpenAI (best quality), deploy to Azure OpenAI (enterprise compliance). Same code, different application.properties.
Best Practices & Tips
💰 Cost Management
- Use
maxTokensto limit response length and cost. - Prefer GPT-3.5-Turbo or Claude Haiku for simple tasks (10x cheaper).
- Implement semantic caching to avoid repetitive calls.
- Use streaming to fail fast on bad responses.
🛡️ Security
- Never expose API keys in frontend code—use backend proxies.
- Implement content moderation for user inputs.
- Rate limit per user to prevent abuse and cost overruns.
- Sanitize PII before sending to external models.
⚡ Performance
- Use async/streaming for better user experience.
- Cache embeddings—they're deterministic for the same input.
- Batch similar requests when possible.
- Monitor latency and set appropriate timeouts.
🧪 Testing
- Mock ChatClient in unit tests—don't call real APIs.
- Use Ollama or local models for integration tests.
- Test edge cases: empty inputs, very long inputs, timeouts.
- Verify structured output parsing with varied responses.