Introduction to Spring AI

    An application framework for AI engineering. Apply Spring's portable and modular design principles to the AI domain with support for OpenAI, Azure, Ollama, and more.

    The AI landscape has exploded with powerful models from OpenAI, Google, Anthropic, and others. But for Java developers, integrating these models has meant dealing with inconsistent APIs, manual HTTP calls, and the constant fear of vendor lock-in. Spring AI changes everything. It brings the same productivity and abstraction that Spring brought to enterprise Java development—now for artificial intelligence.

    Just as Spring Data lets you switch between databases with minimal code changes, Spring AI's portable abstractions let you move between AI providers seamlessly. Start development with local Ollama models for free, switch to OpenAI for testing, and deploy to Azure OpenAI for enterprise compliance—all without rewriting your application logic.

    Why Spring AI?

    Python has dominated AI development, but enterprise applications are built in Java. Spring AI bridges this gap by providing production-grade AI integration with the reliability and patterns Java developers already know. You get proper dependency injection, transaction management, security integration, and testability—not just raw API calls.

    The framework handles the complexity: automatic retries with exponential backoff when models are overloaded, streaming responses for real-time user feedback,structured output parsing to map AI responses to your domain objects, andconversation memory for stateful chatbots. These patterns took years to develop in the Python ecosystem—Spring AI gives them to you out of the box.

    Core Philosophy

    Portable API

    Write once, run anywhere. Switch between AI providers (OpenAI, Azure, Bedrock, Ollama) with minimal code changes. Your business logic stays unchanged—only configuration differs.

    Modular Design

    Components for Models, Prompts, Output Parsers, and RAG are designed as loosely coupled modules. Mix and match providers—use OpenAI for chat but Ollama for embeddings.

    POJO Support

    Map AI outputs directly to your domain objects using sophisticated Output Parsers and JSON mode. No more string parsing—get type-safe POJOs from LLM responses.

    Key Concepts

    Models

    Interfaces that abstract interaction with AI models including Chat, Image, Audio, and Embedding across providers.

    Prompts

    Encapsulates creation of inputs for AI models including PromptTemplate for parameter substitution.

    RAG

    Retrieval Augmented Generation — load documents, compute embeddings, store in Vector Databases.

    Advisors

    Interceptors that modify requests/responses—add memory, inject RAG context, log interactions, or implement custom logic.

    Function Calling

    Let the LLM invoke your Java methods—query databases, call APIs, or perform calculations based on user intent.

    Streaming

    Receive tokens as they're generated for real-time UX. Built on Project Reactor's Flux for reactive applications.

    Understanding ChatClient

    The ChatClient is the heart of Spring AI. It provides a fluent API for interacting with language models, inspired by Spring's RestClient and WebClient patterns. The builder pattern lets you configure defaults once—system prompts, model parameters, advisors—and reuse them across your application.

    Unlike raw API calls, ChatClient handles the full lifecycle: constructing properly formatted messages, managing conversation context, parsing responses, and handling errors gracefully. It's the difference between writing boilerplate HTTP code and using Spring's abstractions—you focus on business logic, not infrastructure.

    Quick Setup

    Add Spring AI OpenAI dependency and configure your API key

    Configuration
    <!-- Maven Dependency --><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai-spring-boot-starter</artifactId></dependency>
    # Application Properties
    spring.ai.openai.api-key=${OPENAI_API_KEY}
    spring.ai.openai.chat.options.model=gpt-4
    spring.ai.openai.chat.options.temperature=0.7

    Basic Usage

    Using ChatClient

    AiService.java
    @ServicepublicclassAiService{privatefinalChatClient chatClient;publicAiService(ChatClient.Builder builder){this.chatClient = builder
    .defaultSystem("You are a helpful assistant for our e-commerce platform.").build();}publicStringchat(String userMessage){return chatClient.prompt().user(userMessage).call().content();}publicFlux<String>streamChat(String userMessage){return chatClient.prompt().user(userMessage).stream().content();}// Structured output - map to POJOspublicProductextractProduct(String description){return chatClient.prompt().user("Extract product info from: "+ description).call().entity(Product.class);}}

    Notice the .entity(Product.class) method—this is Spring AI's magic. It uses JSON mode to ensure the LLM returns valid JSON matching your POJO structure, then deserializes it automatically. No more regex parsing of AI responses. Your domain objects become the contract.

    Supported Providers

    Chat Models

    • OpenAI — GPT-4, GPT-4o, GPT-3.5-Turbo
    • Azure OpenAI — Enterprise deployment with compliance
    • Anthropic — Claude 3 Opus, Sonnet, Haiku
    • Google Vertex AI — Gemini Pro, Gemini Ultra
    • Ollama — Local LLMs (Llama 3, Mistral, Phi-3)
    • Amazon Bedrock — Claude, Titan, Command R+

    Vector Stores

    • Redis — With vector search module
    • PostgreSQL/PGVector — Native vectors in Postgres
    • Pinecone — Managed, serverless vector DB
    • Weaviate — GraphQL-native vector search
    • Chroma — Lightweight for development
    • Oracle 23ai — Enterprise vector search

    Provider portability in action: Develop locally with Ollama (free, no API keys), test with OpenAI (best quality), deploy to Azure OpenAI (enterprise compliance). Same code, different application.properties.

    Best Practices & Tips

    💰 Cost Management

    • Use maxTokens to limit response length and cost.
    • Prefer GPT-3.5-Turbo or Claude Haiku for simple tasks (10x cheaper).
    • Implement semantic caching to avoid repetitive calls.
    • Use streaming to fail fast on bad responses.

    🛡️ Security

    • Never expose API keys in frontend code—use backend proxies.
    • Implement content moderation for user inputs.
    • Rate limit per user to prevent abuse and cost overruns.
    • Sanitize PII before sending to external models.

    ⚡ Performance

    • Use async/streaming for better user experience.
    • Cache embeddings—they're deterministic for the same input.
    • Batch similar requests when possible.
    • Monitor latency and set appropriate timeouts.

    🧪 Testing

    • Mock ChatClient in unit tests—don't call real APIs.
    • Use Ollama or local models for integration tests.
    • Test edge cases: empty inputs, very long inputs, timeouts.
    • Verify structured output parsing with varied responses.

    Ready to Build with Spring AI?

    Explore our tutorials to build intelligent chatbots, implement RAG, add function calling, and create production-ready AI applications with Spring Boot.