Home / Technology / Spring Commerce: Building a Resilient E-commerce System with Spring Boot Microservices

Spring Commerce: Building a Resilient E-commerce System with Spring Boot Microservices

Hola 👋,

I recently completed a comprehensive microservices project using Spring Boot 3, implementing various modern patterns and technologies. Here’s an overview of what I built and the key learnings from this journey.

The project code is available here on GitHub

Project Overview

I developed a simple e-commerce application with a microservices architecture. The system allows customers to browse products, place orders, and receive notifications. Instead of building a monolithic application, I split functionality into specialized services that communicate with each other.

The project implements several crucial microservice architectural patterns:

  • Service Discovery
  • API Gateway
  • Circuit Breaker
  • Event-Driven Architecture
  • Observability

Project Architecture

Project Architecture

The architecture consists of multiple services, an API Gateway, and supporting components like Kafka for messaging and Keycloak for security. Services communicate both synchronously (via REST) and asynchronously (via Kafka).

Core Services

Product Service

  • Acts as a product catalog
  • Built with Spring Boot, MongoDB
  • Provides REST APIs to create and view products
  • Implements CRUD operations for product management
  • Technical implementation:
@Document(value = "product")
public class Product {    
    @Id    
    private String id;    
    private String name;    
    private String description;    
    private BigDecimal price;
}
  • Uses records for DTOs (Java 16+ feature) for clean, immutable data transfer:
public record ProductRequest(String name, String description, BigDecimal price) {}

Order Service

  • Handles customer orders
  • Uses MySQL via Spring Data JPA with Flyway for database migrations
  • Communicates with Inventory Service to check product availability via REST
  • Sends events to Notification Service via Kafka
  • Implements Resilience4J for circuit breaking and retry mechanisms
  • Technical implementation includes transaction management:
@Service
@RequiredArgsConstructor
@Transactional
public class OrderService {    
    private final OrderRepository orderRepository;    
    private final InventoryClient inventoryClient;    
    private final KafkaTemplate<String, OrderPlacedEvent> kafkaTemplate;        
    public void placeOrder(OrderRequest orderRequest) {        
        // Check inventory        
        // Create and save order        
        // Send event via Kafka 
    }
}

Inventory Service

  • Manages product stock information
  • Provides REST API to check if products are in stock
  • Uses MySQL database with JPA
  • Uses Flyway for database schema migrations
  • Implementation example:
@Transactional(readOnly = true)
public boolean isInStock(String skuCode, Integer quantity) {    
    return inventoryRepository .existsBySkuCodeAndQuantityIsGreaterThanEqual(skuCode, quantity);
}

Notification Service

  • Sends notification emails to customers
  • Listens to Kafka events from Order Service
  • Uses Spring Kafka and Avro for message serialization
  • Implementation with Kafka listener:
@KafkaListener(topics = "order-placed")
public void listen(OrderPlacedEvent orderPlacedEvent) {   
    // Send email notification
}

Key Patterns & Technologies

API Gateway (Spring Cloud Gateway MVC)

I implemented an API Gateway that serves as the entry point for all client requests. This simplified client interactions and provided a centralized point for cross-cutting concerns like security and routing.

The routing configuration uses the functional programming model in Spring:

@Configuration
public class Routes {
    @Bean
    public RouterFunction<ServerResponse> productServiceRoute() {
        return GatewayRouterFunctions.route("product_service")
                .route(RequestPredicates.path("/api/product"), 
                    HandlerFunctions.http("http://localhost:8080"))
                .filter(circuitBreaker("productServiceCircuitBreaker", "/fallbackRoute"))
                .build();
    }

    // More routes for other services...
}

Security with Keycloak

The API Gateway integrates with Keycloak for authentication and authorization, providing OAuth2 security for all services behind it. I configured the API Gateway as an OAuth2 Resource Server.

Security configuration in the API Gateway:

@Configuration
public class SecurityConfig {
    @Bean
    public SecurityFilterChain securityFilterChain(HttpSecurity httpSecurity) throws Exception {
        return httpSecurity.authorizeHttpRequests(authorize -> authorize
                .anyRequest().authenticated())
                .oauth2ResourceServer(oauth2 -> oauth2.jwt(Customizer.withDefaults()))
                .build();
    }
}

Application properties configuration:

spring.security.oauth2.resourceserver.jwt.issuer-uri=http://localhost:8181/realms/springcommerce

Circuit Breaker Pattern (Resilience4J)

To prevent cascading failures between services, I implemented circuit breakers using Resilience4J. This helps maintain system stability when individual services fail.

Resilience4J operates in three states:

  • Closed: Normal operation, requests flow through
  • Open: Circuit is broken, requests are blocked after failure threshold is reached
  • Half-Open: Testing recovery by allowing limited requests

Configuration in application.properties:

# Circuit Breaker configuration
resilience4j.circuitbreaker.configs.default.registerHealthIndicator=true
resilience4j.circuitbreaker.configs.default.slidingWindowType=COUNT_BASED
resilience4j.circuitbreaker.configs.default.slidingWindowSize=10
resilience4j.circuitbreaker.configs.default.failureRateThreshold=50
resilience4j.circuitbreaker.configs.default.waitDurationInOpenState=5s
resilience4j.circuitbreaker.configs.default.permittedNumberOfCallsInHalfOpenState=3
resilience4j.circuitbreaker.configs.default.automaticTransitionFromOpenToHalfOpenEnabled=true

# Timeout configuration
resilience4j.timelimiter.configs.default.timeout-duration=3s

# Retry configuration
resilience4j.retry.configs.default.max-attempts=3
resilience4j.retry.configs.default.wait-duration=2s

In my implementation, if 50% of requests fail in a window of 10 requests, the circuit opens for 5 seconds, then transitions to half-open where it allows 3 test requests.

Asynchronous Communication (Kafka)

For order notifications, I set up async communication between Order Service and Notification Service using Kafka, implementing a true event-driven architecture.

I used Apache Avro for schema definition and Confluent Schema Registry for schema management:

{
  "type": "record",
  "name": "OrderPlacedEvent",
  "namespace": "com.springcommerce.order_service.event",
  "fields": [
    { "name": "orderNumber", "type": "string" },
    { "name": "email", "type": "string" },
    { "name": "firstName", "type": "string" },
    { "name": "lastName", "type": "string" }
  ]
}

Producing events in the Order Service:

// Inside OrderService.java
OrderPlacedEvent orderPlacedEvent = new OrderPlacedEvent(
    order.getOrderNumber(),
    orderRequest.userDetails().email(),
    orderRequest.userDetails().firstName(),
    orderRequest.userDetails().lastName()
);
kafkaTemplate.send("order-placed", orderPlacedEvent);

Consuming events in the Notification Service:

@Service
@RequiredArgsConstructor
public class NotificationService {
    private final JavaMailSender mailSender;

    @KafkaListener(topics = "order-placed")
    public void listen(OrderPlacedEvent event) {
        // Send email notification logic
    }
}

Observability with Grafana Stack

I integrated the complete Grafana stack to implement comprehensive observability across all services:

  • Grafana: Visualization dashboard for all observability data
  • Prometheus: Time-series database for collecting and querying metrics
  • Loki: Log aggregation system (similar to Elasticsearch)
  • Tempo: Distributed tracing backend

For logging, I configured Loki integration using logback:


    <appender name="LOKI" class="com.github.loki4j.logback.Loki4jAppender">
        
            http://localhost:3100/loki/api/v1/push
        
        
            
                application=${appName},host=${HOSTNAME},level=%level
            
            
                ${FILE_LOG_PATTERN}
            
        
    

    <root level="INFO">
        <appender-ref ref="LOKI"/>
    

For metrics, I used Spring Boot Actuator with Micrometer Prometheus registry:

management.endpoints.web.exposure.include=health, info, metrics, prometheus
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.observations.key-values.application=order-service

For tracing, I implemented distributed tracing with Micrometer Tracing:

@Observed
public class OrderRepository {
    // Repository methods
}

This setup provides a complete picture of system behavior, making debugging and performance analysis much easier.

Testing Strategy

Integration Testing with TestContainers

Rather than mocking databases, I used TestContainers to spin up actual database instances (MongoDB, MySQL) during tests. This provides a more realistic testing environment.

TestContainers implementation for Product Service:

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
class ProductServiceApplicationTests {
    @ServiceConnection
    static MongoDBContainer mongoDBContainer = new MongoDBContainer("mongo:7.0.5");

    @LocalServerPort
    private Integer port;

    @BeforeEach
    void setup() {
        RestAssured.baseURI = "http://localhost";
        RestAssured.port = port;
    }

    static {
        mongoDBContainer.start();
    }

    @Test
    void shouldCreateProduct() {
        // Test code using RestAssured
    }
}

WireMock for Service Simulation

For testing the Order Service without depending on the actual Inventory Service, I used WireMock to simulate the Inventory Service’s responses, making tests more reliable and independent.

Setting up WireMock for testing:

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@AutoConfigureWireMock(port = 0)  // Dynamic port allocation
class OrderServiceApplicationTests {
    @Test
    void shouldSubmitOrder() {
        // Stub the inventory service call
        stubFor(get(urlEqualTo("/api/inventory?skuCode=iphone_15&quantity=1"))
            .willReturn(aResponse()
                .withStatus(200)
                .withHeader("Content-Type", "application/json")
                .withBody("true")));

        // Test order submission
    }
}

Test properties:

inventory.url=http://localhost:${wiremock.server.port}

This approach ensures that our Order Service tests aren’t affected by the availability or behavior of the actual Inventory Service.

Development Challenges & Solutions

Challenge 1: Implementing Retry Mechanisms in Order Service

When the Inventory Service was temporarily unavailable, the Order Service would fail completely. I needed a way to make it more resilient.

Solution:

  • Implemented Resilience4J’s retry mechanism in the Order Service
  • Configured optimal retry parameters (3 retries with exponential backoff)
  • Added fallback mechanisms for when retries were exhausted
  • Used Circuit Breaker to prevent overwhelming the system with retries

Implementation in the Inventory Client:

@CircuitBreaker(name = "inventory", fallbackMethod = "fallbackMethod")
@Retry(name = "inventory")
boolean isInStock(@RequestParam String skuCode, @RequestParam Integer quantity);

default boolean fallbackMethod(String code, Integer quantity, Throwable exception) {
    log.info("Cannot get inventory for skucode {}, failure reason: {}", 
             code, exception.getMessage());
    return false;
}

Configuration for the retry mechanism:

resilience4j.retry.instances.inventory.max-attempts=3
resilience4j.retry.instances.inventory.wait-duration=5s

The trickiest part was finding the right balance – too aggressive retrying would flood the system, while too cautious an approach would affect user experience. Testing different configurations under various failure scenarios was essential to find the optimal setup.

Challenge 2: Setting Up Distributed Tracing

Tracking requests as they traveled between services was difficult. When issues occurred, I couldn’t easily see which service was causing the problem.

Solution:

  • Integrated Spring Boot’s tracing capabilities using Micrometer
  • Set up Grafana Tempo to collect and visualize traces
  • Added trace IDs to logs for correlation
  • Ensured proper propagation of trace context between services

Dependencies added to implement tracing:


    io.micrometer
    micrometer-tracing-bridge-brave


    io.zipkin.reporter2
    zipkin-reporter-brave

Configuration for tracing:

@Configuration
public class ObservationConfig {
    @Bean
    ObservedAspect observedAspect(ObservationRegistry registry) {
        return new ObservedAspect(registry);
    }
}

Property configuration:

management.tracing.sampling.probability=1.0

Tempo configuration in docker-compose.yml:

tempo:
  image: grafana/tempo:2.2.2
  command: ['-config.file=/etc/tempo.yaml']
  volumes:
    - ./docker/tempo/tempo.yml:/etc/tempo.yaml:ro
    - ./docker/tempo/tempo-data:/tmp/tempo
  ports:
    - '3110:3100'  # Tempo
    - '9411:9411'  # zipkin

This gave me an end-to-end view of request flows, making it much easier to debug issues across service boundaries. The most challenging aspect was ensuring that trace context properly propagated across all services, especially when using different communication methods (REST vs Kafka).

Key Learnings

This project reinforced that successful microservices architectures need more than just splitting services – they require careful implementation of patterns for resilience, observability, and communication.

Key technical insights gained:

  1. Design for Failure: In distributed systems, failures are inevitable. Implementing circuit breakers, retries, and fallbacks is essential.
  2. Observability is Not Optional: Without proper logging, metrics, and tracing, debugging distributed systems becomes nearly impossible.
  3. Data Consistency Challenges: Managing data consistency across services requires careful design. Use eventual consistency where appropriate.
  4. Test Infrastructure Matters: Using tools like TestContainers and WireMock resulted in more reliable tests that better reflect production behavior.
  5. Security Requires Planning: Implementing OAuth2 with KeyCloak showed the importance of designing security from the beginning.

Happy Springing! 🚀

Built with ❤️ by Rahul
LinkedIn | GitHub | Portfolio