Distributed Caching in Microservices with Redis

Modern microservices architectures are designed for scalability, resilience, and rapid deployment. However, as systems grow, repeated database calls and inter-service communication can become performance bottlenecks. This is where distributed caching becomes critical.

In this article, we’ll explore:

What distributed caching is
Why microservices need it
Distributed cache design principles
Using Redis as a distributed cache
Common caching patterns and challenges

Distributed Caching in Microservices: Designing Scalable Systems with Redis

What Is Distributed Caching?

A distributed cache is a caching system shared across multiple application instances or microservices. Instead of each service maintaining its own local cache, all services interact with a centralized or clustered cache layer.

Unlike in-memory local caches, distributed caches provide:

Shared access across services
Horizontal scalability
High availability
Consistent cached data

In microservices, distributed caching helps reduce latency, lower database load, and improve system responsiveness.

Why Microservices Need Distributed Caching

Microservices typically involve:

Multiple services communicating over the network
Independent databases
High read traffic
Frequent API calls

Microservices require distributed caching because local, in-memory caching fails to synchronize state across multiple horizontally scaled service instances, leading to severe data inconsistencies. By utilizing a shared, external, in-memory data store like Redis or Memcached, distributed caching acts as a centralized source of truth that ensures uniform data access, dramatically drops inter-service network latency, and shields isolated databases from being overwhelmed.

Key Reasons Microservices Require Distributed Caching

Data Consistency Across Instances: Horizontally scaled microservices run inside multiple isolated containers or pods. Local caching creates "islands" of stale data, whereas a distributed cache provides a shared state so all service instances read identical, updated values.
Mitigating Inter-Service Latency: Fulfilling a single user request through an API Gateway often requires synchronous, chained network calls across multiple downstream services. Caching common responses aggregates data instantly, eliminating expensive network hops.
Alleviating Database Bottlenecks: In a microservice design, each service typically owns its specific database. Read-heavy traffic spikes can quickly saturate these isolated databases; a distributed cache absorbs the majority of these read queries.
Stateless Scaling and Session Sharing: To scale fluidly, microservice instances must remain completely stateless. Moving session states, shopping carts, or security tokens into a distributed cache allows any instance to pick up any user request seamlessly if a container crashes.
Independent Scalability: The caching layer can be scaled horizontally and dynamically by adding nodes to its cluster completely independent of the microservices compute layer.

Distributed Cache Design Principles

A distributed cache stores data across a cluster of multiple servers (nodes) to scale horizontally, maximize RAM capacity, and deliver sub-millisecond latencies. Designing an effective distributed cache requires managing data routing, synchronization, and eviction across an inherently unreliable network.

The core architectural design principles of a distributed cache include:

Data Partitioning and Routing
To scale memory capacity horizontally beyond a single server, data must be partitioned (sharded) across multiple cache nodes.
- Consistent Hashing: Traditional modulo hashing shifts almost all keys when a node is added or removed. Distributed caches use Consistent Hashing to map both keys and nodes to a circular hash ring. When the cluster size changes, only a tiny fraction of the keys are redistributed.
- Virtual Nodes: To prevent "hotspots" or uneven data distribution, physical nodes are assigned multiple virtual positions across the hash ring. This ensures an even balancing of keys across different hardware capacities.
High Availability and Replication
Since memory is volatile and network hardware fails, caches must introduce fault tolerance without severely impacting low-latency guarantees.
- Master-Replica Sharding: Each shard consists of a primary master node (handling writes) and one or more read replicas.
- Replication Trade-offs:
  - Synchronous replication writes to the master and backups simultaneously, providing strong data consistency at the expense of higher write latency.
  - Asynchronous replication acknowledges the write as soon as the master is updated, copying data to backups in the background to preserve sub-millisecond speeds.
Cache Access Strategies
The system must determine how the application layer coordinates data synchronization between the cache and the primary database.
- Cache-Aside (Lazy Loading): The application checks the cache first. On a cache miss, it pulls from the database and manually populates the cache. This avoids wasting memory on unused data.
- Write-Through: The application writes exclusively to the cache layer, which immediately updates the underlying database synchronously before confirming success. This ensures high consistency but slows down writes.
- Write-Back (Write-Behind): The application updates the cache and receives an immediate success confirmation. A background service asynchronously flushes batched changes to the database later. This achieves maximum write throughput but carries a risk of data loss during unexpected node crashes.
Memory Management and Eviction
RAM is expensive and structurally bounded; a distributed cache must automatically prune data when it runs out of memory.
- Time-to-Live (TTL): Every key should have a defined expiration timestamp to ensure data does not stay stale indefinitely.
- Eviction Policies: When the system reaches its memory ceiling, it relies on algorithmic policies to remove data:
  - Least Recently Used (LRU): Evicts keys that haven't been accessed for the longest period.
  - Least Frequently Used (LFU): Tracks hit counters and evicts keys with the lowest access frequency.
Mitigating Edge Cases & Distributed Failures
- Cache Stampede (Thundering Herd): When a highly popular key expires, thousands of concurrent requests might hit the primary database simultaneously. This is mitigated using distributed locks or background probabilistic early recomputation to update the cache before it lapses.
- Cache Penetration: Occurs when requests target keys that do not exist in either the cache or the database, forcing continuous DB lookups. Designers implement Bloom filters at the front boundary to quickly reject nonexistent keys.

Redis as Distributed Cache in Microservices

When discussing distributed caching, Redis is often the first technology engineers consider. Redis works exceptionally well in microservices because it is lightweight and fast.

Using Redis as a distributed cache in a microservices architecture solves the limitation of local in-memory caching. Local caches cannot synchronize data across horizontally scaled service replicas. Redis serves as a centralized, high-performance, in-memory data store. It provides low-latency data access while preventing redundant database hits.

Caching Topologies in Microservices

When implementing Redis, choosing how your services interact with the cache dictates your deployment structure.

Centralized Shared Cache: A single Redis Cluster handles caching requirements for all microservices. Services separate data logically using structured namespaces (e.g., orders:123, users:456).
Isolated Per-Service Cache: Each microservice is deployed alongside its own dedicated Redis instance. This maintains strict domain isolation and aligns with the database-per-service pattern.
Two-Tier (L1/L2) Cache: Combines a local in-memory cache like Caffeine or MemoryCache (L1) for sub-microsecond local hits with a centralized Redis instance (L2) to coordinate state changes globally.

What is Redis Distributed Cache?

A Redis distributed cache is a caching system where multiple applications or microservices share cached data using Redis as a centralized in-memory data store. Instead of every application keeping its own local cache, Redis acts as a common cache layer that all services can access over the network.

How It Works

Application requests data
Service checks Redis first
If data exists in Redis → return immediately
If data is missing:
- Fetch from database
- Store result in Redis
- Return response

How Redis Improves Microservices Performance

Redis improves microservices performance primarily by reducing latency, offloading repeated work, and enabling fast communication between distributed services. In a microservices architecture, many small services communicate over the network, which can create bottlenecks. Redis acts as an in-memory data layer that helps eliminate those bottlenecks.

Key Mechanisms of Performance Improvement

Distributed Caching & Shared State
Microservices run in multiple isolated instances. Redis acts as a unified Distributed Cache across all instances:
- Eliminating DB Bottlenecks: Serves frequently read data (like product catalogs or user profiles) directly from RAM. This drastically minimizes expensive relational database queries.
- Centralised Session Management: Stores stateless user sessions externally. This allows any instance of a microservice to instantly handle incoming requests without state replication lag.
High-Speed Asynchronous Messaging
Network hops between services introduce significant latency. Redis optimizes communication via:
- Redis Streams: Acts as a lightweight event broker allowing microservices to publish and consume events asynchronously. This removes the blocking nature of synchronous HTTP/gRPC requests.
- Pub/Sub Messaging: Routes real-time notifications and system-wide configurations instantaneously across services.
Data Integration Patterns (CQRS)
In a Command Query Responsibility Segregation (CQRS) pattern, microservices separate write operations from read operations.
- Redis serves as the lightning-fast read database optimized for complex view queries.
- Changes written to the primary persistent database are asynchronously synced to Redis, preserving peak read performance.
Network and Query Optimization
- Pipelining: Microservices can batch multiple commands together using Redis Pipelining. This returns results in a single network round-trip time (RTT), drastically cutting network overhead.
- API Gateway Caching: Placed at the entry point of the ecosystem, it caches repetitive API responses. This stops requests from ever hitting downstream services.

Code Example: Distributed Caching in Microservices with Redis

Here’s a practical example of using Redis as a distributed cache in a C# microservices setup using ASP.NET Core. This example covers:

Redis configuration
Cache service abstraction
Writing/reading cached data
Sharing cache across multiple microservices
Serialization with JSON

Install Required Packages


dotnet add package Microsoft.Extensions.Caching.StackExchangeRedis
dotnet add package StackExchange.Redis

Configure Redis in appsettings.json


{
  "Redis": {
    "ConnectionString": "localhost:6379"
  }
}

For Docker:


{
  "Redis": {
    "ConnectionString": "redis:6379"
  }
}

Register Redis in Program.cs


using Microsoft.Extensions.Caching.Distributed;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddStackExchangeRedisCache(options =>
{
  options.Configuration =
  builder.Configuration["Redis:ConnectionString"];
  options.InstanceName = "MicroserviceDemo:";
  });
  builder.Services.AddControllers();
  builder.Services.AddScoped<ICacheService, CacheService>();
  var app = builder.Build();
  app.MapControllers();
  app.Run();

Create Cache Service Abstraction (ICacheService.cs)


public interface ICacheService
{
  Task<T?> GetAsync<T>(string key);
  Task SetAsync<T>(
  string key,
  T value,
  TimeSpan? expiry = null);
  Task RemoveAsync(string key);
}

Implement Cache Service (CacheService.cs)


using System.Text.Json;
using Microsoft.Extensions.Caching.Distributed;
public class CacheService : ICacheService
{
  private readonly IDistributedCache _cache;
  public CacheService(IDistributedCache cache)
  {
    _cache = cache;
  }
  public async Task<T?> GetAsync<T>(string key)
  {
    var cachedData = await _cache.GetStringAsync(key);
    if (string.IsNullOrEmpty(cachedData))
    return default;
    return JsonSerializer.Deserialize<T>(cachedData);
  }
  public async Task SetAsync<T>(
  string key,
  T value,
  TimeSpan? expiry = null)
  {
    var options = new DistributedCacheEntryOptions
    {
      AbsoluteExpirationRelativeToNow =
      expiry ?? TimeSpan.FromMinutes(5)
      };
      var jsonData = JsonSerializer.Serialize(value);
      await _cache.SetStringAsync(
      key,
      jsonData,
      options);
    }
    public async Task RemoveAsync(string key)
    {
      await _cache.RemoveAsync(key);
    }
  }

Example Product Service Using Redis Cache (ProductsController.cs)


using Microsoft.AspNetCore.Mvc;
[ApiController]
[Route("api/products")]
public class ProductsController : ControllerBase
{
  private readonly ICacheService _cacheService;
  public ProductsController(ICacheService cacheService)
  {
    _cacheService = cacheService;
  }
  [HttpGet("{id}")]
  public async Task<IActionResult> GetProduct(int id)
  {
    string cacheKey = $"product:{id}";
    // 1. Try cache first
    var cachedProduct =
    await _cacheService.GetAsync<Product>(cacheKey);
    if (cachedProduct != null)
    {
      return Ok(new
      {
        Source = "Redis Cache",
        Data = cachedProduct
        });
      }
      // 2. Simulate DB call
      await Task.Delay(2000);
      var product = new Product
      {
        Id = id,
        Name = "Laptop",
        Price = 1200
        };
        // 3. Store in cache
        await _cacheService.SetAsync(
        cacheKey,
        product,
        TimeSpan.FromMinutes(10));
        return Ok(new
        {
          Source = "Database",
          Data = product
          });
        }
      }

Product Model (Product.cs)

public class Product
{
public int Id { get; set; }
public string Name { get; set; } = string.Empty;
public decimal Price { get; set; }
}

In the example, Redis is used as a distributed cache between the microservice and the database to improve performance. Normally, every API request would directly query the database, which can become slow under heavy traffic.

With Redis, the service first checks whether the requested data already exists in cache. If the data is found (cache hit), it is returned immediately from Redis, which is extremely fast. If the data is not found (cache miss), the service fetches it from the database, returns it to the client, and also stores it in Redis for future requests.

In the C# code, AddStackExchangeRedisCache() configures the application to connect to Redis, while IDistributedCache provides methods to store and retrieve cached data. Since Redis stores strings or bytes, objects are converted to JSON using serialization before saving and deserialized back into C# objects when reading.

The CacheService class acts as a reusable abstraction layer so controllers do not directly interact with Redis logic. Cache expiration is added to ensure outdated data is automatically removed after a specified time.

This approach is especially useful in microservices because multiple services can share the same Redis server and access common cached data, reducing database load and significantly improving response times.

Summary

Distributed caching is essential for building scalable microservices architectures. A well-planned distributed cache design can dramatically improve performance, reduce database load, and enhance user experience.

Among available technologies, Redis remains one of the best choices for implementing a Redis distributed cache due to its speed, scalability, and operational simplicity.

Using Redis as distributed cache allows microservices to share fast-access data efficiently while supporting high availability and horizontal scaling.

Thanks