What is the Circuit Breaker?
In a few words, it’s a kind of circuit breaker or a design pattern created to add resilience and fault tolerance to distributed systems. It acts as a proxy between the calling service and the target service, preventing cascading failures from bringing down the entire application.
The Problem It Solves?
In a microservices architecture, when service A depends on service B that becomes unavailable, service A starts to accumulate blocked threads waiting for responses that never arrive. This leads to resource exhaustion, which in turn makes service A unavailable as well, and thus the failure propagates in a domino effect throughout the application. The Circuit Breaker solves this by failing fast instead of letting the request hang for long periods.
The Three States
| State | Behavior | Transition |
|---|---|---|
| Closed | Requests flow normally; failures are monitored | -> Open, when the error threshold is reached |
| Open | Requests are blocked immediately; returns fallback or error | -> Half-Open, after the recovery timeout |
| Half-Open | A limited number of test requests is allowed | -> Closed (success) or Open (failure) |
In the Closed state, the system operates normally while the circuit breaker monitors the error rate. If the number of failures exceeds a configured threshold (e.g., 15 failures in 60 seconds), it transitions to Open. After a waiting period (cooling-off period), it moves to Half-Open, where it allows some test requests to check if the service has recovered.
Benefits and Challenges
Benefits
- Prevents cascading failures
- Improves overall system stability by reducing load on failing services
- Provides insights into the health and reliability of services
Challenges
- Configuring ideal thresholds and timeouts requires deep knowledge of service behavior
- Requires well-defined fallback strategies
- The pattern assumes the service recovers over time, which is not always true
When to Use
- Synchronous calls between microservices
- Integrations with external or third-party APIs
- Services with high risk of overload or variable latency
- High-availability environments where downtime is critical