Health checks for an application are generally designed to determine if a specific component or service is functioning correctly. Whether a health check should call other health checks depends on the design of your system and the level of granularity needed. Here's a breakdown of common approaches:
Each service or component has its own health check, and these checks do not call others.
Advantages:
Simplifies the design of each health check.
Avoids cascading failures caused by issues in downstream services.
Easier to pinpoint issues because the checks are scoped to specific components.
Use Case: Ideal for microservices or systems with independent components.
A "parent" health check queries the health checks of its dependencies.
Advantages:
Provides a high-level view of overall system health.
Simplifies monitoring by aggregating results in one place.
Challenges:
Can introduce dependencies between services.
If not properly managed, failures in one service can propagate.
Adds latency due to sequential or parallel checks.
Use Case: Useful in systems where high-level health summaries are needed, such as in a gateway or orchestrator.
Best Practices
Example: If a service can still function in a degraded state without a dependency, its health check might not query that dependency.
Timeouts and Circuit Breakers: When calling other health checks, use timeouts to avoid hanging on failures.
Granularity: Keep individual checks simple and lightweight. Avoid deep dependency chains in health checks.
Separate Liveness vs. Readiness:
Liveness checks: Ensure the application is running. Should rarely call external dependencies.
Readiness checks: Ensure the application is ready to handle requests, which may depend on the state of dependencies.