Could the
context deadline exceedederror be related to network latency, request timeouts, or server overload?
The "context deadline exceeded" error you're seeing is a client-side timeout, but in this scenario, it's almost certainly a symptom of a bottleneck on the server-side or in the networking path, rather than a simple RPC timeout. With 5,000+ connections, you are likely hitting a resource limit.
Are there specific configurations for Spring Boot Reactive gRPC to handle a large number of concurrent streams efficiently?
For a large number of long-lived streams, you'll need to tune your gRPC server and client settings, like thread pool management, flow control or keepalive pings.
Are there limits on concurrent connections imposed by GKE's networking or my Ingress configuration?
The e2-standard-4 machine type might be insufficient for 100,000 concurrent streams. Each connection consumes memory and file descriptors.
What strategies can I use to optimize gRPC performance for a large number of subscribers in a reactive Spring Boot application (e.g., connection pooling, flow control)?
Reuse gRPC channels, tune flow control, most importantly use service mesh.
Also, to get more insight into what's happening at the gRPC level, you can enable verbose logging. This can be done by setting the GRPC_VERBOSITY environment variable to DEBUG on your server. This will produce detailed logs about connections, RPCs, and transport-level events, which can be invaluable for debugging.
Are there alternative solutions I should consider for this use case, given the high fan-out requirement?
Maybe websocket?