How can I ensure that ClickHouse distributes queries across all replicas?
distributed queries by default use only one live replica per shard
you can change it with SETTINGS max_parallel_replicas=3
and check
SELECT hostName(), count() FROM db.distributed_table GROUP BY ALL
Do I need to configure a specific load balancing strategy for read queries?
use <load_balacing>
only if you understand why do you need it, for example nearest_hostname
usually used for geo-distributed cluster to select from the same region which encoded in DNS hostnames in <remote_servers>
<host>
Is there any way to confirm which replica is being used for queries?
SELECT hostName(), count() FROM db.distributed_table GROUP BY ALL