Async execution of queries does not necessarily imply true parallelism. While executeAsync() is non-blocking, spawning 300 queries could overwhelm the system. Not all threads will be in the RUNNING state simultaneously. Many may be waiting or queued. This queuing likely explains the drastic increase in response time for APIs executing large numbers of queries.
I would recommend checking CPU utilization, thread pool stats (via nodetool tpstats), and capturing a thread dump to confirm thread contention or queuing bottlenecks on the Cassandra nodes.
Additionally, Cassandra does not support batch reads (Refer: Batch select in Cassandra). Also, upgrading hardware may not help much if the application logic and query patterns are inefficient. It would be more effective to optimize the queries, avoid wide rows or fetching 300 columns unless necessary, and limit concurrency with bounded async execution strategies rather than spawning hundreds of concurrent queries without control.