Assuming it isn't simply a matter of luck of timing, seeing higher throughput with more streams implies that the TCP window size settings at the sender, and the receiver, are not sufficient to enable achieving the full bandwidth-delay product with the smaller number (eg single) of streams.
There are likely plenty of references for TCP tuning for high bandwidth delay product networks. One which touches upon the topic, which is near and dear to my heart for some reason :) is at https://services.google.com/fh/files/misc/considerations_when_benchmarking_tcp_bulk_flows.pdf