I got my own answer. Turns out, dragonfly tries to use 6 threads per program, and with 6 nodes, dragon fly used 36 threads, and my cpu cores were being maxed out on htop. When I reduced threads using --proactor_threads (2 for master and 1 for slave), the performance easily improved.
ops/sec: 165,611
latency: 3.62 ms
throughput: 9.11 MB/sec
edit: so issue was running on the same machine, if it was actual different machines, performance would have improved.