Found the answer using strace. I set TCP_CORK wrongly, and I may have to disable TCP_CORK afterwards, at least that's what Nginx does.
Here's the code I used to solve the issue (sending PSH FIN in one go):
setsockopt(event->data.fd, SOL_TCP, TCP_CORK, &enable, sizeof(int));
send(event->data.fd, response.c_str(), response.size(), MSG_NOSIGNAL);
shutdown(event->data.fd, SHUT_WR);
setsockopt(event->data.fd, SOL_TCP, TCP_CORK, &disable, sizeof(int));
This indeed did not solve the WRK performance issue like one commenter said, but it did the thing I asked about in the question.
I am now on the way to figure out the performance issue - namely by capturing WRK packets instead of a single Apache Bench packet I noticed what Nginx also does is use Connection: keep-alive to reuse the connection when there's many inbound connections, and that's what I believe now is the right course of action to optimize for many inbound connections including better WRK results.