I recently found that when returning std::pair from a function, extra move is needed compared to an aggregate struct. See this
https://godbolt.org/z/b63K9bzfs
All f1(), f2(), f3() will call constructor twice since we are constructing two new A objects. But for f1(), the objects are constructed directly on the caller. For f3(), two extra moves are called. I don't know how to optimize those away.