After several attempts, I finally fixed the problem. The noise was caused by two issues: first, the SwrContext should be reused for each packet; second, the return value from swr_convert (ret) indicates the actual number of samples generated after resampling. Therefore, I had to use ret to accurately determine the correct buffer size and then combine the actual buffer to generate valid PCM.