When you pass nil
to AVAssetReaderAudioMixOutput
, like this:
[AVAssetReaderAudioMixOutput assetReaderAudioMixOutputWithAudioTracks:audioTracks audioSettings:nil];
You're telling AVFoundation:
“Just give me the original audio format, exactly as stored in the asset. No conversion. I’ll handle it myself.”
When you're working with a spatial audio, here's what happens:
The source audio is stored in a complex multichannel format — not just stereo.
It might be:
4 channels (from mic arrays)
Ambisonic B-format
Custom layout (like mic A + mic B + directional data)
When you pass nil
settings to the reader, AVFoundation says:
“Alright, here's your raw multichannel format (e.g. 4ch at 48kHz). Have fun!”
But your writer input is expecting:
AVNumberOfChannelsKey: @1 // or @2 AVSampleRateKey: @44100
So when you do:
[input appendSampleBuffer:sampleBuffer]
It fails with:
-11800 (cannot complete) / -12780 (format mismatch)
Because:
Provide explicit settings for the reader (e.g. downmix to 2-channel PCM), like:
NSDictionary *audioReaderSettings = @{
AVFormatIDKey: @(kAudioFormatLinearPCM),
AVSampleRateKey: @(44100),
AVNumberOfChannelsKey: @(2),
AVLinearPCMBitDepthKey: @(16),
AVLinearPCMIsFloatKey: @(NO),
AVLinearPCMIsBigEndianKey: @(NO),
AVLinearPCMIsNonInterleaved: @(NO)
};
self.audioOutput = [AVAssetReaderAudioMixOutput assetReaderAudioMixOutputWithAudioTracks:audioTracks audioSettings:audioReaderSettings];
Then AVFoundation knows:
“Ah, okay, I’ll decode and downmix this spatial audio into regular stereo for you.”
Now the writer is happy because it gets standard 2-channel PCM and can encode it to AAC smoothly.