Reports

A solution that does not require looking at the transcription strings and guessing if the words actual represent a new piece of speech:

I have only tested this on MacOS and not on iOS so take it with a grain of sand, but I have found that the bestTranscription will generally be emptied/reset after the speechRecognitionMetadata field in the result is not nil.

Which means that gathering the complete transcription is simply a matter of concatenating all the transcriptions when the speechRecognitionMetadata is present:

var cominedResult = ""

func combineResults(result: SFSpeechRecognitionResult) {
    if (result.speechRecognitionMetadata != nil) {
        cominedResult += ". " + result.bestTranscription.formattedString
    } 
    else {
        // I still want to print intermediate results, you might not want this.
        let intermediate = cominedResult + ". " + result.bestTranscription.formattedString
        print(intermediate)
    }
}

79708295