A solution that does not require looking at the transcription strings and guessing if the words actual represent a new piece of speech:
I have only tested this on MacOS and not on iOS so take it with a grain of sand, but I have found that the bestTranscription
will generally be emptied/reset after the speechRecognitionMetadata
field in the result is not nil
.
Which means that gathering the complete transcription is simply a matter of concatenating all the transcriptions when the speechRecognitionMetadata
is present:
var cominedResult = ""
func combineResults(result: SFSpeechRecognitionResult) {
if (result.speechRecognitionMetadata != nil) {
cominedResult += ". " + result.bestTranscription.formattedString
}
else {
// I still want to print intermediate results, you might not want this.
let intermediate = cominedResult + ". " + result.bestTranscription.formattedString
print(intermediate)
}
}