Thanks @Hamed Jimoh and @Salketer for your comment. After studying the ricky123 VAD code base, I switched to use NonRealTimeVAD
following the example (https://github.com/ricky0123/vad/blob/master/test-site/src/non-real-time-test.ts#L31). Here is the code used in a Web Worker:
import { NonRealTimeVAD, NonRealTimeVADOptions, utils } from "@ricky0123/vad-web";
var concatArrays = (arrays: Float32Array[]): Float32Array => {
const sizes = arrays.reduce((out, next) => {
out.push(out.at(-1) as number + next.length);
return out;
}, [0]);
const outArray = new Float32Array(sizes.at(-1) as number);
arrays.forEach((arr, index) => {
const place = sizes[index];
outArray.set(arr, place);
});
return outArray;
};
// const options: Partial<NonRealTimeVADOptions> = {
// // FrameProcessorOptions defaults
// positiveSpeechThreshold: 0.5,
// negativeSpeechThreshold: 0.5 - 0.15,
// preSpeechPadFrames: 3,
// redemptionFrames: 24,
// frameSamples: 512,
// minSpeechFrames: 9,
// submitUserSpeechOnPause: false,
// };
var Ricky0123VadWorker = class {
vad: NonRealTimeVAD|null;
sampleRate: number = 16000;
constructor() {
this.vad = null;
this.init = this.init.bind(this);
this.process = this.process.bind(this);
}
public async init(sampleRate: number) {
console.log("VAD initialization request.");
try {
this.sampleRate = sampleRate;
const baseAssetPath = '/vad-models/';
defaultNonRealTimeVADOptions.modelURL = baseAssetPath + 'silero_vad_v5.onnx';
// defaultNonRealTimeVADOptions.modelURL = baseAssetPath + 'silero_vad_legacy.onnx';
this.vad = await NonRealTimeVAD.new(defaultNonRealTimeVADOptions); // default options
console.log("VAD instantiated.");
self.postMessage({ type: "initComplete" });
}
catch (error: any) {
self.postMessage({ type: 'error', error: error.message });
}
}
public async process(chunk: Float32Array) {
// Received an audio chunk from the AudioWorkletNode.
let segmentNumber = 0;
let buffer: Float32Array[] = [];
for await (const {audio, start, end} of this.vad!.run(chunk, this.sampleRate)) {
segmentNumber++;
// do stuff with
// audio (float32array of audio)
// start (milliseconds into audio where speech starts)
// end (milliseconds into audio where speech ends)
buffer.push(audio);
}
if (segmentNumber > 0) {
console.log("Speech segments detected");
const audio = concatArrays(buffer);
self.postMessage({ type: 'speech', data: audio });
}
else {
console.log("No speech segments detected");
}
}
// Finalize the VAD process.
public finish() {
this.vad = null;
}
};
var vadWorkerInstance = new Ricky0123VadWorker();
self.onmessage = (event) => {
const { type, data } = event.data;
switch (type) {
case "init":
vadWorkerInstance.init(data);
break;
case "chunk":
vadWorkerInstance.process(data);
break;
case "finish":
vadWorkerInstance.finish();
break;
}
};
The worker creation in the main thread:
const vadWorker = new Worker(
new URL('../lib/workers/ricky0123VadWorker.tsx', import.meta.url),
{ type: 'module' }
);
Upon running the web page, it still hangs on this.vad = await NonRealTimeVAD.new()
as console.log afterwards never outputs the trace message. I tried both silero_vad_legacy.onnx
and silero_vad_v5.onnx
. I also copied the following files into public/vad-models/
folder:
silero_vad_v5.onnx
silero_vad_legacy.onnx
vad.worklet.bundle.min.js
ort-wasm-simd-threaded.wasm
ort-wasm-simd-threaded.mjs
ort-wasm-simd-threaded.jsep.wasm
ort.js
I suspect something wrong with underlying model loading. Without any error messages, it's hard to know where the problem is exactly. Could anyone enlighten me on what else I missed out to cause the hang?
Thanks