If somebody still struggles with this. I ended up creating my own Swift binary with both ScreenCaptureKit (for MacOS 13.0-14.1) and Core Audio Hardware Taps (for MacOS 14.2+). Here are the docs:
ScreenCaptureKit (you can ignore video, and set 2x2px with low fps to save resources): https://developer.apple.com/documentation/screencapturekit/
Core Audio (audio only, better quality, but not supported by many libraries, like virtual devices): https://developer.apple.com/documentation/coreaudio/audiohardwarecreateprocesstap(_:_:)
AFAIK, to support older MacOS, you either need to write C++/Objective-C to create a Kernel Extension (needs certification) or use some kind of virtual device (Blackhole, Loopback, SoundPusher). If you know a better way, please let me know.
But writing this Swift binary wasn't easy, especially in Swift 6 with new strict concurrency. And LLMs won't help much because Swift is a niche language, so they just don't know it very well, especially the newest syntax and methods like Actors, Sendable, etc. Here are some examples for SCK and Node.js:
You can later use this binary in your Electron code with the child_process
module or a library like execa: https://github.com/sindresorhus/execa