You're on the right track. To build a video streaming app, it’s important to break things down: capture → encode → package → deliver → playback. Start by defining your use case: Is it low-latency live video like conferencing? Or a one-to-many broadcast stream?
Instead of reinventing the wheel, explore existing protocols like WebRTC for real-time and HLS/DASH for traditional streaming. Tools like FFmpeg are a must-know for encoding and muxing. Also, understanding basic networking (TCP/UDP) helps, but you won’t need to build protocols from scratch.
If you're looking for a real-world approach, platforms like VPlayed or Muvi can handle encoding, delivery, and multi-device support—all while letting you retain full control over infrastructure and monetization. That can save you from building everything from the ground up while still giving you room to customize.
Once you understand the layers—media capture, codecs, containers, delivery—you’ll be fully equipped to create a video streaming app: https://www.vplayed.com/build-video-streaming-app.php