Theoretically one could use computer vision to detect what has differentially changed. And then generate the canonical transformations to the SVG elements generating as a result a very size efficient animated SVG file when compared to a movie or gif. One of the challenges is to make the object tracking work flawlessly, another is computing requirements to do so, possibly you have to train neural networks to sort this problem. You basically have to make sure that you correctly track things for example that moved, things that grew, or changed color, or even that temporarily disappeared.