Turns out it works when I use RGBA image instead of greyscale image for the mask. Even though MoviePy docs explicitly mention that mask should always be greyscale image, but it seems to work with RGBA images only.