Key is storing the video frames from the past couple of seconds, i.e. in a ring buffer. Once you have detected a distinct playing card, apply block motion detection backwards. You should get a lot of redundant motion vectors (one is sufficient to tell the origin), so filter them and you are able to retrieve the original direction of the card.