The setup confuses me. So you have 2 cameras point downward at different location, the table is covering the FOV of camera A, and when you click at a point, the table then runs to camera B and show the point to the much lower field of camera B right? If that is the case, all you need to do is map both pixel coordinates from the 2 camera to the world coordinates. Camera B has a narrow FOV, so maybe just map the center point of that FOV to the world coordinates, or better, mark that point on the ground using some tapes and work out its world coordinates using a ruler. Now you could get the world coordinates of the point on the table using an inverse transformation, calculate the offset between the 2 points, and control the motor to move the table from the start to the end point.