About the first problem, my guess is that The LLM uses DOM structure and visual hints to infer which element matches your instruction. So when visually adjacent elements (like icons or spans inside buttons) are rendered, the LLM picks the wrong node, especially if accessibility labels or semantic tags are missing.