A good reference for this is the Python Universal Feed Parser which has thorough sanitization based on a list of HTML elements and attributes that are allowed through, and excludes any elements that allow script to be run.
If you want a test suite for this, there is an extensive one for the allowed and disallowed attributes in the python project.