Seems like @JasonTrue 's answer is not working anymore due to the "//body//text()"
XPath.
Acessing all the document's child nodes and then filtering out the empty text tags may be the way.
public static string StripInnerText(string html)
{
if (string.IsNullOrEmpty(html))
return null;
HtmlAgilityPack.HtmlDocument doc = new();
doc.LoadHtml(html);
if (doc is null)
return string.Empty;
var texts = doc.DocumentNode.ChildNodes
.Select(node => node.InnerText)
.Where(text => !string.IsNullOrWhiteSpace(text))
.Select(text => text.Trim())
.ToList();
var output = string.Join(Environment.NewLine, texts);
string textOnly = HttpUtility.HtmlDecode(output.ToString());
return textOnly;
}
Test it with the following fiddle: https://dotnetfiddle.net/NQC2Y5
Sorry for posting a new answer, it is because I don't have 50 reputation at the moment and this question and all the answers here was so useful for me that I felt like I have the duty to contribute.