Hi,
I found out that the issue is with the JSON input format. The actual input looks like this:
"values": [
{
"recordId": "1",
"data": {
"file_data": {
"data": "<base64-encoded-pdf>"
}
}
}
]
}
However, I had assumed it followed the format shown in the documentation for #Microsoft.Skills.Util.DocumentExtractionSkill:
https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-document-extraction
Also, I found a similar question here:
How do I read the original pdf file in set indexer datasource in a custom WebApiSkill after enabling "Allow Skillset to read file data"