You need to ensure that your regex treats a base character plus its combining marks as part of the same "word." \w
doesn't recognize combining characters.
(cl-ppcre:scan "([a-zA-Z][\u0300-\u036F]*|[\u0300-\u036F]+)+" str :start 10)
Adjust ranges as needed.