You don't actually need PCRE to identify offices documents. For example, PDF can be identified using this simple rule:
rule pdf {
strings:
$pdf = "%PDF-"
condition:
$pdf at 0
}
For other documents, since they are actually packaged inside zip archives, you could search for the zip magic at offset 0, and search for the document type identifiable paths as strings in you yara