Reports

This is a follow up to the answer by a deleted user (user12635841).

I run a website in German and have analyzed my spam for a few days. Here is a figure of the analysis I did in R:

As you can see, within my sample

only spam fills in the honeypot
only spam is not in German
only few legitimate messages have an URL in the message body
all spam is faster than 27 seconds while all legitimate messages are slower than 30 seconds

To identify language of the message, I use a list of the 500 or so most common word forms in German that I have cleaned of strings that appear in the most common Latin alphabet spam languages also (English, Dutch, French, Spanish etc.). I calculate a percentage of German-appearing words in the message (because some languages have strings that look like German words, e.g. English die and German die). Most spam has 0% German-appearing words, while German messages consist of about 30% or more of German-appearing "words".

I use a combination of all this information to filter out the majority of spam. I have set the filter to allow some spam (e.g. I don't filter on URL in message body) so I don't filter out any legitimate messages.

What I recommend is:

Analyze your spam and your legitimate messages for enough time to understand what differentiates the two. Then install a filter that fits your use case. Do not blindly employ one or all of the procedures given in the other answers without understanding your spam situation or you may miss important messages. Customers will buy elsewhere if you don't reply to them in a timely manner so avoid of false positives!

79456741