- Should we store invoices directly in the database or use Kafka as a buffer first? As in all architectural decisions, the answer is "it depends". Is your volume going to be high enough that you are going to outrun your database? If so, does buffering fix it? Could you make changes in your database to fix it? I've seen MySQL databases with minimal indexes handle 20K-200K records / second insert rates... That's pretty high. Kafka buffers in that range will hit memory limits and network I/O issues very quickly. There are a LOT of variables here, and your best bet is to do an experiment, figure out where your real constraints are, and start optimizing them.
- Is using Redis for temporary storage of invoices before full persistence a good idea? It depends... What are your durability requirements? How do you have Redis configured (are you using Enterprise?) How do you handle it if the write fails to the the "real" database? The user interactions in there are generally complex with "provisionally written" invoices vs "actually committed" invoices, you will have much more client complexity.
- What is the best way to connect the persistor and validator services? Kafka, RabbitMQ, or REST It depends... Too many considerations to talk about here.. Use cases for these technologies differ wildly, questions about scalability, producer/consumer patterns, team capability. You are talking about which tech to pick instead of thinking about what problems you are trying to solve, then selecting a tech to match those problems. Start with the simplest solution, then figure out why it doesn't work and solve that problem.
Personally, I'd take a much simpler approach getting started. Have a REST call to insert invoices into a table, drop a message on Kafka or RabbitMQ for validation & processing. See what the capacity looks like and where you are overwhelming throughput. Then look at where caching can increase performance if needed. Don't add complexity until you know you need to add it.