This looks like a good use case for Apache Spark. This would typically be done with Python or Scala, but there is no reason you couldn't also do this in Java (Apache Spark has java libraries). Not sure this answers your question but I think this approach is worth looking into.