Reports

My goal is to load transactions for only those customers which is present in Input View and do some processing on them later.

The current implementation is:

1. Hdfs has data for customers with columns (cust_id, date, uuid)
So , first I read this data and create input view on this one.

2. Now later in the pipeline I have create a view from transactions table of DB having schema
(transaction_id, customer_id, date, transaction_amt, tran_type, current_amt).

3. At this point I now have both views with me input and transactions view. Then i am running SPARK SQL on them as "Select i.cust_id, t.transaction_id, t.transaction_amt, t.tran_type from transactions t join input i on i.cust_id=t.customer_id && i.date= t.date"

Now what happens here is that Spark will load all the data from transactions table to create view which is not efficient.
I want to achieve some filter push down in RDBMS also like Spark does for Hdfs.

79820736