79820736

Date: 2025-11-15 10:25:04
Score: 2.5
Natty:
Report link

My goal is to load transactions for only those customers which is present in Input View and do some processing on them later.

The current implementation is:

1. Hdfs has data for customers with columns (cust_id, date, uuid)
So , first I read this data and create input view on this one.

2. Now later in the pipeline I have create a view from transactions table of DB having schema
(transaction_id, customer_id, date, transaction_amt, tran_type, current_amt).

3. At this point I now have both views with me input and transactions view. Then i am running SPARK SQL on them as "Select i.cust_id, t.transaction_id, t.transaction_amt, t.tran_type from transactions t join input i on i.cust_id=t.customer_id && i.date= t.date"

Now what happens here is that Spark will load all the data from transactions table to create view which is not efficient.
I want to achieve some filter push down in RDBMS also like Spark does for Hdfs.

Reasons:
  • RegEx Blacklisted phrase (1): I want
  • Long answer (-0.5):
  • No code block (0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Parth Sarthi Roy