From what I’ve gathered so far, the overall requirement is to build a pipeline that identifies which users should be contacted in real time for vKYC calls based on these conditions: 1. Start with the most recent data from your base table (loan_taken_vkyc_strategy_base). You either: • Take only the rows for the latest eligibility date overall, or • For each user, take their most recent eligibility record. 2. Remove users who already have a completed vKYC event, by left-joining the base table to the videokycevent table and excluding any users present there (we only want people not in vKYC yet). 3. Filter to “active” users in MoEngage (for instance, those who clicked an email or notification) within a certain recent timeframe—often the last hour or last day. 4. Check who booked a vKYC slot (also via MoEngage events) on the current date. 5. Check who has been “surfing” the app within the last hour (by looking at log_master_3 and seeing recent activity). 6. Combine all those conditions: • The user is from the latest base subset, • Not already in vKYC, • Has “active” (clicked) events in the last hour, • Booked a slot today, • Has been “surfing” in the last hour.
Only the users fulfilling all of these criteria end up in the final user set that you want to call in real-time for vKYC.
Key Points • How you do each step can vary, but typically you’ll do: 1. CTE #1: get the latest base data, 2. CTE #2: remove those in vKYC (left join + WHERE vkyc.userid IS NULL), 3. CTE #3: identify “active clickers” in MoEngage (last hour), 4. CTE #4: identify “slot-booked” events (today), 5. CTE #5: identify “surfers” in log_master_3 (last hour), 6. CTE #6 (final): intersect these sets to produce the final user list. • Time windows (last hour, current date, etc.) can be changed as needed. • You’ll eventually run one or more queries in Athena, possibly with multiple CTEs or smaller queries that you then merge in Python/pandas.
Essentially, you want to narrow down the user population from the base table, step by step, so that you end up with a small group of currently engaged, potentially interested users who haven’t yet done vKYC—and then you call them to complete that process.