Reports

To retrieve the full set of records inserted/updated between this run and the last time you executed a DML statement using the stream, the streams have to do a full outer join against the source table and scan the table =twice=. On smaller tables, this is less an issue (assuming the DW used is sized appropriately and can fit the temporary dataset into memory). BUT - as you saw in your second query against the table with 6.5B records - the memory available to the warehouse wasn't sufficient and the query sent 6TB of data over the network and spilled 4.3TB to local disk. Both are far slower than reading data from memory. You could try 1) significantly increasing the size of the warehouse the stream uses, 2) modifying the data flow into the source table to be "insert-only" and changing the stream to be append-only or 3) don't use a stream for this use case and - instead - use timestamps to find the latest updates on the source table (assuming "update_dt" is a column in the table), a metadata table to track the last run, and a temporary table that invokes time travel to retrieve changes.

79565177