79579545

Date: 2025-04-17 15:06:12
Score: 1.5
Natty:
Report link

Solution using modern array-based formulas, generalizes to any-size tables

Coming to this question after the advent of array-based formulas like MAP, BYROW, LAMBDA, etc., which seem to be faster than Apps Script functions (when less than 106 cells are involved). I want to offer an alternative solution that uses only formulas, and does not require "string hacking" (2), because some people need such features. This solution will work on tables with different shapes.

Definitions. In your example, we'll assume Table A is TableA!A1:B3 and Table B is TableB!A1:B5, and we're going to use LET to define four variables for clarity:

In either table, the key values are not unique. Our goal is to find all matches of key values (e.g. x1 and x2) between the two tables, and display only the corresponding values from TableA!A1:A3 and TableB!B1:B5.

Formula. The formula below generates a new array containing values from the first column of table A and second column of table B. Each row represents a match between keys_left and keys_right, with proper duplication when a key appears in multiple rows.

= LET(
  data_left, TableA!A1:A3, data_right, TableB!B1:B5,
  keys_left, TableA!B1:B3, keys_right, TableB!A1:A5,
  index_left, SEQUENCE( ROWS( keys_left ) ),
  prefilter, ARRAYFORMULA( MATCH( keys_left, keys_right, 0 ) ),
  index_left_filtered, FILTER( index_left, prefilter ),
  keys_left_filtered, FILTER( keys_left, prefilter ),
  matches, MAP( index_left_filtered, keys_left_filtered, LAMBDA( id_left, key_left,
    LET(
      row_left, XLOOKUP( id_left, index_left, data_left ),
      matches_right, FILTER( data_right, keys_right = key_left ),
      TOROW( BYROW( matches_right, LAMBDA( row_right,
        HSTACK( row_left, row_right )
      ) ) )
    )
  ) ),
  wrapped, WRAPROWS( FLATTEN(matches), COLUMNS(data_right) + COLUMNS(data_left) ),
  notblank, FILTER( wrapped, NOT(ISBLANK(CHOOSECOLS(wrapped, 1))) ),
  notblank
)

How it works? A few tricks are necessary to make this both accurate and fast:

Generalize. To apply this formula to other tables, just specify the desired ranges (or the results of ARRAYFORMULA or QUERY operations) for the first four variables; keys_left and keys_right must be single column but data_left and data_right can be multi-column. (Or create a Named Function and specify the four variables as parameters as "Argument placeholders".)

Named Function. If you just want to use this, you can import the Named Function INNERJOIN from my spreadsheet functions. That version assumes the first row contains column headers. See documentation at this GitHub repo.

Notes.

(1) I loved string-hacking approaches back when they were the only option, but doubleunary pointed out that they convert numeric types to strings and cause undesirable side effects.

(2) This is counterintuitive because it means you search the keys_right twice overall; but I found in testing that if you include unmatched rows in the joining step, is much costlier.

Reasons:
  • RegEx Blacklisted phrase (1): I want
  • Contains signature (1):
  • Long answer (-1):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Low reputation (0.5):
Posted by: garcias