With some help from @Rory (many thanks), I think I can now explain this behaviour.
The key things to understand are:
In Excel, some function parameters can accept either reference or array arguments. The behaviour of the function may differ depending on whether a reference or an array is passed.
INDEX()
is such a function. The documentation refers to the "array form" and the "reference form". The syntax is almost identical (except for the rare "reference form" case where multiple ranges are passed).
The documentation states that the array
parameter of the "array form" can accept "a range of cells or an array constant". Crucially, "range of cells" here does not mean a reference to a cell range on a worksheet such as A1:B2
(this would be a reference). Rather, it means an in-memory cell range, as might be returned by a nested function call to an outer function (examples below).
Therefore, INDEX(A1:B2, ...)
always calls the "reference form" of the INDEX()
fucntion, never the "array form".
As pointed out by @Rory in the comments, there is a minor difference in the documentation between the two forms: in the "array form", the column_num
parameter is optional for a 2D range ("2D" meaning > 1 row, > 1 column). This is not the case for the "reference form".
This explains why, in this unique case of a 2D reference with an omitted column_num
, a #REF!
error is returned, whereas the equivalent call on a literal array (or an in-memory cell range) succeeds because the "array form" supports this.
Examples:
=INDEX(A1:B2, 0) : fails (reference form; col_num is required for a 2D reference)
=INDEX(A1:B2, 0, ) : succeeds (reference form; the comma means that col_num is present, even though 'missing')
=INDEX({1, 2; 3, 4}, 0) : succeeds (array form; col_num is optional for a 2D array)
=INDEX(INDEX(A1:B2, 0, 0), 0) : fails (a _reference_ is passed to the outer INDEX() call)
=INDEX(SQRT(INDEX(A1:B2, 0, 0)), 0) : succeeds (an _array_ is passed to the outer INDEX() call)
This last example is the "range of cells" case: the inner INDEX()
function returns a reference, which is processed by the SQRT()
function, returning an array ("range of cells"), which then calls the "array form" of the outer INDEX()
function.
I still do not understand why this difference is necessary; I can see no reason why the "reference form" should not behave identically to the array form in this regard. Presumably there is a reason, which I would love to hear if anyone knows it!