I figured out how to get the output that I needed. I'll post it here for others to see and comment on.
The way I did it was to also require jq as a provider, which then allowed me to run a jq_query data block. This is the full end to end conversion of the data sources:
locals {
instances_json = jsonencode([ for value in data.terraform_remote_state.instances : value.outputs ])
}
data "jq_query" "all_ids" {
data = local.instances_json
query = ".[] | .. | select(.id? != null) | .id"
}
locals {
instances = split(",", replace(replace(data.jq_query.all_ids.result, "\n", "," ), "\"", "") )
}
The last locals block is needed because the jq_query block returns multiple values but the string is not in a standard json format. So we can't decode the string from json, we just simply have to work around it. So I replaced the "\n"
characters with commas, and then replaced the \"
with nothing so that the end result would give me something I could use the split function with to split up the values into a list.