Reports

I have a similar issue unnesting a dataset where the column with the data to unnest contains unequal rows and columns instead of equal sized data as in the original question.

Example dataset:

DT <- data.table::data.table(
  a = c("a1", "a2"),
  df1 = list(data.frame(
    b = c("b1", "b2")
  ))
)
n <- 5 #1e5
set.seed(42)
dt1 <- DT[sample(seq_len(nrow(DT)), n, TRUE),]

# Add unequal data to unnest column
DT2 <- data.table::data.table(
  a = c("a3"),
  df1 = list(data.frame(
    b = c("b1", "b2", "b3"),
    c = c("c1", "c2", "c3")
  ))
)
n <- 5
dt2 <- DT2[sample(seq_len(nrow(DT2)), n, TRUE),]

dt1  <- rbind(dt1, dt2)

Using the data.table solutions I get the following results:

dt1[, data.table::rbindlist(df1, fill = TRUE), by = .(a)]
Error in `[.data.table`(dt1, , data.table::rbindlist(df1, fill = TRUE),  : 
  j doesn't evaluate to the same number of columns for each group

dt1[, unlist(df1, TRUE, FALSE), .(a)]
# Works but all unnested data is placed in a single column

Other data.table solution works but is very slow. Much slower than tidyr::unnest(dt1, cols = c(df1)) which can handle this dataset.

unnested <- rbindlist(
  lapply(seq_len(nrow(dt1)), function(i) {
    inner_dt <- as.data.table(dt1$df1[[i]])  # Convert to data.table
    inner_dt[, a := dt1$a[i]]                # Add outer column
    return(inner_dt)
  }),
  fill = TRUE  # Fill missing columns with NA
)
setcolorder(unnested, c("a", setdiff(names(unnested), "a")))

Any ideas to unnest this type of dataset fast using data.table?

79766410