I have a similar issue unnesting a dataset where the column with the data to unnest contains unequal rows and columns instead of equal sized data as in the original question.
Example dataset:
DT <- data.table::data.table(
a = c("a1", "a2"),
df1 = list(data.frame(
b = c("b1", "b2")
))
)
n <- 5 #1e5
set.seed(42)
dt1 <- DT[sample(seq_len(nrow(DT)), n, TRUE),]
# Add unequal data to unnest column
DT2 <- data.table::data.table(
a = c("a3"),
df1 = list(data.frame(
b = c("b1", "b2", "b3"),
c = c("c1", "c2", "c3")
))
)
n <- 5
dt2 <- DT2[sample(seq_len(nrow(DT2)), n, TRUE),]
dt1 <- rbind(dt1, dt2)
Using the data.table
solutions I get the following results:
dt1[, data.table::rbindlist(df1, fill = TRUE), by = .(a)]
Error in `[.data.table`(dt1, , data.table::rbindlist(df1, fill = TRUE), :
j doesn't evaluate to the same number of columns for each group
dt1[, unlist(df1, TRUE, FALSE), .(a)]
# Works but all unnested data is placed in a single column
Other data.table
solution works but is very slow.
Much slower than tidyr::unnest(dt1, cols = c(df1))
which can handle this dataset.
unnested <- rbindlist(
lapply(seq_len(nrow(dt1)), function(i) {
inner_dt <- as.data.table(dt1$df1[[i]]) # Convert to data.table
inner_dt[, a := dt1$a[i]] # Add outer column
return(inner_dt)
}),
fill = TRUE # Fill missing columns with NA
)
setcolorder(unnested, c("a", setdiff(names(unnested), "a")))
Any ideas to unnest this type of dataset fast using data.table?