Reports

The issue in your code arises because df1$gene contains concatenated gene names (e.g., "TMEM201;PIK3CD"), while df2$gene has individual gene names. The %in% operator checks for exact matches, so "PIK3CD" %in% "TMEM201;PIK3CD" returns FALSE.

To fix this, you need to check for partial matches using stringr::str_detect(). Here's a solution using sapply() and str_detect() from the stringr package:

**Solution

library(stringr)**

df1 <- data.frame(gene = c('TMEM201;PIK3CD','BRCA1','MECP2','TMEM201', 'HDAC4','TMEM201'))

df2 <- data.frame(gene = c('PIK3CD','GRIN2B','BRCA2'))

df1_common_df2 <- df1[sapply(df1$gene, function(x) any(str_detect(x, df2$gene))), ]

print(df1_common_df2)

--------------

str_detect(x, df2$gene): Checks if any value from df2$gene is present as a substring in each row of df1$gene.

sapply(..., any(...)): Ensures that if any match is found, the row is included in df1_common_df2

79492226