I understand that this is not the most appropriate answer to this question. I'm only posting this because I found this useful and the accepted solution was too slow in my case. I would've liked to comment on the accepted answer to understand why it was slow but I don't have enough reputation to do so.
Kindly do not down vote, instead ask me to remove this answer in a comment and I will do so.
I have a pandas
DataFrame
named df
with some text like below:
import pandas as pd
df = pd.DataFrame({'a':['president ronald regan', 'george'], 'b':['george bush', 'president george']})
I want to run the query '"president"*("ronald"+("george"*~"bush"))'
, which is my interpretation of the query in the question, on df
.
class logical_search:
def __init__(self, expression=""):
self.expression = expression
def evaluate(self):
return eval(self.expression)
def add_self(match):
self_added = 'self.'+match.group(2)
return self_added
expression = '"president"*("ronald"+("george"*~"bush"))'
self_expression = re.sub(r'(")([^"]*)(")', add_self, expression)
ls = logical_search(self_expression)
var_nms = re.findall(r'"([^"]*)"', expression)
for var_nm in var_nms:
setattr(ls, var_nm, df.applymap(lambda x: var_nm in x))
result = ls.evaluate()
I would be grateful to hear any feedback about this. Thanks!