Date: 2020-08-08 14:03:39
Score: 4
Report link

I understand that this is not the most appropriate answer to this question. I'm only posting this because I found this useful and the accepted solution was too slow in my case. I would've liked to comment on the accepted answer to understand why it was slow but I don't have enough reputation to do so.

Kindly do not down vote, instead ask me to remove this answer in a comment and I will do so.

I have a pandas DataFrame named df with some text like below:

import pandas as pd

df = pd.DataFrame({'a':['president ronald regan', 'george'], 'b':['george bush', 'president george']})

I want to run the query '"president"*("ronald"+("george"*~"bush"))', which is my interpretation of the query in the question, on df.

class logical_search:
    def __init__(self, expression=""):
        self.expression = expression
    def evaluate(self):
        return eval(self.expression)

def add_self(match):
    self_added = 'self.'+match.group(2) 
    return self_added

expression = '"president"*("ronald"+("george"*~"bush"))'

self_expression = re.sub(r'(")([^"]*)(")', add_self, expression)
ls = logical_search(self_expression)

var_nms = re.findall(r'"([^"]*)"', expression)
for var_nm in var_nms:
    setattr(ls, var_nm, df.applymap(lambda x: var_nm in x))

result = ls.evaluate()

I would be grateful to hear any feedback about this. Thanks!

  • Blacklisted phrase (0.5): Thanks
  • Blacklisted phrase (1): to comment
  • RegEx Blacklisted phrase (1): I don't have enough reputation
  • RegEx Blacklisted phrase (2): I would be grateful
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: PSK