79542278

Date: 2025-03-28 19:40:43
Score: 0.5
Natty:
Report link

I found the best method by comparing for subsequent characters or letters, all other methods did not really work, if the differences in the string length were too big.

I had the situation in ComfyUI to find a lora by it's name, which was extracted from an images metadata and which had to be compared against a list of installed lora's. The main difficulty is, that the locally installed loras are all modified in their names, so that the names are still similar, but not matching.

A method like "Jaccard similarity" or other quantizations did not work and gave sometimes even results for completely different names, but where the amount of matching characters was even better, than the correct name.

So I've wrote a method, to compare two strings for subsequent characters. To make it a bit more complicated: the lora names to find are in a list, to compare with another list containing the locally installed loras. The best matches will the be stored in a dict.

# get a list of loras
model_list = folder_paths.get_filename_list("loras")
loras = {}
for lora in lora_list:
    similarity = 0
    # clean the string up from everything not ordinary 
    # and set it to lowercase
    lora_name = os.path.splitext(os.path.split(str(lora).lower())[1])[0]
    lora_name = re.sub('\W+',' ', lora_name.replace("lora", "").replace("  ", " ")).strip()

    for item in model_list:
        # clean the string and set it to lowercase
        item_name = re.sub('\W+',' ', os.path.splitext(os.path.split(item.lower())[1])[0]).strip()
        # get the shorter string first

        n1, n2 = (item_name, lora_name) if len(lora_name) > len(item_name) else (lora_name, item_name)
        set0 = (set(n1) & set(n2)) # build a set for same chars in both strings

        n1_word = ""
        n1_size = 0 # substring size
        n1_sum = 0 # similarity counter

        # check for subsequent characters
        for letter in n1:
            if letter in set0: # if it exists in both strings ...
                # reassemble parts of the string
                n1_word += letter
                if n2.find(n1_word) > -1: # check for existence
                    n1_size += 1 # increase size
                else: # end of similarity
                    if n1_size > 1: # if 2 or more were found before
                        n1_sum += n1_size

                    # reset for next iteration
                    n1_size = 1 
                    n1_word = letter
            else: # does not exist in both strings
                # end of similarity
                if n1_size > 1: 
                    n1_sum += n1_size # if 2 or more were found before

                # prepare for next new letter
                n1_size = 0
                n1_word = ""
        if n1_size > 1: # if 2 or more were found at last
            n1_sum += n1_size

        # get score related to the first (shorter) strings length
        n1_score = float(n1_sum / len(n1))

        if n1_score > similarity:
            similarity = n1_score
            best_match = [item,]
    best_match = best_match[0]
    loras.update({best_match: lora_list[lora]})

So this gives me the best result and fails only, if there is really no locally installed lora with the characteristics of the description in the base list.

Reasons:
  • Blacklisted phrase (1): did not work
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: dschoni