Search Results With "Best Result" Score Sorting?

Hello everyone!

I’m working on a script in which I’m trying to search the Render Materials in the Rhino Document and return the Render Material names matching the search result and organize the results in order of “best match”

In example if the Rhino doc contains render materials named:
“Oak redwood”, “Red Oak”, “Red Oak Polished”, “Polished Dark Oak”, “Red Tile Polished”, “Red Metal Polished”, “Red Corrugated Metal Polished”

And I input search result:
“oak”

I expect Oak Redwood first as that is the “closest match”. This would be followed by Red Oak, then Red Oak Polished, and then lastly Polished Dark Oak.

If I input search result:
Red Polished

I would expect Red Oak Polished, Red Tile Polished, and Red Metal Polished to be equally weighted score wise and would probably sort alphabetically for “tie breakers”.
I would expect Red Metal Polished to come before Red Corrugated Metal Polished.

If I input search result:
polished oak

I would expect Polished Dark Oak before Red Oak Polished.

Here’s my code thus far, I tried to setup a scoring method based on the summation of “found words” but a bit stuck at the moment and curious if I am “overthinking” all of this and there is a method that already handles this kind of “best match” sorting?

Code:

import scriptcontext

search_text = "oak polished"
search_limit = 3

# Finds materials that are actively in the model and match the search text partially and case-insensitively
def FindMaterialsByPartialName(mat_name, limit):
    mat_table = scriptcontext.doc.Materials
    render_mat_table = scriptcontext.doc.RenderMaterials
    matched_materials = {}
    matched_render_materials = {}

    search_words = mat_name.lower().split()  # Split search text into individual words

    # Search for regular materials
    for i in range(mat_table.Count):
        mat = mat_table[i]
        # Check if the material is in use by any objects
        in_use = False
        for obj in scriptcontext.doc.Objects:
            if obj.Attributes.MaterialIndex == i:
                in_use = True
                break
        if not in_use:
            continue

        # Calculate the score based on the number of matched words
        score = sum(word in mat.Name.lower() for word in search_words)

        # Check for partial and case-insensitive match with any of the search words
        if score > 0:
            if mat.Name in matched_materials:
                matched_materials[mat.Name].append((i, score))
            else:
                matched_materials[mat.Name] = [(i, score)]

    # Search for render materials
    for i in range(render_mat_table.Count):
        render_mat = render_mat_table[i]
        # Calculate the score based on the number of matched words
        score = sum(word in render_mat.Name.lower() for word in search_words)

        # Check for partial and case-insensitive match with any of the search words
        if score > 0:
            render_mat_name = f"{render_mat.Name}"
            if render_mat_name in matched_render_materials:
                matched_render_materials[render_mat_name].append((i, score))
            else:
                matched_render_materials[render_mat_name] = [(i, score)]

    # Print render materials found
    if matched_render_materials:
        print("RENDER MATERIALS:")
        count = 0
        # Sort render materials based on score before printing
        for mat_name, indices in sorted(matched_render_materials.items(), key=lambda x: max(x[1], key=lambda y: y[1]), reverse=True):
            if count >= limit:
                break
            indices_str = ','.join(str(index[0]) for index in indices)
            print("{0} found at Render Mat List[{1}]".format(mat_name, indices_str))
            count += 1
    else:
        print("No render materials matching '{}' found".format(mat_name))

if __name__ == "__main__":
    FindMaterialsByPartialName(search_text, search_limit)

Thank you all for your response!

Hi,

You could try regex instead, which is meant for text searching.
Here’s an example:

import re

test_data = [
    "Oak redwood",
    "Red Oak",
    "Red Oak Polished",
    "Polished Dark Oak",
    "Red Tile Polished",
    "Red Metal Polished",
    "Red Corrugated Metal Polished"
]

searches = [
    "oak",
    "red polished",
    "polished oak",
    "apricot"
]


for search in searches: 
    print(f"\nSearching for '{search}'") 
    search_terms = search.split(' ')
    pattern = r"|".join([rf"({s})" for s in search_terms])
    
    rated_data = []
    for data in test_data:
        matches = re.findall(pattern, data.lower())
        if not len(matches):
             continue
         
        hits = len(list(filter(lambda m: m != '', matches)))
        rated_data.append((hits, data))
    
    if not len(rated_data):
        print(f" x nothing found!")
        continue

    rated_data.sort(key=lambda t: t[0], reverse=True)
    max_rating = rated_data[0][0]
    
    best_results = filter(lambda t: t[0] == max_rating, rated_data)
    for rating, data in best_results:
        print(f" * {data} ({rating})")

This is the output:

Searching for 'oak'
 * Oak redwood (1)
 * Red Oak (1)
 * Red Oak Polished (1)
 * Polished Dark Oak (1)

Searching for 'red polished'
 * Red Oak Polished (2)
 * Red Tile Polished (2)
 * Red Metal Polished (2)
 * Red Corrugated Metal Polished (2)

Searching for 'polished oak'
 * Red Oak Polished (2)
 * Polished Dark Oak (2)

Searching for 'apricot'
 x nothing found!

The script seems to meet most of your criteria.
I’ve forgotten to sort the best matches alphabetically, but that should be easy to add.

2 Likes

Amazing @diff-arch, thank you! This appears to do it and yes I will add the alphabetic sorting and post back here.

Thanks again!

1 Like

You’re welcome!

Keep in mind that you still might want to refine the regex pattern.
My suggestion seems to work, however there might be some edge cases that you also might want to capture.

The code-obfuscated pattern above looks like this: (search1)|(search2)|(...)|(searchX) (e.g. (red)|(polished) or (oak))

This simply means “match group (search1) or (search2) or (...) or (searchX).
What’s between the parentheses is a group to search for and the pipe character (i.e. vertical bar) means “or”, as in this group or that group.

regexr.com is a neat website to learn about and test regex patterns.
re is an imbalanced tool to know. It’s so handy when dealing with strings, and I think rather efficient in a lot of cases.

Anyway, have a nice Sunday!

1 Like

Thank you! I’ve been refining the pattern with groups as you mention and getting really close to the result I’m after.

Much appreciated! Regex is definitely powerful and I had forgotten I had used it before in some other similar scripts. It’s a bit cryptic to me syntactically but the website you shared certainly helps!

Thank you for your time and detailed response, have a great Sunday!

1 Like