Python - Struggling With Data Trees/Iterating Nested Lists

Hello,

I’ve found quite a few topics on datatrees aka lists of lists in Python and stumbled across the tree helper functions which are great!

However, I’m stuck on understanding how to loop within a tree in a basic sense…

Here’s the example files I’m working in and the python script I have thus far:

Currently my component outputs (B) as a data tree how I want it, my next step is to loop through (B) and add additional functionality to the script at those locations based on certain criteria.

If I return (B) as a list instead of a data tree, the data isn’t nested how I need it. (See A) but I think this is okay because in python when I print this list it appears to contain nested lists as expected (like the data tree)

How do I iterate through the list in the list though? I thought a nested for loop inside another for loop but that (perhaps user error) didn’t seem to return the results I expected with the if and elif statements (see definition unit_check(B))

Is the key to success working only with data trees or am i just missing the fundamentals of iterating and appending nested lists?
Perhaps I’m overthinking all of this but I feel like this is my biggest hurdle to moving forward in GH Python.

Thank you all for any help and pointers!

Graph Space:

Code Thus Far:

Functions Remaining(psuedo code)(sharing to help explain my intent):

#Get_Frac_Values
If “/” in List
Get Item before and after “/” and divide them by one another to get the fraction decimal value as a single number

#Sum_Secondary_Values
(index 1 and 2 in each data tree branch of (B) are secondary values (think 1 3/4" inches) so I will sum the 1 + .75 (decimal equivalent)

#Convert_To_Document_Units
(since index 0 (Primary Value) and index 1+2 (Secondary Value) can be different units (think 3’ 7cm) I want to keep index 0 seperate until I convert it to the document units, convert the secondary values to document units, and then sum the result of both converted values to finally return a single value)
example: branch {0;3} 0. 35’,1. 3, 2. 3/4cm] (35’ 3 3/4cm)(stupid value i know…)
in this example i will see that 35 contains " ’ " meaning it is unit system index 9 (feet) and index 1 and 2 contain “cm” meaning it is unit system index 3 (centimeters) therefore I will get the UnitScale of the systems against my model units and multiply the results leaving me with all 3 items in this branch converted to feet, then i can sum the result

Naughty Strings To Parse (Some naughtier than others):

1515/8
15-1 5/8
1'-0"
15 1 5/8
15 .75
15 3/4
15 3
15
15" 6.75
15 6.75
15m 6.75mm
15m 6.75
15m6.75cm
15 6.75cm
15 3/4 cm
15 -3/4 cm
15ft6.75"
15ft 6.75"
15 ft 6.75"
12' 3 3/4
12' 3 3/4"
12ft 3 3/4"
12 feet 3 3/4cm
12 feet 3 3/4centimeters

Code Thus Far:

import rhinoscriptsyntax as rs
import scriptcontext as sc
import Rhino.RhinoDoc as rd

import re
import ghpythonlib.treehelpers as th



#Parse String & Return Number Values Converted To Document Units

#Get Document Units
sc.doc = rd.ActiveDoc
model_units = rs.UnitSystem()
model_units_name = rs.UnitSystemName(True,False)
ghenv.Component.Message = str(model_units_name)


#Split Input String With Spaces or Hyphens
def split_string(V):
    results = []
    for item in V:
        if "-" in item:
            values = re.split(r"[- ]", item)
            results.append(values)
        elif " " in item:
            values = re.split(r"\s+", item.strip())
            results.append(values)
        else:
            results.append([item])
    return results

#Result A As Data Tree
A = th.list_to_tree(split_string(V))
#Result B As Python List Of Lists
B = (split_string(V))

def unit_check(B):
    #Create New Blank Unit List As Ul
    Ul = []
    #Search Keys For Each Unit
    ft_list = ["'","ft","feet"]
    in_list = ['"',"in","inches"]
    mm_list = ["mm","millimeters"]
    cm_list = ["cm","centimeters"]
    m_list = ["m","meters"]
    
    #If Value Contains Keys, Return Unit System, Else Return Default Unit System
    for values in [B]:
        for value in values:
#            print value
            for i in value:
#                print i
                if any(map(i.__contains__, ft_list)):
                    Ul.append(9)
                elif any(map(i.__contains__, in_list)):
                    Ul.append(8)
                elif any(map(i.__contains__, mm_list)):
                    Ul.append(2)
                elif any(map(i.__contains__, cm_list)):
                    Ul.append(3)
                elif any(map(i.__contains__, m_list)):
                    Ul.append(4)
                elif value.index(i) == 0:
                    Ul.append(model_units)
                elif value.index(i) > 0:
                    Ul.append(int(model_units)-1)
    return Ul

Vu = th.list_to_tree(unit_check(B))

Hi,

You can set your GHPython component input V to Item Access and thus threat the incoming data line by line, item by item, instead of as a list or tree. The component will now run 22 cycles for each item in your data list. Of course you have to refactor your Python code to match this. For instance, your current split_string() function, which seems to accept a list of strings, should be rewritten to take in a single string and split it.

You could make this a lot simpler by using regex for parsing everything. :wink:

Hi @diff-arch, thanks for the response! I was messing around with item access vs list access and ultimately it seemed like list access opened up the most doors.

So I’m actually trying to create a python version of a somewhat verbose GH script I was working on recently and someone on the forums set me up with regex that worked well for the front end string parsing:

So that GH script I basically parsed the string for numbers and if a string contained “inches” lets say, it would be converted from inches to model units (feet) in my case.

So essentially I am trying to make a string parser in python that takes text inputs and converts them to the document units. Very similar to the “convert to document units” component in the human plugin but I’m trying to learn how to do this myself to learn python more and because I use this component ALL the time in my definitions but it is the only Human component I ever use.

Your help is much appreciated and any RegEx/parsing tips you could share I would love to glean off of, thank you!

Something else I found interesting is passing a number node after a text panel will actually solve strings by treating them as expressions in some cases:

I then began a quest of seeing if I could “force” strings to become numbers in Python without parsing but I quickly realized the reason this panel method works is probably a result of behind the scenes parsing/pattern matching haha so then I went back to the drawing board…

Just in case you missed it, the McNeel documentation covers the basics:

And if you want to dig a bit deeper, have a look here.

2 Likes

Here’s a simple example using regex and the afore mentioned cycle processing of input V:

import re

SYNONYMS = [
    "ft", "feet", "'",
    "in", "inches", '"',
    "m", "meters",
    "cm", "centimeters",
    "mm", "millimeters"
]

if __name__ == "__main__":
    pattern = r"([-+]?\d+[\/\d.]*|\d)\s*({})?".format(
        "|".join(SYNONYMS)
    )
    # print pattern
    
    match = re.findall(pattern, V)
    print V, "->", match
2 Likes

Thanks as always for the information @AndersDeleuran I appreciate your insight!

1 Like

Awesome that RegEx looks like exactly what I need to start off the logic. Thank you very much! I’ll explore and integrate

1 Like

I’ve extended the code a bit with a Unit type for unit identification. It’s an idea I already had on Friday, but I had no time to implement it.

import itertools
import re


UNIT_DATA = [
    ("feet", "ft", "'"),
    ("inches", "in", '"'),
    ("meters", "m"), 
    ("centimeters", "cm"), 
    ("millimeters", "mm")
]


class Unit:
    def __init__(self, name, symbols=[]):
        self.name = name.lower()
        self.symbols = [s.lower() for s in symbols]
        self.symbols.sort()
    
    def __eq__(self, other):
        if isinstance(other, Unit):
            return self.name == other.name and \
                   self.symbols == other.symbols
        elif isinstance(other, str):
            return self.name == other.lower() or \
                   other.lower() in self.symbols
        else:
            return False
        
    def __repr__(self):
        return "{} ({})".format(
            self.name.title(), ", ".join(self.symbols)
        )
    
    @property
    def info(self):
        return [self.name] + self.symbols


if __name__ == "__main__":
    units = [Unit(d[0], d[1:]) for d in UNIT_DATA]
    
    pattern = r"([-+]?\d+[\/\d.]*|\d)\s*({})?".format(
        "|".join(itertools.chain.from_iterable([u.info for u in units]))
    )
    #print pattern
    
    match = re.findall(pattern, V)
    #print V, "->", match
    
    print V
    
    for str_num, symbol in match:
        unit = None
        for u in units:
            if u == symbol:
                unit = u
        
        msg = "  * {}, {} -> {}"
        if unit is not None:
            print msg.format(str_num, symbol, unit)
        else:
            print msg.format(str_num, "''", "(unit undefined)")

At this point it might make sense to transition to list access for input V, but I kept it at item access since it’s way easier to debug like this.
Currently the units get re-instantiated at each cycle which could pose a performance issue, especially if you’re dealing with large data sets, but for testing this is fine.

Thank you! I’ll explore what you’ve come up with here and share my own explorations as well.

I modified the RegEx pattern a bit as I didn’t need the hyphen and it wasn’t properly working in strings with m and mm present in the same string. It would return m, m, m instead of m, mm.

I’m also working on a function that, if the pattern got a hit it returns the hit as an rs.Unit_System index but more interestingly if the primary value has no match, return the default model units and if the secondary value has no match return one “size” smaller than the primary unit.

So, in example, if you input string “15m 32.5” it will assume you mean 15 meters, 32.5 centimeters and if you input a string “23 6.75in” it will assume you mean 23 feet, 6.75in.

This function I am using to ensure that all values have a Unit_System index and then in the next function I will use this value to get a conversion factor via Unit_Scale and return all the values converted to document units, lastly summing/concatenating any values as needed.

I think item access is okay as this component will primarily be used as a single input most times like if the user has a wall they are creating and they want the wall to be 10.5 ft tall they can type 10’ - 6", 10.5, 10ft 6 1/2 in, and so on. Trying to make it easy for the user to provide input in how they understand to.

More importantly if they are more familiar with metric dimensions but working in a model set to feet it is nice to be able to type 750mm and have the script logic handle the conversion for you. (in my opinion)

Here’s my WIP code so far:

import rhinoscriptsyntax as rs
import re

#Get Document Units
model_units = rs.UnitSystem()
model_units_name = rs.UnitSystemName(True,False)
ghenv.Component.Message = "Model Units: " + str(model_units_name)

def Parse_String(V):
    # Unit Values To Search For
    SYNONYMS = [
        "ft", "feet", "'",
        "in", "inches", '"',
        "m", "meters",
        "cm", "centimeters",
        "mm", "millimeters"
    ]
    pattern = r"(\d*\.?\d+(?:[\/\d.]*)?)(?:\s*({})(?!\w))?".format("|".join(SYNONYMS))
    
    match = re.findall(pattern, V)
#    print V, "->", match
    return match

Ps = Parse_String(V)

def Get_Unit_System(Ps):
    Unit_Sys = {
        "ft": 9,
        "feet": 9,
        "'": 9,
        "in": 8,
        "inches": 8,
        '"': 8,
        "mm": 2,
        "millimeters": 2,
        "cm": 3,
        "centimeters": 3,
        "m": 4,
        "meters": 4,
        # Assign other values as needed
    }
    
    Us = []
    Prim_Val = None

    for index, sublist in enumerate(Ps):
        if index == 0:
            if sublist[1] in Unit_Sys:
                Prim_Val = Unit_Sys[sublist[1]]
                Us.append(Prim_Val)
            elif sublist[1] == '':
                Prim_Val = model_units
                Us.append(Prim_Val)
        else:
            if sublist[1] in Unit_Sys:
                Sec_Val = Unit_Sys[sublist[1]]
                Us.append(Sec_Val)
                if Prim_Val is None:
                    Prim_Val = Sec_Val + 1
            elif sublist[1] == '':
                if Prim_Val is not None:
                    Us.append(Prim_Val - 1)

    return Us

Us = Get_Unit_System(Ps)

Graph Space:

I appreciate your help @diff-arch, I’ll explore what you put together and report back

Okay I’ve added the additional functions and I think everything is working!
Now I need to peel through the code and look for ways to be more efficient

Graph Space:

Code:

import rhinoscriptsyntax as rs
from re import findall

#Get Document Units
model_units = rs.UnitSystem()
model_units_name = rs.UnitSystemName(True,False)
ghenv.Component.Message = "Model Units: " + str(model_units_name)



def Parse_String(V):
    # Unit Values To Search For
    SYNONYMS = [
        "ft", "feet", "'",
        "in", "inches", '"',
        "m", "meters",
        "cm", "centimeters",
        "mm", "millimeters"
    ]
    pattern = r"(\d*\.?\d+(?:[\/\d.]*)?)(?:\s*({})(?!\w))?".format("|".join(SYNONYMS))

    match = findall(pattern, V)
    #    print V, "->", match
    return match

#Ps = Parse_String(V)

def Get_Unit_System(Ps):
    Unit_Sys = {
        "ft": 9,
        "feet": 9,
        "'": 9,
        "in": 8,
        "inches": 8,
        '"': 8,
        "mm": 2,
        "millimeters": 2,
        "cm": 3,
        "centimeters": 3,
        "m": 4,
        "meters": 4,
        # Assign other values as needed
    }

    Us = []
    Prim_Val = None

    for index, sublist in enumerate(Ps):
        if index == 0:
            if sublist[1] in Unit_Sys:
                Prim_Val = Unit_Sys[sublist[1]]
                Us.append(Prim_Val)
            elif sublist[1] == '':
                Prim_Val = model_units
                Us.append(Prim_Val)
        else:
            if sublist[1] in Unit_Sys:
                Sec_Val = Unit_Sys[sublist[1]]
                Us.append(Sec_Val)
                if Prim_Val is None:
                    Prim_Val = Sec_Val + 1
            elif sublist[1] == '':
                if Prim_Val is not None:
                    Us.append(Prim_Val - 1)

    return Us

#Us = Get_Unit_System(Ps)

def Convert_Units(Ps):
    sub_con_vals = []  # Store the converted values for each sublist

    for sublist, unit_sys in zip(Ps, Us):
        sublist_con_vals = []  # Store the converted values within each sublist
        for value in sublist:
            # If Value Is Fraction, Return Value As Such
            if "/" in value:
                numerator, denominator = value.split("/")
                try:
                    value = float(numerator) / float(denominator)
                except ValueError:
                    value = None
            else:
                try:
                    value = float(value)
                except ValueError:
                    value = None

            if value is not None:
                # Calculate the scale factor for the current unit system
                Scale_Factor = rs.UnitScale(model_units, unit_sys)

                con_val = value * Scale_Factor
                sublist_con_vals.append(con_val)

        # Append the sum of sublist_con_vals to sub_con_vals
        sub_con_vals.append(sum(sublist_con_vals))

    # Calculate the final con_vals as the sum of sub_con_vals
    con_vals = sum(sub_con_vals)

    return con_vals

#V = Convert_Units(Ps)

#Check V
if V:
    Ps = Parse_String(V)
    Us = Get_Unit_System(Ps)
    V = Convert_Units(Ps)
else:
    V = None

Don’t you need the hyphen to catch the negative values in your example data?
You’re right. Nice find! This is because regex matches a character or character sequence, not really a word, and can be remedied like this, which should cover both cases, millimeters and meters:

([-+]?\d+[\/\d.]*|\d)\s*(m*)

In terms of imperial that makes sense, although I’m not really familiar with the system, but you often hear Americans on say YouTube talk about “this many feet and that fraction of inches”, but in metric units I wouldn’t write something like 15m (and) 32.5cm, but rather 15.325m or 1532.5cm. I don’t deconstruct measurements like that.

Yes, that’s a neat idea.

Where do the unit system indices stem from? Did you define these?

1 Like

Here are two functions I wrote a while back to verify whether a string value can be cast to either float or int, which might come in handy:

def is_integer_num(val):
    """Returns True if a value is an integer number, otherwise False."""
    try:
        float(val)
    except ValueError:
        return False
    else:
        return float(val).is_integer()

def is_num(val):
    """Returns True if a value is a valid number, otherwise False."""
    try:
        float(val)
    except ValueError:
        return False
    return True

Feel free to use or modify them.

1 Like

-Yes you are correct, anyone using metric would write 1503mm of course or 15.03m, I just wanted to make sure in cases where someone doesn’t do that but mostly it’s intended to help for imperial values. Thanks for the updated RegEx I’ll take a look at that!

Often times in the US we will write dimensions as 15’ - 2 5/8" I guess the hyphen is just to denote the stepping down of the unit size to inches for legibility?

-So the hyphen is not needed and with this component I don’t want to allow negative values anyways because operations downstream will have logic to “flip” the direction of things by inverting/using a negative value. So I want to avoid people getting confused with double negatives or something like that. (Maybe I’ll reconsider later after testing)

-Yes, the unit system indices are derived from RhinoScriptSyntax.UnitSystem() this function has indices for all possible Rhino unit systems.

-I only go as far as searching against feet, inches, meters, cm, mm but you could expand it to more exotic values such as “Astronomical Units”. Who doesn’t want to build a chair in AU? :laughing:

Thank you!

1 Like

Oh, I see. I read that as a negative 2. :wink: My bad!

OK. I haven’t played with that so far.

:rofl:

You’re welcome.

To use the functions above in a workflow - just to be clear -, you would first call is_num to see if the string is generally a number. Then if you’re still interested to distinguish between integers and floats, you can additionally call 'is_integer_num` to check whether it’s an integer, and if it’s not it must be a float.

Just when I thought I wouldn’t need the is_num function (at least for this script) I came across an issue when feeding a slider value into the component and realized I’ve been so focused on strings I forgot to allow it to pass a simple number value! :upside_down_face:

1 Like