I read a text file line by line with a GHPython script. What I get is a list of strings, one string per line. I now need to parse each string into more manageable data (i.e. strings, integers) that I can use to place geometry and annotate it.
The strings look like this:
- “the description (number)” (e.g. “door (0)”)
- “the description (number|number|number)” (e.g. "window (1|22|4))
- “the description (number|number|number|number)” (e.g. "toilet (2|6|5|10))
The description refers to the geometry type, the first integer to the floor number, the second to last integers to the rooms in which one of the described objects is to be placed.
And no, the structuring of the data in the text file was not my idea!
Now what I want is a list of split/parsed strings for each line from the text file that I can process further, for instance:
- “window (1|22|4)” -> [ “window”, “1”, “22”, “4” ]
I guess regular expressions are best fit to accomplish this and I already managed to come up with this:
(.+)\s+((\d+)\), which perfectly matches [ “door", “0” ] for “door (0)”
However, some items have more data to parse:
(.+)\s((\d+)+\|\), which matches only [ “window”, “1” ] for “window (1|22|4)”
How can I repeat the pattern matching for the part
(\d+)+\| (i.e “1|”) up to the closing parenthesis for an undefined number repetitions of this pattern? The last item to match would be an integer, which could be caught separately with
Also is there a way to match either the simple or the extended case with a single regular expression?