Match Text - Returning Too Many Matches

Hello,

Can’t find an answer to this one anywhere, hoping someone can help me here.

I’m using RegEx pattern input on the Match Text component to parse some string values based on a list of possible keys. In this case I am checking my input text against symbols/abbreviations of units of measure (imperial/metric)

Input examples:
1’-3.75"
1’ 3 3/4"
10m 3.5mm (I know I know, this can just be 10.035m, please bear with me)

Pattern to check against:

"
ft
in
m
cm
mm
feet
inches
meters
centimeters
millimeters

It works great for most of the strings I have tested. Example, if I have a value like 1’ 3" the symbols are distinct so it’s an easy true/false match.

However, when I try to match a value like 1’ 3m (1 foot, 3 meters) it returns too many options because the match text component returns m, cm, meters, centimeters and any other value description that contains the letter m.

Is there a RegEx syntax or another method that will return only the closest match? for example m should return m only, not cm even though cm contains m and centimeters should not return meters even though it contains the word meters.

I hope that makes sense?

Graph space:

Example showing too many values returned:

Please see attached GH script for your reference.

Thank you for any and all leads!

20230623_Match_Text_Too_Many_Results_Question_01a.gh (11.8 KB)

Here is one way.

Remove all empty spaces from your strings to be tested.
Then, your regex should be : “a unit precedeed by a number between 0 and 9.”.

Hmmm… Still not very robust if you have to test mm and m in the same string. Right now it only works if m is before mm in the test list :triumph:

20230623_Match_Text_Too_Many_Results_Question_01a.gh (17.7 KB)

1 Like

Thanks @magicteddy I’m going to give this a go. Actually I’m not too concerned about the order of mm/m because I can use a Sort List component and compare index values of where those fall in the original list so that they get organized correctly if that makes sense.

I’ll check back in shortly,

Thanks!

Hi @michaelvollrath,

You need to delimit your pattern values as words. You can do this by preceding and following each of them with sets of values including space, digits 0 to 9 and maybe the minus sign. You then also need to append a space to the end of the input string to ensure you have a final delimiter.

20230623_Match_Text_Too_Many_Results_Question_01a_J.gh (23.0 KB)

HTH
Jeremy

Thank you for your response @jeremy5 , I’ll give this a go. What does the 0-9 achieve/mean? Is it a RegEx formatting thing?

Yes, in Regex if you have a position where a choice of values is acceptable you put the alternatives between square brackets. 0-9 is shorthand for any digit in the range from 0 to 9. The space means a space is acceptable and the final - represents a minus sign (as opposed to a range) by dint of being at the very beginning or end of the list.

For more info on Regex, see the great resource at Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns

Regards
Jeremy

1 Like

Very helpful, thank you for the additional information!

1 Like

This with c#

20230623_Match_Text_Too_Many_Results_Question_01a_c.gh (13.9 KB)

1 Like

Hi Michael,

To demonstrate the power of Regex, here’s a variant which groups the synonymous patterns and outputs a preferred term regardless of which synonym is found. In this case that’s the short alpha form (ft, cm etc).

20230623_Match_Text_Too_Many_Results_Question_01a_J2.gh (25.8 KB)

You can extend this by adding synonyms separated by vertical bars (these being the regex for alternatives). The first synonym on each row will be the preferred term. The sets of patterns have brackets added and word delimiters, this time using a lookahead and a lookbehind syntax (which makes the intent clearer to a regex practitioner).

Regards
Jeremy

1 Like

Incredible! This makes it even better to work with the initial data and since I’ll be calling functions downstream that will handle unit conversion it’s great to reduce the output to the abbreviated list.

I created some logic as a next step to this that will infill missing unit values.

In example if a string is 15’ 3 it assumes the 3 is the next value “down” and parses it as inches. If a string reads 15 3mm it assumes the 15 is cm. I guess if you were intending meters that could be odd but I figure that’s maybe the users fault and they should update the string to be 15m 3mm if that’s what they want or 15.003m.

I’ll share the updated script when I finalize the next part of it.

I really appreciate your help and everyone else’s. This has been really informative.

Much appreciated @anon39580149 ! This works well and is a great insight into the coded version of the logic. Thanks for your contribution!