Can’t find an answer to this one anywhere, hoping someone can help me here.
I’m using RegEx pattern input on the Match Text component to parse some string values based on a list of possible keys. In this case I am checking my input text against symbols/abbreviations of units of measure (imperial/metric)
Input examples:
1’-3.75"
1’ 3 3/4"
10m 3.5mm (I know I know, this can just be 10.035m, please bear with me)
Pattern to check against:
’
"
ft
in
m
cm
mm
feet
inches
meters
centimeters
millimeters
It works great for most of the strings I have tested. Example, if I have a value like 1’ 3" the symbols are distinct so it’s an easy true/false match.
However, when I try to match a value like 1’ 3m (1 foot, 3 meters) it returns too many options because the match text component returns m, cm, meters, centimeters and any other value description that contains the letter m.
Is there a RegEx syntax or another method that will return only the closest match? for example m should return m only, not cm even though cm contains m and centimeters should not return meters even though it contains the word meters.
Thanks @magicteddy I’m going to give this a go. Actually I’m not too concerned about the order of mm/m because I can use a Sort List component and compare index values of where those fall in the original list so that they get organized correctly if that makes sense.
You need to delimit your pattern values as words. You can do this by preceding and following each of them with sets of values including space, digits 0 to 9 and maybe the minus sign. You then also need to append a space to the end of the input string to ensure you have a final delimiter.
Yes, in Regex if you have a position where a choice of values is acceptable you put the alternatives between square brackets. 0-9 is shorthand for any digit in the range from 0 to 9. The space means a space is acceptable and the final - represents a minus sign (as opposed to a range) by dint of being at the very beginning or end of the list.
To demonstrate the power of Regex, here’s a variant which groups the synonymous patterns and outputs a preferred term regardless of which synonym is found. In this case that’s the short alpha form (ft, cm etc).
You can extend this by adding synonyms separated by vertical bars (these being the regex for alternatives). The first synonym on each row will be the preferred term. The sets of patterns have brackets added and word delimiters, this time using a lookahead and a lookbehind syntax (which makes the intent clearer to a regex practitioner).
Incredible! This makes it even better to work with the initial data and since I’ll be calling functions downstream that will handle unit conversion it’s great to reduce the output to the abbreviated list.
I created some logic as a next step to this that will infill missing unit values.
In example if a string is 15’ 3 it assumes the 3 is the next value “down” and parses it as inches. If a string reads 15 3mm it assumes the 15 is cm. I guess if you were intending meters that could be odd but I figure that’s maybe the users fault and they should update the string to be 15m 3mm if that’s what they want or 15.003m.
I’ll share the updated script when I finalize the next part of it.
I really appreciate your help and everyone else’s. This has been really informative.