Hello,
Is there a gh component which allow to check syntax text string?
I’ll explain better:
If I have some strings for e.g. WA_STR_cls_01 or SL_NST_wod_05, there are composed by 2 capital letter_3 capital letter_3 letter_2 number. Is there a component that check if the syntax strings is correct respect at reference type AB_ABC_abc_00?
Than you for any suggestions
you can Regex that… I’m not good at all with Regex but I guess you could do something like this for AB_ABC_abc_00
[what character] {how many times if more than 1}
\b = word start
AB = [A-Z]{2} = I want two uppercase letters A to Z
_ = [_] = I want one underscore
ABC = [A-Z]{3} = I want three uppercase letters A to Z
_ = [_]
abc = [a-z]{3} = I want three lowercase letters a to z
_ = [_]
00 = [0-9] {2} = I want two digits 0 to 9
\b = word end
final Regex = \b[A-Z]{2}[_][A-Z]{3}[_][a-z]{3}[_][0-9]{2}\b
Regex.gh (8.9 KB)
first two lines are the correct one, all the other lines are -wrong- variations of the second one
maybe you can shorten/simplify that, some Regex Shoguns in the forum will come to your rescue
That’s elementary via C# (and chars Methods). Notify if you want an example.
Obviously (general case) you should compare char to char in order to accept/use any reference string (and/or modify it on the fly).
String_CompareToReferenceString_V1A.gh (125.5 KB)
Hi @inno,
You could shorten it slightly by removing the square brackets around the underscores, but I’d leave them.
If I understand the requirement correctly you should not use the \b’s - they will flag, e.g.
WA_STR_cls_01,WA_STR_cls_02
as True. Instead use a caret at the beginning and a dollar at the end:
^[A-Z]{2}[_][A-Z]{3}[_][a-z]{3}[_][0-9]{2}$
This works fine with the ASCII character set, but for interest, if you want to cater for unicode, with accented characters etc then you need to move into another ball park:
^(\p{Lu}\p{M}*){2}_(\p{Lu}\p{M}*){3}_(\p{Ll}\p{M}*){3}_\p{N}{2}$
Accented characters in Unicode can be precomposed as a single code point or composite, with a character code point followed by one or more marks. The É in row 19 of the test data below is the former, that in row 20 is the latter. in the above regex \p{Lu} matches an upper case letter, \p{Ll} matches a lower case one and \p{M}* matches 0…n marks. \p{N} matches a numeral, including non-western ones.
If at some stage you want to allow alternative separators like - or :, but not mix them, you can do:
^(\p{Lu}\p{M}*){2}(?'Sep'[_\-:])(\p{Lu}\p{M}*){3}\k'Sep'(\p{Ll}\p{M}*){3}\k'Sep'\p{N}{2}$
Here the first round brackets encompass a named group of separators, where ?‘Sep’ indicates the name. \k’Sep’ refers back to the specific character matched by that group. (Note: this is a Microsoft specific syntax, other environments, e.g. Python, use different syntaxes).
Regex 2.gh (24.2 KB)
Regards
Jeremy