Sort a number list which contains text

Hi all,

I’m trying to sort a list of numbers with letters, but no luck so far.

Any help is much appreciated. Thank you

Rule number 1 when working with names that contain numbers : zero-stuffing !

I managed to add 0 between the first one or two characters and the integers, so the sort text works properly.

SortText.gh (11.0 KB)

4 Likes

I have here a Python script component that uses a regular expression to help with the sorting.

Essentially using the form of Name + number + extra string. Then let the sort happen on those as it should.

The reason that your attempt fails is that everything is handled as one string.

The code in the component is:

import re

pattern = re.compile("(?P<first>\D*)(?P<number>\d*)(?P<second>\D*)")


vals = [pattern.match(v).groups() for v in Values]
vals.sort()
SortedValues = [v[0] + v[1] + v[2] for v in vals]

sort_using_regex.gh (10.3 KB)

3 Likes

Thanks magicteddy.
I like your approach.

Thanks Nathan.
This one is good, but it places TC10 & TC100… in front of TC2, unfortunately.

Nice solution @magicteddy … i think I never used Sort Text component, you used it wisely!
I worked on your definition…


SortText.gh (10.6 KB)

2 Likes

Right then first sort on the number, then on the rest I guess. Shouldn’t be too hard to switch things around in the Python component.

1 Like

Hi @hldk,

A bit late perhaps, but for interest here’s a short python script that will correctly sort your list.

CustomSortHLDK.gh (10.4 KB)

It combines concepts from Nathan’s and magicteddy’s solutions: regex to split the strings into alpha and numeric parts, and padding the numeric parts to a common length so they can be sorted as strings.

Regards
Jeremy

5 Likes

Oh, and for a much more comprehensive embedded numerics sorting solution for Python, you can import the natsort module from PyPI. This parses your list values and recognises various number formats so you don’t need to split them out yourself.

Note that for Python 2.7 you need an old version: pip install natsort==3.3.0. The current version (8.2.0) only works with Python 3.

Jeremy

Thanks Jeremy. I’ll give it a go.

Hi. I am trying to use your script in my definition but I do get a "Solution exception: 'None Type' object has no attribute 'groups' "

Just used a random list like:

100a
10b
34
89y
79xxx
1z_random
5b
10a
14fang
10x
asd
uty

Hi @Cumberland,

That routine was set up to work with data in HDLK’s format:
Character(s) + Number(s) + MaybeMoreCharacter(s).

Your data format looks like it is:
( MaybeNumber(s) + Character(s) ) OR ( Number(s) + MaybeCharacter(s) )

So the routine needs to be rewritten to work with your format. If you’re not familiar with Regex expressions I may be able to take a look at it this evening and make some suggestions.

Regards
Jeremy

1 Like

Here’s a modified sort that works with the data sample you posted.

CustomSortCumberland.gh (20.0 KB)

Remember, if you change the nature of the data you will need to rewrite the routine to accommodate the new format.

HTH
Jeremy

3 Likes

Thank you very much. It is working as expected.

Maybe we can go a step further and sort also the geometries that corresponds to this names? This is my ultimate goal. To sort my geometries by Names, Layers, etc. as the needs imposes.

Here is a mockup of what I do need to achieve:

Two questions that need to be answered:

Would your keys be unique or could you have multiple instances of a key?

If you can have multiple instances of a key, will they have the same value associated?

Hi Jeremy.

Thank you for your reply.

  1. I can have one or multiple instances (up to a few hundreds) of the same geometry sharing the same names, keys, layers, properties.

  2. Yes, the objects who are sharing the same name, also do share the same properties (Attribute User Text keys/values, layers, color, etc.) excepting the shape, which can be a little bit different.

I am doing furniture mechanization, and I do need to generate cutting-lists and group the parts based on Names, Layers or different Attribute User Text Keys/Values do they have associated.

The names of the parts and layers can be a mix of numbers and letters, so right now it is difficult to sort by name for example because Grasshopper have two different components for sorting, one for numbers and another one for letters, but I do need to be capable to sort both numbers and letters alphabetically and with zero padding (I don’t need to have number 2 after 10 or 30 after 299 for example, but before).

Hi @Cumberland,

Here’s an updated version that sorts associated values along with the keys:

CustomSortCumberlandPairs.gh (36.6 KB)

Because your keys are not necessarily unique, and two items with the same key can have different values, during processing the routine temporarily appends a number to each key to make them unique. This requires the Key length to be limited to a common value. This is set arbitrarily to 2048 characters.

1 Like

As this topic popped up again…just wanted to point out:

regex

As regex is not my everyday buisness -

This pattern uses “named groups” that are explained here

To play around with regex - I recommend online expression validators / interfaces like
regex101
or
regexr
… there are many more - which is your favourite ?

cast to int

vals2 = [(v[0],int(v[1]),v[2],v[1]) for v in vals];
#.............................^keep format for final output
#..............^cast to int for sorting

Nathan s script can be extended by additionally casting to int before sorting.
And to keep formating with leading 0’s like 1 01 001 i’ll store the formated number as string on position 4 (index 3) of the tuple

so my adaption will look like this.

import re

pattern = re.compile("(?P<first>\D*)(?P<number>\d*)(?P<second>\D*)")

vals = [pattern.match(v).groups() for v in Values]
vals2 = [(v[0],int(v[1]),v[2],v[1]) for v in vals];
vals2.sort()
SortedValues = [v[0] + v[3] + v[2] for v in vals2]

kind regards -tom

1 Like

If I’m reading that correctly, the middle group takes 0 or more digits and saves them as ‘number’, which in the case of zero digits is “”. Then you use the Python int() function to convert ‘number’ to a integer.

I believe (not a Python expert) that int() of an empty string is invalid and would cause a runtime error.
Add a default option to int() to use if conversion fails - Ideas - Discussions on Python.org

import re

def TryToInt(numberStr,defaultInt = 0):
    back = defaultInt
    try:
        back = Int(numberStr)
        return back
    except:
        return defaultInt

pattern = re.compile("(?P<first>\D*)(?P<number>\d*)(?P<second>\D*)")
vals = [pattern.match(v).groups() for v in Values]
vals2 = [(v[0],TryToInt(v[1]),v[2],v[1]) for v in vals];
vals2.sort()
SortedValues = [v[0] + v[3] + v[2] for v in vals2]

for sure it takes a extra step to have strings with no digits / numbers at all, but that was not part of the initial given Dataset so I skipped it.
not sure if phyton has build in TryCast… functionality.

kind regards - tom