Extract numbers from Text

Hello
To extract numbers from this list i use regular components Text split and Replace text or Wombat Replace text multiple.
Is there an easiest way with Python or Grasshopper ?

text.gh (12.4 KB)

Not necessarily easier, but more parametric approach:


ExtractNumber.gh (15.9 KB)

4 Likes

Thank you; i also tried with Python, i know there are more simple ways

import re 
a = re.findall(r'[0-9.]+', x)

ExtractNumber.gh (15.7 KB)

6 Likes

Thank you

You can also use \d to represent the digits in your regular expression, and add r"[-\d.]+" if you also want to extract all negative numbers:

200330_ExtractNegativeIntegersAndFloats_00.gh (5.0 KB)

3 Likes

for speed you could always use C# if your collection is very big. For a collection of 100,000 strings execution time is about 1sec in this 1 liner

 A = x.Select(a => a).Where(b => string.IsNullOrEmpty(b) == false)
      .Select(a => Regex.Replace(a, "[^.-9-9]", "")).ToArray();

Python will probably take ages… lol maybe one of you can test and prove me wrong :smile:

1 Like

Sure thing:

200330_ExtractNegativeIntegersAndFloats_01.gh (8.6 KB)

Next time please provide a test case if you’re looking to compare options. Also, I suspect that running the regex expression on one multiline string will likely be more efficient (in Python at least).

A probably faster implementation

The Regex class was actually causing performance issues, in this case. changed it to a more classical approach. They are roughly the same using the GH profiler turned on but when I use Stopwatch run time is much different 53-56 ms Could you try to profile your version?

private void RunScript(List<string> x, ref object A)
  {

    GH_Number [] formated = new GH_Number[x.Count];

    System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

    sw.Start();
    for (int i = 0; i < x.Count; i++)
    {
      //string ne = Regex.Replace(x[i], "[^.-9-9]", "");
      string ne = new String(x[i].ToCharArray().Where(c => !char.IsLetter(c) && !char.IsUpper(c)).ToArray());
      double r = 0;
      if(double.TryParse(ne, out r))
        formated[i] = new GH_Number(r);

    }
    sw.Stop();

    Print(string.Format("Ellapsed time {0} ms", sw.ElapsedMilliseconds));
    A = formated;
  }

200330_ExtractNegativeIntegersAndFloats_01.gh (7.1 KB)

I suspect the additional profiling time you’re experiencing is due to the casting cost of using type hints, which can scale up quite quickly with large lists (removed the output to rule that out):

2020-03-31 15_08_31-Grasshopper - 200330_ExtractNegativeIntegersAndFloats_01_

That is an important issue here that you pointed out. I will read more about your post when I have more time. But to be fair enough, for a real test this at the end this should be a compiled component in visual studio. Anyway lets suppose the real running time for this case is 52 ms

By the way, removing the output did not affect performance. But, removing it wouldn’t make sense anyways, since having access to the data is what matters.

  GH_Number [] formated = new GH_Number[x.Count];

    System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

    sw.Start();
    for (int i = 0; i < x.Count; i++)
    {
      if(x[i] is string)
      {
        string s = x[i] as string;
        //string ne = Regex.Replace(x[i], "[^.-9-9]", "");
        string ne = new String(s.ToCharArray().Where(c => !char.IsLetter(c) && !char.IsUpper(c)).ToArray());

        double r = 0;
        if(double.TryParse(ne, out r))
          formated[i] = new GH_Number(r);
      }

    }
    sw.Stop();


    Print(string.Format("Ellapsed time {0} ms", sw.ElapsedMilliseconds));
    A = formated;

Indeed, that’s what I was referring to with the “… (removed the output to rule that out) …” bit.

Indeed, that’s where wrapping output data (as per the bottleneck thread) comes in.

Indeed, that’s where you’ll really start to demonstrate the benefit of a statically/strongly typed language.