This simple Python script gets significantly slower each time the input wire is connected (or the data changes?). By 2/10ths or 3/10ths of a second each time! It is only one line using ‘statistics’:
Oddly, the Profiler time is retained from the last time the GH file was saved? Rebooting Windows shows the last Profiler time at first but when the input wire is reconnected, the time drops to less than one second, then increments again with each reconnection.
This value would be very useful if not for the degrading Profiler times. Why does this happen?
Is there a C# alternative that doesn’t suffer the same behavior?
I would look for a native GH component in math (the same menu section where the GhPython component itself lives). Otherwise, the variance formulae’re not difficult to implement manually Variance - Wikipedia
statistics is not actually included in IronPython2.7. Have McNeel included it in Rhino?
statistics is an unusual pure-Python CPython 3.4 module: “aimed at the level of graphing and scientific calculators” and “not intended to be a competitor to third-party libraries such as NumPy, SciPy, or proprietary full-featured statistics packages”
Looking in the source it goes out of its way to support both Fractions and Decimal, so will never be as fast as math or any core library written in C.
@Joseph_Oster Why this increasing slowness happens is a mystery indeed. My best guess is something in the statistics module and/or its dependencies cause IronPython to do weird things. Maybe in combination with how perhaps tree is automagically chopped into its branches for list access (lots of New Implicit Grasshopper Cycles messages in script window output pane). But instead of wasting time into IronPython or the statistics and dependant modules:
Here is a simple Python implementation that should work just fine
Thank you very much, that’s way, WAY FASTER! Very nice. I can just barely understand it but am too much of a dilettante at Python to ever write it myself.
# two-pass variance, from https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Two-pass_algorithm
# see example explanation also at https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Example
# regarding floating point issues
s = sum(x)
m = s / len(x)
variance = sum([(i - m)*(i - m) for i in x]) / ((len(x) - 1) * 1.0)
a = variance
Let me rewrite a bit so it is hopefully a bit more clear?
n = len(x)
the_sum = sum(x)
mean = the_sum / n
# use list comprehension to get all the squares of difference from mean
squares_of_difference_from_mean = [(xi - mean) * (xi-mean) for xi in x]
# following I call just sum of squares, but from above
# know that this is sum of squares of difference from mean
sum_of_squares = sum(squares_of_difference_from_mean)
# using 1.0 here to ensure we get a float result in n_1,
# which itself stands for n - 1
n_1 = n - 1.0
# Finally the variance in the two-pass variance algorithm
variance = sum_of_squares / n_1
Thank you again, it was clear enough, though Python has some peculiar ways of writing “for” loops after the fact, among other things. I haven’t translated this style of notation to code in almost fifty years, though it says the same thing as both Python examples you posted: