Could someone please verify that the following code takes superlong in Rhinos IronPython:
offset = int(1E6)
array = range(100000) + [n + offset for n in range(100000)]
set(array)
Please note that:
the creation of the set is the slow part
the size of the set is only 200k, so nothing special
the execution time drops dramatically when the inner offset is set from 1E6 to 1E5
the same code is rocket fast in cPython and shows absolutely no offset dependency
converting the integers into strings before applying set is reasonably fast
Here is a more detailed script that shows the timings for different set sizes and offsets:
import timeit
def _timeit_setup(size, offset):
return "a = range(%d) + [i + %d for i in range(%d)]" % (size,offset,size)
sizes = [i*10000 for i in range(1,10)]
offsets = [int(10**i) for i in range(7)]
for size in sizes:
for offset in offsets:
setup = _timeit_setup(size, offset)
time = timeit.timeit("set(a)", setup=setup, number=10)
print("size=%d, offset=%d: %.6fs" % (size, offset, time))
The last step takes more than 45s (!) on my computer. That is really annoying.
I noticed the same issue for integer dictionary keys. So, there is obviously something odd with the (Rhino?)IronPython implementation of integer hashables.
A possible solution for that would be to use System.Collection.Generic.HashSet instead of the native set implementation. Unfortunately, the interface is pretty different and not as ‘mighty’ (e.g. set.intersection_update …) as the set interface. Any other ideas?
I believe IronPython uses a different hashing algorithm compared to CPython. I agree that It is painfully slow. The easiest solution would be to make a 2ndary plugin in C++/C# to run specific functions which are extremely slow.
Hi Trademacher
IronPython can be slow at times.
I think there are two ways to improve your code.
1: you can use xrange instead of range .xrange
2: Replace list()+list() with set()|set()
import timeit
def _timeit_setup(size, offset):
return "a = set(range(%d)) | set(i + %d for i in xrange(%d))" % (size,offset,size)
sizes = [i*10000 for i in xrange(1,10)]
offsets = [int(10**i) for i in xrange(7)]
for size in sizes:
for offset in offsets:
setup = _timeit_setup(size, offset)
time = timeit.timeit("set(a)", setup=setup, number=10)
print("size=%d, offset=%d: %.6fs" % (size, offset, time))
Unfortunately, I think it’s not the same thing. If I’m not mistaking the setup-argument code in timeit is excluded from the overall timed execution. So, in your example the critical set-creation is done outside of the timed scope.
Actually, you need to run:
for size in sizes:
for offset in offsets:
timedCode = "set(range(%d)) | set(i + %d for i in xrange(%d))" % (size, offset, size)
time = timeit.timeit(timedCode, number=10)
print("size=%d, offset=%d: %.6fs" % (size, offset, time))
I had to terminate Rhino via Task Manager before your code finished. So yes I can indeed verify that it takes super long.
First of all, do you really need to hash integers? If you need to store a collection of unique numbers, isn’t computational effort better spent maintaining a sorted list, rather than computing a hash for every single number?
Otherwise, I think you’re on the right lines trying to pick a better concrete implementation for your needs than IronPython’s one size fits all. I was trying to find out how sets are implemented in Iron Python and found the old module sets. I couldn’t find a HashSet in Rhino but there is a SortedSet. This, sets.Set, (and a dict with all values None) were all much faster for me (with much smaller test data):
thanks for your response. And yes I/we need to work with sets and dicts because we are using labels (that get stored in RhinoGeometry-objects using some tricks) to identify (Finite)Elements.
You can load the HashSets like this:
import clr
import System
clr.AddReference("System.Core")
from System.Collections.Generic import HashSet
This seems to work fine. But I’m not sure about yet.
By the way, I looked a bit more closely at the image I posted - the SortedSet actually seems to speed up by a factor of 20 as the offset size increases. So if it doesn’t spend too long on an initial sort, it’s perfect for your application. I tried the tests in reverse order to rule out cacheing, and picked a wider range of offsets and it confirms it:
set is quirky in Iron Python (I actually reported a bug in set comprehensions). But its .Net backend does have its uses.
We use sets very often. And large parts of our code framework are used in cPython and ironPython. By now, it seems that i’ll end up in subclassing the HashSet-Classes and giving them the same interface as the python-set. Than we can easily switch between both implementations. Whew! That’s a long way to go.