Is there a Python dictionary without values?

(I tried to change the category to developer but I couldn’t…)
Hello, probably a bit sounds stupid, but I often create a dictionary only to do a quick search of keys and values are never used.

For example

pnts #list of points
def makekey(P):
    return (round(P.X,3),round(P.Y,3),round(P.Z,3))
dict={}
for P in pnts:
    dict[makekey(P)]=P
anotherpnts #another list of points
for Q in anotherpnts:
    if makekey(Q) in dict:
        #do something
    else:
        #do something else

This code quickly searches if point Q exists in the first point list. In this case, right-hand side of dict[makekey§]=P can be anything actually becasue it’s never used.
I know makekey() using round() is a little problematic, but that’s another topic.
Is there any native Python data type that is essentially dictionary minus values?
I think I understand the basic concept of hash table.

Thank you.

You are probably looking for set()

Wouldn’t a dictionary without the values be equivalent to a list of keys?
If you had a dictionary, as follows {“value1”: None, “value2”: None, …, “valueX”: None}, then it could be represented in a list, like this [“value1”, “value2”, "valueX]. You could also call if “valueX” in [“value1”, “value2”, "valueX], to search for inclusions.

Hi, that’s actually not true. dictionary creates a hashtable which makes “key in dictionary” faster than “item in list”.

1 Like

Hello Nathan, thanks for the quick reply. Seems like set() is exactly what I’m looking for! Thank you! Sounds like you can simply write like

_set={"A","B"}

Also, key in dictionary and key in dictionary.keys() are different in speed. The former uses hashtable (fast), the latter is same as a search of a list (compare one by one, slow).

1 Like

You can also use a frozenset() which is identical to a set but is immutable and can therefore itself be used as a key in a further set or dict.
You can construct your set with a comprehension if you prefer: _set = {p for p in [1,2,3]}

1 Like

Thanks, for clearing that up @mikity_kogekoge . I didn’t know that!

I did some digging on this topic and found out that, what I recommended above, is called ‘simple search’, which simply means checking each item in a collection one by one. It takes O(n) time, meaning linear time to compute, which translates to the bigger your list gets the longer it takes.
You can optimise this by sorting your list, before performing searches on it. As I understand, this is called ‘binary search’ and takes O(Log n) time.

Although, binary search is pretty fast, looking up things in a sorted list can be tedious too. Binary search allows you to find stuff in O(log n) time. If you want to find items in O(1) time that’s where hash functions come into play. These are for instance functions that you put in a string and you get back a a number from a hash table (in python a dictionary).

Example for the imaginary capability of evaluating 10 items per second:

Simple Search Binary Search Hash Functions
# of item O(n) O(log n) O(1)
100 10 sec. 1 sec. instant
1000 1.6 min. 1 sec. instant
10,000 16.6 min. 2 sec. instant

You can check out Aditya Bhargava’s excellent ‘Grokking Algorithms’ book for more information on this topic.

Hi @diff-arch I definitely recommend you read about set operations (.union(), …) in Python as well

1 Like

:sunglasses: this is a good thread. :smiley:

2 Likes

Let’s keep it going then ! Looking at the real difference between the methods with 10,000 elements in each list:

import time
from collections import namedtuple
from random import randint as rand
from dis import dis

Point = namedtuple('Point', 'X Y Z')

pnts = [Point(rand(0,100), rand(0,100), rand(0,100)) for j in range(10000)] #list of points
this_set=set(pnts)

anotherpnts = [Point(rand(0,100), rand(0,100), rand(0,100)) for j in range(10000)] # another list of points
another_set = set(anotherpnts)

def listcheck():
	intersection = []
	for Q in anotherpnts:
	   if Q in pnts:
	   	intersection.append(Q)
	   else:
	   	pass
	       #do something else
#	print(intersection)

def setcheck():
	intersection = []
	for Q in anotherpnts:
	   if Q in this_set:
	   	intersection.append(Q)
	   else:
	   	pass
	       #do something else
#	print(intersection)
	
def set_intersect():
    intersection = this_set.intersection(another_set)
#	print(intersection)

# call the fuctions and time them :
for func in [listcheck, setcheck, set_intersect]:
    t1= time.clock()
    func()
    t2 = time.clock()
    print '%s : %.4f seconds' % (func.__name__.upper().ljust(15), t2 - t1)


Gives:

LISTCHECK : 9.6451 seconds
SETCHECK : 0.0078 seconds
SET_INTERSECT : 0.0029 seconds

4 Likes

Yeah, sets are fast! But you’ll have to take creating it into account. Rounding the point coordinates may take a while. With loads of points I’d use RTree: https://developer.rhino3d.com/5/api/RhinoCommonWin/html/M_Rhino_Geometry_RTree_Search_3.htm

2 Likes

Also see here for more about how dicts and hash tables work in Python in a beautifully visual way.

2 Likes

I get the fastest performance with a .NET dictionary for more complex keys, like Point3d, and values, like list. In these cases they build 2.5X to 4X faster than a Python dictionary or set and access in about the same time or a little faster.

from System.Collections.Generic import Dictionary
# Definition example:
i2j = Dictionary[int,int]()
# Build example (using keys from v_indices dictionary to build the ij2 dictionary):
for j in v_indices.Keys: i2j[i] = j; i += 1
# Test example (but see below; may be faster to use TryGetValue):
if i2j.ContainsKey(i): a = b
# Read example for when you know key is in dictionary:
j = i2j[j]
# Read example for when you are unsure key is in dictionary (nearly as fast):
found, j = i2j.TryGetValue(i)
if found: a = 6*j
# Get key and value:
for kvp in i2j:
    i = kvp.Key
    j = kvp.Value
# Get all keys or values:
keys = i2j.Keys
values = i2j.Values

Reference on the Web:

The dictionary definition will take many types: int, float, Point3d, list, tuple. So it is pretty flexible.

Regards,
Terry.

2 Likes

Interesting
And presumably a HashSet will give comparable performance advantages in cases where values are not needed.

I cannot get HashSet to import from the module System.Collections.Generic even though the .NET documentation says it exists.

Also note that I updated my original post to point out that the speed advantage of .NET dictionary is only for cases with more complex keys or values. You need to test each case of key and value for your application to decide. I use a mesh with around 1M vertices for testing and time randomly accessing 1M vertices.

1 Like

This sometimes trips me up too, not all .NET assemblies are “pre-referenced” by the clr:

Also, small side note, you can make your code snippets explicitly highlight Python. So purdy.

3 Likes

For 10M elements, I am getting the .NET dictionary as the fastest for loading data. It is important to declare a big enough initial size for the .NET dictionary otherwise it can be slower. This is easy for all my use cases.

Time to build HashSet = 0.4787 sec
Time to build Python dictionary = 0.5406 sec
Time to build .NET dictionary = 0.2055 sec 2.6X faster than Python dictionary
Time to build set = 0.9315 sec

Accessing and testing for membership using integers are all within 10% for these 4 structures at around 0.14 sec for 1M operations, with the HashSet winning by 0.9% over the .NET dictionary (which has an almost a 2X faster build time). The time of 0.14 sec is after subtracting the time of an empty loop. So each operation takes about 140 ns on my 4 GHz machine or 560 machine cycles that Python has to chew thru to get one done. If you need better performance, then C# is a much better choice and I have seen over 100X speed up for simple math operations. The downside is that you have to leave the most popular language on the planet and do some heavy lifting in C#. I have done this a few times but 99% of the time I exploit Python’s productivity for my development work. Thus knowing its best methods is useful for avoiding going to C#. I really like using Python after all the decades since I first learned to program in Fortran with 80-column wide punch cards on an IBM 1620 running at 0.05 MHz:

IBM%201620 Punch%20Card

Rhino with Python is hard to beat, especially with all the good help from the Rhino Forum.

Regards,
Terry.

3 Likes

Question:
Did you test python using Rhino embedded engine? Which means you’ve had Rhino running. That would most definitely slow the solution down as it all runs within a single process.

That’d be the environment to test anyway, since this is about finding an optimal solution inside Rhino.