Is there a Python dictionary without values?

developer
python

#1

(I tried to change the category to developer but I couldn’t…)
Hello, probably a bit sounds stupid, but I often create a dictionary only to do a quick search of keys and values are never used.

For example

pnts #list of points
def makekey(P):
    return (round(P.X,3),round(P.Y,3),round(P.Z,3))
dict={}
for P in pnts:
    dict[makekey(P)]=P
anotherpnts #another list of points
for Q in anotherpnts:
    if makekey(Q) in dict:
        #do something
    else:
        #do something else

This code quickly searches if point Q exists in the first point list. In this case, right-hand side of dict[makekey§]=P can be anything actually becasue it’s never used.
I know makekey() using round() is a little problematic, but that’s another topic.
Is there any native Python data type that is essentially dictionary minus values?
I think I understand the basic concept of hash table.

Thank you.


(Nathan 'jesterKing' Letwory) #2

You are probably looking for set()


(P1r4t3b0y) #3

Wouldn’t a dictionary without the values be equivalent to a list of keys?
If you had a dictionary, as follows {“value1”: None, “value2”: None, …, “valueX”: None}, then it could be represented in a list, like this [“value1”, “value2”, "valueX]. You could also call if “valueX” in [“value1”, “value2”, "valueX], to search for inclusions.


#4

Hi, that’s actually not true. dictionary creates a hashtable which makes “key in dictionary” faster than “item in list”.


#5

Hello Nathan, thanks for the quick reply. Seems like set() is exactly what I’m looking for! Thank you! Sounds like you can simply write like

_set={"A","B"}

#6

Also, key in dictionary and key in dictionary.keys() are different in speed. The former uses hashtable (fast), the latter is same as a search of a list (compare one by one, slow).


(Graham) #7

You can also use a frozenset() which is identical to a set but is immutable and can therefore itself be used as a key in a further set or dict.
You can construct your set with a comprehension if you prefer: _set = {p for p in [1,2,3]}


(P1r4t3b0y) #8

Thanks, for clearing that up @mikity_kogekoge . I didn’t know that!

I did some digging on this topic and found out that, what I recommended above, is called ‘simple search’, which simply means checking each item in a collection one by one. It takes O(n) time, meaning linear time to compute, which translates to the bigger your list gets the longer it takes.
You can optimise this by sorting your list, before performing searches on it. As I understand, this is called ‘binary search’ and takes O(Log n) time.

Although, binary search is pretty fast, looking up things in a sorted list can be tedious too. Binary search allows you to find stuff in O(log n) time. If you want to find items in O(1) time that’s where hash functions come into play. These are for instance functions that you put in a string and you get back a a number from a hash table (in python a dictionary).

Example for the imaginary capability of evaluating 10 items per second:

Simple Search Binary Search Hash Functions
# of item O(n) O(log n) O(1)
100 10 sec. 1 sec. instant
1000 1.6 min. 1 sec. instant
10,000 16.6 min. 2 sec. instant

You can check out Aditya Bhargava’s excellent ‘Grokking Algorithms’ book for more information on this topic.


(Graham) #9

Hi @p1r4t3b0y I definitely recommend you read about set operations (.union(), …) in Python as well


#10

:sunglasses: this is a good thread. :smiley:


(Graham) #11

Let’s keep it going then ! Looking at the real difference between the methods with 10,000 elements in each list:

import time
from collections import namedtuple
from random import randint as rand
from dis import dis

Point = namedtuple('Point', 'X Y Z')

pnts = [Point(rand(0,100), rand(0,100), rand(0,100)) for j in range(10000)] #list of points
this_set=set(pnts)

anotherpnts = [Point(rand(0,100), rand(0,100), rand(0,100)) for j in range(10000)] # another list of points
another_set = set(anotherpnts)

def listcheck():
	intersection = []
	for Q in anotherpnts:
	   if Q in pnts:
	   	intersection.append(Q)
	   else:
	   	pass
	       #do something else
#	print(intersection)

def setcheck():
	intersection = []
	for Q in anotherpnts:
	   if Q in this_set:
	   	intersection.append(Q)
	   else:
	   	pass
	       #do something else
#	print(intersection)
	
def set_intersect():
    intersection = this_set.intersection(another_set)
#	print(intersection)

# call the fuctions and time them :
for func in [listcheck, setcheck, set_intersect]:
    t1= time.clock()
    func()
    t2 = time.clock()
    print '%s : %.4f seconds' % (func.__name__.upper().ljust(15), t2 - t1)


Gives:

LISTCHECK : 9.6451 seconds
SETCHECK : 0.0078 seconds
SET_INTERSECT : 0.0029 seconds


#12

Yeah, sets are fast! But you’ll have to take creating it into account. Rounding the point coordinates may take a while. With loads of points I’d use RTree: https://developer.rhino3d.com/5/api/RhinoCommonWin/html/M_Rhino_Geometry_RTree_Search_3.htm


(Graham) #13

Also see here for more about how dicts and hash tables work in Python in a beautifully visual way.