Check each item's repeat times within a list


#1

Hi All,

Actually I got 2 question…I guess it could be achieved by some basic syntax in Python.
When I have a list like [0,0,1,1,1,1,3,3,3,3,3,3,3,4,4,4,4,4,5,5 … ], how can I eliminate the repeated items to be a list like [0,1,3,4,5 … ]? Is there a way to apply on int, str and even geometries?

Another question is, how can we calculation each item’s repeat times with a list?
Take the same list for example, [0,0,1,1,1,1,3,3,3,3,3,3,3,4,4,4,4,4,5,5 … ] to [2,4,7,5,2 … ].

Thanks for any help,

Jack


(Nathan 'jesterKing' Letwory) #2

Seed a set() with your list(), then convert it back to a list: https://docs.python.org/2/library/stdtypes.html#set

l = [0,0,3,3,2,2]
l.sort()
s = set(l)
ll = list(s)
ll.sort()

c = [l.count(i) for i in ll]

print(l)
print(c)

edit: note that the counting works really only for sorted lists in a useful way, hence the explicit sorts in the snippet.
edit2: this should work for any data type you put in the list.


#3

great! Thanks a lot nathanletwory!
Can I ask why we need to sort() twice? I try to avoid the second sort() but seems it still works.

Jack


(Nathan 'jesterKing' Letwory) #4

I added the second sort as well because a set is an unordered collection of items. Creating a list from a set does not guarantee to have elements sorted correctly. In many cases with simple data like ints it seems to work ok, but it won’t necessarily always be the case. Sorting the list created from the set will ensure you have sorted elements.

/Nathan


#5

As Nathan did (I was just too slow):

orig_list=[1,2,2,4,5,4,4,5,2,1,2,4,1,4,3,4,3,4,1,2,2,3,5,4,4,4,5,4,1,5,4,6]
unique=list(set(orig_list))
unique.sort()
repeats=[]
print "Unique elements in orig_list:"
print unique
print "\nIn original list:"
for element in unique:
    count=orig_list.count(element)
    repeats.append(count)
    print "{} repeated {} times".format(element,count))
Unique elements in orig_list:
[1, 2, 3, 4, 5, 6]

In original list:
1 repeated 5 times
2 repeated 6 times
3 repeated 3 times
4 repeated 12 times
5 repeated 5 times
6 repeated 1 times

–Mitch


(Nathan 'jesterKing' Letwory) #6

The repeats list is unnecessary (and the append() to it), but otherwise nice verbose :slight_smile:

/Nathan


#7

That was just in case someone actually wanted to do something with that data later… Not much overhead involved…

–Mitch


(Nathan 'jesterKing' Letwory) #8

Fair enough :slight_smile:


#9

I would do this:

from collections import Counter
from itertools import groupby

u = [1,2,2,4,5,4,4,5,2,1,2,4,1,4,3,4,3,4,1,2,2,3,5,4,4,4,5,4,1,5,4,6]
s = [1,1,1,1,1,2,2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,6]

for element, count in sorted(Counter(u).items()):
    print (element, count)

print ('\n---\n')

for element, group in groupby(s):
    print (element, len(list(group)))

#10

Or this:

from collections import defaultdict

mylist = [6,0,0,1,1,1,1,3,3,3,3,3,3,3,4,4,4,4,4,5,5,2,2,4]
d = defaultdict(list)

for n in mylist: d[n].append(n)

print "Unique items:", d.keys()

for key, values in d.items():
    print "{} repeated {} times".format(key, len(values))

1 question, 100 solutions. :wink:

c.


#11

Sorry, no, but in Python there should be one – and preferably only one – obvious way to do it. :innocent:


#12

Oh yeah, just noted the next line:

“Although that way may not be obvious at first unless you’re Dutch”

:thinking:

c.


#13

I don’t really get that line, but Guido van Rossum is Dutch.


(Nathan 'jesterKing' Letwory) #14

I do. I am dutch, too. ^.^


(Nathan 'jesterKing' Letwory) #15

Btw, I was doing some timings on the proposed methods, but suffered a RUB (rapid unscheduled boot, spacex style). But I do recall some numbers.

First I created a list of 1.000.000 randomly picked ints between 0 and 13, random.seed(13) (13 for no obvious reason).

Then I timed each proposed solution (without the printing, just creating lists of the counts.

Fastest was the Counter method, at around 0.09 on my machine (start = time.time() … time.time() - start). Slowest was groupby with 2.something. My list comprehesion was a bit slower than the list.append() from @Helvetosaur (around 0.32 vs around 0.31). I don’t recall all of them, and I’m to lazy to redo at this moment. But indeed list comprehensions aren’t the fastest around - but I do like them.


(Willem Derks) #16

My 3 line take:

orig_list=[1,2,2,4,5,4,4,5,2,1,2,4,1,4,3,4,3,4,1,2,2,3,5,4,4,4,5,4,1,5,4,6]

values_counts = [(value,orig_list.count(value)) for value in sorted(list(set(orig_list)))]

for v,c in values_counts : print '{} repeated {}'.format (c,v)

-Willem

(I’m Dutch too BTW)


#17

Using groupby to solve the counting problem just works when the list is sorted as it groups subsequent equal elements. Guess then it’s faster than Counter.

To eliminate duplicates, @Jack_Zeng, you would, btw, also use groupby:

from itertools import groupby

l = [0,0,1,1,1,1,3,3,3,3,3,3,3,4,4,4,4,4,5,5,6]
l = [element for element, _ in groupby(l)]

print (l) 

Okay, I got it, when someone understands he is Dutch. So right now, I’m Dutch, too. That’s cool!


#18

Wow, cool!
I didn’t expect there are so many ways to achieve it.
Amazing!!
I will have a look at groupby as well, seems it is an advanced library I should know~~:grin:
Thanks guys!!

Jack