 # [Python] List processing algorithm out of ideas

Theory:
I have two lists with equal number of items:

``````L1 = [0,1,2,3,4,5,6,7,7,8,8,8,9,10,10]
L2 = [12,412,51,523,52,54,65,74,35,22,14,1,3,76,159]
``````

How can I find the duplicate items inside `L1` get their index, and use that index to aquire the corresponding item in L2. Then get the value of these items from L2 and sum them up.
After that remove the duplicates from L1 or create another list (say L1a) without duplicates. Also create another list (L2a) where the items with indices equal to the duplicates from L1 are replaced with their sum.

Practical case:
I have a number of lines, such that all lines are in XY plane and parallel to Y axis.
Ergo, they all have constant X coordinate, but some lines just like the values in L1 overlap but have different lengths.

What I need is a sum of the lengths for each unique X coordinate.

So far I figured out:

• How to get the unique items,
• How to pop the unique items in a different list
• How to remove duplicates in L1 by converting to set then back to list

What I cannot figure out is how can I get two lists of X coordinates and Lengths that have equal number of items.
I always get either new duplicates or I get different number of items

I assume the answer is hidden in the `collections` module or by using enumerate(), but am not familiar with them and I don’t know where to look. I’m not sure if I exactly got what you want, but if I’m not mistaken you can achieve same result with very simpler algorithm:

``````import rhinoscriptsyntax as rs
yCoordinates = []
for line in lines:
yCoordinates.append(rs.CurveStartPoint(line).Y)
yCoordinates.append(rs.CurveEndPoint(line).Y)
maxY = max(yCoordinates)
minY = min(yCoordinates)
sumY = abs(minY-maxY)
``````

IVELIN PEYCHEV.gh (13.6 KB)

Hi, can there be gaps between lines on the same X? Can they overlap but with different lengths? Can they be exact duplicates?

Thanks Mahdiyar,

But I know how to find the items with equal X coordinates and sum them up, I need the lists though with equal number of items.

Yes I tried that, but how to combine the arrays such that zip stays the same and zip is a sum of all items that have equal zip?

This is the main question.

I tried something with 4-5 levels of for and if loops and ended up with lots of duplicates.

for the practical case,

Is there a way to find all lines with X==i, Y = 0 and Z = 0 and sum them up?
RhinoCommon, Grasshopper?

I think I found a solution here:

But I still get one phantom item more in one of the lists:

``````import rhinoscriptsyntax as rs
from ghpythonlib.treehelpers import list_to_tree

from collections import Counter
A = []
c = Counter()
zlst = []
for i in range(len(x)):
zlst.append([str(y[i].X),round(x[i],0)])

for j,k in zlst:
c.update({j:k})

for l in range(len(zlst)):
A.append(zlst[l])

a,b = zip(*A)

AA = list(set(a))
BB = list(set(b))

#len(AA) = 221
#len(BB) = 222
``````

I was thinking of this data structure:

``````from collections import defaultdict
lines = defaultdict(list)
for tup in zip(L1,L2):
lines[tup].append(tup)``````

Based on the practical case I’d try:

``````import rhinoscriptsyntax as rs
import scriptcontext as sc
import Rhino as R

line_guids = rs.GetObjects('select lines')

# get line geometry
lines = []
for lg in line_guids:
lines.append(rs.coerceline(lg))

# dict to hold list of lines keyed on x coordinate
line_lists = {}
# populate the dict
for line in lines:
if line.FromX not in line_lists:  # should probably test x to some window
line_lists[line.FromX] = [line]
else:
line_lists[line.FromX].append(line)

# loop through the line_lists and do calcs
for x_start, line_list in line_lists.items():
sum = 0
count = len(line_list)
for line in line_list:
sum += line.Length
print('x coord of {} has {} lines with summed length of {}'.format(x_start, count, sum))
``````

Output:

``````x coord of 0.0 has 3 lines with summed length of 20.9950916265
x coord of -6.61721090246 has 1 lines with summed length of 5.94563031763
x coord of -3.77186291026 has 3 lines with summed length of 14.1364263773
``````

Input: Should probably think about the x coordinate key precision though. Also it doesn’t use list indices, if that is a requirement.

1 Like

Or defaultdict(set) to remove duplicates. Then you can use for `xval, line in lines.iteritems()` to iterate over your list / set

1 Like

Yes, that is a requirement, since the values in L1 could be scrambled.

Thank you @nathancoatney, @Dancergraham ,

I need to think through the ideas and how I can apply them.

Plus that thread in stackoverflow seems similar to mine. I need to see if I can combine all ideas.

What I mean is that it doesn’t correlate indexes, as it works with the geometry. One could do the same with indexes, just the comparing would be different, so you would have:

``````L1 = [0,1,2,3,4,5,6,7,7,8,8,8,9,10,10]
L2 = [12,412,51,523,52,54,65,74,35,22,14,1,3,76,159]

L1_indicies = {}
for i, l in enumerate(L1):
if l not in L1_indicies:
L1_indicies[l] = [i]
else:
L1_indicies[l].append(i)

print(L1_indicies)

L1a = []
L2a = []

for value, indices in L1_indicies.items():
sum = 0
for i in indices:
sum += L2[i]
L1a.append(value)
L2a.append(sum)

print(L1a)
print(L2a)
``````

That stackoverflow link has more pythonic ways to do the same it looks like.

1 Like

You’re welcome For the next step it may be helpful to sort the lines by y value, eg:

``````for xval,listt in lines.iteritems():
print(sorted(listt,key = lambda x: x))``````
1 Like

I added a few lines and it’s now useful in my case:

``````
import collections

L1_indicies = {}

for i, l in enumerate(L1):
if l not in L1_indicies:
L1_indicies[l] = [i]
else:
L1_indicies[l].append(i)

print(L1_indicies)

L1a = []
L2a = []

for value, indices in L1_indicies.items():
sum = 0
for i in indices:
sum += L2[i]
L1a.append(value)
L2a.append(sum)

print(L1a)
print(L2a)

a = []
b = []

d = dict(zip(L1a,L2a))
od = collections.OrderedDict(sorted(d.items()))

for key, value in od.iteritems():
temp = [key,value]
a.append(temp)
b.append(temp)

``````

Thanks Graham,

Your examples are a bit advanced. I don’t know in which cases I have to use `enumerate` let alone `lambda` Thanks anyways. Some day I’ll understand them better.

1 Like

Maybe slightly more compact to avoid the zipping and the OrderedDict, sort the keys of of the dict instead of looping through the items:

``````for key in sorted(L1_indicies.keys()):
indices = L1_indicies[key]
sum = 0
for i in indices:
sum += L2[i]
L1a.append(key)
L2a.append(sum)
``````

I think that will give you two lists in sorted order, which I think is the same as your a and b lists, and should save quite a bit of overhead if that is important.

1 Like I’ll dial it back a bit !

Simple is better than complex.

(`import this`)