# Calculating the distance between all the combinations made by one column by python3

I need to calculate the distance between all possible contact points present in the dataframe using python3. My code is working but it is very slow. How to reduce the time?

Condition: If chr string is same in the dataframe. Then distance between all gene will be calculated. Chr1 is present in row Index 1,3,5 so three combination will be made:

1)MK1 MI5 40 (62~17)

2)MK1 MR4 9 (62~51)

3)MI5 MR4 34 (17~51)

Similarly chr2 is present in index 1 and 2. So the only combination will be LC1 LI6 18 (16~34).

``````INPUT
chr   st  st1 gene
0     chr1  62  62  MK1
1     chr2  16  16  LC1
2     chr2  34  34  LI6
3     chr1  17  17  MI5
4     chr3  15  15  LI6
5     chr1  51  51  MR4

OUTPUT
gene1 gene2  dist
MK1   MI5    45
MK1   MR4    11
MI5   MK1    45

``````

data1 = {‘chr’: [‘chr1’,‘chr2’,‘chr2’,‘chr1’,‘chr3’,‘chr1’],
‘st’: [62,16,34,17,15,51],
‘st1’: [62,16,34,17,15,51],
‘gene’:[‘MK1’,‘LC1’,‘LI6’,‘MI5’,‘LI6’,‘MR4’]}
data = pd.DataFrame(data1)
chr = pd.Series(data.chr.unique())

cols = [‘gen1’,‘gen2’,‘dist’]
all_dist_comb = pd.DataFrame()

for chr_num in chr:
for i in range(0,len(data)):
for j in range(0,len(data)):
if i != j and chr_num == data.iloc[i,0] and chr_num == data.iloc[j,0]:
dist = abs(data.iloc[i, 1] - data.iloc[j, 1])
all_dist_comb=all_dist_comb.append(pd.Series([data.iloc[i,3],data.iloc[j,3],dist],index={‘gene1’:‘str’,‘gene2’:‘str’,‘dist’:int}),ignore_index=True)

all_dist_comb=all_dist_comb.reindex(columns=[‘gene1’, ‘gene2’, ‘dist’])
all_dist_comb[‘dist’] = all_dist_comb[‘dist’].astype(int)

print(all_dist_comb)

Hello,

How does this relate to Rhino or GrassHopper ?

Have you tried profiling your code using cprofile ?

See here for general advice on speeding up your code with Pandas:

If the dataframe is large then printing it will take a while

Also you can move your `data.iloc[i,0]` and similar checks out of the j loop, up into the i loop to avoid looking them up on every iteration.