Proximity in hierarchy
In function cluster_dotted_names
, local proximity function code is :
def proximity(col_a, col_b):
return(sum(a==b for a,b in zip(col_a, col_b)))
# proximity ('a.b.c.d.e','b.b.c.d.e') => 8
Since a dotted name reflects position in a hierarchy, should the count stop at the first difference when reading from left to right ?
def proximity(col_a, col_b):
res = 0;
for (a,b) in zip(col_a, col_b):
if (a==b):
res = res + 1
else:
break;
return res
# proximity('a.b.c.d.e','b.b.c.d.e') => 0