i find your method very interesting , however in a case where you don't know anything about your data, meaning you don't know the optimal distance threshold , how would you encounter the problem ( also you can't choose it based on intuition with no profound study , since you can have a lot of clusters/sets of points with big variety of deviations , very dense ones vs very sparse ones ) ? My goal is: given a numerical variable ( 1 dimension only ) , do an agglomerative clustering and get the optimal number of clusters to do a binning just after that and transform it to a categorical one , I'm thinking about using some indicators of information purity in order to measure the impact of each merge, but i haven't figured it out yet.