Hierarchical clustering of 1 million objects

0 votes

Can anyone point me to a hierarchical clustering tool (preferable in python) that can cluster ~1 Million objects? I have tried hcluster and also Orange.

hcluster had trouble with 18k objects. Orange was able to cluster 18k objects in seconds, but failed with 100k objects (saturated memory and eventually crashed).

I am running on a 64bit Xeon CPU (2.53GHz) and 8GB of RAM + 3GB swap on Ubuntu 11.10.

Feb 24, 2022 in Machine Learning by Dev
• 6,000 points
525 views

1 answer to this question.

0 votes
Consider switching the algorithm instead of using Hierarchical clustering and try using DBSCAN or OPTICS .
Hierarchical Clustering involves space and time complexities due to hierarchical layering and clustering of the data.
One of the drawbacks of Hierarchical Clustering is that it is nor suitable for large datasets.
One way to go ahead with huge data set and with same algorithm is to to divide the data into clusters and then construct hierarchical  trees. Again the working will depend upon the levels of the tree, size and shape of the tree.
System software and other things are also taken into account here.
answered Feb 24, 2022 by Nandini
• 5,480 points

Related Questions In Machine Learning

+1 vote
2 answers

ValueError: Found input variables with inconsistent numbers of samples: [1, 1000]

Hi@akhtar, Here you used x as your feature ...READ MORE

answered Apr 14, 2020 in Machine Learning by MD
• 95,460 points

edited Aug 11, 2021 by Soumya 57,920 views
0 votes
1 answer

Are there different types of reinforcements?

There are two types of reinforcements - ...READ MORE

answered May 9, 2019 in Machine Learning by Mishra
1,108 views
0 votes
1 answer

Real world applications of Machine Learning

Few real-world applications of machine learning are  Have ...READ MORE

answered May 10, 2019 in Machine Learning by Jinu
839 views
0 votes
1 answer

What is clustering in Machine Learning?

Clustering is a type of unsupervised learning ...READ MORE

answered May 10, 2019 in Machine Learning by Shridhar
1,157 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,442 views
0 votes
1 answer
0 votes
1 answer

Hierarchical Clustering

K-means and Hierarchical clustering are both Clustering ...READ MORE

answered Feb 2, 2022 in Machine Learning by Nandini
• 5,480 points
757 views
0 votes
1 answer

Assumptions of Naïve Bayes and Logistic Regression

There are very few difference between Naive ...READ MORE

answered Feb 7, 2022 in Machine Learning by Nandini
• 5,480 points
488 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP