Hello, I am pleased to announce the availability of the Ruby library 'clusterer' which implements the basic K-Means and Hierarchical Clustering algorithms for text data. The project Rubyforge page is: http://rubyforge.org/projects/clusterer/ The library can be installed using: gem install clusterer More information can be found in the following blog entry: http://cuttingtheredtape.blogspot.com/2006/08/ruby-clustering-library-for-text-data.html Happy hacking. -- Surendra Singhi http://ssinghi.kreeti.com, http://www.kreeti.com Read my blog at: http://cuttingtheredtape.blogspot.com/ ,---- | "All animals are equal, but some animals are more equal than others." | -- Orwell, Animal Farm, 1945 `----
on 2006-08-22 19:53
on 2006-08-22 20:19
On 8/22/06, Surendra Singhi <efuzzyone@netscape.net> wrote: > I am pleased to announce the availability of the Ruby library 'clusterer' > which implements the basic K-Means and Hierarchical Clustering algorithms for > text data. I've installed the gem but am not getting very good results with my limited use. In particular, I tried the example you posted on your blog: Clusterer::Clustering.kmeans_clustering(["hello world","mea culpa","goodbye world"]) but it appears to have placed all three strings in the same cluster; the result was [[0, 1, 2]]. I get a similar result ([[1, 0, 2]]) if I try the hierarchical clustering instead. This is on Mac OS X 10.4, running Ruby 1.8.4.
on 2006-08-23 12:01
On Tuesday 22 August 2006 18:51, Surendra Singhi wrote: > The library can be installed using: > > gem install clusterer > > More information can be found in the following blog entry: > > http://cuttingtheredtape.blogspot.com/2006/08/ruby-clustering-library-for-t >ext-data.html > > > Happy hacking. This looks very interesting, great work. Alex
on 2006-08-23 15:13
Hello, "Lyle Johnson" <lyle.johnson@gmail.com> writes: > Clusterer::Clustering.kmeans_clustering(["hello world","mea > culpa","goodbye world"]) > > but it appears to have placed all three strings in the same cluster; > the result was [[0, 1, 2]]. I get a similar result ([[1, 0, 2]]) if I > try the hierarchical clustering instead. > The examples were just to show how to use the algorithms. Clustering can also be thought of as a problem where you are looking for representative points for a given set of points, if you want to preserve all the information you can have every point as a cluster, or if you want maximum compression, then just have one cluster. So, there is a trade-off. Here I choose the default number of clusters equal to Math.sqrt(no. of docs), and with the example it reduces to integer 1, and hence one cluster. If you want custom number of clusters, then use Clusterer::Clustering.kmeans_clustering(["hello world","mea culpa","goodbye world"],2) and also use it on a larger corpus to really evaluate the merit of the algorithms. The algorithms may also need some additional customisation depending upon the problem domain. Cheers, -- Surendra Singhi http://ssinghi.kreeti.com, http://www.kreeti.com Read my blog at: http://cuttingtheredtape.blogspot.com/ ,---- | By all means marry; if you get a good wife, you'll be happy. If you | get a bad one, you'll become a philosopher. | -- Socrates `----
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.