I want to generate a â€œtext cloudâ€, except there are a couple things out
of the ordinary I am doing with this.
For one, it is not for view on the web, but rather offline use. Output
to an image would be great, but HTML is still just fine (especially as I
imagine that would be much easier to accomplish).
Second item is, I am looking to do this with extremely large (in
comparison of most text cloud generators I have seen) data sets. I am
looking on the order of say min of 2 Gigabytes of text, as high as 10
Gig or more. So obviously I will need some sort of sliding scale of a
threshold that will need to be crossed before a â€œwordâ€ shows up, as I
wonâ€™t want every string to appear in the cloud.
As most of the cloud generators I have seen seem directed for use with a
much smaller dataset, I am worried about how they would scale and memory
consumption. I have seen a number of Ruby based cloud generators, how
ever most of which are integrated into rails applications. I am hoping
to do this stand alone, command line ideally.
I donâ€™t have to do this in Ruby as far as that goes, I am just working
off the thought I canâ€™t find a tool to do what I need, so I am going to
have to write one. So I was hoping at the very least to find some code I
can alter to do what I want. If anyone knows of a tool in any language
that does what I am looking for, that would be even better!
Thanks for any help.