A piece of software that badly needs to be written is an easy-to-use statistics package. Apparently, in the social sciences the industry standard is this program called SPSS, which is a chore to use, quite expensive, and suffers from all sorts of vestiges that no modern program should. The essential functions are importing (and 'massaging') data, running regressions, and making pretty graphs. When I got thinking about this idea I realized it would be really cool as a web app, because then datasets would be public, and different researchers could use them for different purposes. The essence of social science is asking people to answer series of questions, correlating their answers and attempting to make causal inferences. While people have their theories and biases that lead them to expect certain causal culprits, their data can speak for themselves (in cases where the wording of their survey questions is smart enough to let them). Perhaps if other sets of eyes happen across these datasets, with the regressions ready to go, and other datasets from similar surveys are readily available, a less biased researcher can come along, aggregate the data and find a deeper truth about human nature. These studies always have the numbers working against them; the cost per subject is high so they don't get as many datapoints as they want, but a ton of datapoints is exactly what they need to move past deceptive ('confounded') results and find the subtler stuff that isn't trivial or outright false. So I feel that a browser-based datacruncher would be cool because it would give people the freedom to work on/show off their findings from any computer, and it would force them to make their data public for the betterment of social science as explained above. Also, use of tags to indicate which sets might be compatible for aggregation could happen; and making all this stuff browsable would help inspire new questions in readers, new directions for research. The only other 'cool' idea I've had thus far was to have the thing 'automatically' run all the logical regressions and succinctly inform the user which ones are significant. So I'm looking for someone to shoot ideas back and forth, someone who may have experience that could be applicable to this sort of project, or who may have a mature understanding of how statistics are used in the real world. I don't exactly have time to get cracking on this yet, but I do want to be actively planning it. -Mike
on 2005-12-02 09:48
on 2005-12-02 11:24
michael.schwab wrote: > So I'm looking for someone to shoot ideas back and forth, someone who > may have experience that could be applicable to this sort of project, > or who may have a mature understanding of how statistics are used in > the real world. I don't exactly have time to get cracking on this yet, > but I do want to be actively planning it. > > -Mike In biological sciences spss is used a lot too, I myself have only limited experience though. If you are really serious about this then I would use the r-project as the backbone for all the statistical test. I think there are some fairly limited ruby-rproject bindings available, but they all seem to limited/unmaintained/undocumented (someone please correct me if I'm wrong). So this might be the first step to take. I am somewhat hesitant on the whole statistical package thing. I know that in biological sciences statistics are often badly understood and people mostly use these packages wrongly. I know that some statiticians looked at a couple of papers and in a large amount (80%?) the statistic methods used were completely wrong. I know you can't really blame spss for this, but the fact is that people will get answers from spss even if they don't understand what they are actually doing. If you want to use a higher level language you are forced to learn about what you are doing -> less mistakes. On the other hand it is about time that we have a good open source statistics package, that is also available from linux. I would also be hesitant about forcing people to open up their data. I myself would be hesitant to share all my data before I had analysed it/published an article about it.