A piece of software that badly needs to be written is an easy-to-use
statistics package. Apparently, in the social sciences the industry
standard is this program called SPSS, which is a chore to use, quite
expensive, and suffers from all sorts of vestiges that no modern
program should.
The essential functions are importing (and ‘massaging’) data, running
regressions, and making pretty graphs. When I got thinking about this
idea I realized it would be really cool as a web app, because then
datasets would be public, and different researchers could use them for
different purposes. The essence of social science is asking people to
answer series of questions, correlating their answers and attempting to
make causal inferences. While people have their theories and biases
that lead them to expect certain causal culprits, their data can speak
for themselves (in cases where the wording of their survey questions is
smart enough to let them). Perhaps if other sets of eyes happen across
these datasets, with the regressions ready to go, and other datasets
from similar surveys are readily available, a less biased researcher
can come along, aggregate the data and find a deeper truth about human
nature. These studies always have the numbers working against them;
the cost per subject is high so they don’t get as many datapoints as
they want, but a ton of datapoints is exactly what they need to move
past deceptive (‘confounded’) results and find the subtler stuff that
isn’t trivial or outright false.
So I feel that a browser-based datacruncher would be cool because it
would give people the freedom to work on/show off their findings from
any computer, and it would force them to make their data public for the
betterment of social science as explained above. Also, use of tags to
indicate which sets might be compatible for aggregation could happen;
and making all this stuff browsable would help inspire new questions in
readers, new directions for research. The only other ‘cool’ idea I’ve
had thus far was to have the thing ‘automatically’ run all the logical
regressions and succinctly inform the user which ones are significant.
So I’m looking for someone to shoot ideas back and forth, someone who
may have experience that could be applicable to this sort of project,
or who may have a mature understanding of how statistics are used in
the real world. I don’t exactly have time to get cracking on this yet,
but I do want to be actively planning it.
-Mike