Spss

michael.schwab · December 2, 2005, 9:48am

A piece of software that badly needs to be written is an easy-to-use
statistics package. Apparently, in the social sciences the industry
standard is this program called SPSS, which is a chore to use, quite
expensive, and suffers from all sorts of vestiges that no modern
program should.

The essential functions are importing (and ‘massaging’) data, running
regressions, and making pretty graphs. When I got thinking about this
idea I realized it would be really cool as a web app, because then
datasets would be public, and different researchers could use them for
different purposes. The essence of social science is asking people to
answer series of questions, correlating their answers and attempting to
make causal inferences. While people have their theories and biases
that lead them to expect certain causal culprits, their data can speak
for themselves (in cases where the wording of their survey questions is
smart enough to let them). Perhaps if other sets of eyes happen across
these datasets, with the regressions ready to go, and other datasets
from similar surveys are readily available, a less biased researcher
can come along, aggregate the data and find a deeper truth about human
nature. These studies always have the numbers working against them;
the cost per subject is high so they don’t get as many datapoints as
they want, but a ton of datapoints is exactly what they need to move
past deceptive (‘confounded’) results and find the subtler stuff that
isn’t trivial or outright false.

So I feel that a browser-based datacruncher would be cool because it
would give people the freedom to work on/show off their findings from
any computer, and it would force them to make their data public for the
betterment of social science as explained above. Also, use of tags to
indicate which sets might be compatible for aggregation could happen;
and making all this stuff browsable would help inspire new questions in
readers, new directions for research. The only other ‘cool’ idea I’ve
had thus far was to have the thing ‘automatically’ run all the logical
regressions and succinctly inform the user which ones are significant.

So I’m looking for someone to shoot ideas back and forth, someone who
may have experience that could be applicable to this sort of project,
or who may have a mature understanding of how statistics are used in
the real world. I don’t exactly have time to get cracking on this yet,
but I do want to be actively planning it.

-Mike

michael.schwab · December 2, 2005, 11:24am

michael.schwab wrote:

So I’m looking for someone to shoot ideas back and forth, someone who
may have experience that could be applicable to this sort of project,
or who may have a mature understanding of how statistics are used in
the real world. I don’t exactly have time to get cracking on this yet,
but I do want to be actively planning it.

-Mike

In biological sciences spss is used a lot too, I myself have only
limited experience though. If you are really serious about this then I
would use the r-project as the backbone for all the statistical test. I
think there are some fairly limited ruby-rproject bindings available,
but they all seem to limited/unmaintained/undocumented (someone please
correct me if I’m wrong). So this might be the first step to take.

I am somewhat hesitant on the whole statistical package thing. I know
that in biological sciences statistics are often badly understood and
people mostly use these packages wrongly. I know that some statiticians
looked at a couple of papers and in a large amount (80%?) the statistic
methods used were completely wrong. I know you can’t really blame spss
for this, but the fact is that people will get answers from spss even if
they don’t understand what they are actually doing. If you want to use a
higher level language you are forced to learn about what you are doing
-> less mistakes.
On the other hand it is about time that we have a good open source
statistics package, that is also available from linux.

I would also be hesitant about forcing people to open up their data. I
myself would be hesitant to share all my data before I had analysed
it/published an article about it.

michael.schwab · December 3, 2005, 1:37am

One letter: R

http://www.r-project.org/

(I think that’s it. Anyway, does everything you want and is also a
sort of nice scripting language).

Zed A. Shaw

On Fri, 2 Dec 2005 17:45:46 +0900