This may be long so I'll write first things first : - Can I apply ? - Can RubyCentral be my mentor ? - If so, should I rather take a real Ruby project instead of my own ? - Do my project sound good ? I've posted that on the ruby-talk-google, but I duplicate here. Please tell me it this is forbidden, but since I don't see my message appearing I wouldn't want it to be lost again ^^; OK, so I had free time on my hands and posting it again ;) = APPLICATION, TIME REQUIRED = I don't know whether I'll have enough time to do GSoC because I have work to do at school until mid-july. But then it can still be possible if school work does not take so much time. Last year was hardcore and I am used to working 70+ hours a week. So if my school only takes me 30 to 40 hours per week in June-July I can still do a great project ;) Are GSoC students supposed to be working harder than basic guys in companies (50h/week) or are they considered as students during summer that will go to parties every night or even have weekends (25h/week) ? If more than 40h a week are a minimum then I guess I'll just have to forget about it. There is still the possibility that I ask my school to include GSoC in my scholarship, giving me more time to work on it and replace a project my friends are going to do with a grade about my GSoC, if my GSoC project gives me skills in a same field as the project, but they are a bit long to decide and applications are due soon. = RUBYCENTRAL AS A MENTOR = My project is not a project *for* the Ruby community, but merely *using* Ruby, which is my favorite language and is quite good for manipulating text. Is that still eligible for RubyCentral to be my mentor ? I like the spirit and philosophy of Ruby and its community has always looked great, so that would really motivates me if I could work with Ruby-guys :) = PROJECTS I CAN HELP WORKING ON = There are projects I'd be glad to work on improving for the community, such as ZenSpider's tools (RubyInline, Ruby2C) or some ambitious VM projects (YARV, rubinius). The question is : I think I'm quite good, but when I look at that it seems to hard. But I like challenges and I've done complicated stuff before, so are these the kind : - super hard, but when you really look at it you find that you can do it ! - super hard, only semi-gods can even understand it (which I'm not) ! Sorry for forgetting to be humble but I think I've done quite hard stuff before. I have good understanding of some subjects : my school made us recode many parts of the C standard lib with only a very small set of available function (ex: recoding malloc with only brk/sbrk. Other functions are restricted to assert, perror, exit, write and getenv). I value this because when shit happens, I can understand what I've done wrong :) I also had opportunities to discover many things : I'm no expert but I tried fun stuff such as distributed programming, image processing, functional languages (OCaML) or just stuff that are interesting and challenging to do (ObjectiveC, tiny tiny bits of Lisp, ...) which made me curious and allow me to quickly match what I'm learning now to parts I've already heard of. Now that you semi-gods know me, is your project still too hard for me or do you think I just have to read many doc then I can join in ? That said, I don't think the other ideas are bad :) I just don't know them all, and I try to ask for the projects that motivates me the most, that's only natural. = MY PROJECT IDEA = OK, now my project idea : when given a report to write, I want to help the teacher spot the cheaters (massive copy-paste from Wikipedia or other docs). This is really something that is resource consuming, so what I want to do is not having to diff every file against every other file, so I'd like to implement heuristics that lead to a "signature" of the document that would be easy to compare with many other "signatures" so that I can show the teacher parts of documents that are highly suspicious. I think the "extract-the-docs-signature" part can be slow and complicated, but I'd really like the signatures comparison to be super fast. I also prefer to let cheaters go unsuspected than to overwhelm the teacher with many cheat warnings (or my tool would defeit its purpose, which is easing the teachers' life), but that can only be a parameter in the heuristics. It has some sub-parts around it such as asking the teacher the other students' documents, asking the keywords and getting a few first docs from Google : cheaters are lazy ;) I also don't want my program to be too "google-heavy". What I am thinking as a first heuristic would be taking the words' size. If in two documents there is the same sequence of 20 words with the exact same length, this really seems suspicious. Of course, since I have said that I want to reduce greatly the number of suspicious parts, I can spend time on 'critical' parts and make some other algorithm run on them, so I can see if that signatures resemblance was only chance (ie. thinking this mail has been copied-pasted from Hamlet). In my school there is such a tool for comparing students' source code (we are not allowed to look at it of course, and maybe that's just bluff ^^). That's easy to do with code since there is a strict grammar, preprocessor tools and so on : a basic attempt of concealing cheat such as changing the variables' names does not work. Of course two sourcecodes with the same AST would be VERY suspicious. I'm willing to try this approach during my GSoC if this is necessary, but I know natural language processing is hard (impossible ?) even for researchers that are far more intelligent than I am ;) But hey, maybe I'll even be able to catch people that are merely paraphrasing Wikipedia ! As you see my thinking is not complete and I have many points to study. If some people are ever interested in that, even not for GSoC, please feel free to contact me ! I may not have lots of free time but hey, let's try ! = THANKS ! = Thanks to anybody having read until here :) Now you understand why I did not want to write it down again, but don't worry I made some copies elsewhere ;) I'm looking forward to your answers and I'm beginning to enjoy Ruby-talk, but alas that's quite time-consuming and my school is forcing me to do an awful J2EE project due very soon, I miss Rails so much :'( Thanks again everyone !
on 2007-03-21 18:02
on 2007-03-22 06:04
Sylvain Abélard wrote: > companies (50h/week) or are they considered as students during summer > that will go to parties every night or even have weekends (25h/week) ? > If more than 40h a week are a minimum then I guess I'll just have to > forget about it. > My 2 cents here would say that I doubt if anyone cares if you work 7 hours a week or 70 hours a week. What impresses me (and I'll wager Google and others) is the results. If you can do a great project then we would all benefit and would appreciate your efforts. If the project sucks and fails miserably then it doesn't matter how much work you've done - nobody benefits, no fame, and no Google job offer for you. There is always the personal benefit of what you learned but the GSoC projects I read about in Dr. Dobbs highlighted the contributions to the community. I don't remember any mention about the hours clocked. -Jim