[Slightly OT] Mashup, Legal & Technical Inquiry

michael · November 21, 2006, 2:45am

All,

I have a rails app that I want to build which is a mashup of a few
different data sources, each requiring their own user logons to the 3rd
party data. What I would like to find out is what others in a similar
situation have done and get advice as to the various legal and technical
issues that may be involved.

Is it okay to act as a proxy to the third party website on behalf of
the user requesting the data. This user is obviously a registered user
of their service and instead of requesting the data from the end users’
machine, they would request it from my server. I don’t want to do this,
but unless someone has a nice workaround to the cross-domain scripting
restrictions then I don’t know what else to do. AFAIK, JSON isn’t
supported on these third party sites. My primary concern here is that
the third pary site will block access to my servers IP address if too
many hits in a day. They would get the hit anyway, but just not
consolidated to my IP address and I don’t want my service to get shut
down in an instant if/when they determine that I’m doing too many
requests in a day. Also, I want to make sure that I’m within legal
boundaries to do this. Again, I’m just acting as an agent on behalf of
their real customer.
Is it acceptable to STORE this data on my server so that I don’t
need to do constant queries to the third party system? I would only
give access to it to users who are registered users of the third party
system. I’m being both selfish and nice in my thinking here. First, I
may have a bunch of registered users that are requesting the same set of
data. Instead of querying it over-and-over when I already have a recent
copy of it then I would prefer to store it in my own database and
determine if a new query to the third party site should be performed or
not. This will save me overhead on scraping the returned html pages and
it will save unnecessary hits to the third party site. Ok…mostly I’m
being selfish. Explained further in “3” below.
This application will be a sort of “monitoring” application. The
end user will load up the site in their browser and then it will be set
to auto-refresh every “nn” seconds/minutes/whatever. I want to
HIGHLIGHT NEW items that become available. This will be easier to
figure out if I have them in my own database. Otherwise, does anyone
have any neat tricks on how to figure out new from old items being made
from AJAX calls? The only thing I can think of is a javascript array
that holds the old items and compares to the list of new items. Session
variables aren’t really an option here (I think bad design???). If an
item isn’t in the old array then I can apply some special affects to it.
If an item is in the new array that isn’t in the old array then I can
remove it. Any other thoughts? Instead of dealing with the javascript,
I would rather query my own database to pull back changed items since
the last query.
Assuming all above is okay (which is a big assumption), what’s the
best tool in ruby/rails to use to act as a proxy? Currently, I have
experience with rubyful_soup and mechanize. However, I have used them
in single-use/single-call situations and have not had to use them with
potentially hundreds of simultaneous instances. Guidance?

Any input is greatly appreciated. If anyone has a better approach to a
mashup design then I’m open to all suggestions. My primary goal is to
enhance the data, add helpful information to it and give the user a
better overall experience (definition of mashup, right?).
Unforuntately, I don’t know the best ways to work around browser
restrictions and I know this community has some of the best talent out
there - so I’m hopeful that I will get some good feedback, especially as
it relates to the various ruby/rails tools available.

Thanks,

Michael