[Slightly OT] Mashup, Legal & Technical Inquiry

All,

I have a rails app that I want to build which is a mashup of a few
different data sources, each requiring their own user logons to the 3rd
party data. What I would like to find out is what others in a similar
situation have done and get advice as to the various legal and technical
issues that may be involved.

  1. Is it okay to act as a proxy to the third party website on behalf of
    the user requesting the data. This user is obviously a registered user
    of their service and instead of requesting the data from the end users’
    machine, they would request it from my server. I don’t want to do this,
    but unless someone has a nice workaround to the cross-domain scripting
    restrictions then I don’t know what else to do. AFAIK, JSON isn’t
    supported on these third party sites. My primary concern here is that
    the third pary site will block access to my servers IP address if too
    many hits in a day. They would get the hit anyway, but just not
    consolidated to my IP address and I don’t want my service to get shut
    down in an instant if/when they determine that I’m doing too many
    requests in a day. Also, I want to make sure that I’m within legal
    boundaries to do this. Again, I’m just acting as an agent on behalf of
    their real customer.

  2. Is it acceptable to STORE this data on my server so that I don’t
    need to do constant queries to the third party system? I would only
    give access to it to users who are registered users of the third party
    system. I’m being both selfish and nice in my thinking here. First, I
    may have a bunch of registered users that are requesting the same set of
    data. Instead of querying it over-and-over when I already have a recent
    copy of it then I would prefer to store it in my own database and
    determine if a new query to the third party site should be performed or
    not. This will save me overhead on scraping the returned html pages and
    it will save unnecessary hits to the third party site. Ok…mostly I’m
    being selfish. Explained further in “3” below.

  3. This application will be a sort of “monitoring” application. The
    end user will load up the site in their browser and then it will be set
    to auto-refresh every “nn” seconds/minutes/whatever. I want to
    HIGHLIGHT NEW items that become available. This will be easier to
    figure out if I have them in my own database. Otherwise, does anyone
    have any neat tricks on how to figure out new from old items being made
    from AJAX calls? The only thing I can think of is a javascript array
    that holds the old items and compares to the list of new items. Session
    variables aren’t really an option here (I think bad design???). If an
    item isn’t in the old array then I can apply some special affects to it.
    If an item is in the new array that isn’t in the old array then I can
    remove it. Any other thoughts? Instead of dealing with the javascript,
    I would rather query my own database to pull back changed items since
    the last query.

  4. Assuming all above is okay (which is a big assumption), what’s the
    best tool in ruby/rails to use to act as a proxy? Currently, I have
    experience with rubyful_soup and mechanize. However, I have used them
    in single-use/single-call situations and have not had to use them with
    potentially hundreds of simultaneous instances. Guidance?

Any input is greatly appreciated. If anyone has a better approach to a
mashup design then I’m open to all suggestions. My primary goal is to
enhance the data, add helpful information to it and give the user a
better overall experience (definition of mashup, right?).
Unforuntately, I don’t know the best ways to work around browser
restrictions and I know this community has some of the best talent out
there - so I’m hopeful that I will get some good feedback, especially as
it relates to the various ruby/rails tools available.

Thanks,

Michael