Question on the best approach for gathering tweets based on time

Hello,

I want to know what is the best way to gather tweets from a specific
date till ‘time.now’.

I have a database which I dumped all user tweet history. Tweets are
dumped in a sqlite3 database. My db fields are tweet.created_at,
tweet.text and tweet.id plus an integer as key.

I use tweet.id to perform a match test before accepting new tweets on
the database.

However, now the script tries to dump all possible
tweets from twitter’s API every time, do the match and add the ones
that are missing (which are the ones of course). The procedure, as you
imagine, causes big delays.

The created_at date string is like this: “Tue Jul 06 10:08:23 +0000
2010”

Time matters, I can’t deal only with dates.

I have a couple of solutions in mind, but I’d like to know from more
experienced users which way to approach this:

  1. Convert the ‘created_at’ string to YYYY-MM-DD date? This could be
    tricky because there’s also the exact time of the tweet to consider.
    (didn’t try it out yet)

  2. Using sqlite3’s “id integer primary key” which uses the biggest
    number for the latest entry and extract date from there?

  3. Any smarter way?

Thanks

On Mon, May 2, 2011 at 3:22 PM, Panagiotis A.
[email protected] wrote:

the database.
I have a couple of solutions in mind, but I’d like to know from more
experienced users which way to approach this:

  1. Convert the ‘created_at’ string to YYYY-MM-DD date? This could be
    tricky because there’s also the exact time of the tweet to consider. (didn’t try
    it out yet)

  2. Using sqlite3’s “id integer primary key” which uses the biggest
    number for the latest entry and extract date from there?

  3. Any smarter way?

It seems this is rather a question for the twitter API. If that API
provides some id for every tweed and if it provides a mechanism to
query “all tweets after ” then the solution is obvious: store the
twitter id in your table and fetch do the fetch accordingly.

http://apiwiki.twitter.com/w/page/22554679/Twitter-API-Documentation

Turns out that id does exist (see XML format):
http://dev.twitter.com/doc/get/statuses/public_timeline

And there is also the “since” query type:
http://dev.twitter.com/doc/get/statuses/user_timeline

Happy coding!

Kind regards

robert