A algorithm for finding the number

Dobai-Pataky_BSSSSl · December 1, 2010, 8:50am

Hi members,

Maybe this is not a special question to ruby.
But since my ruby app has meet this algorithm problem, so please give
the suggestion if you’d like.

I have these datas in a mysql table:

mysql> select startNum,endNum from ip_data limit 10;
±---------±---------+
| startNum | endNum |
±---------±---------+
| 16777216 | 16777471 |
| 16843008 | 16843263 |
| 16909056 | 16909311 |
| 17367040 | 17498111 |
| 17498112 | 17563647 |
| 17563648 | 17825791 |
| 17825792 | 18153471 |
| 18153472 | 18219007 |
| 18219008 | 18350079 |
| 18350080 | 18874367 |
±---------±---------+

(Those are actually the bigint for ip addresses.)

Given a numer, say 17498200, I want to find this number is in which
row of the table.

I could run the SQL with ruby’s mysql API (dbi and mysql::dbd):

select * from ip_data where startNum <= #{number} and endNum >=
#{number}

But I found that is much slow.

The talbe is not so large, has only 339542 rows totally.

So I was thinking loading the whole data into memory, discarding
mysql, and find a algorithm for doing it quickly.

Any suggestion please? Thanks.

Regards,
zuerrong

zuerrong · December 1, 2010, 9:36am

The common sense thing to do is exactly what you have done. Make sure
you have indexes on the fields. If it’s still slow, try a different
database. Five minutes and you should be able to move the data into
Access (Jet), like this:

        require 'win32ole'
        connection = WIN32OLE.new('ADODB.Connection')
        connection.Open('Provider=Microsoft.Jet.OLEDB.4.0;Data
        Source=c:\Ruby\RubyAccess.mdb')
        recordset =  WIN32OLE.new('ADODB.Recordset')
        sql = 'SELECT * FROM T1'
        recordset.Open(sql, connection)
        data = recordset.GetRows.transpose
        puts data
        recordset.close
      end

zuerrong · December 1, 2010, 10:19am

2010/12/1 Mike S. [email protected]:

The common sense thing to do is exactly what you have done. Make sure
you have indexes on the fields. If it’s still slow, try a different
database.

Yes I have added keys to the columns in mysql.
While it’s still slooow.
So I was thinking moving the whole data into memory, and find a good
algorithm for it.

Thanks.

zuerrong · December 1, 2010, 10:39am

I don’t know how SQLite works but Access Jet would only fetch across the
network the index pages and the target data page(s). Perhaps SQLite is
bringing the whole database across. A normal server-based database would
obviously solve your problem.

zuerrong · December 1, 2010, 1:18pm

On Wed, Dec 1, 2010 at 3:50 PM, zuerrong [email protected] wrote:

So I was thinking loading the whole data into memory, discarding
mysql, and find a algorithm for doing it quickly.
Any suggestion

try doing a soft binary search on the first column

best regards -botp

zuerrong · December 1, 2010, 1:24pm

Anyone know how i Unsubscribe from this ?

zuerrong · December 1, 2010, 4:48pm

On Wed, Dec 1, 2010 at 1:39 AM, Mike S. [email protected]
wrote:

I don’t know how SQLite works but Access Jet would only fetch across the
network the index pages and the target data page(s). Perhaps SQLite is
bringing the whole database across. A normal server-based database would
obviously solve your problem.

They’ve said twice they’re using mysql, not sqlite. And remember that
most folks aren’t on Windows and can’t use Access.

Quite aside from that, sqlite doesn’t do any network access at all, it
is always filesystem access only.

Ben

zuerrong · December 1, 2010, 3:06pm

zuerrong wrote in post #965316:

±---------±---------+
| startNum | endNum |
±---------±---------+
| 16777216 | 16777471 |
| 16843008 | 16843263 |
| 16909056 | 16909311 |
| 17367040 | 17498111 |
| 17498112 | 17563647 |
| 17563648 | 17825791 |
| 17825792 | 18153471 |
| 18153472 | 18219007 |
| 18219008 | 18350079 |
| 18350080 | 18874367 |
±---------±---------+

select * from ip_data where startNum <= #{number} and endNum >=
#{number}

try with

select * from ip_data where startNum = (select max(startnum) from
ip_data where startnum <= #{number})

zuerrong · December 1, 2010, 5:05pm

Ben B. wrote in post #965416:

They’ve said twice they’re using mysql, not sqlite. And remember that
most folks aren’t on Windows and can’t use Access.

Well spotted. However I don’t know why you think Windows is not widely
used. It is in the corporate world.

Quite aside from that, sqlite doesn’t do any network access at all, it
is always filesystem access only.

In the corporate world you rarely - if ever - store data except across a
network

zuerrong · December 1, 2010, 5:14pm

On Wed, Dec 1, 2010 at 8:05 AM, Mike S. [email protected]
wrote:

Well spotted. However I don’t know why you think Windows is not widely
used. It is in the corporate world.

Call it long experience in the Ruby community. At any rate, giving a
platform-specific solution, whether it’s windows or not, is a big
assumption.

In the corporate world you rarely - if ever - store data except across a
network

Yes, of course. But since sqlite has no concept of the network, it all
appears to be local disk access as far as the database is concerned.
If you happen to be accessing it over a network filesystem, sqlite has
no idea. My point was that sqlite will never “bring the whole database
across” since it has no idea what that even means.

Ben

zuerrong · December 1, 2010, 10:22pm

Ben B. wrote in post #965426:

My point was that sqlite will never “bring the whole database
across” since it has no idea what that even means.

Whilst this is not a Ruby issue, it is worth pointing out to those that
don’t already know, that client databases like Jet (and I would wager,
SQLite) do not read all the database file if there are indexes they can
use.

Your program will read the first bit of the file and detect if there are
applicable indexes. It will then fetch the index pages and look up the
addresses (typically blocks that hold data pages) and fetch those from
the file system.

If you are looking at mickey mouse databases or stuff off a local drive,
this might not be of any consequence, but if you start to get hundreds
of
thousands of records, or millions, it will dramatically cut down your
fetches across the network and the work your program has to do to set up
the data.

If you have joins, the program might be able to do them purely off the
indexes, so that next to no real data has to be fetched.

The original poster was talking about MySQL as you rightly said. The
first issue is therefore to check why the Select was running slowly -
outside of Ruby. There are many reasons why that might be.

Databases are sophisticated components that have millions of man hours
lovingly invested in them. It would be entirely illogical to attempt to
reinvent the wheel in your Ruby program.