Aaf batch_size limits indexing on mssql to 1000 records

hi!

after wondering why i can’t find alot of records i eventually found the
problem in the sqlserver_adapters implementation of “add_limit_offset!”.

the problem is when using MSSQL with the sqlserver_adapter paging will
only work when at least one column is defined in “:order”.

for example i was indexing a table with 2912 records, the generated sql
for the batches were these:

SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 1000 * FROM
table_name) AS tmp1 ) AS tmp2
SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 2000 * FROM
table_name) AS tmp1 ) AS tmp2
SELECT * FROM (SELECT TOP 912 * FROM (SELECT TOP 2912 * FROM table_name)
AS tmp1 ) AS tmp2

as you can imagine it was indexing the same top 1000 records 3 times :frowning:

i think a default ordering by the primary key would help to eliminate
that problem.

On Wed, Mar 14, 2007 at 04:45:04PM +0100, neongrau __ wrote:

i think a default ordering by the primary key would help to eliminate
that problem.

I just committed this, so could you please try if current trunk fixes
this for you?

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

I just committed this, so could you please try if current trunk fixes
this for you?

hi!

thanks for your fast response.
yes it looks like everything gets indexed now.

but now somehow find_id_by_contents doesnt work as before, has it
changed?

i was using this to get all id’s as an array:
.find_id_by_contents(search, {:limit => :all}).collect {|x| x[:id]}

On Wed, Mar 14, 2007 at 05:43:35PM +0100, neongrau __ wrote:

changed?

i was using this to get all id’s as an array:
.find_id_by_contents(search, {:limit => :all}).collect {|x| x[:id]}

if used without a block, find_id_by_contents returns a 2-element-array
where the first element is the number of total hits and the last element
is the results array, so

.find_id_by_contents(search, {:limit => :all}).last.collect {|x| x[:id]}

should work.

I’m not sure when this behaviour has been introduced, but I guess it’s
been a while ago…

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

if used without a block, find_id_by_contents returns a 2-element-array
where the first element is the number of total hits and the last element
is the results array, so

ok, found that while checking on the console.

but whats the difference to just ask for .length on the resulting array
on the old behavior?

On Wed, Mar 14, 2007 at 06:11:17PM +0100, neongrau __ wrote:

Jens K. wrote:

if used without a block, find_id_by_contents returns a 2-element-array
where the first element is the number of total hits and the last element
is the results array, so

ok, found that while checking on the console.

but whats the difference to just ask for .length on the resulting array
on the old behavior?

if you use :limit to only fetch 20 or so results, total_hits will still
tell you the total number of results, while results.length will be 20.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa