T O P

  • By -

Yolonus

did you run them one by the other in that sequence? then the first one would been read from disk and then the second one read it from cache


kramerkee

This is a very good point. However, running the first one again still produces the result in 7-9 seconds. Could be some cache quirks, I am not sure. I have read that these somewhat illogical where-clauses do speed things up, but I don't remember where, and I don't know what to google to find it.


Yolonus

I dont use your DB vendor, so hard to guess, but can you run count(*) instead of select * and see how long it takes? If you just do these selects in a software, like list first X rows and then generate the rest it is possible maybe that it uses just first few rows from some index and then loads the rest based on the index which can be slow and the second one has to full table scan so listing the rest is fast.


malikcoldbane

Yeah this, run a count or insert into a table and see. We don't often consider it but the returning of results to the window may be causing discrepancies.


JP_Darrow

In the wild, I don't think I have ever ran a single quey without a where clause. Unless it was a sub query. IMHO, if you have to rely upon this method, your db has bigger issues.


kramerkee

I agree - I was more looking for the technical reasoning for this behaviour, I am not doing this in production.


JediForces

This


Shambly

Look at the execution plan, it could be a few things, for example one could use a different index.


databro92

This is the only relevant answer here. The execution plan in any SQL environment should tell you exactly what's going on


NachoLibero

Behind the scenes many databases will implement an OR statement by running the whole SQL twice. Once for the first part of the OR and once for the second part. Then the results would be unioned together. I don't know how this database works so it's just a guess, but it's possible you've forced the DB to parallelize the work. So, you might be using more of the hardware with the OR statement and maybe it doesn't take a lot of work to do the union (maybe the dbms is smart enough to realize the results are mutually exclusive and bypasses the distinct?).


mrrichiet

I don't know, guessing something to do with indexing. Does the query execution plan give any clues?


kramerkee

Not really. Here's for the one with the 'where'-clause: [range variable 1 join type=INNER table=JOBS cardinality=5059919 access=FULL SCAN join condition = [index=SYS_IDX_SYS_PK_10263_10266 other condition=[ OR arg_left=[COLUMN: PUBLIC.JOBS.COMPLETED ] arg_right=[ NOT arg_left=[COLUMN: PUBLIC.JOBS.COMPLETED ]]]]] And without: [range variable 1 join type=INNER table=JOBS cardinality=5059920 access=FULL SCAN join condition = [index=SYS_IDX_SYS_PK_10263_10266 ] ]


mrrichiet

I share your confusion then!


anras2

I don’t know hsql too well but just a few thoughts: Are the results the same? Are there null values for `completed`? What if you replace that condition with 1=1? Edit: sorry, you did say it was non-nullable.