Performance difference in accessing differrent columns in a Postgres Table

2018-07-28 Thread Dinesh Kumar
Hello All,

I created a table with 200 bigint column, 200 varchar column. (Postgres
10.4)

create table i200c200 ( pk bigint primary key, int1 bigint, int2
bigint,., int200 bigint, char1 varchar(255),.., char200
varchar(255)) ;

Inserted values only in pk,int1,int200 columns with some random data ( from
generate series) and remaining columns are all null. The table has 100
rows.

I found performance variance between accessing int1 and int200 column which
is quite large.

Reports from pg_stat_statements:

 query  | total_time | min_time |
max_time | mean_time |stddev_time
-++--+--+---+
 select pk,int1 from i200c200 limit 200  |   0.65 |0.102 |
0.138 |  0.13 | 0.0140142784330839
 select pk,int199 from i200c200 limit $1 |  1.207 | 0.18 |
0.332 |0.2414 | 0.0500583659341773
 select pk,int200 from i200c200 limit 200|   1.67 |0.215 |
0.434 | 0.334 | 0.0697825193010399

Explain Analyse:

explain analyse select pk,int1 from i200c200 limit 1000;
  QUERY PLAN
--
 Limit  (cost=0.00..23.33 rows=1000 width=16) (actual
time=0.014..0.390 rows=1000 loops=1)
   ->  Seq Scan on i200c200  (cost=0.00..23334.00 rows=100
width=16) (actual time=0.013..0.268 rows=1000 loops=1)
 Planning time: 0.066 ms
 Execution time: 0.475 ms

 explain analyse select pk,int200 from i200c200 limit 1000;
  QUERY PLAN
--
 Limit  (cost=0.00..23.33 rows=1000 width=16) (actual
time=0.012..1.001 rows=1000 loops=1)
   ->  Seq Scan on i200c200  (cost=0.00..23334.00 rows=100
width=16) (actual time=0.011..0.894 rows=1000 loops=1)
 Planning time: 0.049 ms
 Execution time: 1.067 ms

I am curious in getting this postgres behaviour and its internals.

Note: I have the tried the same query with int199 column which is null in
all rows,it is still performance variant.Since,postgres doesn't store null
values in data instead it store in null bit map,there should not be this
variation(because i'm having data only for pk,int1,int200).I am wondering
that this null bit map lookup is slowing down this , because each row in my
table is having a null bit map of size (408 bits).As newbie I am wondering
whether this null bit map lookup for non-earlier column is taking too much
time (for scanning the null bit map itself).Am i thinking in right way?

Thanks in advance,

Dineshkumar.P

Postgres Newbie.


Re: Performance difference in accessing differrent columns in a Postgres Table

2018-09-04 Thread Dinesh Kumar
Hi All,
I was wondering whether the case is solved or still continuing. As a
Postgres newbie, I can't understand any of the terms (JIT, tuple
deformation) as you mentioned above. Please anyone let me know , what is
the current scenario.

Thanks,
Dineshkumar.

On Wed, Aug 1, 2018 at 8:51 PM Jeff Janes  wrote:

> On Mon, Jul 30, 2018 at 3:02 PM, Andres Freund  wrote:
>
>> Hi,
>>
>> On 2018-07-30 13:31:33 -0400, Jeff Janes wrote:
>> > I don't know where the time is going with the as-committed JIT.  None of
>> > the JIT-specific timings reported by EXPLAIN (ANALYZE) add up to
>> anything
>> > close to the slow-down I'm seeing.  Shouldn't compiling and optimization
>> > time show up there?
>>
>> As my timings showed, I don't see the slowdown you're reporting. Could
>> you post a few EXPLAIN ANALYZEs?
>>
>
>
> I don't think you showed any timings where jit_above_cost < query cost <
> jit_optimize_above_cost, which is where I saw the slow down.  (That is also
> where things naturally land for me using default settings)
>
> I've repeated my test case on a default build (./configure --with-llvm
> --prefix=) and default postgresql.conf, using the post-11BETA2 commit
> 5a71d3e.
>
>
> I've attached the full test case, and the full output.
>
> Here are the last two executions, with jit=on and jit=off, respectively.
> Doing it with TIMING OFF doesn't meaningfully change things, nor does
> increasing shared_buffers beyond the default.
>
>
>
> QUERY PLAN
>
> --
>  Seq Scan on i200c200  (cost=0.00..22.28 rows=828 width=16)
> (actual time=29.317..11966.291 rows=1000 loops=1)
>  Planning Time: 0.034 ms
>  JIT:
>Functions: 2
>Generation Time: 1.589 ms
>Inlining: false
>Inlining Time: 0.000 ms
>Optimization: false
>Optimization Time: 9.002 ms
>Emission Time: 19.948 ms
>  Execution Time: 12375.493 ms
> (11 rows)
>
> Time: 12376.281 ms (00:12.376)
> SET
> Time: 1.955 ms
>QUERY PLAN
>
> 
>  Seq Scan on i200c200  (cost=0.00..22.28 rows=828 width=16)
> (actual time=0.063..3897.302 rows=1000 loops=1)
>  Planning Time: 0.037 ms
>  Execution Time: 4292.400 ms
> (3 rows)
>
> Time: 4293.196 ms (00:04.293)
>
> Cheers,
>
> Jeff
>


Re: Performance difference in accessing differrent columns in a Postgres Table

2018-09-06 Thread Dinesh Kumar
Ok, will do that. Thanks a lot.

On Wed, Sep 5, 2018 at 9:37 PM Jeff Janes  wrote:

>
>
> On Wed, Sep 5, 2018 at 12:00 PM Jeff Janes  wrote:
>
>> On Wed, Sep 5, 2018 at 12:21 AM Dinesh Kumar  wrote:
>>
>>> Hi All,
>>> I was wondering whether the case is solved or still continuing. As a
>>> Postgres newbie, I can't understand any of the terms (JIT, tuple
>>> deformation) as you mentioned above. Please anyone let me know , what is
>>> the current scenario.
>>>
>>>
>> JIT is a just-in-time compilation, which will be new in v11.  Tuple
>> deforming is how you get the row from the on-disk format to the in-memory
>> format.
>>
>> Some people see small improvements in tuple deforming using JIT in your
>> situation, some see large decreases, depending on settings and apparently
>> on hardware.  But regardless, JIT is not going to reduce your particular
>> use case (many nullable and actually null columns, referencing a
>> high-numbered column) down to being constant-time operation in the number
>> of preceding columns.  Maybe JIT will reduce the penalty for accessing a
>> high-numbered column by 30%, but won't reduce the penalty by 30 fold.  Put
>> your NOT NULL columns first and then most frequently accessed NULLable
>> columns right after them, if you can.
>>
>
> Correction: NOT NULL columns with fixed width types first.  Then of the
> columns which are either nullable or variable width types, put the most
> frequently accessed earlier.
>
>