Re: Selection not "pushed down into" CTE

2024-01-07 Thread Tom Lane
Clemens Eisserer  writes:
> running postgresql 15.5 I was recently surpised postgresql didn't
> perform an optimization which I thought would be easy to apply.

It is not.

> running the following query results in a full sort (caused by lead
> over order by) as the ts > '2024-01-04' selection doesn't seem to be
> applied to the CTE but only later:

> with cte as (select ts, lead(ts, 1) over (order by ts) as ts2 from smartmeter)
> select ts, ts2 from cte where ts > '2024-01-04' and  extract(epoch
> from ts2) - extract(epoch from ts) > 9;

The ts restriction is not pushed down because of the rules in
allpaths.c:

 * 4. If the subquery has any window functions, we must not push down quals
 * that reference any output columns that are not listed in all the subquery's
 * window PARTITION BY clauses.  We can push down quals that use only
 * partitioning columns because they should succeed or fail identically for
 * every row of any one window partition, and totally excluding some
 * partitions will not change a window function's results for remaining
 * partitions.  (Again, this also requires nonvolatile quals, but
 * subquery_is_pushdown_safe handles that.)

To conclude that it'd be safe with this particular window function
requires deep knowledge of that function's semantics, which the
planner has not got.

regards, tom lane




Re: Slow GroupAggregate and Sort

2024-01-07 Thread Darwin Correa
Hello, my answers in blue again 







 El Wed, 03 Jan 2024 21:43:15 -0500, Jeff Janes  
escribió 



On Mon, Jan 1, 2024 at 9:57 AM Darwin Correa  wrote:

Hello, Happy New Year! I add my responses in blue.







 El Thu, 28 Dec 2023 13:06:18 -0500, Jeff Janes 
 escribió 




I thought the point of sharding was to bring more CPU and RAM to bear than can 
feasibly be obtained in one machine.  Doesn't that make 24 shards per machine 
completely nuts?








Based o citus docs the recommended shards is 2x cpu cores in my case I've 
tested with few shards and 1:1, 2:1 shards but always have slow query time in 
the last step (sorting and grouping) in máster node.








That might make sense if PostgreSQL didn't do parallelization itself.  But 
according to your plan, PostgreSQL itself tries to parallelize 4 ways (although 
fails, as it can't find any available workers) and then you have 24 nodes all 
doing the same thing, all with only 12 CPU.  That doesn't seem good. although 
it now does seem unrelated to the issue at hand.








But the coordinator (who make sort and aggr) are in separate server (each node 
si phisically other server) I no understand why if cooridnator not aree too 
busy, and I've already test with less shards, and time increment.







I'd break this down into more manageable chunks for investigation.  Populate 
one scratch table (on one node, not a hypertable) with all 2.6 million rows.  
See how long it takes to populate it based on the citus query, and separately 
see how long it takes to run the aggregate query on the populated scratch table.









After scratch table filled sort took 32s, explain 
(https://explain.dalibo.com/plan/8a3h26hcc6328c11)







So that plan shows the sort to be egregiously slow, and with no involvement of 
citus and no apparent reason for slowness.  I'm thinking you have a 
pathological collation being used.  What is your default collation?  (Your DDL 
shows that no non-default collations are in use, but doesn't indicate what the 
default is)








The collation is en_US.UFT-8, can you give more detail or which refer to 
"pathological collation" please to research about that? and the data store in 
this tables and this column specific are only alphanumeric charactres a-z,A-Z 
and numbers, nothing special



Cheers,



Jeff










 


Darwin

 






Correa P.   



//     software architect




























 



 



 


Veintimilla y Leonidas Plaza   



065925       



mailto:[email protected]  /  mailto:[email protected]   



  //      DESARROLLO E INNOVACIÓN TECNOLÓGICA