RalfJL commented on issue #12949: URL: https://github.com/apache/pinot/issues/12949#issuecomment-2079194061
to reduce the complexity of the execution plan I removed the subselect in the select. This subselect is not necessary but was there only for debugging. The result stays the same. With the filter to a specific acctuniquesessionid the result is correct, without the filter, the result is incorrect this is the reduced query: ``` -- explain plan for select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit, a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime -- ,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid -- and a.acctuniquesessionid = c.acctuniquesessionid -- and a.eventtime > 0 and b.eventtime > 0 -- and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' ) -- and c.eventtime between b.eventtime+1 and a.eventtime-1 -- ) from radius_json a join radius_json b on b.acctuniquesessionid = a.acctuniquesessionid and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and a._3gppimsi <> 'null' and a.eventtime > b.eventtime where not exists ( select 1 from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid and a.eventtime > 0 and b.eventtime > 0 and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' ) and c.eventtime between b.eventtime+1 and a.eventtime-1 ) -- and a.acctuniquesessionid = 'fac07ae7853810946d87e868e463af2c' -- and a.acctuniquesessionid = 'ac6baa744130522c1eb1eec161114d1b' -- and a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82' order by Zeit desc ``` The query plan is: ``` Execution Plan LogicalSort(sort0=[$1], dir0=[DESC]) PinotLogicalSortExchange(distribution=[hash], collation=[[1 DESC]], isSortOnSender=[false], isSortOnReceiver=[true]) LogicalProject(acctuniquesessionid=[$14], Zeit=[/(-($17, $36), 1000)], acctstatustype=[$12], acctstatustype0=[$31], eventtime=[$17], eventtime0=[$36]) LogicalFilter(condition=[IS NULL($48)]) LogicalJoin(condition=[AND(=($14, $42), =($33, $43), =($38, $44), =($39, $45), =($40, $46), =($41, $47))], joinType=[left]) PinotLogicalExchange(distribution=[hash[14, 33, 38, 39, 40, 41]]) LogicalProject($docId=[$0], $hostName=[$1], $segmentName=[$2], USERNAME=[$3], _3gppimsi=[$4], _3gppimsimccmnc=[$5], _3gppsgsnmccmnc=[$6], acctinputoctets=[$7], acctinputpackets=[$8], acctoutputoctets=[$9], acctoutputpackets=[$10], acctsessiontime=[$11], acctstatustype=[$12], acctterminatecause=[$13], acctuniquesessionid=[$14], calledstationid=[$15], callingstationid=[$16], eventtime=[$17], eventtimestamp=[$18], $docId0=[$21], $hostName0=[$22], $segmentName0=[$23], USERNAME0=[$24], _3gppimsi0=[$25], _3gppimsimccmnc0=[$26], _3gppsgsnmccmnc0=[$27], acctinputoctets0=[$28], acctinputpackets0=[$29], acctoutputoctets0=[$30], acctoutputpackets0=[$31], acctsessiontime0=[$32], acctstatustype0=[$33], acctterminatecause0=[$34], acctuniquesessionid0=[$35], calledstationid0=[$36], callingstationid0=[$37], eventtime0=[$38], eventtimestamp0=[$39], $f40=[$19], $f41=[$40], $f42=[$41], $f43=[$20]) LogicalJoin(condition=[AND(=($35, $14), >($17, $38))], joinType=[inner]) PinotLogicalExchange(distribution=[hash[14]]) LogicalProject($docId=[$0], $hostName=[$1], $segmentName=[$2], USERNAME=[$3], _3gppimsi=[$4], _3gppimsimccmnc=[$5], _3gppsgsnmccmnc=[$6], acctinputoctets=[$7], acctinputpackets=[$8], acctoutputoctets=[$9], acctoutputpackets=[$10], acctsessiontime=[$11], acctstatustype=[$12], acctterminatecause=[$13], acctuniquesessionid=[$14], calledstationid=[$15], callingstationid=[$16], eventtime=[$17], eventtimestamp=[$18], $f40=[>($17, 0)], $f43=[-($17, 1)]) LogicalFilter(condition=[=($12, _UTF-8'Stop')]) LogicalTableScan(table=[[radius_json]]) PinotLogicalExchange(distribution=[hash[14]]) LogicalProject($docId=[$0], $hostName=[$1], $segmentName=[$2], USERNAME=[$3], _3gppimsi=[$4], _3gppimsimccmnc=[$5], _3gppsgsnmccmnc=[$6], acctinputoctets=[$7], acctinputpackets=[$8], acctoutputoctets=[$9], acctoutputpackets=[$10], acctsessiontime=[$11], acctstatustype=[$12], acctterminatecause=[$13], acctuniquesessionid=[$14], calledstationid=[$15], callingstationid=[$16], eventtime=[$17], eventtimestamp=[$18], $f41=[>($17, 0)], $f42=[+($17, 1)]) LogicalFilter(condition=[=($12, _UTF-8'Start')]) LogicalTableScan(table=[[radius_json]]) PinotLogicalExchange(distribution=[hash[0, 1, 2, 3, 4, 5]]) LogicalAggregate(group=[{0, 1, 2, 3, 4, 5}], agg#0=[MIN($6)]) PinotLogicalExchange(distribution=[hash[0, 1, 2, 3, 4, 5]]) LogicalAggregate(group=[{0, 1, 2, 3, 4, 5}], agg#0=[MIN($6)]) LogicalProject(acctuniquesessionid0=[$2], acctuniquesessionid00=[$3], $f40=[$4], $f41=[$5], $f42=[$6], $f43=[$7], $f0=[true]) LogicalJoin(condition=[AND(=($3, $0), =($2, $0), >=($1, $6), <=($1, $7))], joinType=[inner]) PinotLogicalExchange(distribution=[hash[0, 0]]) LogicalProject(acctuniquesessionid=[$14], eventtime=[$17]) LogicalFilter(condition=[OR(=($12, _UTF-8'Start'), =($12, _UTF-8'Stop'))]) LogicalTableScan(table=[[radius_json]]) PinotLogicalExchange(distribution=[hash[0, 1]]) LogicalProject(acctuniquesessionid=[$0], acctuniquesessionid0=[$3], $f40=[$1], $f41=[$4], $f42=[$5], $f43=[$2]) LogicalAggregate(group=[{0, 1, 2, 3, 4, 5}]) PinotLogicalExchange(distribution=[hash[0, 1, 2, 3, 4, 5]]) LogicalAggregate(group=[{0, 2, 3, 4, 6, 7}]) LogicalJoin(condition=[AND(=($4, $0), >($1, $5))], joinType=[inner]) PinotLogicalExchange(distribution=[hash[0]]) LogicalProject(acctuniquesessionid=[$14], eventtime=[$17], $f40=[>($17, 0)], $f43=[-($17, 1)]) LogicalFilter(condition=[AND(=($12, _UTF-8'Stop'), >($17, 0))]) LogicalTableScan(table=[[radius_json]]) PinotLogicalExchange(distribution=[hash[0]]) LogicalProject(acctuniquesessionid=[$14], eventtime=[$17], $f41=[>($17, 0)], $f42=[+($17, 1)]) LogicalFilter(condition=[AND(=($12, _UTF-8'Start'), >($17, 0))]) LogicalTableScan(table=[[radius_json]]) ``` The overall goal is to compute the Bytes sent and received by a device in a defined timeframe from freeradius accounting data. So, for e.g., we want to know how many bytes where transfered by device "XYZ" in the month march. We will have to do this for more than 1000000 devices. This is the easiest request out of 4 requests. The other 3 requests will be more complicated We are still in the POC phase to see if Pinot is the rigth tool for it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org