atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764246045

   @huaxingao Based on your suggestion, I have narrowed the filter criteria so 
that even considering the timezone problem, we dont filter on more than two 
partitions so that filter can be pushed down completely or filter complete 
partition by adjusting the time timestamp according to UTC time but in either 
case I see post-scan filters and no aggregate pushdown. Although I do see in 
the log. Please let me know what I am missing here.
   
   > "Evaluating completely on Iceberg side: IsNotNull(initial_page_view_dtm)"
   
   ```
   23/10/16 06:36:08 INFO SparkScanBuilder: Evaluating completely on Iceberg 
side: IsNotNull(initial_page_view_dtm)
   23/10/16 06:36:08 INFO V2ScanRelationPushDown:
   Pushing operators to spark_catalog.schema.table1
   Pushed Filters: IsNotNull(initial_page_view_dtm), 
GreaterThanOrEqual(initial_page_view_dtm,2023-06-02 06:00:00.0), 
LessThanOrEqual(initial_page_view_dtm,2023-06-02 08:59:59.0)
   Post-Scan Filters: (initial_page_view_dtm#3 >= 2023-06-02 
06:00:00),(initial_page_view_dtm#3 <= 2023-06-02 08:59:59)
   
   23/10/16 06:36:08 INFO V2ScanRelationPushDown:
   Output: pageviewdate#0, initial_page_view_dtm#3
   
   23/10/16 06:36:09 INFO SnapshotScan: Scanning table 
spark_catalog.schema.table1 snapshot 3251312493606204579 created at 
2023-10-05T08:25:16.490+00:00 with filter ((initial_page_view_dtm IS NOT NULL 
AND initial_page_view_dtm >= (16-digit-int)) AND initial_page_view_dtm <= 
(16-digit-int))
   23/10/16 06:36:09 INFO LoggingMetricsReporter: Received metrics report: 
ScanReport{tableName=spark_catalog.schema.table1, 
snapshotId=3251312493606204579, 
filter=((not_null(ref(name="initial_page_view_dtm")) and 
ref(name="initial_page_view_dtm") >= "(16-digit-int)") and 
ref(name="initial_page_view_dtm") <= "(16-digit-int)"), schemaId=0, 
projectedFieldIds=[1, 4], projectedFieldNames=[pageviewdate, 
initial_page_view_dtm], 
scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS,
 totalDuration=PT0.383991592S, count=1}, 
resultDataFiles=CounterResult{unit=COUNT, value=1}, 
resultDeleteFiles=CounterResult{unit=COUNT, value=0}, 
totalDataManifests=CounterResult{unit=COUNT, value=68}, 
totalDeleteManifests=CounterResult{unit=COUNT, value=0}, 
scannedDataManifests=CounterResult{unit=COUNT, value=1}, 
skippedDataManifests=CounterResult{unit=COUNT, value=67}, 
totalFileSizeInBytes=CounterResult{unit=BYTES, value=340185692}, 
totalDeleteFileSizeInBytes=CounterResult{unit=B
 YTES, value=0}, skippedDataFiles=CounterResult{unit=COUNT, value=30}, 
skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, 
scannedDeleteManifests=CounterResult{unit=COUNT, value=0}, 
skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, 
indexedDeleteFiles=CounterResult{unit=COUNT, value=0}, 
equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, 
positionalDeleteFiles=CounterResult{unit=COUNT, value=0}}, 
metadata={engine-version=3.3.1, iceberg-version=Apache Iceberg 1.3.0 (commit 
7dbdfd33a667a721fbb21c7c7d06fec9daa30b88), 
app-id=application_1689900894764_104752, engine-name=spark}}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to