[
https://issues.apache.org/jira/browse/IMPALA-11810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith updated IMPALA-11810:
-----------------------------------
Component/s: Frontend
(was: fe)
> The mem estimate is incorrect in HdfsScanNode
> ----------------------------------------------
>
> Key: IMPALA-11810
> URL: https://issues.apache.org/jira/browse/IMPALA-11810
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 4.2.0, Impala 4.1.1
> Reporter: jhkcool
> Priority: Major
> Attachments: after_used_materialized.png, current_totalBytes.png
>
>
> About perInstanceMemEstimate calculate in the below method of the
> HdfsScanNode class:
> {code:java}
> @Override
> public void computeNodeResourceProfile(TQueryOptions queryOptions) {
> ...
> long avgScanRangeBytes = (long) Math.ceil(sumValues(totalBytesPerFs_) /
> (double) scanRangeSize);
> ...
> }{code}
> All table data file sizes are used in the calculation of scan hdfs memory
> consumption, it is unreasonable. Because not all data in the table needs to
> be read into the memory, but only the materialized fields are read, so only
> the data file size occupied by the materialized field needs to be calculated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]