[
https://issues.apache.org/jira/browse/HADOOP-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012186#comment-18012186
]
Steve Loughran commented on HADOOP-19641:
-----------------------------------------
are you using the openFile seek policy as suggested? parquet will tell you when
its a parquet file and its read policy is common: 8 byte footer, reall footer,
rowgroups.
> ABFS: [ReadAheadV2] First Read should bypass ReadBufferManager
> --------------------------------------------------------------
>
> Key: HADOOP-19641
> URL: https://issues.apache.org/jira/browse/HADOOP-19641
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.4.1
> Reporter: Anuj Modi
> Assignee: Anuj Modi
> Priority: Major
> Labels: Performance
>
> We have observed this across multiple workload runs that when we start
> reading data from input stream. The first read which came to input stream has
> to be read synchronously even if we trigger prefetch request for that
> particular offset. Most of the times we end up doing extra work of checking
> if the prefetch is trigerred, removing prefetch from the pending queue and go
> ahead to do a direct remote read in workload thread itself.
> To avoid all this overhead, we will always bypass read ahead for the very
> first read of each input stream and trigger read aheads for second read
> onwards.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]