[ 
https://issues.apache.org/jira/browse/HADOOP-16202?focusedWorklogId=753410&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753410
 ]

ASF GitHub Bot logged work on HADOOP-16202:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Apr/22 14:00
            Start Date: 06/Apr/22 14:00
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on PR #2584:
URL: https://github.com/apache/hadoop/pull/2584#issuecomment-1090308409

   really need reviews of this @mukund-thakur @mehakmeet @bibinchundatt 
@dannycjones @surendralilhore
   
   This patch needs to go in before any other input stream optimisations so 
that 
   1. we can cut that HEAD request overhead on small files
   2.  distcp and fsshell can tell the streams that they are reading the whole 
file, so they should do big reads and expect no backwards seek.
   3. parquet and orc libs can switch to this to get 
   
   although #2975 sets it up, this PR doesn't include abfs in handling the file 
length option as an alternative to the file status.
   
   I've looked at it but need a plan about etag tracking. we will have to 
replicate the bit in the s3a code where the first GET's etag is picked up and 
used from then on. A future piece of work. This PR does contain the tests that 
are needed there though...
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 753410)
    Time Spent: 16h 20m  (was: 16h 10m)

> Enhance openFile() for better read performance against object stores 
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-16202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16202
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3, tools/distcp
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> The {{openFile()}} builder API lets us add new options when reading a file
> Add an option {{"fs.s3a.open.option.length"}} which takes a long and allows 
> the length of the file to be declared. If set, *no check for the existence of 
> the file is issued when opening the file*
> Also: withFileStatus() to take any FileStatus implementation, rather than 
> only S3AFileStatus -and not check that the path matches the path being 
> opened. Needed to support viewFS-style wrapping and mounting.
> and Adopt where appropriate to stop clusters with S3A reads switched to 
> random IO from killing download/localization
> * fs shell copyToLocal
> * distcp
> * IOUtils.copy



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to