[ 
https://issues.apache.org/jira/browse/HADOOP-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759123#comment-17759123
 ] 

Steve Loughran commented on HADOOP-18842:
-----------------------------------------

ok, so you are proposing we split the output files by dest directory, for 
parallelised reading and better scale there/

good
* you can switch from memory storage to disk storage once some threshold is 
reached.
* many readers can read files independently
* if a job commit fails, more partitions are likely to be preserved or updated
* bad: lots of files to create and open
* bad: complexit when reading in the manifest of a task to determine which file 
to update.

I suppose a tactic would be to generate a map of (dir -> accumulator), and the 
accumulator is updated with the list of files from that TA. if the accumulator 
gets above a certain size, then the switch to saving to files kicks in. You 
could probably avoid the need for the cross-thread queue /async record write by 
just having whichever thread is trying to update the accumulator acquire a lock 
to it, then do the create (if needed), plus the record writes. 

Another thing to consider is: how efficient is the current SinglePendingCommit 
structure; we do use the file format as the record format, don't we? a more 
efficient design for any accumulator would be possible, wouldn't it? something 
of (path, uploadID, array[part-info]). 

in the manifest committer I hadn't worried about the preservation of dirs until 
commit; having a single file listing all commits was just a way to avoid 
running OOM and rely on file buffering/caching to keep cost of building the 
file low.

we did hit memory problems without it though. the big issue is on a spark 
driver with many active jobs: the memory requirement of multiple job commits 
going on at the same time was causing oom failures not seen with the older 
committer, even though the entry size for each file to commit was much smaller 
(src, dest path, etag).

> Support Overwrite Directory On Commit For S3A Committers
> --------------------------------------------------------
>
>                 Key: HADOOP-18842
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18842
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>
> The goal is to add a new kind of commit mechanism in which the destination 
> directory is cleared off before committing the file.
> *Use Case*
> In case of dynamicPartition insert overwrite queries, The destination 
> directory which needs to be overwritten are not known before the execution 
> and hence it becomes a challenge to clear off the destination directory.
>  
> One approach to handle this is, The underlying engines/client will clear off 
> all the destination directories before calling the commitJob operation but 
> the issue with this approach is that, In case of failures while committing 
> the files, We might end up with the whole of previous data being deleted 
> making the recovery process difficult or time consuming.
>  
> *Solution*
> Based on mode of commit operation either *INSERT* or *OVERWRITE* , During 
> commitJob operations, The committer will map each destination directory with 
> the commits which needs to be added in the directory and if the mode is 
> *OVERWRITE* , The committer will delete the directory recursively and then 
> commit each of the files in the directory. So in case of failures (worst 
> case) The number of destination directory which will be deleted will be equal 
> to the number of threads if we do it in multi-threaded way as compared to the 
> whole data if it was done in the engine side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to