[
https://issues.apache.org/jira/browse/PIG-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603433#comment-15603433
]
Rohini Palaniswamy commented on PIG-5036:
-----------------------------------------
bq. The tail columns of prerank are not necessary.
Got it. prerank file size without tail would be 667 which will still launch 2
mappers with split size of 300. Since purpose of tail is to create more mappers
and we have achieved it with split size it is not required any more. Can we
remove tail from everywhere then (generate_data.pl and nightly.conf)?
> Remove biggish from e2e input dataset
> -------------------------------------
>
> Key: PIG-5036
> URL: https://issues.apache.org/jira/browse/PIG-5036
> Project: Pig
> Issue Type: Improvement
> Components: e2e harness
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-5036-1.patch, PIG-5036-2.patch
>
>
> To reduce e2e runtime. It takes around 10 min to generate it and more time to
> run the tests involving the file (Rank_4, Rank_5). Actually it is not
> necessary, the purpose is to run multiple map and we can do that with
> "mapreduce.input.fileinputformat.split.maxsize" parameter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)