[ 
https://issues.apache.org/jira/browse/PIG-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904852#comment-15904852
 ] 

Adam Szita commented on PIG-5177:
---------------------------------

[~kellyzly]
Basically the issue is caused by backend not being able to find the script file 
(e.g. in _ScriptEngine#getScriptAsStream_)

1. This is only an issue in yarn-client mode, in local mode it works because 
the script file is available in the local FS at its original location

2. Script files have to be carried along to (backend) executor nodes. This is 
done differently in MR/Tez vs Spark mode.
In all cases the script file paths are available in 
pigContext().getScriptFiles() (after they were registered on the frontend). In 
MR/Tez modes _JarManager#createPigScriptUDFJar(PigContext)_ will create a jar 
file and put the script files into it. This jar will be distributed among 
backend nodes, and upon job execution they will be accessed with a ClassLoader. 
(e.g here: 
https://github.com/apache/pig/blob/spark/src/org/apache/pig/scripting/ScriptEngine.java#L146)
In Spark we use _LoadConverter#registerUdfFiles_ on the frontend and let Spark 
do the job of distributing the script files to executor nodes. Later on the 
backend an executor can retrieve the path of the script file using 
SparkFiles.get(originalFileName). This will point to the file in the executor's 
container, and we can use this to open a FileInputStream on it.

This patch solves about 30 E2E test case failures, since this is a common 
problem among the scripting functionalities.

> Scripting and StreamingPythonUDFs fail with Spark exec type
> -----------------------------------------------------------
>
>                 Key: PIG-5177
>                 URL: https://issues.apache.org/jira/browse/PIG-5177
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5177.0.patch, PIG-5177.1.patch, PIG-5177.2.patch
>
>
> We are thrown an exception because the Python script file is not found on the 
> backend side (on spark executors).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to