[jira] [Updated] (PIG-2700) Unit tests fail against Hadoop 2.0.0

Cheolsoo Park (JIRA) Tue, 15 May 2012 21:50:41 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cheolsoo Park updated PIG-2700:
-------------------------------

    Description: 
I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:

{code}
--- ivy/libraries.properties
+++ ivy/libraries.properties
@@ -37,9 +37,9 @@ guava.version=11.0
 jersey-core.version=1.8
 hadoop-core.version=1.0.0
 hadoop-test.version=1.0.0
-hadoop-common.version=0.23.1
-hadoop-hdfs.version=0.23.1
-hadoop-mapreduce.version=0.23.1
+hadoop-common.version=2.0.0-SNAPSHOT
+hadoop-hdfs.version=2.0.0-SNAPSHOT
+hadoop-mapreduce.version=2.0.0-SNAPSHOT
{code}

And I see the following issues:

1) copyFromLocalToCluster fails:
{code}
fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please 
check output logs for details
java.io.IOException: fs command '-put AccumulatorInput.txt 
AccumulatorInput.txt' failed. Please check output logs for details
    at 
org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012)
{code}

I am getting around this problem by explicitly creating intermediate 
directories that do not exist. (Please see the attached patch.)


2) Many tests including TestAccumulator hang and eventually timeout. The JVM 
thread dump shows the following call stack:

{code}
[junit]    java.lang.Thread.State: TIMED_WAITING (sleeping)
[junit]     at java.lang.Thread.sleep(Native Method)
[junit]     at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
[junit]     at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
[junit]     at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
[junit]     at org.apache.pig.PigServer.storeEx(PigServer.java:996)
[junit]     at org.apache.pig.PigServer.store(PigServer.java:963)
[junit]     at org.apache.pig.PigServer.openIterator(PigServer.java:876)
[junit]     at 
org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
{code}

This is because test jobs are never finished in the mini cluster. The reason 
why test jobs are never finished is because they fail with a ClassNotFound 
exception while being executed.

In fact, this is a regression of HADOOP-6963 where hadoop introduced dependency 
on Apache Commons IO library:

{code:title=FileUtil.java}
isSymLink = org.apache.commons.io.FileUtils.isSymlink(allFiles[i]);
{code}

But the Apache Commons IO library is missing in Pig, so test jobs keep failing 
in the mini cluster until timeout.

I am fixing this issue by adding commons-io-2.3.jar to ivy.xml and 
library.properties. (Please see the attached patch.)

  was:
I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:

{code}
--- ivy/libraries.properties
+++ ivy/libraries.properties
@@ -37,9 +37,9 @@ guava.version=11.0
 jersey-core.version=1.8
 hadoop-core.version=1.0.0
 hadoop-test.version=1.0.0
-hadoop-common.version=0.23.1
-hadoop-hdfs.version=0.23.1
-hadoop-mapreduce.version=0.23.1
+hadoop-common.version=2.0.0-SNAPSHOT
+hadoop-hdfs.version=2.0.0-SNAPSHOT
+hadoop-mapreduce.version=2.0.0-SNAPSHOT
{code}

And see the following issues:

1) copyFromLocalToCluster fails:
{code}
fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please 
check output logs for details
java.io.IOException: fs command '-put AccumulatorInput.txt 
AccumulatorInput.txt' failed. Please check output logs for details
    at 
org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012) 
    at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117)
    at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
    at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.test.Util.copyFromLocalToCluster(Util.java:538)
    at org.apache.pig.test.TestAccumulator.createFiles(TestAccumulator.java:83)
    at org.apache.pig.test.TestAccumulator.setUp(TestAccumulator.java:63)
{code}

2) TestAccumulator times out with the following error message in the log:

{code}
Testcase: testAccumBasic took 794.69 sec
    Caused an ERROR
Unable to open iterator for alias C
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias C
    at org.apache.pig.PigServer.openIterator(PigServer.java:901)
    at 
org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
    at org.apache.pig.PigServer.openIterator(PigServer.java:893)
{code}

The JVM thread dump shows the following call stack:

{code}
[junit]    java.lang.Thread.State: TIMED_WAITING (sleeping)
[junit]         at java.lang.Thread.sleep(Native Method)
[junit]         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
[junit]         at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
[junit]         at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
[junit]         at org.apache.pig.PigServer.storeEx(PigServer.java:996)
[junit]         at org.apache.pig.PigServer.store(PigServer.java:963)
[junit]         at org.apache.pig.PigServer.openIterator(PigServer.java:876)
[junit]         at 
org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
{code}


As for the 1st issue, I am getting around it with the following change:

{code}
diff --git test/org/apache/pig/test/Util.java test/org/apache/pig/test/Util.java
index ca168ca..e88eb4a 100644
--- test/org/apache/pig/test/Util.java
+++ test/org/apache/pig/test/Util.java
@@ -531,7 +531,14 @@ public class Util {
         PigServer ps = new PigServer(ExecType.MAPREDUCE, 
cluster.getProperties());
         String script = "fs -put " + localFileName + " " + fileNameOnCluster;
 
-       GruntParser parser = new GruntParser(new StringReader(script));
+        FileSystem fs = cluster.getFileSystem();
+        Path clusterFile = new Path(fileNameOnCluster);
+        Path clusterFileParent = clusterFile.getParent();
+        if (!fs.exists(clusterFileParent)) {
+          fs.mkdirs(clusterFileParent);
+        }
+
+        GruntParser parser = new GruntParser(new StringReader(script));
         parser.setInteractive(false);
         parser.setParams(ps);
         try {
{code}

But I am not sure what's happening with the 2nd issue.


     Patch Info: Patch Available
    
> Unit tests fail against Hadoop 2.0.0
> ------------------------------------
>
>                 Key: PIG-2700
>                 URL: https://issues.apache.org/jira/browse/PIG-2700
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.3
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2460.patch
>
>
> I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:
> {code}
> --- ivy/libraries.properties
> +++ ivy/libraries.properties
> @@ -37,9 +37,9 @@ guava.version=11.0
>  jersey-core.version=1.8
>  hadoop-core.version=1.0.0
>  hadoop-test.version=1.0.0
> -hadoop-common.version=0.23.1
> -hadoop-hdfs.version=0.23.1
> -hadoop-mapreduce.version=0.23.1
> +hadoop-common.version=2.0.0-SNAPSHOT
> +hadoop-hdfs.version=2.0.0-SNAPSHOT
> +hadoop-mapreduce.version=2.0.0-SNAPSHOT
> {code}
> And I see the following issues:
> 1) copyFromLocalToCluster fails:
> {code}
> fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please 
> check output logs for details
> java.io.IOException: fs command '-put AccumulatorInput.txt 
> AccumulatorInput.txt' failed. Please check output logs for details
>     at 
> org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012)
> {code}
> I am getting around this problem by explicitly creating intermediate 
> directories that do not exist. (Please see the attached patch.)
> 2) Many tests including TestAccumulator hang and eventually timeout. The JVM 
> thread dump shows the following call stack:
> {code}
> [junit]    java.lang.Thread.State: TIMED_WAITING (sleeping)
> [junit]     at java.lang.Thread.sleep(Native Method)
> [junit]     at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
> [junit]     at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
> [junit]     at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
> [junit]     at org.apache.pig.PigServer.storeEx(PigServer.java:996)
> [junit]     at org.apache.pig.PigServer.store(PigServer.java:963)
> [junit]     at org.apache.pig.PigServer.openIterator(PigServer.java:876)
> [junit]     at 
> org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
> {code}
> This is because test jobs are never finished in the mini cluster. The reason 
> why test jobs are never finished is because they fail with a ClassNotFound 
> exception while being executed.
> In fact, this is a regression of HADOOP-6963 where hadoop introduced 
> dependency on Apache Commons IO library:
> {code:title=FileUtil.java}
> isSymLink = org.apache.commons.io.FileUtils.isSymlink(allFiles[i]);
> {code}
> But the Apache Commons IO library is missing in Pig, so test jobs keep 
> failing in the mini cluster until timeout.
> I am fixing this issue by adding commons-io-2.3.jar to ivy.xml and 
> library.properties. (Please see the attached patch.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2700) Unit tests fail against Hadoop 2.0.0

Reply via email to