[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381979#comment-14381979
 ] 

Ha Son Hai commented on HADOOP-9774:
------------------------------------

Hi,
Kmeans algorythm also has problem with this in Centos with HDP 2.2. When kmeans 
is running, the intermediate result was stored on HDFS, but it look for the 
cluster directory on local file system and then fire out FileNotFound 
exception. To solve this, we had to specify the absolute path when using 
kmeans. Can anyone reopen the issues and check it? it took me several hours 
just to figure out the problem.

> RawLocalFileSystem.listStatus() return absolute paths when input path is 
> relative on Windows
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9774
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>             Fix For: 2.1.1-beta
>
>         Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
> HADOOP-9774-4.patch, HADOOP-9774-5.patch, HADOOP-9774.patch
>
>
> On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
> relative path (without drive spec), e.g., "file:///mydata", the resulting 
> paths become absolute paths, e.g., ["file://E:/mydata/t1.txt", 
> "file://E:/mydata/t2.txt"...].
> Note that if we use it to enumerate an absolute path, e.g., 
> "file://E:/mydata" then the we get the same results as above.
> This breaks some hive unit tests which uses local file system to simulate 
> HDFS when testing, therefore the drive spec is removed. Then after 
> listStatus() the path is changed to absolute path, hive failed to find the 
> path in its map reduce job.
> You'll see the following exception:
> [junit] java.io.IOException: cannot find dir = 
> pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
> pathToPartitionInfo: 
> [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
> [junit]       at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
> This problem is introduced by this JIRA:
> HADOOP-8962
> Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
> relative paths if the parent paths are relative, e.g., 
> ["file:///mydata/t1.txt", "file:///mydata/t2.txt"...]
> This behavior change is a side effect of the fix in HADOOP-8962, not an 
> intended change. The resulting behavior, even though is legitimate from a 
> function point of view, break consistency from the caller's point of view. 
> When the caller use a relative path (without drive spec) to do listStatus() 
> the resulting path should be relative. Therefore, I think this should be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to