[ 
https://issues.apache.org/jira/browse/PIG-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4821:
------------------------------------
    Description: SedesHelper.writeChararray does writeUTF, but we do str1 = new 
String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); when reading it 
in the BinInterSedesTupleRawComparator 
https://github.com/apache/pig/blob/e0c5f265c68491395d8303c86195445be3d8aecf/src/org/apache/pig/data/BinInterSedes.java#L959-L964.
 For some reason, this works fine in my MAC (both jdk7 and jdk8) but not in 
Linux. Not sure about the actual cause and have not dug into it. Suspecting 
either charset environment or the specific update of jdk 8 (different in my MAC 
and Linux).  (was: SedesHelper.writeChararray does writeUTF, but we do str1 = 
new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); when 
reading it in the BinInterSedesTupleRawComparator. For some reason, this works 
fine in my MAC (both jdk7 and jdk8) but not in Linux. Not sure about the actual 
cause and have not dug into it. Suspecting either charset environment or the 
specific update of jdk 8 (different in my MAC and Linux).)

> Pig chararray field with special UTF-8 chars as part of tuple join key 
> produces wrong results in Tez
> ----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4821
>                 URL: https://issues.apache.org/jira/browse/PIG-4821
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>
> SedesHelper.writeChararray does writeUTF, but we do str1 = new 
> String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); when reading 
> it in the BinInterSedesTupleRawComparator 
> https://github.com/apache/pig/blob/e0c5f265c68491395d8303c86195445be3d8aecf/src/org/apache/pig/data/BinInterSedes.java#L959-L964.
>  For some reason, this works fine in my MAC (both jdk7 and jdk8) but not in 
> Linux. Not sure about the actual cause and have not dug into it. Suspecting 
> either charset environment or the specific update of jdk 8 (different in my 
> MAC and Linux).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to