[ 
https://issues.apache.org/jira/browse/PIG-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4821:
------------------------------------
    Attachment: PIG-4821-1.patch

  Attaching patch.  I have not been able to come up with a test case for it and 
had postponed uploading the patch for long. Patch is simple. Been struggling 
with test case due to two issues
    - Not easily reproducible in my laptop. Tried playing around with encodings 
- sun.io.unicode.encoding, jdk and different subset of data and it did not 
work. Issue is reproducible sometimes, but is rare and not repeatable.
   - Narrow down to a smaller dataset for the test. Issue occurs during sorting 
and happens only when specific order of data go through comparison. Not able to 
exactly narrow down the minimal set of records from the larger data. 
   
 This patch is important and needs to go into the release. Will create a 
separate jira to add testcase later. 

> Pig chararray field with special UTF-8 chars as part of tuple join key 
> produces wrong results in Tez
> ----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4821
>                 URL: https://issues.apache.org/jira/browse/PIG-4821
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4821-1.patch
>
>
> SedesHelper.writeChararray does writeUTF, but we do str1 = new 
> String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); when reading 
> it in the BinInterSedesTupleRawComparator 
> https://github.com/apache/pig/blob/e0c5f265c68491395d8303c86195445be3d8aecf/src/org/apache/pig/data/BinInterSedes.java#L959-L964.
>  For some reason, this works fine in my MAC (both jdk7 and jdk8) but not in 
> Linux. Not sure about the actual cause and have not dug into it. Suspecting 
> either charset environment or the specific update of jdk 8 (different in my 
> MAC and Linux).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to