Multi-valued CSV fields are double encoded.

We start with: "aaa ""bbb""ccc"'
Then decoding one leve, we get:  aaa "bbb"ccc
Decoding again to get individual values results in a decode error
because the encapsulator appears unescaped in the middle of the second
value (i.e. invalid CSV).

One easier way to fix this is to use a different encapsulator for the
sub-values of a multi-valued field by adding f.title.encapsulator=%27
(a single quote char)

But I can't really tell you exactly how to encode or specify options
to the CSV loader when I don't know what the actual values you want
after "aaa ""bbb""ccc"' is decoded.

-Yonik
http://www.lucidimagination.com



On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć <r....@solr.pl> wrote:
> Hi!
>
>  Yonik, thanks for the reply. I just realized that the example I gave
> was not full - the error is returned by Solr only when the field is
> multivalued and the values in the fields are splited. For example, the
> following curl command give me the mentioned error:
>
> curl
> 'http://localhost:8983/solr/update/csv?fieldnames=id,title&commit=true&en
> capsulator=%22&f.title.split=true&f.title.separator=%20' -H
> 'Content-type:text/plain' -d '"1","aaa ""bbb""ccc"'
>
> while the following is executed without any problem:
> curl
> 'http://localhost:8983/solr/update/csv?fieldnames=id,title&commit=true&en
> capsulator=%22&f.title.split=true&f.title.separator=%20' -H
> 'Content-type:text/plain' -d '"1","aaa ""bbb"" ccc"'
>
> The only difference between those two is the additional space
> character in between bbb"" and ccc in the second example.
>
> Am I doing something wrong ? ;)
>
> --
> Regards,
>  Rafał Kuć
>  http://solr.pl
>
>> This works fine for me:
>
>> curl http://localhost:8983/solr/update/csv -H
>> 'Content-type:text/plain' -d 'id,name
>> "1","aaa ""bbb"" ccc"'
>
>> -Yonik
>> http://www.lucidimagination.com
>
>
>> On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć <r....@solr.pl> wrote:
>>> Hello!
>>>
>>>  I have a question about the CSV update handler. Lets say I have the
>>> following file sent to CSV update handler using curl:
>>>
>>> id,name
>>> "1","aaa ""bbb""ccc"
>>>
>>> It throws an error, saying that:
>>> Error 400 java.io.IOException: (line 0) invalid char between encapsulated 
>>> token end delimiter
>>>
>>> If I change the contents of the file to:
>>>
>>> id,name
>>> "1","aaa ""bbb"" ccc"
>>>
>>> it works without a problem. This anyone encountered this ? Is it know 
>>> behavior ?
>>>
>>> --
>>> Regards,
>>>  Rafał Kuć
>>>
>>>
>>>
>
>
>
>
>

Reply via email to