Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-23 Thread stevenNabble
Hello,

I am finding that if any fields in a document returned by a Solr query
(*wt=json* to get a JSON response) contain backslash *'\'* characters, they
are not being escaped (to make then valid JSON).

e.g. Solr returns this: 'A quoted value *\"XXX\"*, plus these are
backslashes *\r\n* which should be escaped but aren't :-('

Any ideas? I shouldn't need to escape these values before submitting to the
Solr index but I can't see any other way at the moment...

Regards
Steven



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-solr-JSONResponseWriter-not-escaping-backslash-characters-tp4112990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-23 Thread stevenNabble
Hi Chris,

thanks for the fast response. I'll try to be more specific about the
problem I am having.

# cat tmp.xml


9553522
quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)





# curl 'http://localhost:8983/solr/collection1/update?commit=true'
--data-binary @tmp.xml -H 'Content-Type: application/xml'


0134




# curl '
http://localhost:8983/solr/collection1/select?q=id:9553522&indent=true&omitHeader=true&wt=xml
'




  
9553522
quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)

  quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)


  quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)


  quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)

1458042122530717696





# curl '
http://localhost:8983/solr/collection1/select?q=id:9553522&indent=true&omitHeader=true&wt=json&fl=id,comments,_version_
'
{
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"9553522",
"comments":"quote: (\") backslash: (\\) \nbackslash-quote: (\\\")
\nnewline: ( \n) backslash-n: (\\n)",
"_version_":1458042122530717696}]
  }}



So my setup gives the same responses as yours.

The problem I have is if I try to parse this response in *php *using
*json_decode()* I get a syntax error because of the '*\n*' s that are in
the response. I could escape the before doing the *json_decode() *or at the
point of submitting to the index but this seems wrong...

I am probably doing something silly and a good nights sleep will reveal
what I am doing wrong ;-)

Thanks
Steven



On 23 January 2014 16:15, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n4113017...@n3.nabble.com> wrote:

>
> : I am finding that if any fields in a document returned by a Solr query
> : (*wt=json* to get a JSON response) contain backslash *'\'* characters,
> they
> : are not being escaped (to make then valid JSON).
>
> you're going to have to give us more concrete specifics on how you are
> indexing your data, and how you are looking at the response, because i
> can't reproduce anything close to what you are describing (see below)
>
> https://wiki.apache.org/solr/UsingMailingLists
>
>
>
>
> hossman@frisbee:~$ cat tmp/tmp.xml
> 
>   
> HOSS
> quote: (") backslash: (\) backslash-quote: (\")
> newline: (
> ) backslash-n: (\n)
>   
> 
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/update?commit=true' --data-binary
> @tmp/tmp.xml -H 'Content-Type: application/xml'
> 
> 
> 0 name="QTime">678
> 
>
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=xml'
> 
> 
>
> 
>   
> HOSS
> quote: (") backslash: (\) backslash-quote: (\")
> newline: (
> ) backslash-n: (\n)
> 1458038035233898496
> 
> 
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=json'
> {
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"HOSS",
> "name":"quote: (\") backslash: (\\) backslash-quote: (\\\")
> newline: (\n) backslash-n: (\\n)",
> "_version_":1458038035233898496}]
>   }}
>
>
> hossman@frisbee:~$ cat tmp/tmp.json
> [
>  {"id" : "HOSS",
>   "name" : "quote: (\") backslash: (\\) backslash-quote: (\\\") newline:
> (\n) backslash-n: (\\n)"}
> ]
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/update?commit=true' --data-binary
> @tmp/tmp.json -H 'Content-Type: application/json'
> {"responseHeader":{"status":0,"QTime":605}}
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=xml'
> 
> 
>
> 
>   
> HOSS
> quote: (") backslash: (\) backslash-quote: (\")
> newline: (
> ) backslash-n: (\n)
> 1458038130437259264
> 
> 
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=json'
> {
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"HOSS",
> "name":"quote: (\") backslash: (\\) backslash-quote: (\\\")
> newline: (\n) backslash-n: (\\n)",
> "_version_":1458038130437259264}]
>   }}
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-solr-JSONResponseWriter-not-escaping-backslash-characters-tp4112990p4113017.html
>  To unsubscribe from Solr solr.JSONResponseWriter not escaping backslash
> '\' characters, click 
> here
> .
> NAML

Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-24 Thread stevenNabble
Hello,

thanks to all for the help :-)

we have managed to narrow it down what is exactly going wrong. My initial
thinking on the backslashes within field values being the problem were
incorrect. The source of the problem is in-fact submitting a document with
a blank field value. The JSON returned by a query containing the
problematic value, is when doing a facet search. Details below:

# cat test.xml


9553524






# curl 'http://localhost:8983/solr/collection1/update?commit=true'
--data-binary @test.xml -H 'Content-Type: application/xml'


0369




# curl '
http://localhost:8983/solr/collection1/select?wt=json&facet=true&facet.field=year&facet.mincount=1&json.nl=map&q=id%3A9553524&start=0&rows=3&indent=true
'
{
"responseHeader": {
"status": 0,
"QTime": 8669,
"params": {
"facet": "true",
"facet.mincount": "1",
"start": "0",
"q": "id:9553524",
"facet.field": ["year"],
"json.nl": "map",
"wt": "json",
"rows": "3"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [{
"id": "9553524",
"year": [""],
"_version_": 1458116227706650624
}]
},
"facet_counts": {
"facet_queries": {
 },
"facet_fields": {
"year": {
"": 1
}
},
"facet_dates": {
 },
"facet_ranges": {
 }
}
}



As you can see above the facet count for the '*year*' field contains a
blank JSON field name. This errors when parsing with *PHP's json_decode*
(...).

*Fatal error*: Cannot access empty property in 



The workaround is to not submit empty field values into the index but this
isn't a great solution :-(

Kind Regards
Steven



On 23 January 2014 18:49, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n4113050...@n3.nabble.com> wrote:

>
> : The problem I have is if I try to parse this response in *php *using
> : *json_decode()* I get a syntax error because of the '*\n*' s that are in
> : the response. I could escape the before doing the *json_decode() *or at
> the
> : point of submitting to the index but this seems wrong...
>
> I don't really know anything about PHP, but i managed to muddle my way
> through both of the little experiments below and couldn't reporoduce any
> error from json_decode when the response contains "\n" (ie: the two byte
> sequence represnting an escaped newline character) inside of a JSON
> string, but i do get the expected error if a literal, one byte, newline
> character is in the string. (something that Solr doesn't do)
>
> are you sure when you fetch the data from Solr you aren't pre-parsing it
> in some what that's evaluating hte "\n" and converting it to a real
> newline?
>
> : I am probably doing something silly and a good nights sleep will reveal
> : what I am doing wrong ;-)
>
> Good luck.
>
> ### Experiment #1, locally crated strings, one bogus json
>
> hossman@frisbee:~$ php -a
> Interactive shell
>
> php > $valid = '{"id": "newline: (\n)"}';
> php > $bogus = "{\"id\": \"newline: (\n)\"}";
> php > var_dump($valid);
> string(23) "{"id": "newline: (\n)"}"
> php > var_dump($bogus);
> string(22) "{"id": "newline: (
> )"}"
> php > var_dump(json_decode($valid));
> object(stdClass)#1 (1) {
>   ["id"]=>
>   string(12) "newline: (
> )"
> }
> php > var_dump(json_decode($bogus));
> NULL
> php > var_dump(json_last_error());
> int(4)
>
>
> ### Experiment #2, fetching json data from Solr...
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&indent=true&omitHeader=true'
>
> {
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"HOSS",
> "name":"quote: (\") backslash: (\\) backslash-quote: (\\\")
> newline: (\n) backslash-n: (\\n)",
> "_version_":1458038130437259264}]
>   }}
> hossman@frisbee:~$ php -a
> Interactive shell
>
> php > $data =
> file_get_contents('
> http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&indent=true&omitHeader=true');
>
> php > var_dump($data);
> string(227) "{
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"HOSS",
> "name":"quote: (\") backslash: (\\) backslash-quote: (\\\")
> newline: (\n) backslash-n: (\\n)",
> "_version_":1458038130437259264}]
>   }}
> "
> php > var_dump(json_decode($data));
> object(stdClass)#1 (1) {
>   ["response"]=>
>   object(stdClass)#2 (3) {
> ["numFound"]=>
> int(1)
> ["start"]=>
> int(0)
> ["docs"]=>
> array(1) {
>   [0]=>
>   object(stdClass)#3 (3) {
> ["id"]=>
> string(4) "HOSS"
> ["name"]=>
> string(78) "quote: (") backslash: (\) backslash-quote: (\")
> newline: (
> ) backslash-n: (\n)"
> ["_version_"]=>
> int(1458038130437259264)
>   }
> }
>   }
> }
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-solr-JSONResponseWriter-not-escaping-backslash-characters-tp4112990p4113050.html
>  To unsubscribe from Solr solr.JSONResponseWriter not e