I did run attempt queries with and without escaping at the admin query browser; made no difference. I seem to recall that the system did not work without escaping, but it does seem worth blocking escaping and testing again.
Many thanks Jack On Sun, Feb 24, 2013 at 1:16 PM, Michael Della Bitta <michael.della.bi...@appinions.com> wrote: > Hello Jack, > > I'm not sure if this is an option for you, but if you submit and > retrieve your documents using only SolrJ, you won't have to worry > about escaping them for encoding into a particular document format. > SolrJ would handle that for you. > > Michael Della Bitta > > ------------------------------------------------ > Appinions > 18 East 41st Street, 2nd Floor > New York, NY 10017-6271 > > www.appinions.com > > Where Influence Isn’t a Game > > > On Sun, Feb 24, 2013 at 12:29 AM, Jack Park <jackp...@topicquests.org> wrote: >> Ok. I have revisited this issue as deeply as possible using simplistic >> unit tests, tossing out indexes, and starting fresh. >> >> A typical Solr document might have a label, e.g. the string inside the >> quotes: "Node Type". That would be queried, according to what I've >> been able to read, as a Phrase Query, which means, include the quotes >> around the text. >> >> When I use the admin query panel with this query: >> label:"Node Type" >> A fragment of the full document is returned. it is this: >> >> <doc> >> <str name="locator">NodeType</str> >> <arr name="label"> >> <str>Node Type</str> >> </arr> >> >> In my code using SolrJ, I have printlines just as the "escaped" query >> string comes in, and one which shows what the SolrQuery looks like >> after setting it up to go online. I then show what came back: >> >> Solr3Client.runQuery- label:"Node Type" 0 10 >> Solr3Client.runQuery-1 q=label%3A%22Node+Type%22&start=0&rows=10 >> ZZZZ {numFound=1,start=0,docs=[SolrDocument{locator=NodeType, >> smallIcon=cogwheel.png, subOf=ClassType, details=The TopicQuests >> typology node type., isPrivate=false, creatorId=SystemUser, label=Node >> Type, largeIcon=cogwheel.png, lastEditDate=Sat Feb 23 20:43:22 PST >> 2013, createdDate=Sat Feb 23 20:43:22 PST 2013, >> _version_=1427826019119661056}]} >> >> What that says is that SolrQuery inserted a + inside the query string, >> and that it found 1 document, but did not return it. >> >> In the largest picture, I have returned to using XMLResponseParser on >> the theory that I will now be able to take advantage of partialUpdates >> on multi-valued fields (List<String>) but haven't tested that yet. I >> am not yet escaping such things as "<" or ">" but just escaping those >> things mentioned in the Solr documents which are reserved characters. >> >> So, the current update is this: learning about phrase queries, and >> judicious escaping of reserved characters seems to be helping. Next up >> entails two issues: more robust testing of escaped characters, and >> trying to discover what is the best approach to dealing with >> characters that must be escaped to get past XML, e.g. '<', '>', and >> others. >> >> Many thanks >> Jack >> >> >> On Fri, Feb 22, 2013 at 2:44 PM, Jack Park <jackp...@topicquests.org> wrote: >>> Michael, >>> I don't think you misunderstood. I will soon give a full response here, but >>> am on the road at the moment. >>> >>> Many thanks >>> Jack >>> >>> >>> On Friday, February 22, 2013, Michael Della Bitta >>> <michael.della.bi...@appinions.com> wrote: >>>> My mistake, I misunderstood the problem. >>>> >>>> Michael Della Bitta >>>> >>>> ------------------------------------------------ >>>> Appinions >>>> 18 East 41st Street, 2nd Floor >>>> New York, NY 10017-6271 >>>> >>>> www.appinions.com >>>> >>>> Where Influence Isn’t a Game >>>> >>>> >>>> On Fri, Feb 22, 2013 at 3:55 PM, Chris Hostetter >>>> <hossman_luc...@fucit.org> wrote: >>>>> >>>>> : If you're submitting documents as XML, you're always going to have to >>>>> : escape meaningful XML characters going in. If you ask for them back as >>>>> : XML, you should be prepared to unescape special XML characters as >>>>> >>>>> that still wouldn't explain the discrepency he's claiming to see between >>>>> the json & xml resmonses (the json containing an empty string >>>>> >>>>> Jack: please elaborate with specifics about your solr version, field, >>>>> field type, how you indexed your doc, and what the request urls & raw >>>>> responses that you get are (ie: don't trust the XML you see in your >>>>> browser, it may be unescaping escaped sequences in element text to be >>>>> "helpful" .. use something like curl) >>>>> >>>>> For example... >>>>> >>>>> ----BEGIN GOOD EXAMPLE OF SPECIFICS--- >>>>> >>>>> I'm using Solr 4.x with the 4.x example schema which has the following >>>>> field... >>>>> >>>>> <field name="cat" type="string" indexed="true" stored="true" >>>>> multiValued="true"/> >>>>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" >>>>> /> >>>>> >>>>> I indexed a doc like this... >>>>> >>>>> $ curl "http://localhost:8983/solr/update?commit=true" -H >>>>> 'Content-type:application/json' -d '[{"id":"hoss", "cat":"<Something to >>>>> use >>>>> as a source node>" } ]' >>>>> >>>>> And this is what i get from the following requests... >>>>> >>>>> $ curl >>>>> "http://localhost:8983/solr/select?q=id:hoss&wt=xml&indent=true&omitHeader=true" >>>>> <?xml version="1.0" encoding="UTF-8"?> >>>>> <response> >>>>> >>>>> <result name="response" numFound="1" start="0"> >>>>> <doc> >>>>> <str name="id">hoss</str> >>>>> <arr name="cat"> >>>>> <str><Something to use as a source node></str> >>>>> </arr> >>>>> <long name="_version_">1427705631375097856</long></doc> >>>>> </result> >>>>> </response> >>>>> >>>>> $ curl >>>>> "http://localhost:8983/solr/select?q=id:hoss&wt=json&indent=true&omitHeader=true" >>>>> { >>>>> "response":{"numFound":1,"start":0,"docs":[ >>>>> { >>>>> "id":"hoss", >>>>> "cat":["<Something to use as a source node>"], >>>>> "_version_":1427705631375097856}] >>>>> }} >>>>> >>>>> $ curl >>>>> "http://localhost:8983/solr/select?q=cat:%22<Something+to+use+as+a+source+node>%22&wt=json&indent=true&omitHeader=true" >>>>> { >>>>> "response":{"numFound":1,"start":0,"docs":[ >>>>> { >>>>> "id":"hoss", >>>>> "cat":["<Something to use as a source node>"], >>>>> "_version_":1427705631375097856}] >>>>> }} >>>>> >>>>> ----END GOOD EXAMPLE OF SPECIFICS--- >>>>> >>>>> : > Even more curious, if I use this query at the console: >>>>> : > >>>>> : > details:<Something to use as a source node> >>>>> : > >>>>> : > I get nothing back. >>>>> >>>>> note in my last example above the importance of using quotes (or the >>>>> {!term} qparser) to query string fields that contain special characters >>>>> like whitespace -- whitespace is syntacally meaningul to the lucene query >>>>> parser, it seperates clauses of a boolean query. >>>>> >>>>> >>>>> -Hoss >>>>