Additional experimenting lead me to the discovery that /dataimport does *not* index words with a preceding %20 (a URL-encoded space), or in fact *any* preceding %xx encoding. I can probably replace each %20 with a '+' in each record of my database -- the dataimporter/indexer doesn't sneeze at those -- but using some sort of encoding is important for certain characters such as double and single quotes, because many non-alphanumeric characters have special meanings to the shell and/or PostgreSQL and need to be escaped.

So now that I know what the issue is, I need to find a work-around. Does Solr have any baseline processors that will handle the URL-encoding? Being new to Solr, I'm not sure I have the skill to write my own. Or, is there another kind of encoding I can use that Solr doesn't adversely react to??

Mark

On 9/11/2015 12:11 PM, Erick Erickson wrote:
Several ideas, all shots in the dark because to analyze this we
need the schema definitions and the result of your query with
&debug=true added. In particular you'll see the "parsed query"
section near the bottom, and often the parsed query isn't
quite what you think it is. In particular this is often the issue:
you query q=Drzal. this translates into q=default_search_field:Drazl
where default_search_field is the "df" parameter in your search
handler ("query" or "select" in solrconfig.xml).

Next most frequent thing: Your analysis chain does things you're
not expecting. Simple example is whether the analysis lower-cases
or not. For this kind of problem, the Admin UI>>core>>analysis page
is _really_ your friend.

Best,
Erick

Reply via email to