Oh my. I'll leave it to the DIH guys to suggest whether there's
something that can be done with pure DIH, and offer a couple
of alternatives:

1> You could put a MappingCharFilterFactory in your analysis
chain. In the mapping file you can map things like:
"%20" => " " that would work with DIH as well.

2> You could use SolrJ rather than DIH and unescape the
data before writing it to Solr, here's an exampl:
http://lucidworks.com/blog/indexing-with-solrj/

What's really happening here isn't that Solr isn't indexing
these, rather it's just not splitting your input up. Take a
look at the adminUI/analysis page for one of the fields in
question and you'll see what I mean. The actual tokens
indexed may be things like 20Drzal or similar.

Best,
Erick

On Fri, Sep 11, 2015 at 10:14 AM, Mark Fenbers <mark.fenb...@noaa.gov> wrote:
> Additional experimenting lead me to the discovery that /dataimport does
> *not* index words with a preceding %20 (a URL-encoded space), or in fact
> *any* preceding %xx encoding.  I can probably replace each %20 with a '+' in
> each record of my database -- the dataimporter/indexer doesn't sneeze at
> those -- but using some sort of encoding is important for certain characters
> such as double and single quotes, because many non-alphanumeric characters
> have special meanings to the shell and/or PostgreSQL and need to be escaped.
>
> So now that I know what the issue is, I need to find a work-around. Does
> Solr have any baseline processors that will handle the URL-encoding?  Being
> new to Solr, I'm not sure I have the skill to write my own.  Or, is there
> another kind of encoding I can use that Solr doesn't adversely react to??
>
> Mark
>
> On 9/11/2015 12:11 PM, Erick Erickson wrote:
>>
>> Several ideas, all shots in the dark because to analyze this we
>> need the schema definitions and the result of your query with
>> &debug=true added. In particular you'll see the "parsed query"
>> section near the bottom, and often the parsed query isn't
>> quite what you think it is. In particular this is often the issue:
>> you query q=Drzal. this translates into q=default_search_field:Drazl
>> where default_search_field is the "df" parameter in your search
>> handler ("query" or "select" in solrconfig.xml).
>>
>> Next most frequent thing: Your analysis chain does things you're
>> not expecting. Simple example is whether the analysis lower-cases
>> or not. For this kind of problem, the Admin UI>>core>>analysis page
>> is _really_ your friend.
>>
>> Best,
>> Erick
>>
>

Reply via email to