Oh my. I'll leave it to the DIH guys to suggest whether there's something that can be done with pure DIH, and offer a couple of alternatives:
1> You could put a MappingCharFilterFactory in your analysis chain. In the mapping file you can map things like: "%20" => " " that would work with DIH as well. 2> You could use SolrJ rather than DIH and unescape the data before writing it to Solr, here's an exampl: http://lucidworks.com/blog/indexing-with-solrj/ What's really happening here isn't that Solr isn't indexing these, rather it's just not splitting your input up. Take a look at the adminUI/analysis page for one of the fields in question and you'll see what I mean. The actual tokens indexed may be things like 20Drzal or similar. Best, Erick On Fri, Sep 11, 2015 at 10:14 AM, Mark Fenbers <mark.fenb...@noaa.gov> wrote: > Additional experimenting lead me to the discovery that /dataimport does > *not* index words with a preceding %20 (a URL-encoded space), or in fact > *any* preceding %xx encoding. I can probably replace each %20 with a '+' in > each record of my database -- the dataimporter/indexer doesn't sneeze at > those -- but using some sort of encoding is important for certain characters > such as double and single quotes, because many non-alphanumeric characters > have special meanings to the shell and/or PostgreSQL and need to be escaped. > > So now that I know what the issue is, I need to find a work-around. Does > Solr have any baseline processors that will handle the URL-encoding? Being > new to Solr, I'm not sure I have the skill to write my own. Or, is there > another kind of encoding I can use that Solr doesn't adversely react to?? > > Mark > > On 9/11/2015 12:11 PM, Erick Erickson wrote: >> >> Several ideas, all shots in the dark because to analyze this we >> need the schema definitions and the result of your query with >> &debug=true added. In particular you'll see the "parsed query" >> section near the bottom, and often the parsed query isn't >> quite what you think it is. In particular this is often the issue: >> you query q=Drzal. this translates into q=default_search_field:Drazl >> where default_search_field is the "df" parameter in your search >> handler ("query" or "select" in solrconfig.xml). >> >> Next most frequent thing: Your analysis chain does things you're >> not expecting. Simple example is whether the analysis lower-cases >> or not. For this kind of problem, the Admin UI>>core>>analysis page >> is _really_ your friend. >> >> Best, >> Erick >> >