Take a look at WordDelimiterFilterFactory. It has a bunch of options to allow this kind of thing to be indexed and searched.
Note that in the default schema, the definition in the index part of the fieldType definition has slightly different parameters than the query time WordDelimiterFilterFactory, that's a good place to start. WARNING: WDFF is a bit complex, you _really_ would be well served by spending some time with the Admin/Analysis page to understand the effects of these parameters... Best, Erick On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers <paul.roge...@gmail.com> wrote: > Hi Guys > > I have a Solr application searching on data uploaded by Nutch. The search > I wish to carry out is for a particular document reference contained within > the "url" field, e.g. IAE-UPC-0001. > > The problem is is that the file names that comprise the url's are not > consistent, so a url might contain the reference as IAE-UPC-0001 or > IAE_UPC_0001 (ie using either the minus or underscore as the delimiter) but > not both. > > I have created the query (in the solr admin interface): > > url:"IAE-UPC-0001" > > which works (returning the single expected document), as do: > > url:"IAE*UPC*0001" > url:"IAE?UPC?0001" > > when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as > a delimiter). > > However: > > url:"IAE_UPC_0001" > url:"IAE*UPC*0001" > url:"IAE?UPC?0001" > > do not work (returning zero documents) when the doc ref is in the format > IAE_UPC_0001 (ie using the underscore character as the delimiter). > > I'm assuming the underscore is a special character but have tried looking > at the solr wiki but can't find anything to say what the problem is. Also > the minus sign also has a specific meaning but is nullified by adding the > quotes. > > Can anyone suggest what I'm doing wrong? > > Many thanks > > Paul >