Hi Jack Thanks for the info. I'll take a look and see if I can figure it out (just purchased the book).
P On 31 July 2014 17:16, Jack Krupansky <j...@basetechnology.com> wrote: > And I have a lot more explanation and examples for word delimiter filter > in my e-book: > http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x- > deep-dive-early-access-release-7/ebook/product-21203548.html > > -- Jack Krupansky > > -----Original Message----- From: Erick Erickson > Sent: Thursday, July 31, 2014 12:58 PM > To: solr-user@lucene.apache.org > Subject: Re: How to search for phrase "IAE_UPC_0001" > > > Take a look at WordDelimiterFilterFactory. It has a bunch of > options to allow this kind of thing to be indexed and searched. > > Note that in the default schema, the definition in the index part > of the fieldType definition has slightly different parameters than > the query time WordDelimiterFilterFactory, that's a good place > to start. > > WARNING: WDFF is a bit complex, you _really_ would be well > served by spending some time with the Admin/Analysis page to > understand the effects of these parameters... > > Best, > Erick > > > > > On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers <paul.roge...@gmail.com> > wrote: > > Hi Guys >> >> I have a Solr application searching on data uploaded by Nutch. The search >> I wish to carry out is for a particular document reference contained >> within >> the "url" field, e.g. IAE-UPC-0001. >> >> The problem is is that the file names that comprise the url's are not >> consistent, so a url might contain the reference as IAE-UPC-0001 or >> IAE_UPC_0001 (ie using either the minus or underscore as the delimiter) >> but >> not both. >> >> I have created the query (in the solr admin interface): >> >> url:"IAE-UPC-0001" >> >> which works (returning the single expected document), as do: >> >> url:"IAE*UPC*0001" >> url:"IAE?UPC?0001" >> >> when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as >> a delimiter). >> >> However: >> >> url:"IAE_UPC_0001" >> url:"IAE*UPC*0001" >> url:"IAE?UPC?0001" >> >> do not work (returning zero documents) when the doc ref is in the format >> IAE_UPC_0001 (ie using the underscore character as the delimiter). >> >> I'm assuming the underscore is a special character but have tried looking >> at the solr wiki but can't find anything to say what the problem is. Also >> the minus sign also has a specific meaning but is nullified by adding the >> quotes. >> >> Can anyone suggest what I'm doing wrong? >> >> Many thanks >> >> Paul >> >> >