And I have a lot more explanation and examples for word delimiter filter in
my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
-- Jack Krupansky
-----Original Message-----
From: Erick Erickson
Sent: Thursday, July 31, 2014 12:58 PM
To: solr-user@lucene.apache.org
Subject: Re: How to search for phrase "IAE_UPC_0001"
Take a look at WordDelimiterFilterFactory. It has a bunch of
options to allow this kind of thing to be indexed and searched.
Note that in the default schema, the definition in the index part
of the fieldType definition has slightly different parameters than
the query time WordDelimiterFilterFactory, that's a good place
to start.
WARNING: WDFF is a bit complex, you _really_ would be well
served by spending some time with the Admin/Analysis page to
understand the effects of these parameters...
Best,
Erick
On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers <paul.roge...@gmail.com> wrote:
Hi Guys
I have a Solr application searching on data uploaded by Nutch. The search
I wish to carry out is for a particular document reference contained
within
the "url" field, e.g. IAE-UPC-0001.
The problem is is that the file names that comprise the url's are not
consistent, so a url might contain the reference as IAE-UPC-0001 or
IAE_UPC_0001 (ie using either the minus or underscore as the delimiter)
but
not both.
I have created the query (in the solr admin interface):
url:"IAE-UPC-0001"
which works (returning the single expected document), as do:
url:"IAE*UPC*0001"
url:"IAE?UPC?0001"
when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as
a delimiter).
However:
url:"IAE_UPC_0001"
url:"IAE*UPC*0001"
url:"IAE?UPC?0001"
do not work (returning zero documents) when the doc ref is in the format
IAE_UPC_0001 (ie using the underscore character as the delimiter).
I'm assuming the underscore is a special character but have tried looking
at the solr wiki but can't find anything to say what the problem is. Also
the minus sign also has a specific meaning but is nullified by adding the
quotes.
Can anyone suggest what I'm doing wrong?
Many thanks
Paul