Re: Wildcards and fuzzy/phonetic query

Haagen Hasle Mon, 10 Dec 2012 05:16:55 -0800

Lowercasing actually seems to work with Wildcard queries, but not with fuzzy 
queries.  Are there any reasons why I should experience such a difference?



Regards, Haagen


Den 10. des. 2012 kl. 13:24 skrev Haagen Hasle:

> 
> It's been two months since I asked about wildcards and phonetic filters, and 
> finally the task of upgrading Solr to version 4.0 was prioritized in our 
> project.  So the last couple of days I've been working on it.  Another team 
> member upgraded Solr from 3.4 to 4.0, and I've been making changes to 
> schema.xml to accommodate the new multiterm functionality.
> 
> However, it doesn't seem to work..  Lowercasing is still not done when I do a 
> fuzzy search, not through the regular index analyzer and its support of 
> MultitermAwareComponents, and not when I try to define a special multiterm 
> analyzer.
> 
> Do I have to do anything special to enable the multiterm functionality in 
> Solr 4.0?
> 
> 
> Regards, 
> 
> Hågen
> 
> Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson:
> 
>> whether phonetic filters can be multiterm aware:
>> 
>> I'd be leery of this, as I basically don't quite know how that would
>> behave. You'd have to insure that the  algorithms changed the
>> first parts of the words uniformly, regardless of what followed. I'm
>> pretty sure that _some_ phonetic algorithms do not follow this
>> pattern, i.e. eric wouldn't necessarily have the same beginning
>> as erickson. That said, some of the algorithms _may_ follow this
>> rule and might be OK candidates for being MultiTermAware....
>> 
>> But, you don't need this in order to try it out. See the "Expert Level
>> Schema Possibilities"
>> at:
>> http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
>> 
>> You can define your own analysis chain for wildcards as part of your 
>> <fieldType>
>> definition and include whatever you want, whether or not it's
>> MultiTermAware and it
>> will be applied at query time. Use the <analyzer type="query"> entry
>> as a basis. _But_ you shouldn't include anything in this section that
>> produces more than one output per input token. Note, "token", not
>> "field". I.e. a really bad candidate for this section is
>> WordDelimiterFilterFactory
>> if you use the admin/analysis page (which you'll get to know intimately) and
>> look at a type that has WordDelimiterFilterFactory in its chain and
>> put something
>> like erickErickson1234, you'll see what I mean.. Make sure and check the
>> "verbose" box....
>> 
>> If you can determine that some of the phonetic algorithms _should_ be
>> MultiTermAware, please feel free to raise a JIRA and we can discuss... I 
>> suspect
>> it'll be on a case-by-case basis.
>> 
>> Best
>> Erick
>> 
>> On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
>> <haagenha...@gmail.com> wrote:
>>> Hi!
>>> 
>>> I'm quite new to Solr, I was recently asked to help out on a project where 
>>> the previous "Solr-person" quit quite suddenly.  I've noticed that some of 
>>> our searches don't return the expected result, and I'm hoping you guys can 
>>> help me out.
>>> 
>>> We've indexed a lot of names, and would like to search for a person in our 
>>> system using these names.  We previously used Oracle Text for this, and we 
>>> experience that Solr is much faster.  So far so good! :)  But when we try 
>>> to use wildcards things start to to wrong.
>>> 
>>> We're using Solr 3.4, and I see that some of our problems are solved in 
>>> 3.6.  Ref SOLR-2438:
>>> https://issues.apache.org/jira/browse/SOLR-2438
>>> 
>>> But we would also like to be able to combine wildcards with fuzzy searches, 
>>> and wildcards with a phonetic filter.  I don't see anything about phonetic 
>>> filters in SOLR-2438 or SOLR-2921.  
>>> (https://issues.apache.org/jira/browse/SOLR-2921)
>>> Is it possible to make the phonetic filters MultiTermAware?
>>> 
>>> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in 
>>> Solr..) and find both christian and kristian.  As far as I understand, this 
>>> is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  
>>> Is this correct, or have I misunderstood anything?  Are there any 
>>> workarounds or filter-combinations I can use to achieve the same result?  
>>> I've seen people suggest using a boolean query to combine the two, but I 
>>> don't really see how that would solve my "chr*"-problem.
>>> 
>>> As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
>>> asking about only shows my ignorance..
>>> 
>>> 
>>> Regards, Hågen
>

Re: Wildcards and fuzzy/phonetic query

Reply via email to