Lowercasing actually seems to work with Wildcard queries, but not with fuzzy queries. Are there any reasons why I should experience such a difference?
Regards, Haagen Den 10. des. 2012 kl. 13:24 skrev Haagen Hasle: > > It's been two months since I asked about wildcards and phonetic filters, and > finally the task of upgrading Solr to version 4.0 was prioritized in our > project. So the last couple of days I've been working on it. Another team > member upgraded Solr from 3.4 to 4.0, and I've been making changes to > schema.xml to accommodate the new multiterm functionality. > > However, it doesn't seem to work.. Lowercasing is still not done when I do a > fuzzy search, not through the regular index analyzer and its support of > MultitermAwareComponents, and not when I try to define a special multiterm > analyzer. > > Do I have to do anything special to enable the multiterm functionality in > Solr 4.0? > > > Regards, > > Hågen > > Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson: > >> whether phonetic filters can be multiterm aware: >> >> I'd be leery of this, as I basically don't quite know how that would >> behave. You'd have to insure that the algorithms changed the >> first parts of the words uniformly, regardless of what followed. I'm >> pretty sure that _some_ phonetic algorithms do not follow this >> pattern, i.e. eric wouldn't necessarily have the same beginning >> as erickson. That said, some of the algorithms _may_ follow this >> rule and might be OK candidates for being MultiTermAware.... >> >> But, you don't need this in order to try it out. See the "Expert Level >> Schema Possibilities" >> at: >> http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ >> >> You can define your own analysis chain for wildcards as part of your >> <fieldType> >> definition and include whatever you want, whether or not it's >> MultiTermAware and it >> will be applied at query time. Use the <analyzer type="query"> entry >> as a basis. _But_ you shouldn't include anything in this section that >> produces more than one output per input token. Note, "token", not >> "field". I.e. a really bad candidate for this section is >> WordDelimiterFilterFactory >> if you use the admin/analysis page (which you'll get to know intimately) and >> look at a type that has WordDelimiterFilterFactory in its chain and >> put something >> like erickErickson1234, you'll see what I mean.. Make sure and check the >> "verbose" box.... >> >> If you can determine that some of the phonetic algorithms _should_ be >> MultiTermAware, please feel free to raise a JIRA and we can discuss... I >> suspect >> it'll be on a case-by-case basis. >> >> Best >> Erick >> >> On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle >> <haagenha...@gmail.com> wrote: >>> Hi! >>> >>> I'm quite new to Solr, I was recently asked to help out on a project where >>> the previous "Solr-person" quit quite suddenly. I've noticed that some of >>> our searches don't return the expected result, and I'm hoping you guys can >>> help me out. >>> >>> We've indexed a lot of names, and would like to search for a person in our >>> system using these names. We previously used Oracle Text for this, and we >>> experience that Solr is much faster. So far so good! :) But when we try >>> to use wildcards things start to to wrong. >>> >>> We're using Solr 3.4, and I see that some of our problems are solved in >>> 3.6. Ref SOLR-2438: >>> https://issues.apache.org/jira/browse/SOLR-2438 >>> >>> But we would also like to be able to combine wildcards with fuzzy searches, >>> and wildcards with a phonetic filter. I don't see anything about phonetic >>> filters in SOLR-2438 or SOLR-2921. >>> (https://issues.apache.org/jira/browse/SOLR-2921) >>> Is it possible to make the phonetic filters MultiTermAware? >>> >>> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in >>> Solr..) and find both christian and kristian. As far as I understand, this >>> is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. >>> Is this correct, or have I misunderstood anything? Are there any >>> workarounds or filter-combinations I can use to achieve the same result? >>> I've seen people suggest using a boolean query to combine the two, but I >>> don't really see how that would solve my "chr*"-problem. >>> >>> As I mentioned earlier I'm quite new to this, so I apologize if what I'm >>> asking about only shows my ignorance.. >>> >>> >>> Regards, Hågen >