The problem with the wildcard searches is that the input is not analyzed. For english, this might not be such a problem (except if you expect case insenstive search). But than again, you don't get that with like, either. Ngrams bring that and more.
What I think is often forgotten when comparing 'like' and Solr search is: Solr's analyzer allow not only for case insenstive search but also for other analysis such as removing diacritics and this is also applied when sorting (you have to create a separate index in the DB, as well, if you want that). Say you have the following names: 'Van Hinden' 'van Hinden' 'Música' 'Musil' like 'mu%' - no hits like 'Mu%' - 1 hit like 'van%' - 1 hit like 'hin%' - no hits with Solr whitespace or standard tokenizer, ngrams and a diacritcs and lowercase filter (no wildcard search): 'mu'/'Mu' - 2 hits sorted ignoring case and diacritics 'van' - 2 hits 'hin' - 2 hits (This is written down from experience. I haven't checked those examples explicitly.) Cheers, Chantal On Fri, 2011-12-30 at 02:00 +0100, Chris Hostetter wrote: > : Thanks. I know I'll be able to utilize some of Solr's free text > : searching capabilities in other search types in this project. The > : product manager wants this particular search to exactly mimic LIKE%. > ... > : Ex: If I search "Albatross" I want "Albert" to be excluded completely, > : rather than having a low score. > > please be specific about the types of queries you want. ie: we need more > then one example of the type of input you want to provide, the type of > matches you want to see for that input, and the type of matches you want > to get back. > > in your first message you said you need to match company titles "pretty > exactly" but then seem to contradict yourself by saying the SQL's LIKE > command fit's the bill -- even though the SQL LIKE command exists > specificly for in-exact matches on field values. > > Based on your one example above of Albatross, you don't need anything > special: don't use ngrams, don't use stemming, don't use fuzzy anything -- > just search for "Albatross" and it will match "Albatross" but not > "Albert". if you want "Albatross" to match "Albatross Road" use some > basic tokenization. > > If all you really care about is prefix searching (which seems suggested by > your "LIKE%" comment above, which i'm guessing is shorthand for something > similar to "LIKE 'ABC%'"), so that queries like "abc" and "abcd" both > match "abcdef" and "abcdzzzz" but neither of them match "xxxxabcdyyyy" > then just use prefix queries (ie: "abcd*") -- they should be plenty > efficient for your purposes. you only need to worry about ngrams when you > want to efficiently match in the middle of a string. (ie: "TITLE LIKE > %ABC%") > > > -Hoss