On Thu, Feb 11, 2010 at 8:39 AM, Ahmet Arslan <iori...@yahoo.com> wrote: >> I am using SOLR 1.3 and my server is >> embedded and accessed using SOLRJ. >> I would like to setup my searches so that exact matches are >> the first >> results returned, followed by near matches, and finally >> token based >> matches. >> For example, if I have a summary field in schema which is >> created >> using copyField from a bunch of other fields: >> "My item title, keyword, other, stuff" >> >> I want this search to match the item above first and >> foremost: >> 1) "My item title*" >> >> Then this one: >> 2) "my item*" > > Wildcards inside phrases are not supported by default. You can use SOLR-1604 > for that in solr 1.4.0. But i am not sure it will work with 1.3. Can you try?
I might be able to try this out though in general the project has a policy about only using released code (no trunk/unstable). https://issues.apache.org/jira/browse/SOLR-1604 It looks like the kind of searching I want to do is not really supported in SOLR by default though. Is that correct? >> I tried creating a field to hold exact match data >> (summaryExact) which >> actually works if I paste in the precise text but stops >> working as >> soon as I add any wildcard to it. > > Your <fieldType name="exact" definition is wrong. You can use directly > string field type which is not analyzed/tonenized. Where string definiton is: > > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > omitNorms="true"/> I thought that was what my exact definition was doing except I also want the exact field to be lowercased and trimmed (which I don't want for all strings). Can you explain what is wrong with the current definition so I can fix it? >> I could not quite figure out which tokenizer to use if I >> don't want >> any tokens created but just want to trim and lowercase the >> string so >> let me know if you have ideas on this. > > KeywordTokenizerFactory + TrimFilterFactory + LowercaseFilterFactory > combination can do that put punctuations won't be removed between tokens. > >> Basically, I want something >> similar to DB "like" matching without case sensitivity and >> probably >> trimmed as well. I don't really want the field to be >> tokenized though. > > Your examples seem you want to search something like startsWith? Can you > explain more in detail? What I really want is the equivalent of a match like this along with the normal tokenized matching (where the query has been lowercased and trimmed as well): select * from blah where lowercase(column) like '%query%'; I think this is called a phrase match or something like that. However, wildcards cannot be used at the beginning of query so I guess I can live with only being able to startsWith type matching until that is fixed. For now I have tried to do that using this: query = (summary:"my item" || summaryExact:"my item*"^3) but I would do this if I could: query = (summary:"my item" || summaryExact:"*my item*"^3) The idea is that a "phrase" match would be boosted over the normal token matches and would show up first in the listing. Let me know if more examples would help. I am happy to provide them. > Also your <fieldType name="name" class="solr.StrField" ..> declation is also > wrong. It should use class="solr.TextField". OK, I will see if I can figure out how to correct that. Thanks for all the help so far -AZ -- Aaron Zeckoski (azeckoski (at) vt.edu) Senior Research Engineer - CARET - University of Cambridge https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile