Nope, that should do it (although I haven't tried that exact set of steps). But you do have to reindex from scratch....
Best Erick On Fri, Jul 8, 2011 at 1:36 PM, Christopher Cato <christopher.c...@minimedia.se> wrote: > Thanks for that pointer, that's really more what I want to do. And actually, > EdgeNGrams is stuck somewhere in the back of my head :) Yes, simple at first > thought but not as easy to implement as I have discovered. > > Well, so how do I implement something like this? I took the fieldtype > declaration from that blog post, added it to my schema.xml within the > fieldtypes part. > > So, if I get it all correctly, all I have to do now is to add a new field > with newly added fieldtype, a copyfield from the original title field, change > the query to use the new field and restart / reindex. Or am I missing > something? > > //Christopher > > > 8 jul 2011 kl. 18.59 skrev Erick Erickson: > >> Yeah, the analysis page takes a bit of getting used to, but it's well >> worth the time. Be sure to check the "verbose" box. Taking some time >> to understand what it's telling you is one of the best investments >> you'll make. >> >> Your "parts of words" is the issue. One approach is to use ngrams or >> edgengrams. Here's a writeup about edgengrams from Lucid: >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> >> it's written for autosuggest, but you get the idea. If "partial" words >> could be not at the start then ngrams are a possibility.... >> >> Your problem is one of those >> conceptually-simple-but-annoyingly-difficult-to-implement >> ones that takes far longer to fully understand/implement than >> it seems like it should. >> >> Best >> Erick >> >> On Fri, Jul 8, 2011 at 12:44 PM, Christopher Cato >> <christopher.c...@minimedia.se> wrote: >>> Hi Briggs, thanks for being patient with me! >>> >>> Yeah, I saw I had a typo there in the OR clause. Fixed it but still no >>> perfect results. >>> I'm looking at the analysis.jsp page and can't really figure it out. >>> Feeling a bit overwhelmed by all the output. I also don't know how to check >>> if stemming is used for the title field. >>> >>> Theoretically, given the field type I'm using and also given that "super >>> technocrane 30" is the title of one of the docs - how would one write the >>> query so that it finds that doc if the user searches for "super techn" or >>> "super technocrane"? Right now it stops matching in the middle of the word >>> "technocrane" or rather after the "r". >>> >>> Darnit, I just want to return all docs that contain the search terms either >>> as whole words or parts of words. >>> Is it possible? >>> >>> Regards, >>> Christopher >>> >>> 8 jul 2011 kl. 16.57 skrev Briggs Thompson: >>> >>>> Hey Chris, >>>> Removing the ORs in each query might help narrow down the problem, but I >>>> suggest you run this through the query analyzer in order to see where it is >>>> dropping out. It is a great tool for troubleshooting issues like these. >>>> >>>> I see a few things here. >>>> >>>> - for leading wildcard queries, you should include the >>>> reverseWildcardFilterFactory. Check out the documentation here: >>>> >>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory >>>> - Your result might get dropped out because you are trying to do wildcard >>>> searches on a stemmed field. Wildcard searches on a stemmed field is >>>> counter-intuitive because if you index "computers", it may stem to >>>> "comput", >>>> in which wildcard query of "computer*" would not match. >>>> - If you want to support stemming and wildcard searches, I suggest >>>> creating a copy field with an un-stemmed field type definition. >>>> >>>> Don't forget if you modify your field type definition, you need to >>>> re-index. >>>> >>>> In response to your question about text_ws, this is just a different field >>>> type definition that essentially splits on whiteSpaces. You should use that >>>> if that is what the desired search logic is, but it probably isn't. Check >>>> out the documentation on each of the tokenizers and filter factories in >>>> your >>>> "text" field type and see what you need and what you don't to satisfy your >>>> use cases. >>>> >>>> Hope that helps, >>>> Briggs Thompson >>>> >>>> >>>> On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato < >>>> christopher.c...@minimedia.se> wrote: >>>> >>>>> Hi Briggs. Thanks for taking the time. I have the query nearly working >>>>> now, >>>>> currently this is how it looks when it matches on the title "Super >>>>> Technocrane 30" and others with similar names: >>>>> >>>>> INFO: [] webapp=/solr path=/select/ >>>>> params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)&qt=standard&fq=type:product+AND+language:sv} >>>>> hits=3 status=0 QTime=1 >>>>> >>>>> Adding another letter stops it matching: >>>>> >>>>> INFO: [] webapp=/solr path=/select/ >>>>> params={qf=title^40.0&hl.fl=title&wt=json&rows=10&fl=*,score&start=0&q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)&qt=standard&fq=type:product+AND+language:sv} >>>>> hits=0 status=0 QTime=0 >>>>> >>>>> The field type definitions are as follows: >>>>> >>>>> <field name="title" type="text" indexed="true" stored="true" >>>>> termVectors="true" omitNorms="true"/> >>>>> >>>>> <fieldType name="text" class="solr.TextField" >>>>> positionIncrementGap="100"> >>>>> <analyzer type="index"> >>>>> <charFilter class="solr.MappingCharFilterFactory" >>>>> mapping="mapping-ISOLatin1Accent.txt"/> >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>> <!-- in this example, we will only use synonyms at query time >>>>> <filter class="solr.SynonymFilterFactory" >>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> >>>>> --> >>>>> <!-- Case insensitive stop word removal. >>>>> add enablePositionIncrements=true in both the index and query >>>>> analyzers to leave a 'gap' for more accurate phrase queries. >>>>> --> >>>>> <filter class="solr.StopFilterFactory" >>>>> ignoreCase="true" >>>>> words="stopwords.txt" >>>>> enablePositionIncrements="true" >>>>> /> >>>>> <filter class="solr.WordDelimiterFilterFactory" >>>>> generateWordParts="1" >>>>> generateNumberParts="1" >>>>> catenateWords="1" >>>>> catenateNumbers="1" >>>>> catenateAll="0" >>>>> splitOnCaseChange="1" >>>>> preserveOriginal="1"/> >>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>> <filter class="solr.SnowballPorterFilterFactory" language="English" >>>>> protected="protwords.txt"/> >>>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>>> </analyzer> >>>>> <analyzer type="query"> >>>>> <charFilter class="solr.MappingCharFilterFactory" >>>>> mapping="mapping-ISOLatin1Accent.txt"/> >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >>>>> ignoreCase="true" expand="true"/> >>>>> <filter class="solr.StopFilterFactory" >>>>> ignoreCase="true" >>>>> words="stopwords.txt" >>>>> enablePositionIncrements="true" >>>>> /> >>>>> <filter class="solr.WordDelimiterFilterFactory" >>>>> generateWordParts="1" >>>>> generateNumberParts="1" >>>>> catenateWords="0" >>>>> catenateNumbers="0" >>>>> catenateAll="0" >>>>> splitOnCaseChange="1" >>>>> preserveOriginal="1"/> >>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>> <filter class="solr.SnowballPorterFilterFactory" language="English" >>>>> protected="protwords.txt"/> >>>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>>> </analyzer> >>>>> </fieldType> >>>>> >>>>> >>>>> There is also a type definition that is called text_ws, should I use that >>>>> instead and change text to text_ws in the field definition for title? >>>>> >>>>> <!-- A text field that only splits on whitespace for exact matching of >>>>> words --> >>>>> <fieldType name="text_ws" class="solr.TextField" >>>>> positionIncrementGap="100"> >>>>> <analyzer> >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>> </analyzer> >>>>> </fieldType> >>>>> >>>>> >>>>> >>>>> >>>>> Mvh >>>>> >>>>> Christopher Cato >>>>> Teknikchef >>>>> ----------------------------------- >>>>> MiniMedia >>>>> Phone: +46761927603 >>>>> www.minimedia.se >>>>> >>>>> 7 jul 2011 kl. 23.16 skrev Briggs Thompson: >>>>> >>>>>> Hello Christopher, >>>>>> >>>>>> Can you provide the exact query sent to Solr for the one word query and >>>>> also >>>>>> the two word query? The field type definition for your title field would >>>>> be >>>>>> useful too. >>>>>> >>>>>> From what I understand, Solr should be able to handle your use case. I am >>>>>> guessing it is a problem with how the field is defined assuming the query >>>>> is >>>>>> correct. >>>>>> >>>>>> Briggs Thompson >>>>>> >>>>>> On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato < >>>>>> christopher.c...@minimedia.se> wrote: >>>>>> >>>>>>> Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal. >>>>>>> >>>>>>> I'm having some problems writing a query that matches a specific field >>>>> on >>>>>>> several words. I have implemented an AJAX search that basically takes >>>>>>> whatever is in a form field and attempts to match documents. I'm not >>>>> having >>>>>>> much luck though. First word always matches correctly but as soon as I >>>>> enter >>>>>>> the second word I'm loosing matches, the third word doesn't give any >>>>> matches >>>>>>> at all. >>>>>>> >>>>>>> The title field that I'm searching contains a product name that may or >>>>> may >>>>>>> not have several words. >>>>>>> >>>>>>> The requirement is that the search should be progressive i.e. as the >>>>> user >>>>>>> inputs words I should always return results that contain all of the >>>>> words >>>>>>> entered. I also have to correct bad input like an erraneous space in the >>>>>>> product name ex. "product name" instead of "productname". >>>>>>> >>>>>>> I'm wondering if there isn't an easier way to query Solr? Ideally I'd >>>>> want >>>>>>> to say "give me all docs that have the following text in it's titles" Is >>>>>>> that possible? >>>>>>> >>>>>>> >>>>>>> I'd really appreciate any help! >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Christopher Cato >>>>> >>>>> >>> >>> > >