Hello Bernd, Thanks for your advice.
I have one question: how did you manage to map one word to a multiwords synonym??? I've tried (in synonyms.txt) mairie, hotel de ville mairie, hotel\ de\ ville mairie => mairie, hotel de ville mairie => mairie, hotel\ de\ ville but nothing prevents mairie from matching with "hotel"... The only way I found is to use tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms declaration in schema.xml, but then since "mairie" is not alone in my index field, it doesn't match. best regards, Elisabeth the only way I found, I schema.xml, is to use 2012/5/15 Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > Without reading the whole thread let me say that you should not trust > the solr admin analysis. It takes the whole multiword search and runs > it all together at once through each analyzer step (factory). > But this is not how the real system works. First pitfall, the query parser > is also splitting at white space (if not a phrase query). Due to this, > a multiword query is send chunk after chunk through the analyzer and, > second pitfall, each chunk runs through the whole analyzer by its own. > > So if you are dealing with multiword synonyms you have the following > problems. Either you turn your query into a phrase so that the whole > phrase is analyzed at once and therefore looked up as multiword synonym > but phrase queries are not analyzed !!! OR you send your query chunk > by chunk through the analyzer but then they are not multiwords anymore > and are not found in your synonyms.txt. > > From my experience I can say that it requires some deep work to get it done > but it is possible. I have connected a thesaurus to solr which is doing > query time expansion (no need to reindex if the thesaurus changes). > The thesaurus holds synonyms and "used for terms" in 24 languages. So > it is also some kind of language translation. And naturally the thesaurus > translates from single term to multi term synonyms and vice versa. > > Regards, > Bernd > > > Am 14.05.2012 13:54, schrieb elisabeth benoit: > > Just for the record, I'd like to conclude this thread > > > > First, you were right, there was no behaviour difference between fq and q > > parameters. > > > > I realized that: > > > > 1) my synonym (hotel de ville) has a stopword in it (de) and since I used > > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms > declaration, > > there was no stopword removal in the indewed expression, so when > requesting > > "hotel de ville", after stopwords removal in query, Solr was comparing > > "hotel de ville" > > with "hotel ville" > > > > but my queries never even got to that point since > > > > 2) I made a mistake using "mairie" alone in the admin interface when > > testing my schema. The real field was something like "collectivités > > territoriales mairie", > > so the synonym "hotel de ville" was not even applied, because of the > > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonym definition > > not splitting field into words when parsing > > > > So my problem is not solved, and I'm considering solving it outside of > Solr > > scope, unless someone else has a clue > > > > Thanks again, > > Elisabeth > > > > > > > > 2012/4/25 Erick Erickson <erickerick...@gmail.com> > > > >> A little farther down the debug info output you'll find something > >> like this (I specified fq=name:features) > >> > >> <arr name="parsed_filter_queries"> > >> <str>name:features</str> > >> </arr> > >> > >> > >> so it may well give you some clue. But unless I'm reading things wrong, > >> your > >> q is going against a field that has much more information than the > >> CATEGORY_ANALYZED field, is it possible that the data from your > >> test cases simply isn't _in_ CATEGORY_ANALYZED? > >> > >> Best > >> Erick > >> > >> On Wed, Apr 25, 2012 at 9:39 AM, elisabeth benoit > >> <elisaelisael...@gmail.com> wrote: > >>> I'm not at the office until next Wednesday, and I don't have my Solr > >> under > >>> hand, but isn't debugQuery=on giving informations only about q > parameter > >>> matching and nothing about fq parameter? Or do you mean > >>> "parsed_filter_querie"s gives information about fq? > >>> > >>> CATEGORY_ANALYZED is being populated by a copyField instruction in > >>> schema.xml, and has the same field type as my catchall field, the > search > >>> field for my searchHandler (the one being used by q parameter). > >>> > >>> CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is text) > >>> > >>> CATEGORY (a string) is copied in catchall field (field type is text), > >> and a > >>> lot of other fields are copied too in that catchall field. > >>> > >>> So as far as I can see, the same analysis should be done in both cases, > >> but > >>> obviously I'm missing something, and the only thing I can think of is a > >>> different behavior between q and fq parameter. > >>> > >>> I'll check that parsed_filter_querie first thing in the morning next > >>> Wednesday. > >>> > >>> Thanks a lot for your help. > >>> > >>> Elisabeth > >>> > >>> > >>> 2012/4/24 Erick Erickson <erickerick...@gmail.com> > >>> > >>>> Elisabeth: > >>>> > >>>> What shows up in the debug section of the response when you add > >>>> &debugQuery=on? There should be some bit of that section like: > >>>> "parsed_filter_queries" > >>>> > >>>> My other question is "are you absolutely sure that your > >>>> CATEGORY_ANALYZED field has the correct content?". How does it > >>>> get populated? > >>>> > >>>> Nothing jumps out at me here.... > >>>> > >>>> Best > >>>> Erick > >>>> > >>>> On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit > >>>> <elisaelisael...@gmail.com> wrote: > >>>>> yes, thanks, but this is NOT my question. > >>>>> > >>>>> I was wondering why I have multiple matches with q="hotel de ville" > >> and > >>>> no > >>>>> match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case > >> I'm > >>>>> searching in the same solr fieldType. > >>>>> > >>>>> Why is q parameter behaving differently in that case? Why do the > >> quotes > >>>>> work in one case and not in the other? > >>>>> > >>>>> Does anyone know? > >>>>> > >>>>> Thanks, > >>>>> Elisabeth > >>>>> > >>>>> 2012/4/24 Jeevanandam <je...@myjeeva.com> > >>>>> > >>>>>> > >>>>>> usage of q and fq > >>>>>> > >>>>>> q => is typically the main query for the search request > >>>>>> > >>>>>> fq => is Filter Query; generally used to restrict the super set of > >>>>>> documents without influencing score (more info. > >>>>>> http://wiki.apache.org/solr/**CommonQueryParameters#q< > >>>> http://wiki.apache.org/solr/CommonQueryParameters#q> > >>>>>> ) > >>>>>> > >>>>>> For example: > >>>>>> ------------ > >>>>>> q="hotel de ville" ===> returns 100 documents > >>>>>> > >>>>>> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" > >> ===> > >>>>>> returns 40 documents from super set of 100 documents > >>>>>> > >>>>>> > >>>>>> hope this helps! > >>>>>> > >>>>>> - Jeevanandam > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 24-04-2012 3:08 pm, elisabeth benoit wrote: > >>>>>> > >>>>>>> Hello, > >>>>>>> > >>>>>>> I'd like to resume this post. > >>>>>>> > >>>>>>> The only way I found to do not split synonyms in words in > >> synonyms.txt > >>>> it > >>>>>>> to use the line > >>>>>>> > >>>>>>> <filter class="solr.**SynonymFilterFactory" > synonyms="synonyms.txt" > >>>>>>> ignoreCase="true" expand="true" > >>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> > >>>>>>> > >>>>>>> in schema.xml > >>>>>>> > >>>>>>> where tokenizerFactory="solr.**KeywordTokenizerFactory" > >>>>>>> > >>>>>>> instructs SynonymFilterFactory not to break synonyms into words on > >>>> white > >>>>>>> spaces when parsing synonyms file. > >>>>>>> > >>>>>>> So now it works fine, "mairie" is mapped into "hotel de ville" and > >>>> when I > >>>>>>> send request q="hotel de ville" (quotes are mandatory to prevent > >>>> analyzer > >>>>>>> to split hotel de ville on white spaces), I get answers with word > >>>>>>> "mairie". > >>>>>>> > >>>>>>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de > >> ville"), it > >>>>>>> doesn't work!!! > >>>>>>> > >>>>>>> CATEGORY_ANALYZED is same field type as default search field. This > >>>> means > >>>>>>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel > >> de > >>>>>>> ville", solr uses the same analyzer, the one with the line > >>>>>>> > >>>>>>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt" > >>>>>>> ignoreCase="true" expand="true" > >>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>. > >>>>>>> > >>>>>>> Anyone as a clue what is different between q analysis behaviour and > >> fq > >>>>>>> analysis behaviour? > >>>>>>> > >>>>>>> Thanks a lot > >>>>>>> Elisabeth > >>>>>>> > >>>>>>> 2012/4/12 elisabeth benoit <elisaelisael...@gmail.com> > >>>>>>> > >>>>>>> oh, that's right. > >>>>>>>> > >>>>>>>> thanks a lot, > >>>>>>>> Elisabeth > >>>>>>>> > >>>>>>>> > >>>>>>>> 2012/4/11 Jeevanandam Madanagopal <je...@myjeeva.com> > >>>>>>>> > >>>>>>>> Elisabeth - > >>>>>>>>> > >>>>>>>>> As you described, below mapping might suit for your need. > >>>>>>>>> mairie => hotel de ville, mairie > >>>>>>>>> > >>>>>>>>> mairie gets expanded to "hotel de ville" and "mairie" at index > >> time. > >>>> So > >>>>>>>>> "mairie" and "hotel de ville" searchable on document. > >>>>>>>>> > >>>>>>>>> However, still white space tokenizer splits at query time will be > >> a > >>>>>>>>> problem as described by Markus. > >>>>>>>>> > >>>>>>>>> --Jeevanandam > >>>>>>>>> > >>>>>>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: > >>>>>>>>> > >>>>>>>>>> <<Have you tried the "=>' mapping instead? Something > >>>>>>>>>> <<like > >>>>>>>>>> <<hotel de ville => mairie > >>>>>>>>>> <<might work for you. > >>>>>>>>>> > >>>>>>>>>> Yes, thanks, I've tried it but from what I undestand it doesn't > >>>> solve > >>>>>>>>> my > >>>>>>>>>> problem, since this means hotel de ville will be replace by > >> mairie > >>>> at > >>>>>>>>>> index time (I use synonyms only at index time). So when user > >> will > >>>> ask > >>>>>>>>>> "hôtel de ville", it won't match. > >>>>>>>>>> > >>>>>>>>>> In fact, at index time I have mairie in my data, but I want user > >>>> to be > >>>>>>>>> able > >>>>>>>>>> to request "mairie" or "hôtel de ville" and have mairie as > >> answer, > >>>> and > >>>>>>>>> not > >>>>>>>>>> have mairie as an answer when requesting "hôtel". > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> <<To map `mairie` to `hotel de ville` as single token you must > >>>> escape > >>>>>>>>> your > >>>>>>>>>> white > >>>>>>>>>> <<space. > >>>>>>>>>> > >>>>>>>>>> <<mairie, hotel\ de\ ville > >>>>>>>>>> > >>>>>>>>>> <<This results in a problem if your tokenizer splits on white > >>>> space > >>>>>>>>> at > >>>>>>>>>> query > >>>>>>>>>> <<time. > >>>>>>>>>> > >>>>>>>>>> Ok, I guess this means I have a problem. No simple solution > >> since > >>>> at > >>>>>>>>> query > >>>>>>>>>> time my tokenizer do split on white spaces. > >>>>>>>>>> > >>>>>>>>>> I guess my problem is more or less one of the problems > >> discussed in > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> http://lucene.472066.n3.**nabble.com/Multi-word-** > >>>>>>>>> synonyms-td3716292.html#**a3717215< > >>>> > >> > http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 > >>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Thanks a lot for your answers, > >>>>>>>>>> Elisabeth > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2012/4/10 Erick Erickson <erickerick...@gmail.com> > >>>>>>>>>> > >>>>>>>>>>> Have you tried the "=>' mapping instead? Something > >>>>>>>>>>> like > >>>>>>>>>>> hotel de ville => mairie > >>>>>>>>>>> might work for you. > >>>>>>>>>>> > >>>>>>>>>>> Best > >>>>>>>>>>> Erick > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit > >>>>>>>>>>> <elisaelisael...@gmail.com> wrote: > >>>>>>>>>>>> Hello, > >>>>>>>>>>>> > >>>>>>>>>>>> I've read several post on this issue, but can't find a real > >>>> solution > >>>>>>>>> to > >>>>>>>>>>> my > >>>>>>>>>>>> multi-words synonyms matching problem. > >>>>>>>>>>>> > >>>>>>>>>>>> I have in my synonyms.txt an entry like > >>>>>>>>>>>> > >>>>>>>>>>>> mairie, hotel de ville > >>>>>>>>>>>> > >>>>>>>>>>>> and my index time analyzer is configured as followed for > >>>> synonyms. > >>>>>>>>>>>> > >>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" > >>>> synonyms="synonyms.txt" > >>>>>>>>>>>> ignoreCase="true" expand="true"/> > >>>>>>>>>>>> > >>>>>>>>>>>> The problem I have is that now "mairie" matches with "hotel" > >> and > >>>> I > >>>>>>>>> would > >>>>>>>>>>>> only want "mairie" to match with "hotel de ville" and > >> "mairie". > >>>>>>>>>>>> > >>>>>>>>>>>> When I look into the analyzer, I see that "mairie" is mapped > >> into > >>>>>>>>>>> "hotel", > >>>>>>>>>>>> and words "de ville" are added in second and third position. > >> To > >>>>>>>>> change > >>>>>>>>>>>> that, I tried to do > >>>>>>>>>>>> > >>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" > >>>> synonyms="synonyms.txt" > >>>>>>>>>>>> ignoreCase="true" expand="true" > >>>>>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> (as I > >> read in > >>>>>>>>> one > >>>>>>>>> post) > >>>>>>>>>>>> > >>>>>>>>>>>> and I can see now in the analyzer that "mairie" is mapped to > >>>> "hotel > >>>>>>>>> de > >>>>>>>>>>>> ville", but now when I have query "hotel de ville", it doesn't > >>>> match > >>>>>>>>> at > >>>>>>>>>>> all > >>>>>>>>>>>> with "mairie". > >>>>>>>>>>>> > >>>>>>>>>>>> Anyone has a clue of what I'm doing wrong? > >>>>>>>>>>>> > >>>>>>>>>>>> I'm using Solr 3.4. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Elisabeth > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >> > > > > -- > ************************************************************* > Bernd Fehling Universitätsbibliothek Bielefeld > Dipl.-Inform. (FH) Universitätsstr. 25 > Tel. +49 521 106-4060 Fax. +49 521 106-4052 > bernd.fehl...@uni-bielefeld.de 33615 Bielefeld > > BASE - Bielefeld Academic Search Engine - www.base-search.net > ************************************************************* >