I recently have had the same use case. I wound up doing this: in both index and query time, the synonyms file is 'expand=false'. All multi-word synonyms map to one single-word synonym (per group). This way, only the main word is indexed or queried.
If the synonym file changes, you have to re-index the matching content. On Tue, May 29, 2012 at 1:27 PM, elisabeth benoit <elisaelisael...@gmail.com> wrote: > Hello Bernd, > > Thanks a lot for your answer. I'll work on this. > > Best regards, > Elisabeth > > 2012/5/29 Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > >> Hello Elisabeth, >> >> my synonyms.txt is like your 2nd example: >> >> naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd, >> foresta\ naturale, natuurbos, natural\ forest, bosque\ natural, >> természetes\ erdő, >> natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural, >> naturskov, >> forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală, >> las\ naturalny, natürlicher\ wald >> >> >> An example from my system with debugging turned on and searching for >> "naturwald": >> >> <lst name="debug"> >> <str name="rawquerystring">naturwald</str> >> <str name="querystring">naturwald</str> >> <str name="parsedquery">textth:naturwald textth:"φυσικό δάσος" >> textth:"естествена гора" >> textth:"prírodný les" textth:"naravni gozd" textth:"foresta naturale" >> textth:natuurbos >> textth:"natural forest" textth:"bosque natural" textth:"természetes erdő" >> textth:"natūralus miškas" textth:"prirodna šuma" textth:"dabiskais mežs" >> textth:"floresta natural" textth:naturskov textth:"forêt naturelle" >> textth:naturskog >> textth:"přírodní les" textth:luonnonmetsä textth:"pădure naturală" >> textth:"las naturalny" >> textth:"natürlicher wald"</str> >> ... >> >> As you can see my search for "naturwald" extends to single and multiword >> synonyms e.g. "forêt naturelle" >> >> >> My SynonymFilterFactory has the following settings: >> >> org.apache.solr.analysis.SynonymFilterFactory >> {tokenizerFactory=solr.KeywordTokenizerFactory, >> synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr, >> ignoreCase=true, >> luceneMatchVersion=LUCENE_36} >> >> But as I already mentioned, there is much more work to be done to get it >> running than >> just using SynonymFilterFactory. >> >> Regards >> Bernd >> >> >> >> Am 23.05.2012 08:49, schrieb elisabeth benoit: >> > Hello Bernd, >> > >> > Thanks for your advice. >> > >> > I have one question: how did you manage to map one word to a multiwords >> > synonym??? >> > >> > I've tried (in synonyms.txt) >> > >> > mairie, hotel de ville >> > >> > mairie, hotel\ de\ ville >> > >> > mairie => mairie, hotel de ville >> > >> > mairie => mairie, hotel\ de\ ville >> > >> > but nothing prevents mairie from matching with "hotel"... >> > >> > The only way I found is to use >> > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms >> declaration >> > in schema.xml, but then since "mairie" is not alone in my index field, it >> > doesn't match. >> > >> > >> > best regards, >> > Elisabeth >> > >> > >> > >> > >> > the only way I found, I schema.xml, is to use >> > >> > >> > >> > 2012/5/15 Bernd Fehling <bernd.fehl...@uni-bielefeld.de> >> > >> >> Without reading the whole thread let me say that you should not trust >> >> the solr admin analysis. It takes the whole multiword search and runs >> >> it all together at once through each analyzer step (factory). >> >> But this is not how the real system works. First pitfall, the query >> parser >> >> is also splitting at white space (if not a phrase query). Due to this, >> >> a multiword query is send chunk after chunk through the analyzer and, >> >> second pitfall, each chunk runs through the whole analyzer by its own. >> >> >> >> So if you are dealing with multiword synonyms you have the following >> >> problems. Either you turn your query into a phrase so that the whole >> >> phrase is analyzed at once and therefore looked up as multiword synonym >> >> but phrase queries are not analyzed !!! OR you send your query chunk >> >> by chunk through the analyzer but then they are not multiwords anymore >> >> and are not found in your synonyms.txt. >> >> >> >> From my experience I can say that it requires some deep work to get it >> done >> >> but it is possible. I have connected a thesaurus to solr which is doing >> >> query time expansion (no need to reindex if the thesaurus changes). >> >> The thesaurus holds synonyms and "used for terms" in 24 languages. So >> >> it is also some kind of language translation. And naturally the >> thesaurus >> >> translates from single term to multi term synonyms and vice versa. >> >> >> >> Regards, >> >> Bernd >> >> >> >> >> >> Am 14.05.2012 13:54, schrieb elisabeth benoit: >> >>> Just for the record, I'd like to conclude this thread >> >>> >> >>> First, you were right, there was no behaviour difference between fq >> and q >> >>> parameters. >> >>> >> >>> I realized that: >> >>> >> >>> 1) my synonym (hotel de ville) has a stopword in it (de) and since I >> used >> >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms >> >> declaration, >> >>> there was no stopword removal in the indewed expression, so when >> >> requesting >> >>> "hotel de ville", after stopwords removal in query, Solr was comparing >> >>> "hotel de ville" >> >>> with "hotel ville" >> >>> >> >>> but my queries never even got to that point since >> >>> >> >>> 2) I made a mistake using "mairie" alone in the admin interface when >> >>> testing my schema. The real field was something like "collectivités >> >>> territoriales mairie", >> >>> so the synonym "hotel de ville" was not even applied, because of the >> >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonym >> definition >> >>> not splitting field into words when parsing >> >>> >> >>> So my problem is not solved, and I'm considering solving it outside of >> >> Solr >> >>> scope, unless someone else has a clue >> >>> >> >>> Thanks again, >> >>> Elisabeth >> >>> >> >>> >> >>> >> >>> 2012/4/25 Erick Erickson <erickerick...@gmail.com> >> >>> >> >>>> A little farther down the debug info output you'll find something >> >>>> like this (I specified fq=name:features) >> >>>> >> >>>> <arr name="parsed_filter_queries"> >> >>>> <str>name:features</str> >> >>>> </arr> >> >>>> >> >>>> >> >>>> so it may well give you some clue. But unless I'm reading things >> wrong, >> >>>> your >> >>>> q is going against a field that has much more information than the >> >>>> CATEGORY_ANALYZED field, is it possible that the data from your >> >>>> test cases simply isn't _in_ CATEGORY_ANALYZED? >> >>>> >> >>>> Best >> >>>> Erick >> >>>> >> >>>> On Wed, Apr 25, 2012 at 9:39 AM, elisabeth benoit >> >>>> <elisaelisael...@gmail.com> wrote: >> >>>>> I'm not at the office until next Wednesday, and I don't have my Solr >> >>>> under >> >>>>> hand, but isn't debugQuery=on giving informations only about q >> >> parameter >> >>>>> matching and nothing about fq parameter? Or do you mean >> >>>>> "parsed_filter_querie"s gives information about fq? >> >>>>> >> >>>>> CATEGORY_ANALYZED is being populated by a copyField instruction in >> >>>>> schema.xml, and has the same field type as my catchall field, the >> >> search >> >>>>> field for my searchHandler (the one being used by q parameter). >> >>>>> >> >>>>> CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is >> text) >> >>>>> >> >>>>> CATEGORY (a string) is copied in catchall field (field type is text), >> >>>> and a >> >>>>> lot of other fields are copied too in that catchall field. >> >>>>> >> >>>>> So as far as I can see, the same analysis should be done in both >> cases, >> >>>> but >> >>>>> obviously I'm missing something, and the only thing I can think of >> is a >> >>>>> different behavior between q and fq parameter. >> >>>>> >> >>>>> I'll check that parsed_filter_querie first thing in the morning next >> >>>>> Wednesday. >> >>>>> >> >>>>> Thanks a lot for your help. >> >>>>> >> >>>>> Elisabeth >> >>>>> >> >>>>> >> >>>>> 2012/4/24 Erick Erickson <erickerick...@gmail.com> >> >>>>> >> >>>>>> Elisabeth: >> >>>>>> >> >>>>>> What shows up in the debug section of the response when you add >> >>>>>> &debugQuery=on? There should be some bit of that section like: >> >>>>>> "parsed_filter_queries" >> >>>>>> >> >>>>>> My other question is "are you absolutely sure that your >> >>>>>> CATEGORY_ANALYZED field has the correct content?". How does it >> >>>>>> get populated? >> >>>>>> >> >>>>>> Nothing jumps out at me here.... >> >>>>>> >> >>>>>> Best >> >>>>>> Erick >> >>>>>> >> >>>>>> On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit >> >>>>>> <elisaelisael...@gmail.com> wrote: >> >>>>>>> yes, thanks, but this is NOT my question. >> >>>>>>> >> >>>>>>> I was wondering why I have multiple matches with q="hotel de ville" >> >>>> and >> >>>>>> no >> >>>>>>> match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both >> case >> >>>> I'm >> >>>>>>> searching in the same solr fieldType. >> >>>>>>> >> >>>>>>> Why is q parameter behaving differently in that case? Why do the >> >>>> quotes >> >>>>>>> work in one case and not in the other? >> >>>>>>> >> >>>>>>> Does anyone know? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> Elisabeth >> >>>>>>> >> >>>>>>> 2012/4/24 Jeevanandam <je...@myjeeva.com> >> >>>>>>> >> >>>>>>>> >> >>>>>>>> usage of q and fq >> >>>>>>>> >> >>>>>>>> q => is typically the main query for the search request >> >>>>>>>> >> >>>>>>>> fq => is Filter Query; generally used to restrict the super set of >> >>>>>>>> documents without influencing score (more info. >> >>>>>>>> http://wiki.apache.org/solr/**CommonQueryParameters#q< >> >>>>>> http://wiki.apache.org/solr/CommonQueryParameters#q> >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> For example: >> >>>>>>>> ------------ >> >>>>>>>> q="hotel de ville" ===> returns 100 documents >> >>>>>>>> >> >>>>>>>> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" >> >>>> ===> >> >>>>>>>> returns 40 documents from super set of 100 documents >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> hope this helps! >> >>>>>>>> >> >>>>>>>> - Jeevanandam >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On 24-04-2012 3:08 pm, elisabeth benoit wrote: >> >>>>>>>> >> >>>>>>>>> Hello, >> >>>>>>>>> >> >>>>>>>>> I'd like to resume this post. >> >>>>>>>>> >> >>>>>>>>> The only way I found to do not split synonyms in words in >> >>>> synonyms.txt >> >>>>>> it >> >>>>>>>>> to use the line >> >>>>>>>>> >> >>>>>>>>> <filter class="solr.**SynonymFilterFactory" >> >> synonyms="synonyms.txt" >> >>>>>>>>> ignoreCase="true" expand="true" >> >>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> >> >>>>>>>>> >> >>>>>>>>> in schema.xml >> >>>>>>>>> >> >>>>>>>>> where tokenizerFactory="solr.**KeywordTokenizerFactory" >> >>>>>>>>> >> >>>>>>>>> instructs SynonymFilterFactory not to break synonyms into words >> on >> >>>>>> white >> >>>>>>>>> spaces when parsing synonyms file. >> >>>>>>>>> >> >>>>>>>>> So now it works fine, "mairie" is mapped into "hotel de ville" >> and >> >>>>>> when I >> >>>>>>>>> send request q="hotel de ville" (quotes are mandatory to prevent >> >>>>>> analyzer >> >>>>>>>>> to split hotel de ville on white spaces), I get answers with word >> >>>>>>>>> "mairie". >> >>>>>>>>> >> >>>>>>>>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de >> >>>> ville"), it >> >>>>>>>>> doesn't work!!! >> >>>>>>>>> >> >>>>>>>>> CATEGORY_ANALYZED is same field type as default search field. >> This >> >>>>>> means >> >>>>>>>>> that when I send q="hotel de ville" and >> fq=CATEGORY_ANALYZED:"hotel >> >>>> de >> >>>>>>>>> ville", solr uses the same analyzer, the one with the line >> >>>>>>>>> >> >>>>>>>>> <filter class="solr.**SynonymFilterFactory" >> synonyms="synonyms.txt" >> >>>>>>>>> ignoreCase="true" expand="true" >> >>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>. >> >>>>>>>>> >> >>>>>>>>> Anyone as a clue what is different between q analysis behaviour >> and >> >>>> fq >> >>>>>>>>> analysis behaviour? >> >>>>>>>>> >> >>>>>>>>> Thanks a lot >> >>>>>>>>> Elisabeth >> >>>>>>>>> >> >>>>>>>>> 2012/4/12 elisabeth benoit <elisaelisael...@gmail.com> >> >>>>>>>>> >> >>>>>>>>> oh, that's right. >> >>>>>>>>>> >> >>>>>>>>>> thanks a lot, >> >>>>>>>>>> Elisabeth >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> 2012/4/11 Jeevanandam Madanagopal <je...@myjeeva.com> >> >>>>>>>>>> >> >>>>>>>>>> Elisabeth - >> >>>>>>>>>>> >> >>>>>>>>>>> As you described, below mapping might suit for your need. >> >>>>>>>>>>> mairie => hotel de ville, mairie >> >>>>>>>>>>> >> >>>>>>>>>>> mairie gets expanded to "hotel de ville" and "mairie" at index >> >>>> time. >> >>>>>> So >> >>>>>>>>>>> "mairie" and "hotel de ville" searchable on document. >> >>>>>>>>>>> >> >>>>>>>>>>> However, still white space tokenizer splits at query time will >> be >> >>>> a >> >>>>>>>>>>> problem as described by Markus. >> >>>>>>>>>>> >> >>>>>>>>>>> --Jeevanandam >> >>>>>>>>>>> >> >>>>>>>>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> <<Have you tried the "=>' mapping instead? Something >> >>>>>>>>>>>> <<like >> >>>>>>>>>>>> <<hotel de ville => mairie >> >>>>>>>>>>>> <<might work for you. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Yes, thanks, I've tried it but from what I undestand it >> doesn't >> >>>>>> solve >> >>>>>>>>>>> my >> >>>>>>>>>>>> problem, since this means hotel de ville will be replace by >> >>>> mairie >> >>>>>> at >> >>>>>>>>>>>> index time (I use synonyms only at index time). So when user >> >>>> will >> >>>>>> ask >> >>>>>>>>>>>> "hôtel de ville", it won't match. >> >>>>>>>>>>>> >> >>>>>>>>>>>> In fact, at index time I have mairie in my data, but I want >> user >> >>>>>> to be >> >>>>>>>>>>> able >> >>>>>>>>>>>> to request "mairie" or "hôtel de ville" and have mairie as >> >>>> answer, >> >>>>>> and >> >>>>>>>>>>> not >> >>>>>>>>>>>> have mairie as an answer when requesting "hôtel". >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> <<To map `mairie` to `hotel de ville` as single token you must >> >>>>>> escape >> >>>>>>>>>>> your >> >>>>>>>>>>>> white >> >>>>>>>>>>>> <<space. >> >>>>>>>>>>>> >> >>>>>>>>>>>> <<mairie, hotel\ de\ ville >> >>>>>>>>>>>> >> >>>>>>>>>>>> <<This results in a problem if your tokenizer splits on white >> >>>>>> space >> >>>>>>>>>>> at >> >>>>>>>>>>>> query >> >>>>>>>>>>>> <<time. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Ok, I guess this means I have a problem. No simple solution >> >>>> since >> >>>>>> at >> >>>>>>>>>>> query >> >>>>>>>>>>>> time my tokenizer do split on white spaces. >> >>>>>>>>>>>> >> >>>>>>>>>>>> I guess my problem is more or less one of the problems >> >>>> discussed in >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> http://lucene.472066.n3.**nabble.com/Multi-word-** >> >>>>>>>>>>> synonyms-td3716292.html#**a3717215< >> >>>>>> >> >>>> >> >> >> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 >> >>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> Thanks a lot for your answers, >> >>>>>>>>>>>> Elisabeth >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> 2012/4/10 Erick Erickson <erickerick...@gmail.com> >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Have you tried the "=>' mapping instead? Something >> >>>>>>>>>>>>> like >> >>>>>>>>>>>>> hotel de ville => mairie >> >>>>>>>>>>>>> might work for you. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Best >> >>>>>>>>>>>>> Erick >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit >> >>>>>>>>>>>>> <elisaelisael...@gmail.com> wrote: >> >>>>>>>>>>>>>> Hello, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I've read several post on this issue, but can't find a real >> >>>>>> solution >> >>>>>>>>>>> to >> >>>>>>>>>>>>> my >> >>>>>>>>>>>>>> multi-words synonyms matching problem. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I have in my synonyms.txt an entry like >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> mairie, hotel de ville >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> and my index time analyzer is configured as followed for >> >>>>>> synonyms. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" >> >>>>>> synonyms="synonyms.txt" >> >>>>>>>>>>>>>> ignoreCase="true" expand="true"/> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> The problem I have is that now "mairie" matches with "hotel" >> >>>> and >> >>>>>> I >> >>>>>>>>>>> would >> >>>>>>>>>>>>>> only want "mairie" to match with "hotel de ville" and >> >>>> "mairie". >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> When I look into the analyzer, I see that "mairie" is mapped >> >>>> into >> >>>>>>>>>>>>> "hotel", >> >>>>>>>>>>>>>> and words "de ville" are added in second and third position. >> >>>> To >> >>>>>>>>>>> change >> >>>>>>>>>>>>>> that, I tried to do >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" >> >>>>>> synonyms="synonyms.txt" >> >>>>>>>>>>>>>> ignoreCase="true" expand="true" >> >>>>>>>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> (as I >> >>>> read in >> >>>>>>>>>>> one >> >>>>>>>>>>> post) >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> and I can see now in the analyzer that "mairie" is mapped to >> >>>>>> "hotel >> >>>>>>>>>>> de >> >>>>>>>>>>>>>> ville", but now when I have query "hotel de ville", it >> doesn't >> >>>>>> match >> >>>>>>>>>>> at >> >>>>>>>>>>>>> all >> >>>>>>>>>>>>>> with "mairie". >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Anyone has a clue of what I'm doing wrong? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I'm using Solr 3.4. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>> Elisabeth >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>> >> >>> >> >> >> >> -- >> >> ************************************************************* >> >> Bernd Fehling Universitätsbibliothek Bielefeld >> >> Dipl.-Inform. (FH) Universitätsstr. 25 >> >> Tel. +49 521 106-4060 Fax. +49 521 106-4052 >> >> bernd.fehl...@uni-bielefeld.de 33615 Bielefeld >> >> >> >> BASE - Bielefeld Academic Search Engine - www.base-search.net >> >> ************************************************************* >> >> >> > >> >> -- >> ************************************************************* >> Bernd Fehling Universitätsbibliothek Bielefeld >> Dipl.-Inform. (FH) Universitätsstr. 25 >> Tel. +49 521 106-4060 Fax. +49 521 106-4052 >> bernd.fehl...@uni-bielefeld.de 33615 Bielefeld >> >> BASE - Bielefeld Academic Search Engine - www.base-search.net >> ************************************************************* >> -- Lance Norskog goks...@gmail.com