Hello Elisabeth, my synonyms.txt is like your 2nd example:
naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd, foresta\ naturale, natuurbos, natural\ forest, bosque\ natural, természetes\ erdő, natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural, naturskov, forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală, las\ naturalny, natürlicher\ wald An example from my system with debugging turned on and searching for "naturwald": <lst name="debug"> <str name="rawquerystring">naturwald</str> <str name="querystring">naturwald</str> <str name="parsedquery">textth:naturwald textth:"φυσικό δάσος" textth:"естествена гора" textth:"prírodný les" textth:"naravni gozd" textth:"foresta naturale" textth:natuurbos textth:"natural forest" textth:"bosque natural" textth:"természetes erdő" textth:"natūralus miškas" textth:"prirodna šuma" textth:"dabiskais mežs" textth:"floresta natural" textth:naturskov textth:"forêt naturelle" textth:naturskog textth:"přírodní les" textth:luonnonmetsä textth:"pădure naturală" textth:"las naturalny" textth:"natürlicher wald"</str> ... As you can see my search for "naturwald" extends to single and multiword synonyms e.g. "forêt naturelle" My SynonymFilterFactory has the following settings: org.apache.solr.analysis.SynonymFilterFactory {tokenizerFactory=solr.KeywordTokenizerFactory, synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr, ignoreCase=true, luceneMatchVersion=LUCENE_36} But as I already mentioned, there is much more work to be done to get it running than just using SynonymFilterFactory. Regards Bernd Am 23.05.2012 08:49, schrieb elisabeth benoit: > Hello Bernd, > > Thanks for your advice. > > I have one question: how did you manage to map one word to a multiwords > synonym??? > > I've tried (in synonyms.txt) > > mairie, hotel de ville > > mairie, hotel\ de\ ville > > mairie => mairie, hotel de ville > > mairie => mairie, hotel\ de\ ville > > but nothing prevents mairie from matching with "hotel"... > > The only way I found is to use > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms declaration > in schema.xml, but then since "mairie" is not alone in my index field, it > doesn't match. > > > best regards, > Elisabeth > > > > > the only way I found, I schema.xml, is to use > > > > 2012/5/15 Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > >> Without reading the whole thread let me say that you should not trust >> the solr admin analysis. It takes the whole multiword search and runs >> it all together at once through each analyzer step (factory). >> But this is not how the real system works. First pitfall, the query parser >> is also splitting at white space (if not a phrase query). Due to this, >> a multiword query is send chunk after chunk through the analyzer and, >> second pitfall, each chunk runs through the whole analyzer by its own. >> >> So if you are dealing with multiword synonyms you have the following >> problems. Either you turn your query into a phrase so that the whole >> phrase is analyzed at once and therefore looked up as multiword synonym >> but phrase queries are not analyzed !!! OR you send your query chunk >> by chunk through the analyzer but then they are not multiwords anymore >> and are not found in your synonyms.txt. >> >> From my experience I can say that it requires some deep work to get it done >> but it is possible. I have connected a thesaurus to solr which is doing >> query time expansion (no need to reindex if the thesaurus changes). >> The thesaurus holds synonyms and "used for terms" in 24 languages. So >> it is also some kind of language translation. And naturally the thesaurus >> translates from single term to multi term synonyms and vice versa. >> >> Regards, >> Bernd >> >> >> Am 14.05.2012 13:54, schrieb elisabeth benoit: >>> Just for the record, I'd like to conclude this thread >>> >>> First, you were right, there was no behaviour difference between fq and q >>> parameters. >>> >>> I realized that: >>> >>> 1) my synonym (hotel de ville) has a stopword in it (de) and since I used >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms >> declaration, >>> there was no stopword removal in the indewed expression, so when >> requesting >>> "hotel de ville", after stopwords removal in query, Solr was comparing >>> "hotel de ville" >>> with "hotel ville" >>> >>> but my queries never even got to that point since >>> >>> 2) I made a mistake using "mairie" alone in the admin interface when >>> testing my schema. The real field was something like "collectivités >>> territoriales mairie", >>> so the synonym "hotel de ville" was not even applied, because of the >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonym definition >>> not splitting field into words when parsing >>> >>> So my problem is not solved, and I'm considering solving it outside of >> Solr >>> scope, unless someone else has a clue >>> >>> Thanks again, >>> Elisabeth >>> >>> >>> >>> 2012/4/25 Erick Erickson <erickerick...@gmail.com> >>> >>>> A little farther down the debug info output you'll find something >>>> like this (I specified fq=name:features) >>>> >>>> <arr name="parsed_filter_queries"> >>>> <str>name:features</str> >>>> </arr> >>>> >>>> >>>> so it may well give you some clue. But unless I'm reading things wrong, >>>> your >>>> q is going against a field that has much more information than the >>>> CATEGORY_ANALYZED field, is it possible that the data from your >>>> test cases simply isn't _in_ CATEGORY_ANALYZED? >>>> >>>> Best >>>> Erick >>>> >>>> On Wed, Apr 25, 2012 at 9:39 AM, elisabeth benoit >>>> <elisaelisael...@gmail.com> wrote: >>>>> I'm not at the office until next Wednesday, and I don't have my Solr >>>> under >>>>> hand, but isn't debugQuery=on giving informations only about q >> parameter >>>>> matching and nothing about fq parameter? Or do you mean >>>>> "parsed_filter_querie"s gives information about fq? >>>>> >>>>> CATEGORY_ANALYZED is being populated by a copyField instruction in >>>>> schema.xml, and has the same field type as my catchall field, the >> search >>>>> field for my searchHandler (the one being used by q parameter). >>>>> >>>>> CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is text) >>>>> >>>>> CATEGORY (a string) is copied in catchall field (field type is text), >>>> and a >>>>> lot of other fields are copied too in that catchall field. >>>>> >>>>> So as far as I can see, the same analysis should be done in both cases, >>>> but >>>>> obviously I'm missing something, and the only thing I can think of is a >>>>> different behavior between q and fq parameter. >>>>> >>>>> I'll check that parsed_filter_querie first thing in the morning next >>>>> Wednesday. >>>>> >>>>> Thanks a lot for your help. >>>>> >>>>> Elisabeth >>>>> >>>>> >>>>> 2012/4/24 Erick Erickson <erickerick...@gmail.com> >>>>> >>>>>> Elisabeth: >>>>>> >>>>>> What shows up in the debug section of the response when you add >>>>>> &debugQuery=on? There should be some bit of that section like: >>>>>> "parsed_filter_queries" >>>>>> >>>>>> My other question is "are you absolutely sure that your >>>>>> CATEGORY_ANALYZED field has the correct content?". How does it >>>>>> get populated? >>>>>> >>>>>> Nothing jumps out at me here.... >>>>>> >>>>>> Best >>>>>> Erick >>>>>> >>>>>> On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit >>>>>> <elisaelisael...@gmail.com> wrote: >>>>>>> yes, thanks, but this is NOT my question. >>>>>>> >>>>>>> I was wondering why I have multiple matches with q="hotel de ville" >>>> and >>>>>> no >>>>>>> match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case >>>> I'm >>>>>>> searching in the same solr fieldType. >>>>>>> >>>>>>> Why is q parameter behaving differently in that case? Why do the >>>> quotes >>>>>>> work in one case and not in the other? >>>>>>> >>>>>>> Does anyone know? >>>>>>> >>>>>>> Thanks, >>>>>>> Elisabeth >>>>>>> >>>>>>> 2012/4/24 Jeevanandam <je...@myjeeva.com> >>>>>>> >>>>>>>> >>>>>>>> usage of q and fq >>>>>>>> >>>>>>>> q => is typically the main query for the search request >>>>>>>> >>>>>>>> fq => is Filter Query; generally used to restrict the super set of >>>>>>>> documents without influencing score (more info. >>>>>>>> http://wiki.apache.org/solr/**CommonQueryParameters#q< >>>>>> http://wiki.apache.org/solr/CommonQueryParameters#q> >>>>>>>> ) >>>>>>>> >>>>>>>> For example: >>>>>>>> ------------ >>>>>>>> q="hotel de ville" ===> returns 100 documents >>>>>>>> >>>>>>>> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" >>>> ===> >>>>>>>> returns 40 documents from super set of 100 documents >>>>>>>> >>>>>>>> >>>>>>>> hope this helps! >>>>>>>> >>>>>>>> - Jeevanandam >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 24-04-2012 3:08 pm, elisabeth benoit wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I'd like to resume this post. >>>>>>>>> >>>>>>>>> The only way I found to do not split synonyms in words in >>>> synonyms.txt >>>>>> it >>>>>>>>> to use the line >>>>>>>>> >>>>>>>>> <filter class="solr.**SynonymFilterFactory" >> synonyms="synonyms.txt" >>>>>>>>> ignoreCase="true" expand="true" >>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> >>>>>>>>> >>>>>>>>> in schema.xml >>>>>>>>> >>>>>>>>> where tokenizerFactory="solr.**KeywordTokenizerFactory" >>>>>>>>> >>>>>>>>> instructs SynonymFilterFactory not to break synonyms into words on >>>>>> white >>>>>>>>> spaces when parsing synonyms file. >>>>>>>>> >>>>>>>>> So now it works fine, "mairie" is mapped into "hotel de ville" and >>>>>> when I >>>>>>>>> send request q="hotel de ville" (quotes are mandatory to prevent >>>>>> analyzer >>>>>>>>> to split hotel de ville on white spaces), I get answers with word >>>>>>>>> "mairie". >>>>>>>>> >>>>>>>>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de >>>> ville"), it >>>>>>>>> doesn't work!!! >>>>>>>>> >>>>>>>>> CATEGORY_ANALYZED is same field type as default search field. This >>>>>> means >>>>>>>>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel >>>> de >>>>>>>>> ville", solr uses the same analyzer, the one with the line >>>>>>>>> >>>>>>>>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt" >>>>>>>>> ignoreCase="true" expand="true" >>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>. >>>>>>>>> >>>>>>>>> Anyone as a clue what is different between q analysis behaviour and >>>> fq >>>>>>>>> analysis behaviour? >>>>>>>>> >>>>>>>>> Thanks a lot >>>>>>>>> Elisabeth >>>>>>>>> >>>>>>>>> 2012/4/12 elisabeth benoit <elisaelisael...@gmail.com> >>>>>>>>> >>>>>>>>> oh, that's right. >>>>>>>>>> >>>>>>>>>> thanks a lot, >>>>>>>>>> Elisabeth >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2012/4/11 Jeevanandam Madanagopal <je...@myjeeva.com> >>>>>>>>>> >>>>>>>>>> Elisabeth - >>>>>>>>>>> >>>>>>>>>>> As you described, below mapping might suit for your need. >>>>>>>>>>> mairie => hotel de ville, mairie >>>>>>>>>>> >>>>>>>>>>> mairie gets expanded to "hotel de ville" and "mairie" at index >>>> time. >>>>>> So >>>>>>>>>>> "mairie" and "hotel de ville" searchable on document. >>>>>>>>>>> >>>>>>>>>>> However, still white space tokenizer splits at query time will be >>>> a >>>>>>>>>>> problem as described by Markus. >>>>>>>>>>> >>>>>>>>>>> --Jeevanandam >>>>>>>>>>> >>>>>>>>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: >>>>>>>>>>> >>>>>>>>>>>> <<Have you tried the "=>' mapping instead? Something >>>>>>>>>>>> <<like >>>>>>>>>>>> <<hotel de ville => mairie >>>>>>>>>>>> <<might work for you. >>>>>>>>>>>> >>>>>>>>>>>> Yes, thanks, I've tried it but from what I undestand it doesn't >>>>>> solve >>>>>>>>>>> my >>>>>>>>>>>> problem, since this means hotel de ville will be replace by >>>> mairie >>>>>> at >>>>>>>>>>>> index time (I use synonyms only at index time). So when user >>>> will >>>>>> ask >>>>>>>>>>>> "hôtel de ville", it won't match. >>>>>>>>>>>> >>>>>>>>>>>> In fact, at index time I have mairie in my data, but I want user >>>>>> to be >>>>>>>>>>> able >>>>>>>>>>>> to request "mairie" or "hôtel de ville" and have mairie as >>>> answer, >>>>>> and >>>>>>>>>>> not >>>>>>>>>>>> have mairie as an answer when requesting "hôtel". >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> <<To map `mairie` to `hotel de ville` as single token you must >>>>>> escape >>>>>>>>>>> your >>>>>>>>>>>> white >>>>>>>>>>>> <<space. >>>>>>>>>>>> >>>>>>>>>>>> <<mairie, hotel\ de\ ville >>>>>>>>>>>> >>>>>>>>>>>> <<This results in a problem if your tokenizer splits on white >>>>>> space >>>>>>>>>>> at >>>>>>>>>>>> query >>>>>>>>>>>> <<time. >>>>>>>>>>>> >>>>>>>>>>>> Ok, I guess this means I have a problem. No simple solution >>>> since >>>>>> at >>>>>>>>>>> query >>>>>>>>>>>> time my tokenizer do split on white spaces. >>>>>>>>>>>> >>>>>>>>>>>> I guess my problem is more or less one of the problems >>>> discussed in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://lucene.472066.n3.**nabble.com/Multi-word-** >>>>>>>>>>> synonyms-td3716292.html#**a3717215< >>>>>> >>>> >> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 >>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks a lot for your answers, >>>>>>>>>>>> Elisabeth >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2012/4/10 Erick Erickson <erickerick...@gmail.com> >>>>>>>>>>>> >>>>>>>>>>>>> Have you tried the "=>' mapping instead? Something >>>>>>>>>>>>> like >>>>>>>>>>>>> hotel de ville => mairie >>>>>>>>>>>>> might work for you. >>>>>>>>>>>>> >>>>>>>>>>>>> Best >>>>>>>>>>>>> Erick >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit >>>>>>>>>>>>> <elisaelisael...@gmail.com> wrote: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've read several post on this issue, but can't find a real >>>>>> solution >>>>>>>>>>> to >>>>>>>>>>>>> my >>>>>>>>>>>>>> multi-words synonyms matching problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have in my synonyms.txt an entry like >>>>>>>>>>>>>> >>>>>>>>>>>>>> mairie, hotel de ville >>>>>>>>>>>>>> >>>>>>>>>>>>>> and my index time analyzer is configured as followed for >>>>>> synonyms. >>>>>>>>>>>>>> >>>>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" >>>>>> synonyms="synonyms.txt" >>>>>>>>>>>>>> ignoreCase="true" expand="true"/> >>>>>>>>>>>>>> >>>>>>>>>>>>>> The problem I have is that now "mairie" matches with "hotel" >>>> and >>>>>> I >>>>>>>>>>> would >>>>>>>>>>>>>> only want "mairie" to match with "hotel de ville" and >>>> "mairie". >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I look into the analyzer, I see that "mairie" is mapped >>>> into >>>>>>>>>>>>> "hotel", >>>>>>>>>>>>>> and words "de ville" are added in second and third position. >>>> To >>>>>>>>>>> change >>>>>>>>>>>>>> that, I tried to do >>>>>>>>>>>>>> >>>>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" >>>>>> synonyms="synonyms.txt" >>>>>>>>>>>>>> ignoreCase="true" expand="true" >>>>>>>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> (as I >>>> read in >>>>>>>>>>> one >>>>>>>>>>> post) >>>>>>>>>>>>>> >>>>>>>>>>>>>> and I can see now in the analyzer that "mairie" is mapped to >>>>>> "hotel >>>>>>>>>>> de >>>>>>>>>>>>>> ville", but now when I have query "hotel de ville", it doesn't >>>>>> match >>>>>>>>>>> at >>>>>>>>>>>>> all >>>>>>>>>>>>>> with "mairie". >>>>>>>>>>>>>> >>>>>>>>>>>>>> Anyone has a clue of what I'm doing wrong? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm using Solr 3.4. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Elisabeth >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >>> >> >> -- >> ************************************************************* >> Bernd Fehling Universitätsbibliothek Bielefeld >> Dipl.-Inform. (FH) Universitätsstr. 25 >> Tel. +49 521 106-4060 Fax. +49 521 106-4052 >> bernd.fehl...@uni-bielefeld.de 33615 Bielefeld >> >> BASE - Bielefeld Academic Search Engine - www.base-search.net >> ************************************************************* >> > -- ************************************************************* Bernd Fehling Universitätsbibliothek Bielefeld Dipl.-Inform. (FH) Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de 33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *************************************************************