Hello Bernd,

Thanks for your advice.

I have one question: how did you manage to map one word to a multiwords
synonym???

I've tried (in synonyms.txt)

mairie, hotel de ville

mairie, hotel\ de\ ville

mairie => mairie, hotel de ville

mairie => mairie, hotel\ de\ ville

but nothing prevents mairie from matching with "hotel"...

The only way I found is to use
tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms declaration
in schema.xml, but then since "mairie" is not alone in my index field, it
doesn't match.


best regards,
Elisabeth




the only way I found, I schema.xml, is to use



2012/5/15 Bernd Fehling <bernd.fehl...@uni-bielefeld.de>

> Without reading the whole thread let me say that you should not trust
> the solr admin analysis. It takes the whole multiword search and runs
> it all together at once through each analyzer step (factory).
> But this is not how the real system works. First pitfall, the query parser
> is also splitting at white space (if not a phrase query). Due to this,
> a multiword query is send chunk after chunk through the analyzer and,
> second pitfall, each chunk runs through the whole analyzer by its own.
>
> So if you are dealing with multiword synonyms you have the following
> problems. Either you turn your query into a phrase so that the whole
> phrase is analyzed at once and therefore looked up as multiword synonym
> but phrase queries are not analyzed !!! OR you send your query chunk
> by chunk through the analyzer but then they are not multiwords anymore
> and are not found in your synonyms.txt.
>
> From my experience I can say that it requires some deep work to get it done
> but it is possible. I have connected a thesaurus to solr which is doing
> query time expansion (no need to reindex if the thesaurus changes).
> The thesaurus holds synonyms and "used for terms" in 24 languages. So
> it is also some kind of language translation. And naturally the thesaurus
> translates from single term to multi term synonyms and vice versa.
>
> Regards,
> Bernd
>
>
> Am 14.05.2012 13:54, schrieb elisabeth benoit:
> > Just for the record, I'd like to conclude this thread
> >
> > First, you were right, there was no behaviour difference between fq and q
> > parameters.
> >
> > I realized that:
> >
> > 1) my synonym (hotel de ville) has a stopword in it (de) and since I used
> > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms
> declaration,
> > there was no stopword removal in the indewed expression, so when
> requesting
> > "hotel de ville", after stopwords removal in query, Solr was comparing
> > "hotel de ville"
> > with "hotel ville"
> >
> > but my queries never even got to that point since
> >
> > 2) I made a mistake using "mairie" alone in the admin interface when
> > testing my schema. The real field was something like "collectivités
> > territoriales mairie",
> > so the synonym "hotel de ville" was not even applied, because of the
> > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonym definition
> > not splitting field into words when parsing
> >
> > So my problem is not solved, and I'm considering solving it outside of
> Solr
> > scope, unless someone else has a clue
> >
> > Thanks again,
> > Elisabeth
> >
> >
> >
> > 2012/4/25 Erick Erickson <erickerick...@gmail.com>
> >
> >> A little farther down the debug info output you'll find something
> >> like this (I specified fq=name:features)
> >>
> >> <arr name="parsed_filter_queries">
> >> <str>name:features</str>
> >> </arr>
> >>
> >>
> >> so it may well give you some clue. But unless I'm reading things wrong,
> >> your
> >> q is going against a field that has much more information than the
> >> CATEGORY_ANALYZED field, is it possible that the data from your
> >> test cases simply isn't _in_ CATEGORY_ANALYZED?
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Apr 25, 2012 at 9:39 AM, elisabeth benoit
> >> <elisaelisael...@gmail.com> wrote:
> >>> I'm not at the office until next Wednesday, and I don't have my Solr
> >> under
> >>> hand, but isn't debugQuery=on giving informations only about q
> parameter
> >>> matching and nothing about fq parameter? Or do you mean
> >>> "parsed_filter_querie"s gives information about fq?
> >>>
> >>> CATEGORY_ANALYZED is being populated by a copyField instruction in
> >>> schema.xml, and has the same field type as my catchall field, the
> search
> >>> field for my searchHandler (the one being used by q parameter).
> >>>
> >>> CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is text)
> >>>
> >>> CATEGORY (a string) is copied in catchall field (field type is text),
> >> and a
> >>> lot of other fields are copied too in that catchall field.
> >>>
> >>> So as far as I can see, the same analysis should be done in both cases,
> >> but
> >>> obviously I'm missing something, and the only thing I can think of is a
> >>> different behavior between q and fq parameter.
> >>>
> >>> I'll check that parsed_filter_querie first thing in the morning next
> >>> Wednesday.
> >>>
> >>> Thanks a lot for your help.
> >>>
> >>> Elisabeth
> >>>
> >>>
> >>> 2012/4/24 Erick Erickson <erickerick...@gmail.com>
> >>>
> >>>> Elisabeth:
> >>>>
> >>>> What shows up in the debug section of the response when you add
> >>>> &debugQuery=on? There should be some bit of that section like:
> >>>> "parsed_filter_queries"
> >>>>
> >>>> My other question is "are you absolutely sure that your
> >>>> CATEGORY_ANALYZED field has the correct content?". How does it
> >>>> get populated?
> >>>>
> >>>> Nothing jumps out at me here....
> >>>>
> >>>> Best
> >>>> Erick
> >>>>
> >>>> On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit
> >>>> <elisaelisael...@gmail.com> wrote:
> >>>>> yes, thanks, but this is NOT my question.
> >>>>>
> >>>>> I was wondering why I have multiple matches with q="hotel de ville"
> >> and
> >>>> no
> >>>>> match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case
> >> I'm
> >>>>> searching in the same solr fieldType.
> >>>>>
> >>>>> Why is q parameter behaving differently in that case? Why do the
> >> quotes
> >>>>> work in one case and not in the other?
> >>>>>
> >>>>> Does anyone know?
> >>>>>
> >>>>> Thanks,
> >>>>> Elisabeth
> >>>>>
> >>>>> 2012/4/24 Jeevanandam <je...@myjeeva.com>
> >>>>>
> >>>>>>
> >>>>>> usage of q and fq
> >>>>>>
> >>>>>> q => is typically the main query for the search request
> >>>>>>
> >>>>>> fq => is Filter Query; generally used to restrict the super set of
> >>>>>> documents without influencing score (more info.
> >>>>>> http://wiki.apache.org/solr/**CommonQueryParameters#q<
> >>>> http://wiki.apache.org/solr/CommonQueryParameters#q>
> >>>>>> )
> >>>>>>
> >>>>>> For example:
> >>>>>> ------------
> >>>>>> q="hotel de ville" ===> returns 100 documents
> >>>>>>
> >>>>>> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed"
> >> ===>
> >>>>>> returns 40 documents from super set of 100 documents
> >>>>>>
> >>>>>>
> >>>>>> hope this helps!
> >>>>>>
> >>>>>> - Jeevanandam
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 24-04-2012 3:08 pm, elisabeth benoit wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I'd like to resume this post.
> >>>>>>>
> >>>>>>> The only way I found to do not split synonyms in words in
> >> synonyms.txt
> >>>> it
> >>>>>>> to use the line
> >>>>>>>
> >>>>>>>  <filter class="solr.**SynonymFilterFactory"
> synonyms="synonyms.txt"
> >>>>>>> ignoreCase="true" expand="true"
> >>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
> >>>>>>>
> >>>>>>> in schema.xml
> >>>>>>>
> >>>>>>> where tokenizerFactory="solr.**KeywordTokenizerFactory"
> >>>>>>>
> >>>>>>> instructs SynonymFilterFactory not to break synonyms into words on
> >>>> white
> >>>>>>> spaces when parsing synonyms file.
> >>>>>>>
> >>>>>>> So now it works fine, "mairie" is mapped into "hotel de ville" and
> >>>> when I
> >>>>>>> send request q="hotel de ville" (quotes are mandatory to prevent
> >>>> analyzer
> >>>>>>> to split hotel de ville on white spaces), I get answers with word
> >>>>>>> "mairie".
> >>>>>>>
> >>>>>>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de
> >> ville"), it
> >>>>>>> doesn't work!!!
> >>>>>>>
> >>>>>>> CATEGORY_ANALYZED is same field type as default search field. This
> >>>> means
> >>>>>>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel
> >> de
> >>>>>>> ville", solr uses the same analyzer, the one with the line
> >>>>>>>
> >>>>>>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> >>>>>>> ignoreCase="true" expand="true"
> >>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>.
> >>>>>>>
> >>>>>>> Anyone as a clue what is different between q analysis behaviour and
> >> fq
> >>>>>>> analysis behaviour?
> >>>>>>>
> >>>>>>> Thanks a lot
> >>>>>>> Elisabeth
> >>>>>>>
> >>>>>>> 2012/4/12 elisabeth benoit <elisaelisael...@gmail.com>
> >>>>>>>
> >>>>>>>  oh, that's right.
> >>>>>>>>
> >>>>>>>> thanks a lot,
> >>>>>>>> Elisabeth
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2012/4/11 Jeevanandam Madanagopal <je...@myjeeva.com>
> >>>>>>>>
> >>>>>>>>  Elisabeth -
> >>>>>>>>>
> >>>>>>>>> As you described, below mapping might suit for your need.
> >>>>>>>>> mairie => hotel de ville, mairie
> >>>>>>>>>
> >>>>>>>>> mairie gets expanded to "hotel de ville" and "mairie" at index
> >> time.
> >>>>  So
> >>>>>>>>> "mairie" and "hotel de ville" searchable on document.
> >>>>>>>>>
> >>>>>>>>> However, still white space tokenizer splits at query time will be
> >> a
> >>>>>>>>> problem as described by Markus.
> >>>>>>>>>
> >>>>>>>>> --Jeevanandam
> >>>>>>>>>
> >>>>>>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
> >>>>>>>>>
> >>>>>>>>>> <<Have you tried the "=>' mapping instead? Something
> >>>>>>>>>> <<like
> >>>>>>>>>> <<hotel de ville => mairie
> >>>>>>>>>> <<might work for you.
> >>>>>>>>>>
> >>>>>>>>>> Yes, thanks, I've tried it but from what I undestand it doesn't
> >>>> solve
> >>>>>>>>> my
> >>>>>>>>>> problem, since this means hotel de ville will be replace by
> >> mairie
> >>>> at
> >>>>>>>>>> index time (I use synonyms only at index time). So when user
> >> will
> >>>> ask
> >>>>>>>>>> "hôtel de ville", it won't match.
> >>>>>>>>>>
> >>>>>>>>>> In fact, at index time I have mairie in my data, but I want user
> >>>> to be
> >>>>>>>>> able
> >>>>>>>>>> to request "mairie" or "hôtel de ville" and have mairie as
> >> answer,
> >>>> and
> >>>>>>>>> not
> >>>>>>>>>> have mairie as an answer when requesting "hôtel".
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> <<To map `mairie` to `hotel de ville` as single token you must
> >>>> escape
> >>>>>>>>> your
> >>>>>>>>>> white
> >>>>>>>>>> <<space.
> >>>>>>>>>>
> >>>>>>>>>> <<mairie, hotel\ de\ ville
> >>>>>>>>>>
> >>>>>>>>>> <<This results in  a problem if your tokenizer splits on white
> >>>> space
> >>>>>>>>> at
> >>>>>>>>>> query
> >>>>>>>>>> <<time.
> >>>>>>>>>>
> >>>>>>>>>> Ok, I guess this means I have a problem. No simple solution
> >> since
> >>>> at
> >>>>>>>>> query
> >>>>>>>>>> time my tokenizer do split on white spaces.
> >>>>>>>>>>
> >>>>>>>>>> I guess my problem is more or less one of the problems
> >> discussed in
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> http://lucene.472066.n3.**nabble.com/Multi-word-**
> >>>>>>>>> synonyms-td3716292.html#**a3717215<
> >>>>
> >>
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
> >>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks a lot for your answers,
> >>>>>>>>>> Elisabeth
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2012/4/10 Erick Erickson <erickerick...@gmail.com>
> >>>>>>>>>>
> >>>>>>>>>>> Have you tried the "=>' mapping instead? Something
> >>>>>>>>>>> like
> >>>>>>>>>>> hotel de ville => mairie
> >>>>>>>>>>> might work for you.
> >>>>>>>>>>>
> >>>>>>>>>>> Best
> >>>>>>>>>>> Erick
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
> >>>>>>>>>>> <elisaelisael...@gmail.com> wrote:
> >>>>>>>>>>>> Hello,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've read several post on this issue, but can't find a real
> >>>> solution
> >>>>>>>>> to
> >>>>>>>>>>> my
> >>>>>>>>>>>> multi-words synonyms matching problem.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have in my synonyms.txt an entry like
> >>>>>>>>>>>>
> >>>>>>>>>>>> mairie, hotel de ville
> >>>>>>>>>>>>
> >>>>>>>>>>>> and my index time analyzer is configured as followed for
> >>>> synonyms.
> >>>>>>>>>>>>
> >>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory"
> >>>> synonyms="synonyms.txt"
> >>>>>>>>>>>> ignoreCase="true" expand="true"/>
> >>>>>>>>>>>>
> >>>>>>>>>>>> The problem I have is that now "mairie" matches with "hotel"
> >> and
> >>>> I
> >>>>>>>>> would
> >>>>>>>>>>>> only want "mairie" to match with "hotel de ville" and
> >> "mairie".
> >>>>>>>>>>>>
> >>>>>>>>>>>> When I look into the analyzer, I see that "mairie" is mapped
> >> into
> >>>>>>>>>>> "hotel",
> >>>>>>>>>>>> and words "de ville" are added in second and third position.
> >> To
> >>>>>>>>> change
> >>>>>>>>>>>> that, I tried to do
> >>>>>>>>>>>>
> >>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory"
> >>>> synonyms="synonyms.txt"
> >>>>>>>>>>>> ignoreCase="true" expand="true"
> >>>>>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> (as I
> >> read in
> >>>>>>>>> one
> >>>>>>>>> post)
> >>>>>>>>>>>>
> >>>>>>>>>>>> and I can see now in the analyzer that "mairie" is mapped to
> >>>> "hotel
> >>>>>>>>> de
> >>>>>>>>>>>> ville", but now when I have query "hotel de ville", it doesn't
> >>>> match
> >>>>>>>>> at
> >>>>>>>>>>> all
> >>>>>>>>>>>> with "mairie".
> >>>>>>>>>>>>
> >>>>>>>>>>>> Anyone has a clue of what I'm doing wrong?
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm using Solr 3.4.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Elisabeth
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> >
>
> --
> *************************************************************
> Bernd Fehling                Universitätsbibliothek Bielefeld
> Dipl.-Inform. (FH)                        Universitätsstr. 25
> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
> bernd.fehl...@uni-bielefeld.de                33615 Bielefeld
>
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *************************************************************
>

Reply via email to