Ok, thanks a lot. After making a few tests, I finally understood what you meant.
Best regards, Elisabeth 2011/5/2 Jonathan Rochkind <rochk...@jhu.edu> > So if you have a field that IS tokenized, regardless of what it's called, > then when you send "My Great Restaurant" to it for _indexing_, it gets > _tokenized upon indexing_ to seperate tokens: "My", "Great", "Restaurant". > Depending on what other analysis you have, it may get further analyzed, > perhaps to: "my", "great", "restaurant". > > You don't need to seperate into tokens yourself before sending it to Solr > for indexing, if you define the field using a tokenizer, Solr will do that > when you index. Because this is a VERY common thing to do with Solr; pretty > much any field that you want to be effectively searchable you have Solr > tokenize like this. > > Because Solr pretty much always matches on individual tokens, that's the > fundamental way Solr works. > Those seperate tokens is what allows you to SEARCH on the field, and get a > match on "my" or on "restaurant". If the field were non-tokenized, you'd > ONLY get a hit if the user entered "My Great Restaurant" (and really not > even then unless you take other actions, because of the way Solr query > parsers work you'll have trouble getting ANY hits to a user-entered search > with the 'lucene' or 'dismax' query parsers if you don't tokenize). > > That tokenized filed won't facet very well though -- if you facetted on a > tokenized field with that example entered in it, you'll get a facet "my" > with that item in it, and another facet "great" with that item in it, and > another facet "restuarant" with that item in it. > > Which is why you likely want to use a seperate _untokenized_ field for > facetting. Which is why you end up wanting/needing two seperate fields -- > one that is tokenized for searching, and one that is not tokenized (and > usually not analyzed at all) for facetting. > > Hope this helps. > > > On 5/2/2011 2:43 AM, elisabeth benoit wrote: > >> I'm a bit confused here. >> >> What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just >> do >> a copyField from what field to another? And how can I search only for >> Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have >> something >> like >> <field name="CATEGORY_TOKENIZED">Hotel</field>, if I want this to work. >> And >> from what I understand, this means I should do more then just copy >> <field name="*CATEGORY*">Restaurant Hotel</field> >> to CATEGORY_TOKENIZED. >> >> Thanks, >> Elisabeth >> >> >> 2011/4/28 Erick Erickson<erickerick...@gmail.com> >> >> See below: >>> >>> >>> On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit >>> <elisaelisael...@gmail.com> wrote: >>> >>>> yes, the multivalued field is not broken up into tokens. >>>> >>>> so, if I understand well what you mean, I could have >>>> >>>> a field CATEGORY with multiValued="true" >>>> a field CATEGORY_TOKENIZED with multiValued=" true" >>>> >>>> and then some POI >>>> >>>> <field name="NAME">POI_Name</field> >>>> ... >>>> <field name="*CATEGORY*">Restaurant Hotel</field> >>>> <field name="CATEGORY_TOKENIZED">Restaurant</field> >>>> <field name="CATEGORY_TOKENIZED">Hotel</field> >>>> >>> [EOE] If the above is the document you're sending, then no. The >>> document would be indexed with >>> <field name="*CATEGORY*">Restaurant Hotel</field> >>> <field name="CATEGORY_TOKENIZED">Restaurant Hotel</field> >>> >>> >>> Or even just: >>> <field name="*CATEGORY*">Restaurant Hotel</field> >>> >>> and set up a<copyField> to copy the value from CATEGORY to >>> CATEGORY_TOKENIZED. >>> >>> The multiValued part comes from: >>> "And a single POIs might have different categories so your document could >>> have" >>> which would look like: >>> <field name="CATEGORY">Restaruant Hotel</field> >>> <field name="CATEGORY">Health Spa</field> >>> <field name="CATEGORY">Dance Hall</field> >>> >>> and your document would be counted for each of those entries while >>> searches >>> against CATEGORY_TOKENIZED would match things like "dance" "spa" etc. >>> >>> But do notice that if you did NOT want searching for "restaurant hall" >>> (no quotes), >>> to match then you could do proximity searches for less than your >>> increment gap. e.g. >>> (this time with the quotes) would be "restaurant hall"~50, which would >>> then >>> NOT match if your increment gap were 100. >>> >>> Best >>> Erick >>> >>> >>> do faceting on CATEGORY and fq on CATEGORY_TOKENIZED. >>>> >>>> But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED? >>>> >>>> Best regards >>>> Elisabeth >>>> >>>> >>>> 2011/4/28 Erick Erickson<erickerick...@gmail.com> >>>> >>>> So, I assume your CATEGORY field is multiValued but each value is not >>>>> broken up into tokens, right? If that's the case, would it work to have >>>>> >>>> a >>> >>>> second field CATEGORY_TOKENIZED and run your fq against that >>>>> field instead? >>>>> >>>>> You could have this be a multiValued field with an increment gap if you >>>>> wanted >>>>> to prevent matches across separate entries and have your fq do a >>>>> >>>> proximity >>> >>>> search where the proximity was less than the increment gap.... >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit >>>>> <elisaelisael...@gmail.com> wrote: >>>>> >>>>>> Hi Stefan, >>>>>> >>>>>> Thanks for answering. >>>>>> >>>>>> In more details, my problem is the following. I'm working on searching >>>>>> points of interest (POIs), which can be hotels, restaurants, plumbers, >>>>>> psychologists, etc. >>>>>> >>>>>> Those POIs can be identified among other things by categories or by >>>>>> >>>>> brand. >>>>> >>>>>> And a single POIs might have different categories (no maximum number). >>>>>> >>>>> User >>>>> >>>>>> might enter a query like >>>>>> >>>>>> >>>>>> McDonald’s Paris >>>>>> >>>>>> >>>>>> or >>>>>> >>>>>> >>>>>> Restaurant Paris >>>>>> >>>>>> >>>>>> or >>>>>> >>>>>> >>>>>> many other possible queries >>>>>> >>>>>> >>>>>> First I want to do a facet search on brand and categories, to find out >>>>>> >>>>> which >>>>> >>>>>> case is the current case. >>>>>> >>>>>> >>>>>> http://localhost:8080/solr /select?q=restaurant paris >>>>>> &facet=true&facet.field=BRAND& facet.field=CATEGORY >>>>>> >>>>>> and get an answer like >>>>>> >>>>>> <lst name="facet_fields"> >>>>>> >>>>>> <lst name="CATEGORY"> >>>>>> >>>>>> <int name="Restaurant">598</int> >>>>>> >>>>>> <int name="Restaurant Hotel">451</int> >>>>>> >>>>>> >>>>>> >>>>>> Then I want to send a request with fq= CATEGORY: Restaurant and still >>>>>> >>>>> get >>> >>>> answers with CATEGORY= Restaurant Hotel. >>>>>> >>>>>> >>>>>> >>>>>> One solution would be to modify the data to add a new document every >>>>>> >>>>> time >>> >>>> we >>>>> >>>>>> have a new category, so a POI with three different categories would be >>>>>> >>>>> index >>>>> >>>>>> three times, each time with a different category. >>>>>> >>>>>> >>>>>> But I was wondering if there was another way around. >>>>>> >>>>>> >>>>>> >>>>>> Thanks again, >>>>>> >>>>>> Elisabeth >>>>>> >>>>>> >>>>>> 2011/4/28 Stefan Matheis<matheis.ste...@googlemail.com> >>>>>> >>>>>> Hi Elisabeth, >>>>>>> >>>>>>> that's not what FilterQueries are made for :) What against using that >>>>>>> Criteria in the Query? >>>>>>> Perhaps you want to describe your UseCase and we'll see if there's >>>>>>> another way to solve it? >>>>>>> >>>>>>> Regards >>>>>>> Stefan >>>>>>> >>>>>>> On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit >>>>>>> <elisaelisael...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I would like to know if there is a way to use the fq parameter with >>>>>>>> >>>>>>> a >>> >>>> partial value. >>>>>>>> >>>>>>>> For instance, if I have a request with fq=NAME:Joe, and I would >>>>>>>> >>>>>>> like >>> >>>> to >>>>> >>>>>> retrieve all answers where NAME contains Joe, including those with >>>>>>>> >>>>>>> NAME = >>>>> >>>>>> Joe Smith. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Elisabeth >>>>>>>> >>>>>>>>