Ok, thanks a lot.

After making a few tests, I finally understood what you meant.

Best regards,
Elisabeth

2011/5/2 Jonathan Rochkind <rochk...@jhu.edu>

> So if you have a field that IS tokenized, regardless of what it's called,
> then when you send "My Great Restaurant" to it for _indexing_, it gets
> _tokenized upon indexing_ to seperate tokens:  "My", "Great", "Restaurant".
>  Depending on what other analysis you have, it may get further analyzed,
> perhaps to: "my", "great", "restaurant".
>
> You don't need to seperate into tokens yourself before sending it to Solr
> for indexing, if you define the field using a tokenizer, Solr will do that
> when you index.  Because this is a VERY common thing to do with Solr; pretty
> much any field that you want to be effectively searchable you have Solr
> tokenize like this.
>
> Because Solr pretty much always matches on individual tokens, that's the
> fundamental way Solr works.
> Those seperate tokens is what allows you to SEARCH on the field, and get a
> match on "my" or on "restaurant".   If the field were non-tokenized, you'd
> ONLY get a hit if the user entered "My Great Restaurant" (and really not
> even then unless you take other actions, because of the way Solr query
> parsers work you'll have trouble getting ANY hits to a user-entered search
> with the 'lucene' or 'dismax' query parsers if you don't tokenize).
>
> That tokenized filed won't facet very well though -- if you facetted on a
> tokenized field with that example entered in it, you'll get a facet "my"
> with that item in it, and another facet "great" with that item in it, and
> another facet "restuarant" with that item in it.
>
> Which is why you likely want to use a seperate _untokenized_ field for
> facetting. Which is why you end up wanting/needing two seperate fields --
> one that is tokenized for searching, and one that is not tokenized (and
> usually not analyzed at all) for facetting.
>
> Hope this helps.
>
>
> On 5/2/2011 2:43 AM, elisabeth benoit wrote:
>
>> I'm a bit confused here.
>>
>> What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just
>> do
>> a copyField from what field to another? And how can I search only for
>> Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have
>> something
>> like
>> <field name="CATEGORY_TOKENIZED">Hotel</field>, if I want this to work.
>> And
>> from what I understand, this means I should do more then just copy
>> <field name="*CATEGORY*">Restaurant Hotel</field>
>> to CATEGORY_TOKENIZED.
>>
>> Thanks,
>> Elisabeth
>>
>>
>> 2011/4/28 Erick Erickson<erickerick...@gmail.com>
>>
>>  See below:
>>>
>>>
>>> On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit
>>> <elisaelisael...@gmail.com>  wrote:
>>>
>>>> yes, the multivalued field is not broken up into tokens.
>>>>
>>>> so, if I understand well what you mean, I could have
>>>>
>>>> a field CATEGORY with  multiValued="true"
>>>> a field CATEGORY_TOKENIZED with  multiValued=" true"
>>>>
>>>> and then some POI
>>>>
>>>> <field name="NAME">POI_Name</field>
>>>> ...
>>>> <field name="*CATEGORY*">Restaurant Hotel</field>
>>>> <field name="CATEGORY_TOKENIZED">Restaurant</field>
>>>> <field name="CATEGORY_TOKENIZED">Hotel</field>
>>>>
>>> [EOE] If the above is the document you're sending, then no. The
>>> document would be indexed with
>>> <field name="*CATEGORY*">Restaurant Hotel</field>
>>> <field name="CATEGORY_TOKENIZED">Restaurant Hotel</field>
>>>
>>>
>>> Or even just:
>>> <field name="*CATEGORY*">Restaurant Hotel</field>
>>>
>>> and set up a<copyField>  to copy the value from CATEGORY to
>>> CATEGORY_TOKENIZED.
>>>
>>> The multiValued part comes from:
>>> "And a single POIs might have different categories so your document could
>>> have"
>>> which would look like:
>>> <field name="CATEGORY">Restaruant Hotel</field>
>>> <field name="CATEGORY">Health Spa</field>
>>> <field name="CATEGORY">Dance Hall</field>
>>>
>>> and your document would be counted for each of those entries while
>>> searches
>>> against CATEGORY_TOKENIZED would match things like "dance" "spa" etc.
>>>
>>> But do notice that if you did NOT want searching for "restaurant hall"
>>> (no quotes),
>>> to match then you could do proximity searches for less than your
>>> increment gap. e.g.
>>> (this time with the quotes) would be "restaurant hall"~50, which would
>>> then
>>> NOT match if your increment gap were 100.
>>>
>>> Best
>>> Erick
>>>
>>>
>>>  do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.
>>>>
>>>> But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?
>>>>
>>>> Best regards
>>>> Elisabeth
>>>>
>>>>
>>>> 2011/4/28 Erick Erickson<erickerick...@gmail.com>
>>>>
>>>>  So, I assume your CATEGORY field is multiValued but each value is not
>>>>> broken up into tokens, right? If that's the case, would it work to have
>>>>>
>>>> a
>>>
>>>> second field CATEGORY_TOKENIZED and run your fq against that
>>>>> field instead?
>>>>>
>>>>> You could have this be a multiValued field with an increment gap if you
>>>>> wanted
>>>>> to prevent matches across separate entries and have your fq do a
>>>>>
>>>> proximity
>>>
>>>> search where the proximity was less than the increment gap....
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
>>>>> <elisaelisael...@gmail.com>  wrote:
>>>>>
>>>>>> Hi Stefan,
>>>>>>
>>>>>> Thanks for answering.
>>>>>>
>>>>>> In more details, my problem is the following. I'm working on searching
>>>>>> points of interest (POIs), which can be hotels, restaurants, plumbers,
>>>>>> psychologists, etc.
>>>>>>
>>>>>> Those POIs can be identified among other things  by categories or by
>>>>>>
>>>>> brand.
>>>>>
>>>>>> And a single POIs might have different categories (no maximum number).
>>>>>>
>>>>> User
>>>>>
>>>>>> might enter a query like
>>>>>>
>>>>>>
>>>>>> McDonald’s Paris
>>>>>>
>>>>>>
>>>>>> or
>>>>>>
>>>>>>
>>>>>> Restaurant Paris
>>>>>>
>>>>>>
>>>>>> or
>>>>>>
>>>>>>
>>>>>> many other possible queries
>>>>>>
>>>>>>
>>>>>> First I want to do a facet search on brand and categories, to find out
>>>>>>
>>>>> which
>>>>>
>>>>>> case is the current case.
>>>>>>
>>>>>>
>>>>>> http://localhost:8080/solr /select?q=restaurant  paris
>>>>>> &facet=true&facet.field=BRAND&  facet.field=CATEGORY
>>>>>>
>>>>>> and get an answer like
>>>>>>
>>>>>> <lst name="facet_fields">
>>>>>>
>>>>>> <lst name="CATEGORY">
>>>>>>
>>>>>> <int name="Restaurant">598</int>
>>>>>>
>>>>>> <int name="Restaurant Hotel">451</int>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Then I want to send a request with fq= CATEGORY: Restaurant and still
>>>>>>
>>>>> get
>>>
>>>> answers with CATEGORY= Restaurant Hotel.
>>>>>>
>>>>>>
>>>>>>
>>>>>> One solution would be to modify the data to add a new document every
>>>>>>
>>>>> time
>>>
>>>> we
>>>>>
>>>>>> have a new category, so a POI with three different categories would be
>>>>>>
>>>>> index
>>>>>
>>>>>> three times, each time with a different category.
>>>>>>
>>>>>>
>>>>>> But I was wondering if there was another way around.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks again,
>>>>>>
>>>>>> Elisabeth
>>>>>>
>>>>>>
>>>>>> 2011/4/28 Stefan Matheis<matheis.ste...@googlemail.com>
>>>>>>
>>>>>>  Hi Elisabeth,
>>>>>>>
>>>>>>> that's not what FilterQueries are made for :) What against using that
>>>>>>> Criteria in the Query?
>>>>>>> Perhaps you want to describe your UseCase and we'll see if there's
>>>>>>> another way to solve it?
>>>>>>>
>>>>>>> Regards
>>>>>>> Stefan
>>>>>>>
>>>>>>> On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
>>>>>>> <elisaelisael...@gmail.com>  wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I would like to know if there is a way to use the fq parameter with
>>>>>>>>
>>>>>>> a
>>>
>>>> partial value.
>>>>>>>>
>>>>>>>> For instance, if I have a request with fq=NAME:Joe, and I would
>>>>>>>>
>>>>>>> like
>>>
>>>> to
>>>>>
>>>>>> retrieve all answers where NAME contains Joe, including those with
>>>>>>>>
>>>>>>> NAME =
>>>>>
>>>>>> Joe Smith.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Elisabeth
>>>>>>>>
>>>>>>>>

Reply via email to