Re: fq parameter with partial value

Jonathan Rochkind Mon, 02 May 2011 14:02:19 -0700

So if you have a field that IS tokenized, regardless of what it'scalled, then when you send "My Great Restaurant" to it for _indexing_,it gets _tokenized upon indexing_ to seperate tokens: "My", "Great","Restaurant". Depending on what other analysis you have, it may getfurther analyzed, perhaps to: "my", "great", "restaurant".

You don't need to seperate into tokens yourself before sending it toSolr for indexing, if you define the field using a tokenizer, Solr willdo that when you index. Because this is a VERY common thing to do withSolr; pretty much any field that you want to be effectively searchableyou have Solr tokenize like this.

Because Solr pretty much always matches on individual tokens, that's thefundamental way Solr works.Those seperate tokens is what allows you to SEARCH on the field, and geta match on "my" or on "restaurant". If the field were non-tokenized,you'd ONLY get a hit if the user entered "My Great Restaurant" (andreally not even then unless you take other actions, because of the waySolr query parsers work you'll have trouble getting ANY hits to auser-entered search with the 'lucene' or 'dismax' query parsers if youdon't tokenize).

That tokenized filed won't facet very well though -- if you facetted ona tokenized field with that example entered in it, you'll get a facet"my" with that item in it, and another facet "great" with that item init, and another facet "restuarant" with that item in it.

Which is why you likely want to use a seperate _untokenized_ field forfacetting. Which is why you end up wanting/needing two seperate fields-- one that is tokenized for searching, and one that is not tokenized(and usually not analyzed at all) for facetting.


Hope this helps.

On 5/2/2011 2:43 AM, elisabeth benoit wrote:

I'm a bit confused here.

What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just do
a copyField from what field to another? And how can I search only for
Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have something
like
<field name="CATEGORY_TOKENIZED">Hotel</field>, if I want this to work. And
from what I understand, this means I should do more then just copy
<field name="*CATEGORY*">Restaurant Hotel</field>
to CATEGORY_TOKENIZED.

Thanks,
Elisabeth


2011/4/28 Erick Erickson<[email protected]>

See below:


On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit
<[email protected]>  wrote:

yes, the multivalued field is not broken up into tokens.

so, if I understand well what you mean, I could have

a field CATEGORY with  multiValued="true"
a field CATEGORY_TOKENIZED with  multiValued=" true"

and then some POI

<field name="NAME">POI_Name</field>
...
<field name="*CATEGORY*">Restaurant Hotel</field>
<field name="CATEGORY_TOKENIZED">Restaurant</field>
<field name="CATEGORY_TOKENIZED">Hotel</field>

[EOE] If the above is the document you're sending, then no. The
document would be indexed with
<field name="*CATEGORY*">Restaurant Hotel</field>
<field name="CATEGORY_TOKENIZED">Restaurant Hotel</field>


Or even just:
<field name="*CATEGORY*">Restaurant Hotel</field>

and set up a<copyField>  to copy the value from CATEGORY to
CATEGORY_TOKENIZED.

The multiValued part comes from:
"And a single POIs might have different categories so your document could
have"
which would look like:
<field name="CATEGORY">Restaruant Hotel</field>
<field name="CATEGORY">Health Spa</field>
<field name="CATEGORY">Dance Hall</field>

and your document would be counted for each of those entries while searches
against CATEGORY_TOKENIZED would match things like "dance" "spa" etc.

But do notice that if you did NOT want searching for "restaurant hall"
(no quotes),
to match then you could do proximity searches for less than your
increment gap. e.g.
(this time with the quotes) would be "restaurant hall"~50, which would then
NOT match if your increment gap were 100.

Best
Erick

do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.

But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?

Best regards
Elisabeth


2011/4/28 Erick Erickson<[email protected]>

So, I assume your CATEGORY field is multiValued but each value is not
broken up into tokens, right? If that's the case, would it work to have

second field CATEGORY_TOKENIZED and run your fq against that
field instead?

You could have this be a multiValued field with an increment gap if you
wanted
to prevent matches across separate entries and have your fq do a

proximity

search where the proximity was less than the increment gap....

Best
Erick

On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
<[email protected]>  wrote:

Hi Stefan,

Thanks for answering.

In more details, my problem is the following. I'm working on searching
points of interest (POIs), which can be hotels, restaurants, plumbers,
psychologists, etc.

Those POIs can be identified among other things  by categories or by

brand.

And a single POIs might have different categories (no maximum number).

User

might enter a query like


McDonald’s Paris


or


Restaurant Paris


or


many other possible queries


First I want to do a facet search on brand and categories, to find out

which

case is the current case.


http://localhost:8080/solr /select?q=restaurant  paris
&facet=true&facet.field=BRAND&  facet.field=CATEGORY

and get an answer like

<lst name="facet_fields">

<lst name="CATEGORY">

<int name="Restaurant">598</int>

<int name="Restaurant Hotel">451</int>



Then I want to send a request with fq= CATEGORY: Restaurant and still

get

answers with CATEGORY= Restaurant Hotel.



One solution would be to modify the data to add a new document every

time

we

have a new category, so a POI with three different categories would be

index

three times, each time with a different category.


But I was wondering if there was another way around.



Thanks again,

Elisabeth


2011/4/28 Stefan Matheis<[email protected]>

Hi Elisabeth,

that's not what FilterQueries are made for :) What against using that
Criteria in the Query?
Perhaps you want to describe your UseCase and we'll see if there's
another way to solve it?

Regards
Stefan

On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
<[email protected]>  wrote:

Hello,

I would like to know if there is a way to use the fq parameter with

partial value.

For instance, if I have a request with fq=NAME:Joe, and I would

like

to

retrieve all answers where NAME contains Joe, including those with

NAME =

Joe Smith.

Thanks,
Elisabeth

Re: fq parameter with partial value

Reply via email to