Re: Strategies for effective prefix queries?

Jorge Luis Betancourt Gonzalez Wed, 16 Jul 2014 18:24:07 -0700

Perhaps what you’re trying to do could be addressed by using the 
EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar 
approach, this is an extract of the configuration I’m using:


<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize=“10" minGramSize="1"/>

Basically this allows you to get partial matches from any part of the string, 
let’s say the field get’s this content at index time: "A brown fox”, this 
document will be matched by the query (“bro”) for instance. My personal 
recommendation is to use this in a separated field that get’s populated through 
a copyField, this way you could apply different boosts.

Greetings,

On Jul 16, 2014, at 2:00 PM, Hayden Muhl <haydenm...@gmail.com> wrote:

> A copy field does not address my problem, and this has nothing to do with
> stored fields. This is a query parsing problem, not an indexing problem.
> 
> Here's the use case.
> 
> If someone has a username like "bob-smith", I would like it to match
> prefixes of "bo" and "sm". I tokenize the username into the tokens "bob"
> and "smith". Everything is fine so far.
> 
> If someone enters "bo sm" as a search string, I would like "bob-smith" to
> be one of the results. The query to do this is straight forward,
> "username:bo* username:sm*". Here's the problem. In order to construct that
> query, I have to tokenize the search string "bo sm" **on the client**. I
> don't want to reimplement tokenization on the client. Is there any way to
> give Solr the string "bo sm", have Solr do the tokenization, then treat
> each token like a prefix?
> 
> 
> On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
> 
>> So copyField it to another and apply alternative processing there. Use
>> eDismax to search both. No need to store the copied field, just index it.
>> 
>> Regards,
>>     Alex
>> On 16/07/2014 2:46 am, "Hayden Muhl" <haydenm...@gmail.com> wrote:
>> 
>>> Both fields? There is only one field here: username.
>>> 
>>> 
>>> On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com
>>>> 
>>> wrote:
>>> 
>>>> Search against both fields (one split, one not split)? Keep original
>>>> and tokenized form? I am doing something similar with class name
>>>> autocompletes here:
>>>> 
>>>> 
>>> 
>> https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
>>>> 
>>>> Regards,
>>>>   Alex.
>>>> Personal: http://www.outerthoughts.com/ and @arafalov
>>>> Solr resources: http://www.solr-start.com/ and @solrstart
>>>> Solr popularizers community:
>> https://www.linkedin.com/groups?gid=6713853
>>>> 
>>>> 
>>>> On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl <haydenm...@gmail.com>
>>> wrote:
>>>>> I'm working on using Solr for autocompleting usernames. I'm running
>>> into
>>>> a
>>>>> problem with the wildcard queries (e.g. username:al*).
>>>>> 
>>>>> We are tokenizing usernames so that a username like "solr-user" will
>> be
>>>>> tokenized into "solr" and "user", and will match both "sol" and "use"
>>>>> prefixes. The problem is when we get "solr-u" as a prefix, I'm having
>>> to
>>>>> split that up on the client side before I construct a query
>>>> "username:solr*
>>>>> username:u*". I'm basically using a regex as a poor man's tokenizer.
>>>>> 
>>>>> Is there a better way to approach this? Is there a way to tell Solr
>> to
>>>>> tokenize a string and use the parts as prefixes?
>>>>> 
>>>>> - Hayden
>>>> 
>>> 
>> 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu

Re: Strategies for effective prefix queries?

Reply via email to