Ah, okay!
Well, then I suggest you index the field in two different ways if you want both possible ways of searching. One, where you treat the entire name as one token (in lowercase) (then you can search for avera* and match on for instance "average joe" etc.) And then another field where you tokenize on whitespace for instance, if you want/need that possibility aswell. Look at the solr copy fields and try it out, it works like a charm :)

Cheers,
 Aleksander

On Tue, 18 Nov 2008 10:40:24 +0100, Carsten L <[EMAIL PROTECTED]> wrote:


Thanks for the quick reply!

It is supposed to work a little like the Google Suggest or field
autocompletion.

I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.

I looked at the solr.WordDelimiterFilterFactory, but it does not mention
anything about whitespaces - or wildcards.

A quick brushup:
I would like to mimic the LIKE functionality from MySQL using the wildcards
in the end of the searchquery.
In MySQL whitespaces are treated as characters, not "splitters".


Aleksander M. Stensby wrote:

Hi there,

You should use LowerCaseTokenizerFactory as you point out yourself. As far as I know, the StandardTokenizer "recognizes email addresses and internet
hostnames as one token". In your case, I guess you want an email, say
"[EMAIL PROTECTED]" to be split into four tokens: average joe apache
org, or something like that, which would indeed allow you to search for
"joe" or "average j*" and match. To do so, you could use the
WordDelimiterFilterFactory and split on intra-word delimiters (I think the
defaults here are non-alphanumeric chars).

Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
for more info on tokenizers and filters.

cheers,
  Aleks

On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote:


Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the "MySQL
LIKE".

So when a user enters the search term: "carsten", then the query looks
like:
        "name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: "carsten l" the query looks like:
        "name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."

I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name
and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.



--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no






--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

Reply via email to