Re: How to effectively search inside fields that should be indexed with changing them.

Brian Carmalt Fri, 14 Dec 2007 02:27:54 -0800

Hello Otis,

The example I provided was a simplified one. The real usecase is thatwill have to dynamically adapt to field values, from whichwe have no idea what form they will have.. So unfortunately, a customtokenizer will not work. I changed the n-gram values to min=max= 2and I can match sub terms inside the fields that are analyzed with theNGramTokenizer. But I haven't had the time to test it completely.

Can you quickly outline why n-grams are not good solution for my problem?


Thanks, Brian

Otis Gospodnetic schrieb:

Brian,

This is not really a job for n-grams.  It sounds like you'll want to write a 
custom Tokenizer that has knowledge about this particular pattern, knows how to 
split input like the one in your example, and produce multiple tokens out of 
it.  For the natural language part you can probably get away with one of the 
existing tokenizers/analyzers/factories.  For the first part you'll likely want 
to extract (W+)0+ -- 1 or morel etters followed by 1 or more zeros as one 
token, and then 0+(D+) -- 1 or more zeros followed by 1 or more digits.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Brian Carmalt <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 9:17:32 AM
Subject: How to effectively search inside fields that should be indexed with 
changing them.

Hello all,
The titles of our docs have the form "ABC0001231-This is an importantdoc.pdf". I would like to be able tosearch for 'important', or '1231', or 'ABC000*', or 'This is animportant doc' in the title field. I looked a the NGramTokenizer andtried to use it.In the index it doesn't seem to work, I cannot get any hits. Theanalysis tool on the admin pages shows me that thengram tokenizing works by highlighting the matches between the indexedvalue and a query. I have set the
min and max ngram size to 2 and 6, with side equal to left.

Can anyone recommend a procedure that will allow me to search as stated
above?
I would also like to find out more about how to use the NgramTokenizer,
but have found little in the form of
documentation. Anyone know about any good sources?

Thanks,

Brian

Re: How to effectively search inside fields that should be indexed with changing them.

Reply via email to