Re: Smart way of indexing for Better performance

Yerraguntla Wed, 19 Mar 2008 06:54:30 -0700

data set(number of documents) is not large - 100k. Number of fields could max
to 10 . With average size of indexed field could be 200 characters.
I tried creating using multiple indexes  by using copy field. 
Let me see how the performance will be with EdgeNGramTokenFilter or
EdgeNGramTokenizer


Thanks for the sugegstions.


hossman wrote:
> 
> :   I have the following use case. I could implement the solution but
> : performance is affected. I need some smart ways of doing this.
> : Use Case :
> : Incoming data has two fields which have values like 'WAL MART STORES
> INC' 
> : and 'wal-mart-stores-inc'.   
> : Users can search the data either in 'walmart'  'wal mart' or 'wal-mart' 
> : also partially on any part of the name from the start of word like
> 'wal',
> : 'walm' 'wal m'  etc .   I could get the solution  by using two indexes,
> one
> : as text field for the first field (wal mart ) column and sub word 
> : wal-mart-stores (with WordDelimiterFilterFactory filter).  
> 
> there are lots of solutions that could work, all depending on what *else* 
> you need to be able to match on besides just prefix queries where 
> whitespace/punctuation are ignored.
> 
> One example: using KeywordTokenizer, along with a PatternReplaceFilter 
> that throws away non letter charagers and a LowercaseFilter and then 
> issuing all your queries as PrefixQueries will get w* wa* wal* and walm* 
> to all match "wal mart", "WALMART", "WAL-mart", etc....  but that won't 
> let "mart" match a document contain "wal mart" .. but you can always use 
> copyField and hit one field for the first type of query, and the other 
> field for "normal" queries.
> 
> depending on the nature of your data (ie: how many documents, how common 
> certian prefixes are, etc...) you might get better performacne at the 
> expense of a larger index if you use something like the 
> EdgeNGramTokenFilter or EdgeNGramTokenizer to index all the prefixes of 
> various sizes so you don't need to do a prefix query
> 
> The bottom line: there are *lots* of options, you'll need to experimentto 
> find the right solution that matches when you want to match, and doesn't 
> when you don't
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Smart-way-of-indexing-for-Better-performance-tp16092886p16143967.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Smart way of indexing for Better performance

Reply via email to