data set(number of documents) is not large - 100k. Number of fields could max to 10 . With average size of indexed field could be 200 characters. I tried creating using multiple indexes by using copy field. Let me see how the performance will be with EdgeNGramTokenFilter or EdgeNGramTokenizer
Thanks for the sugegstions. hossman wrote: > > : I have the following use case. I could implement the solution but > : performance is affected. I need some smart ways of doing this. > : Use Case : > : Incoming data has two fields which have values like 'WAL MART STORES > INC' > : and 'wal-mart-stores-inc'. > : Users can search the data either in 'walmart' 'wal mart' or 'wal-mart' > : also partially on any part of the name from the start of word like > 'wal', > : 'walm' 'wal m' etc . I could get the solution by using two indexes, > one > : as text field for the first field (wal mart ) column and sub word > : wal-mart-stores (with WordDelimiterFilterFactory filter). > > there are lots of solutions that could work, all depending on what *else* > you need to be able to match on besides just prefix queries where > whitespace/punctuation are ignored. > > One example: using KeywordTokenizer, along with a PatternReplaceFilter > that throws away non letter charagers and a LowercaseFilter and then > issuing all your queries as PrefixQueries will get w* wa* wal* and walm* > to all match "wal mart", "WALMART", "WAL-mart", etc.... but that won't > let "mart" match a document contain "wal mart" .. but you can always use > copyField and hit one field for the first type of query, and the other > field for "normal" queries. > > depending on the nature of your data (ie: how many documents, how common > certian prefixes are, etc...) you might get better performacne at the > expense of a larger index if you use something like the > EdgeNGramTokenFilter or EdgeNGramTokenizer to index all the prefixes of > various sizes so you don't need to do a prefix query > > The bottom line: there are *lots* of options, you'll need to experimentto > find the right solution that matches when you want to match, and doesn't > when you don't > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Smart-way-of-indexing-for-Better-performance-tp16092886p16143967.html Sent from the Solr - User mailing list archive at Nabble.com.