Re: Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires

2013-04-30 Thread meghana
re whether/how those > patterns could be combined. > > Also, that doesn't allow the case of a single ".", "&", or "_" as a word - > but you didn't specify how that case should be handled. > > > > -- Jack Krupansky > -Original Mes

Re: Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires

2013-04-24 Thread Jack Krupansky
I'm not a regular expression expert, so I'm not sure whether/how those patterns could be combined. Also, that doesn't allow the case of a single ".", "&", or "_" as a word - but you didn't specify how that case should be handled. -

Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires

2013-04-24 Thread meghana
can i set configuration for worddelimiter factory to fulfill my requirement. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-WordDelimiterFactory-with-Custom-Tokenizer-to-split-only-on-Boundires-tp4058557.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiterFactory

2013-04-19 Thread Ashok
ok -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFactory-tp4056529p4057349.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiterFactory

2013-04-19 Thread Erick Erickson
Ashok: You really, _really_ need to dive into the admin/analysis page. That'll show you exactly what WDFF (and all the other elements of your chain) do to input tokens. Understanding the index and query-time implications of all the settings in WDFF takes a while. But from what you're describing,

Re: WordDelimiterFactory

2013-04-16 Thread Shawn Heisey
On 4/16/2013 8:12 PM, Ashok wrote: > It looks like any 'word' that starts with a digit is treated as a numeric > string. > > Setting generateNumberParts="1" in stead of "0" seems to generate the right > tokens in this case but need to see if it has any other impacts on the > finalized token list..

Re: WordDelimiterFactory

2013-04-16 Thread Ashok
this case but need to see if it has any other impacts on the finalized token list... Thanks - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFactory-tp4056529p4056544.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiterFactory

2013-04-16 Thread Jack Krupansky
M To: solr-user@lucene.apache.org Subject: WordDelimiterFactory Hi, Why does WDF swallow all 'words' that start with a 'digit'? My config is: For some text like 20x-30y I am expecting (& want) '20x' & '30y' to be returned & retained as the toke

WordDelimiterFactory

2013-04-16 Thread Ashok
analysis page. Any idea why? I am using 4.1 Thanks - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFactory-tp4056529.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Regarding WordDelimiterFactory

2010-09-09 Thread Grijesh.singh
set splitWordsPart=0,splitNumberPart=0 - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Regarding-WordDelimiterFactory-tp1444694p1444742.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Regarding WordDelimiterFactory

2010-09-09 Thread Robert Muir
On Thu, Sep 9, 2010 at 3:57 AM, Sandhya Agarwal wrote: > Hello, > > I have a file with the input string "91{40}9490949090", and I wanted to > return this file when I search for the query string "+91?40?9*". The > problem is that, the input string is getting indexed as 3 terms "91", "40", > "94909

Regarding WordDelimiterFactory

2010-09-09 Thread Sandhya Agarwal
Hello, I have a file with the input string "91{40}9490949090", and I wanted to return this file when I search for the query string "+91?40?9*". The problem is that, the input string is getting indexed as 3 terms "91", "40", "9490949090". Is there a way to consider "{" and "}" as part of the