Re: WordDelimiterFactory

2013-04-19 Thread Ashok
Yes, thank you Erick. The analysis/document handlers hold the key to deciding the type & order of the filters to employ given one's document set, & subject matter at hand. The finalized terms they produce for SOLR search, mlt etc... are crucial to the quality of the results. - ashok -- View thi

Re: WordDelimiterFactory

2013-04-19 Thread Erick Erickson
Ashok: You really, _really_ need to dive into the admin/analysis page. That'll show you exactly what WDFF (and all the other elements of your chain) do to input tokens. Understanding the index and query-time implications of all the settings in WDFF takes a while. But from what you're describing,

Re: WordDelimiterFactory

2013-04-16 Thread Shawn Heisey
On 4/16/2013 8:12 PM, Ashok wrote: > It looks like any 'word' that starts with a digit is treated as a numeric > string. > > Setting generateNumberParts="1" in stead of "0" seems to generate the right > tokens in this case but need to see if it has any other impacts on the > finalized token list..

Re: WordDelimiterFactory

2013-04-16 Thread Ashok
Thank you Jack, yes it is tricky. If my text is x20-y30 I get two nice tokens x20 & y30 that I need to keep. But the text 20x-30y is treated differently and I get nothing. 20x-y30 gives me just 'y30' The docs on LucidWorks say generateNumberParts: (integer, default 1) If non-zero, splits num

Re: WordDelimiterFactory

2013-04-16 Thread Jack Krupansky
Because you told it to!!! With: generateNumberParts="0" WDF is tricky... tell us exactly what rules you want it to follow and then we can tell you how to set the options. Maybe more to the point: why exactly do you think you want it use WDF? Not that there aren't good reasons, but what specif