Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Erick Erickson
bq: To me this seems like a design flaw. The Solr fieldtypes seem like they allow a developer to create types that should handle wildcards intelligently. Well, that's pretty impossible. WordDelimiter(Graph)FilterFactory is a case in point. It's designed to break up on uppercase/lowercase/numeric/n

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Webster Homer
It doesn't seem to matter what you do in the query analyzer, if you have a wildcard, it won't use it. Which is exactly the behavior I observed. the solution was to set preserveOriginal="1" and change the etl process to not strip the dashes, letting the index analyzer do that. We have a lot of lega

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Saurabh Sethi
Webster, did you try escaping the special character (assuming you did not do what Shawn did by replacing - with some other text and your indexed tokens have -)? On Thu, Jul 27, 2017 at 12:03 PM, Webster Homer wrote: > Shawn, > Thank you for that. I didn't know about that feature of the WDF. It d

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Webster Homer
Shawn, Thank you for that. I didn't know about that feature of the WDF. It doesn't help my situation but it's great to know about. Googling solr wildcard searches I found this link http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-t

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Erick Erickson
The Admin/Analysis page is useful here. It'll show you what each bit of your query analysis chain does and may well point you to the part of the chain that's the problem. Best, Erick On Wed, Jul 26, 2017 at 11:33 AM, Webster Homer wrote: > checked the Pattern Replace it's OK. Can't use the prese

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
checked the Pattern Replace it's OK. Can't use the preserve original since it preserves the hyphens too, which I don't want. It would be best if it didn't touch the * at all On Wed, Jul 26, 2017 at 1:30 PM, Saurabh Sethi wrote: > My guess is PatternReplaceFilterFactory is most likely the cause.

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi
My guess is PatternReplaceFilterFactory is most likely the cause. Also, based on your query, you might want to set preserveOriginal=1 You can take one filter out at a time and see which one is altering the query. On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer wrote: > 1. KeywordTokenizer - we

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
1. KeywordTokenizer - we want to treat the entire field as a single term to parse 2. preserveOriginal = "0" Thought about changing this to 1 3. 6.2.2 This is the fieldtype

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi
1. What tokenizer are you using? 2. Do you have preserveOriginal="1" flag set in your filter? 3. Which version of solr are you using? On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer wrote: > I have several fieldtypes that use the WordDelimiterFilterFactory > > We have a fieldtype for cas numbers

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Mike L.
rom: Jack Krupansky To: solr-user@lucene.apache.org; Mike L. Sent: Sunday, April 5, 2015 8:23 AM Subject: Re: WordDelimiterFilterFactory - tokenizer question You have to tell the filter what types of tokens to generate - words, numbers. You told it to generate... nothing. You did te

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Jack Krupansky
You have to tell the filter what types of tokens to generate - words, numbers. You told it to generate... nothing. You did tell it to preserve the original, unfiltered token though, which is fine. -- Jack Krupansky On Sun, Apr 5, 2015 at 3:39 AM, Mike L. wrote: > Solr User Group, > I have a

Re: WordDelimiterFilterFactory and position increment.

2015-02-04 Thread Dmitry Kan
Hi, Could you enable it on the querying side and re-test your case? The rule of thumb I usually follow is to make the index and query side transformations as close as possible. HTH, Dmitry On Wed, Feb 4, 2015 at 6:14 AM, Modassar Ather wrote: > Hi, > > No I am not using WordDelimiterFilter on

Re: WordDelimiterFilterFactory and position increment.

2015-02-03 Thread Modassar Ather
Hi, No I am not using WordDelimiterFilter on query side. Regards, Modassar On Fri, Jan 30, 2015 at 5:12 PM, Dmitry Kan wrote: > Hi, > > Do you use WordDelimiterFilter on query side as well? > > On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather > wrote: > > > Hi, > > > > An insight in the behav

Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Dmitry Kan
Hi, Do you use WordDelimiterFilter on query side as well? On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather wrote: > Hi, > > An insight in the behavior of WordDelimiterFilter will be very helpful. > Please share your inputs. > > Thanks, > Modassar > > On Thu, Jan 22, 2015 at 2:54 PM, Modassar At

Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Modassar Ather
Hi, An insight in the behavior of WordDelimiterFilter will be very helpful. Please share your inputs. Thanks, Modassar On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather wrote: > Hi, > > I am using WordDelimiterFilter while indexing. Parser used is edismax. > Phrase search is failing for terms li

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
Hey Ahmet, Yeah I had missed Shawn's response, I'll have to give that a try as well. As for the version, we're using 4.4. StandardTokenizer sets type for HANGUL, HIRAGANA, IDEOGRAPHIC, KATAKANA, and SOUTHEAST_ASIAN and you're right, we're using TypeTokenFilter to remove those. Diego Fernand

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Ahmet Arslan
Hi Diego, Did you miss Shawn's response? His ICUTokenizerFactory solution is better than mine.  By the way, what solr version are you using? Does StandardTokenizer set type attribute for CJK words? To filter out given types, you not need a custom filter. Type Token filter serves exactly that

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
Great, thanks for the information! Right now we're using the StandardTokenizer types to filter out CJK characters with a custom filter. I'll test using MappingCharFilters, although I'm a little concerned with possible adverse scenarios. Diego Fernandez - 爱国 Software Engineer US GSS Supporta

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread Shawn Heisey
On 5/16/2014 9:24 AM, aiguofer wrote: > Jack Krupansky-2 wrote >> Typically the white space tokenizer is the best choice when the word >> delimiter filter will be used. >> >> -- Jack Krupansky > > If we wanted to keep the StandardTokenizer (because we make use of the token > types) but wanted to

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread Ahmet Arslan
Hi Aiguofer, You mean ClassicTokenizer? Because StandardTokenizer does not set token types (e-mail, url, etc). I wouldn't go with the JFlex edit, mainly because maintenance costs. It will be a burden to maintain a custom tokenizer. MappingCharFilters could be used to manipulate tokenizer beha

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread aiguofer
Jack Krupansky-2 wrote > Typically the white space tokenizer is the best choice when the word > delimiter filter will be used. > > -- Jack Krupansky If we wanted to keep the StandardTokenizer (because we make use of the token types) but wanted to use the WDFF to get combinations of words that ar

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Jack Krupansky
Typically the white space tokenizer is the best choice when the word delimiter filter will be used. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, April 16, 2014 11:03 PM To: solr-user@lucene.apache.org Subject: Re: WordDelimiterFilterFactory and

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Shawn Heisey
On 4/16/2014 8:37 PM, Bob Laferriere wrote: >> I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when >> used in conjunction with StandardTokenizerFactory (STF). >> I see the following results for the document of “wi-fi”: >> >> Index: “wi”, “fi” >> Query: “wi”,”fi”,”wifi” >> >

Re: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-04-09 Thread Erick Erickson
l. Best, Erick On Wed, Apr 9, 2014 at 7:38 AM, Malte Hübner wrote: >> -Ursprüngliche Nachricht- >> Von: Erick Erickson [mailto:erickerick...@gmail.com] >> Gesendet: Samstag, 29. März 2014 16:09 >> An: solr-user@lucene.apache.org >> Betreff: Re: Word

Re: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-03-29 Thread Erick Erickson
Why do you say at the indexing part: The given search term is: *X-002-99-495* WordDelimiterFilterFactory indexes the following word parts: * X (shouldn't be there) * 00299495 (shouldn't be there) ?? You've set catenateNumbers="1" in your fieldType for the indexig part, so WDFF is doing exactly wh

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill
You can always try something like this out in the analysis.jsp page, accessible from the Solr Admin home. Check out that page and see how it allows you to enter text to represent what was indexed, and text for a query. You can then see if there are matches. Very handy to see how the various filters

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-19 Thread Peter Karich
Hi, the final solution is explained here in context: http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3caanlktimatgvplph_mgfbsughdoedc8tc2brrwxhid...@mail.gmail.com%3e " /If you are using Solr branch_3x or trunk, you can turn this off, by setting autoGeneratePhraseQueries to fa

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Peter, I recently had this issue, and I had to set splitOnCaseChange="0" to keep the word delimiter filter from doing what you describe. Can you try that and see if it helps? - Ken Hi Ken, yes this would solve my problem, but then I would lost a match for 'SuperMario' if I query 'mario', r

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:22 PM, Peter Karich wrote: > >> Hi, >> >> Please add preserveOriginal="1"  to your WDF [1] definition and reindex >> (or >> just try with the analysis page). > > but it is already there!? > >                         generateWordParts="1" generateNumberParts="1" > catenat

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? Regards, Peter. Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). This will make

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Markus Jelsma
Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). This will make sure the original input token is being preserved along the newly generated tokens. If you then pass it all through a lowercase filter, it should match both documents

Re: WordDelimiterFilterFactory removes words when options set to 0

2009-04-28 Thread Chris Hostetter
: In trying to understand the various options for : WordDelimiterFilterFactory, I tried setting all options to 0. This seems : to prevent a number of words from being output at all. In particular : "can't" and "99dxl" don't get output, nor do any wods containing hypens. : Is this correct behav