Re: WordDelimiter in extended way.

2019-10-23 Thread servus01
got it, thank you -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: WordDelimiter in extended way.

2019-10-23 Thread Shawn Heisey
On 10/23/2019 9:41 AM, servus01 wrote: Hey, thank you for helping me: Thanks in advanced for any help, really appriciate. It is not the WordDelimiter filter th

Re: WordDelimiter in extended way.

2019-10-23 Thread servus01
Hey, thank you for helping me: Thanks in advanced for any help, really appriciate. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.h

Re: WordDelimiter in extended way.

2019-10-23 Thread Shawn Heisey
On 10/23/2019 7:43 AM, servus01 wrote: Now Solr behaves in such a way that on the one hand the hyphens which have a blank before and after are not indexed and also the search as soon as blank - blank is searched does not return any results. With the WordDelimiter I have already covered the cases

Re: WordDelimiter Works differently in solr3X vs SolrCloud..?

2015-01-14 Thread gouthsmsimhadri
Thanks Ahmet, that works. - -goutham -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiter-Works-differently-in-solr3X-vs-SolrCloud-tp4179647p4179662.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiter Works differently in solr3X vs SolrCloud..?

2015-01-14 Thread Ahmet Arslan
Hi, You could try passing luceneMatchVersion argument to WordDelimiterFilterFactory and see if it works for you. Factory returns Lucene47WordDelimiterFilter before LUCENE_4_8_0. Ahmet On Wednesday, January 14, 2015 11:10 PM, gouthsmsimhadri wrote: Problem: While migrating the solr version

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
There are two approaches for the query “mixedCase” to match “mixed Case” in the original document. 1. Add an index time synonym. 2. Add a ShingleFilterFactory to the index analysis chain. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:50 AM,

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Michael Sokolov
On 12/30/14 12:42 PM, Jonathan Rochkind wrote: On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then "mixedCase" at query time will no longer match "mixed Case" in the in

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then "mixedCase" at query time will no longer match "mixed Case" in the index/source material. I think I'm having troubl

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
You want preserveOriginal=“1”. You should only do this processing at index time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:33 AM, Jonathan Rochkind wrote: > Okay, thanks. I'm not sure if it's my lack of understanding, but I feel like

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
Okay, thanks. I'm not sure if it's my lack of understanding, but I feel like I'm having a very hard time getting straight answers out of you all, here. I want the query "mixedCase" to match both/either "mixed Case" and "mixedCase" in the index. What configuration of WDF at index/query time w

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
I do have a more thorough discussion of WDF in my Solr Deep Dive e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html You're not "wrong" about anything here... you just need to accept that WDF is not magic and can't handle every

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 11:45 AM, Alexandre Rafalovitch wrote: On 30 December 2014 at 11:12, Jonathan Rochkind wrote: I'm a bit confused about what splitOnCaseChange combined with catenateWords is meant to do at all. It _is_ generating both the split and single-word tokens at query time Have you tried o

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Alexandre Rafalovitch
On 30 December 2014 at 11:12, Jonathan Rochkind wrote: > I'm a bit confused about what splitOnCaseChange combined with catenateWords > is meant to do at all. It _is_ generating both the split and single-word > tokens at query time Have you tried only having WDF during indexing with both options

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
I guess I don't understand what the four use cases are, or the three out of four use cases, or whatever. What the intended uses of the WDF are. Can you explain what the intended use of setting: generateWordParts="1" catenateWords="1" splitOnCaseChange="1" Is that supposed to do something usefu

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
Right, that's what I meant by WDF not being "magic" - you can configure it to match any three out of four use cases as you choose, but there is no choice that matches all of the use cases. To be clear, this is not a "bug" in WDF, but simply a limitation. -- Jack Krupansky On Tue, Dec 30, 2014 a

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
Thanks Erick! Yes, if I set splitOnCaseChange=0, then of course it'll work -- but then query for "mixedCase" will no longer also match "mixed Case". I think I want WDF to... kind of do all of the above. Specifically, I had thought that it would allow a query for "mixedCase" to match both/eit

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Alexandre Rafalovitch
On 29 December 2014 at 18:07, Jonathan Rochkind wrote: > I do not understand what separate query/index analysis you are suggesting to > accomplish what I wanted. I am sure you do know that, but just in case. At the moment, you have only one analyzer chain, so it applies at both index and query ti

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Erick Erickson
Jonathan: Well, it works if you set splitOnCaseChange="0" in just the query part of the analysis chain. I probably mislead you a bit months ago, WDFF is intended for this case iff you expect the case change to generate _tokens_ that are individually meaningful.. And unfortunately "significant" in

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
On 12/29/14 5:24 PM, Jack Krupansky wrote: WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean while the query might be sloppy. You need to separate the index and query analyzers and they need to respect that distinction I do not understand what separate q

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Alexandre Rafalovitch
> splitOnCaseChange="1" So, it does not get split during indexing because there is no case change. But does get split during search and now you are looking for partial tokens against a combined single-token in the index. And not matching. The WordDelimiterFilterFactory is more for product IDs tha

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jack Krupansky
WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean while the query might be sloppy. You need to separate the index and query analyzers and they need to respect that distinction - the index analyzer would index as you have indicated, indexing both the unitary

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
Okay, some months later I've come back to this with an isolated reproduction case. Thanks very much for any advice or debugging help you can give. The WordDelimiter filter is making a mixed-case query NOT match the single-case source, when it ought to. I am in Solr 4.3 (sorry, that's what we

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-03 Thread Erick Erickson
Jonathan: If at all possible, delete your collection/data directory (the whole directory, including data) between runs after you've changed your schema (at least any of your analysis that pertains to indexing). Mixing old and new schema definitions can add to the confusion! Good luck! Erick On W

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-03 Thread Jonathan Rochkind
Thanks Erick and Diego. Yes, I noticed in my last message I'm not actually using defaults, not sure why I chose non-defaults originally. I still need to find time to make a smaller isolation/reproduction case, I'm getting confusing results that suggest some other part of my field def may be pe

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Diego Fernandez
Although not a solution, this may help in trying to find the problem. In http://solr.pl/en/2010/08/16/what-is-schema-xml/ it says: "It is worth noting that there is an additional attribute for the text field type: autoGeneratePhraseQueries This attribute is responsible for telling filters h

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Erick Erickson
What happens if you append &debug=query to your query? IOW, what does the _parsed_ query look like? Also note that the defaults for WDFF are _not_ identical. catenateWords and catenateNumbers are 1 in the index portion and 0 in the query section. Still, this shouldn't be a problem all other things

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
On 9/2/14 1:51 PM, Erick Erickson wrote: bq: In my actual index, query "MacBook" is matching ONLY "mac book", and not "macbook" I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portio

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Erick Erickson
bq: In my actual index, query "MacBook" is matching ONLY "mac book", and not "macbook" I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of the admin/analysis page? Best, Erick

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Yes, thanks, I realize I can twiddle those parameters, but it will probably result in "MacBook" no longer matching "mac book" at all, but ONLY matching "macbook". My understanding of the default settings of WordDelimiterFactory is that they are intending for "MacBook" to match both "mac book"

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
If that's your problem, I bet all you have to do is twiddle on one of the catenate options, either catenateWords or catenateAll. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appin

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Thanks for the response. I understand the problem a little bit better after investigating more. Posting my full field definitions is, I think, going to be confusing, as they are long and complicated. I can narrow it down to an isolation case if I need to. My indexed field in question is relati

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
Hi Jonathan, Little confused by this line: > And, what I think it's trying to do, is match text indexed as "d elalain" as well as text indexed by "delalain". In this case, I don't know how WordDelimiterFilter will help, as you're likely tokenizing on spaces somewhere, and that input text has a s

Re: WordDelimiter

2014-08-08 Thread Jack Krupansky
The word delimiter filter is actually combining "100-001" into "11". You have BOTH catenateNumbers AND catenateAll, so "100-R8989" should generate THREE tokens: the concatenated numbers 100", the concatenated words "R8989", and both numbers and words concatenated, "100R8989 ". -- Jack Krup

Re: WordDelimiter

2014-08-08 Thread Erick Erickson
You haven't really explained what you want to _do_. If you don't want to split words up, just take WordDelimiterFilterFactory out. Or do you want to split sometimes but not others? Best, Erick On Fri, Aug 8, 2014 at 12:27 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) wrote: >

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread roySolr
THANK YOU!! I thought i only could use one character for the pattern.. Now i use a regular expression:) I don't need the wordDelimiter anymore. It's split on # and whitespace dataset: mcdonald's#burgerking#Free record shop#h&m mcdonald's burgerking free record shop h&m This is exactly how we

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread Erick Erickson
It's a little obscure, but you can use http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory in front of WhitespaceTokenizer if you prefer. Note that a CharFilterFactory is different than a FilterFactory, so read carefully .. Best Erick On Tue, Jun 14,

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread lee carroll
do you need the word delimiter ? #|\s i think its just regex in the pattern tokeniser - i might be wrong though ? On 14 June 2011 11:15, roySolr wrote: > Ok, with catenatewords the index term will be mcdonalds. But that's not what > i want. > > I only use the wordDelimiter to split on whitespa

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread roySolr
Ok, with catenatewords the index term will be mcdonalds. But that's not what i want. I only use the wordDelimiter to split on whitespace. I have already used the PatternTokenizerFactory so i can't use the whitespacetokenizer. I want my index looks like this: dataset: mcdonald's#burgerking#Free r

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-10 Thread Erick Erickson
Hmmm, that is confusing. the stemEnglishPossessive=0 actually leaves the 's' in the index, just not attached to the word. The admin/analysis page can help show this Setting it equal to 1 removes it entirely from the stream. If you set catenateWords=1, you'll get "mcdonalds" in your index if s