Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Erick Erickson
bq: To me this seems like a design flaw. The Solr fieldtypes seem like they allow a developer to create types that should handle wildcards intelligently. Well, that's pretty impossible. WordDelimiter(Graph)FilterFactory is a case in point. It's designed to break up on uppercase/lowercase/numeric/n

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Webster Homer
It doesn't seem to matter what you do in the query analyzer, if you have a wildcard, it won't use it. Which is exactly the behavior I observed. the solution was to set preserveOriginal="1" and change the etl process to not strip the dashes, letting the index analyzer do that. We have a lot of lega

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Saurabh Sethi
Webster, did you try escaping the special character (assuming you did not do what Shawn did by replacing - with some other text and your indexed tokens have -)? On Thu, Jul 27, 2017 at 12:03 PM, Webster Homer wrote: > Shawn, > Thank you for that. I didn't know about that feature of the WDF. It d

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Webster Homer
Shawn, Thank you for that. I didn't know about that feature of the WDF. It doesn't help my situation but it's great to know about. Googling solr wildcard searches I found this link http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-t

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Erick Erickson
ateWordParts="0" >> >splitOnCaseChange="0" >> >splitOnNumerics="1" >> >generateNumberParts="0" >> > catenateWords="0" >> >

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
generateNumberParts="0" > >catenateWords="0" > >catenateNumbers="1" > >catenateAll="0" > >preserveOriginal="0" > >

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi
catenateAll="0" >preserveOriginal="0" >stemEnglishPossessive="0"/> > > > > > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi < > saurabh.se...@sendgrid.com> > wrote: > > >

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
omer > wrote: > > > I have several fieldtypes that use the WordDelimiterFilterFactory > > > > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers > > separated by hyphens, users often leave out the hyphens and either use > > spaces or just st

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi
1. What tokenizer are you using? 2. Do you have preserveOriginal="1" flag set in your filter? 3. Which version of solr are you using? On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer wrote: > I have several fieldtypes that use the WordDelimiterFilterFactory > > We have

WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
I have several fieldtypes that use the WordDelimiterFilterFactory We have a fieldtype for cas numbers. which look like 1234-12-1, numbers separated by hyphens, users often leave out the hyphens and either use spaces or just string the numbers together. The WDF seemed like a great solution

Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2015-10-26 Thread Jamie Johnson
I came across this post ( http://lucene.472066.n3.nabble.com/Payload-doesn-t-apply-to-WordDelimiterFilterFactory-generated-tokens-td3136748.html) and tried to find a JIRA for this task. Was one ever created? If not I'd be happy to create it if this is still something that makes sense

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Mike L.
rom: Jack Krupansky To: solr-user@lucene.apache.org; Mike L. Sent: Sunday, April 5, 2015 8:23 AM Subject: Re: WordDelimiterFilterFactory - tokenizer question You have to tell the filter what types of tokens to generate - words, numbers. You told it to generate... nothing. You did te

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Jack Krupansky
> this was a quick hit for somebody and also I'm reindexing. > WordDelimiterFilterFactory doesn't seem to be working as expected. Hoping > to get some clarification or if something sticks out here. > > Below is the field type definition being used: > &

WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Mike L.
o I'm reindexing. WordDelimiterFilterFactory doesn't seem to be working as expected. Hoping to get some clarification or if something sticks out here. Below is the field type defini

Re: WordDelimiterFilterFactory and position increment.

2015-02-04 Thread Dmitry Kan
Hi, Could you enable it on the querying side and re-test your case? The rule of thumb I usually follow is to make the index and query side transformations as close as possible. HTH, Dmitry On Wed, Feb 4, 2015 at 6:14 AM, Modassar Ather wrote: > Hi, > > No I am not using WordDelimiterFilter on

Re: WordDelimiterFilterFactory and position increment.

2015-02-03 Thread Modassar Ather
Hi, No I am not using WordDelimiterFilter on query side. Regards, Modassar On Fri, Jan 30, 2015 at 5:12 PM, Dmitry Kan wrote: > Hi, > > Do you use WordDelimiterFilter on query side as well? > > On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather > wrote: > > > Hi, > > > > An insight in the behav

Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Dmitry Kan
Hi, Do you use WordDelimiterFilter on query side as well? On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather wrote: > Hi, > > An insight in the behavior of WordDelimiterFilter will be very helpful. > Please share your inputs. > > Thanks, > Modassar > > On Thu, Jan 22, 2015 at 2:54 PM, Modassar At

Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Modassar Ather
Hi, An insight in the behavior of WordDelimiterFilter will be very helpful. Please share your inputs. Thanks, Modassar On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather wrote: > Hi, > > I am using WordDelimiterFilter while indexing. Parser used is edismax. > Phrase search is failing for terms li

WordDelimiterFilterFactory and position increment.

2015-01-22 Thread Modassar Ather
Hi, I am using WordDelimiterFilter while indexing. Parser used is edismax. Phrase search is failing for terms like "3d image". On the analysis page it shows following four tokens for *3d* and there positions. *token position* 3d 1 3 1 3d 1 d

WordDelimiterFilterFactory and PatternReplaceCharFilterFactory

2014-11-05 Thread Jae Joo
Hi, Once I apply PatternReplaceCharFilterFactory to the input string, the position of token is changed. Here is an example. In the analysis page, p-xylene and p-xylene (without xml tags) have different positions. for p-xylene, p-xylene --> 1 xylene --> 2 p --> 2 pxylene --> However, for the t

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-18 Thread benjelloun
hello, for WordDelimiterFilterFactory: this is an exemple in schema.xml to folow: and for

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-17 Thread Erick Erickson
gt; > > > > >Ahmet > > > > > >On Wednesday, July 16, 2014 3:07 AM, "j...@ece.ubc.ca" > wrote: > > > > > > > >Hello everyone :) > > > >I have a product called "xbox" indexed, and when the user search for > >

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-17 Thread jiag
ednesday, July 16, 2014 3:07 AM, "j...@ece.ubc.ca" wrote: > > > >Hello everyone :) > >I have a product called "xbox" indexed, and when the user search for >either "x-box" or "x box" i want the "xbox" product to be

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Diego Fernandez
t;xbox" before the tokenizer. Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - > Jia, > > I agree that for the spellcheckers to work, you need name="last-components"> instead of . > > But the "x-bo

RE: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Dyer, James
Jia, I agree that for the spellcheckers to work, you need instead of . But the "x-box" => "xbox" example ought to be solved by analyzing using WordDelimiterFilterFactory and "catenateWords=1" at query-time. Did you re-index after changing your analysis chai

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Ahmet Arslan
uot;xbox" product to be returned.  I'm new to Solr, and from reading online, I thought I need to use WordDelimiterFilterFactory for "x-box" case, and WordBreakSolrSpellChecker for "x box" case. Is this correct? (1) In my schema file, this is what I changed: But I don

questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-15 Thread jiag
Hello everyone :) I have a product called "xbox" indexed, and when the user search for either "x-box" or "x box" i want the "xbox" product to be returned. I'm new to Solr, and from reading online, I thought I need to use WordDelimiterFilterFactory fo

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
to keep the StandardTokenizer (because we make use of the > > token > > types) but wanted to use the WDFF to get combinations of words that are > > split with certain characters (mainly - and /, but possibly others as > > well), > > what is the suggested way of accomplishing this? Would we just have to > > extend the JFlex file for the tokenizer and re-compile it? > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > >

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Ahmet Arslan
lit with certain characters (mainly - and /, but possibly others as well), > what is the suggested way of accomplishing this? Would we just have to > extend the JFlex file for the tokenizer and re-compile it? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html > Sent from the Solr - User mailing list archive at Nabble.com. > >

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
because we make use of the token > types) but wanted to use the WDFF to get combinations of words that are > split with certain characters (mainly - and /, but possibly others as well), > what is the suggested way of accomplishing this? Would we just have to > extend the JFlex file for th

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread Shawn Heisey
On 5/16/2014 9:24 AM, aiguofer wrote: > Jack Krupansky-2 wrote >> Typically the white space tokenizer is the best choice when the word >> delimiter filter will be used. >> >> -- Jack Krupansky > > If we wanted to keep the StandardTokenizer (because we make use of the token > types) but wanted to

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread Ahmet Arslan
we just have to extend the JFlex file for the tokenizer and re-compile it? -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread aiguofer
3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not allowing exact match with WordDelimiterFilterFactory

2014-04-25 Thread Kashish
(titleName:"fast five"^20.0 akaName:"fast five"^10.0) Why si the hyphen getting removed? I have no clue. - Thanks, Kashish -- View this message in context: http://lucene.472066.n3.nabble.com/Not-allowing-exact-match-with-WordDelimiterFilterFactory-tp4133193p4133235.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not allowing exact match with WordDelimiterFilterFactory

2014-04-25 Thread Jack Krupansky
quot; for your text field type. -- Jack Krupansky -Original Message- From: Kashish Sent: Friday, April 25, 2014 2:49 PM To: solr-user@lucene.apache.org Subject: Not allowing exact match with WordDelimiterFilterFactory Hi, I am having some problem with WordDelimiterFilte

Not allowing exact match with WordDelimiterFilterFactory

2014-04-25 Thread Kashish
Hi, I am having some problem with WordDelimiterFilterFactory. This is my fieldType So now, If i search for a word like fast-five across this field, the debug shows me (((titleName:fast-five)^20 OR (akaName:fast-five)^10

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Jack Krupansky
Typically the white space tokenizer is the best choice when the word delimiter filter will be used. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, April 16, 2014 11:03 PM To: solr-user@lucene.apache.org Subject: Re: WordDelimiterFilterFactory and

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Shawn Heisey
On 4/16/2014 8:37 PM, Bob Laferriere wrote: >> I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when >> used in conjunction with StandardTokenizerFactory (STF). >> I see the following results for the document of “wi-fi”: >> >> Index: “wi”,

WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Bob Laferriere
 I am seeing odd behavior from WordDelimiterFilterFactory  (WDFF) when used in conjunction with StandardTokenizerFactory (STF). If I use the following configuration

Re: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-04-09 Thread Erick Erickson
you often get a much better sense of what actually happens if you look at the docs for the filter rather than the factory, int this case the WordDelimiterFilter rather than WordDelimiterFilterFactory. This latter is not where the action is, but it's what's available for definitions in schema.xm

AW: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-04-09 Thread Malte Hübner
> -Ursprüngliche Nachricht- > Von: Erick Erickson [mailto:erickerick...@gmail.com] > Gesendet: Samstag, 29. März 2014 16:09 > An: solr-user@lucene.apache.org > Betreff: Re: WordDelimiterFilterFactory splits up hyphenated terms although > splitOnNumerics, gen

Re: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-03-29 Thread Erick Erickson
Why do you say at the indexing part: The given search term is: *X-002-99-495* WordDelimiterFilterFactory indexes the following word parts: * X (shouldn't be there) * 00299495 (shouldn't be there) ?? You've set catenateNumbers="1" in your fieldType for the indexig part,

WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-03-27 Thread Malte Hübner
I am using Solr 4.7 and have got a serious problem with WordDelimiterFilterFactory. WordDelimiterFilterFactory behaves different on hyphenated terms if they contain charaters (a-Z) or characters AND numbers. Splitting up hyphenated terms is deactivated in my configuration: *This is the

Re: Question on WordDelimiterFilterFactory use

2012-12-26 Thread Anirudha Jadhav
ts there too. > > Best, > > Dmitry Kan > > On Wed, Dec 26, 2012 at 10:08 AM, Jose Yadao wrote: > > > Hi and Happy Holidays to everyone. > > > > I have a question regarding the use of WordDelimiterFilterFactory. > > so if i set it to split on intra word de

Re: Question on WordDelimiterFilterFactory use

2012-12-26 Thread Dmitry Kan
he use of WordDelimiterFilterFactory. > so if i set it to split on intra word delimiter, generateWordparts="1" and > catenateWords="1", for the word is i-pod, the ff query will return a result > => "i" "pod" and "ipod"? > > Thanks >

Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output

2012-11-23 Thread Yonik Seeley
recently upgraded from Solr 1.4.1 to 3.6.1 and an running into > a problem with a specific query. When I search for "8mile" or 8-mile" > without the quotes, and I use just the WordDelimiterFilterFactory as > configured below, I get this query which is as expected: album:&

Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output

2012-11-23 Thread Erick Erickson
e, Nov 20, 2012 at 10:55 PM, Chris Book wrote: > Hello, I've recently upgraded from Solr 1.4.1 to 3.6.1 and an running into > a problem with a specific query. When I search for "8mile" or 8-mile" > without the quotes, and I use just the WordDelimiterFilterFactory as &

Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

2012-05-14 Thread Jack Krupansky
apache.org Subject: Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory Thanks Jack! It's too bad I can't have catenate and generateParts both set to "1" at query time. If I set catenate to "0", then I miss the case where "wifi&

Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

2012-05-14 Thread Chung Wu
ex analyzer and do them only in the query analyzer - and multi-term > synonyms don't work well, except for replacement synonyms at index time. > > See the "text_en_splitting" field type in the example schema. > > -- Jack Krupansky > > -Original Message- Fro

Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

2012-05-14 Thread Jack Krupansky
ple schema. -- Jack Krupansky -Original Message- From: Chung Wu Sent: Monday, May 14, 2012 7:01 PM To: solr-user@lucene.apache.org Subject: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory Hi all! I'm using Solr 3.6, and I'm seeing unex

RE: DisMax and WordDelimiterFilterFactory (limitations of MultiPhraseQuery)

2011-10-27 Thread Demian Katz
tion that might be worth overcoming -- I'm sure my use case is not the only one where this could matter. Has anyone given this any thought? - Demian > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, October 27, 2011 8:21 AM &g

Re: DisMax and WordDelimiterFilterFactory

2011-10-27 Thread Erick Erickson
  generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1" /> >         words="stopwords.txt" enablePositionIncrements="true"/> >         >         protected="protwo

DisMax and WordDelimiterFilterFactory

2011-10-25 Thread Demian Katz
like this: The important feature here is the use of WordDelimiterFilterFactory, which allows a search for "WiFi" to match an indexed

Re: Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2011-07-18 Thread Chris Hostetter
: It seems that the payloads are applied only to the original word that I : index and the WordDelimiterFilter doesn't apply the payloads to the tokens : it generates. I believe you are correct. I think the general rule for most TokenFilters that you will find in Lucene/Solr is that they don't t

Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2011-07-04 Thread Lox
Hi, I have a problem with the WordDelimiterFilterFactory and the DelimitedPayloadTokenFilterFactory. It seems that the payloads are applied only to the original word that I index and the WordDelimiterFilter doesn't apply the payloads to the tokens it generates. For example, imagine I inde

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Erick Erickson
2:37 PM > To: solr-user@lucene.apache.org > Subject: Re: term position question from analyzer stack for > WordDelimiterFilterFactory > > Hi Robert, > > I'm no WDFF expert, but all these zero look suspicious: > > org.apache.solr.analysis.WordDelimiterFilterFac

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateW

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Otis Gospodnetic
ucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Robert Petersen > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Sent: Tue, April 26, 2011 4:39:49 PM > Subject: RE: term position question from analyzer stack for >WordDelimiterFilter

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
n question from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen
l.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen wrote: > The search and index analyzer stack are the sam

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Yonik Seeley
On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen wrote: > The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/Analyz

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen
-Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, April 22, 2011 5:55 PM To: Robert Petersen Cc: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Fri, Apr 22, 2011 at

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-22 Thread Yonik Seeley
On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen wrote: > I can repeatedly demonstrate this in my dev environment, where I get > entirely different results searching for AppleTV vs. appletv You originally said "I cannot get a match between AppleTV on the indexing side and appletv on the search si

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-22 Thread Robert Petersen
this work... thanks for the help! -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, April 21, 2011 5:54 PM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactor

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 8:06 PM, Robert Petersen wrote: > So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory > settings I cannot get a match between AppleTV on the indexing side and > appletv on the search side. Hmmm, that shouldn't be the case. The "

term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-21 Thread Robert Petersen
So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side. Without that setting the all lowercase version of AppleTV is in term position two due to the catenateWords=1 o

Re: Is WordDelimiterFilterFactory applicable to non-english language?

2011-03-14 Thread Ahmet Arslan
> Does it make sense to apply > WordDelimiterFilterFactory to non-english > language, such as spanish?  Yes it makes sense. WDF is especially good for product names; like i-phone, iphone4 etc.

Is WordDelimiterFilterFactory applicable to non-english language?

2011-03-14 Thread cyang2010
Does it make sense to apply WordDelimiterFilterFactory to non-english language, such as spanish? What about asian lanaguage? The following are the typical use case for WordDelimiterFilterFactory. Is 1, 2, 3, and 4 applicable to all wester language (including spanish)? For asian language, is

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill
m type that uses the WordDelimiterFilterFactory) (let's name the field "foo") - create a string field (let's name it "foo_string") - create a "copyField" with the source being "foo" and the dest being "foo_string". - use dismax (or edismax) to search both

WordDelimiterFilterFactory

2011-02-04 Thread John kim
If i use WordDelimiterFilterFactory during indexing and at query time, will a search for "cls500" find "cls 500" and "cls500x"? If so, will it find and score exact matches higher? If not, how do you get exact matches to display first?

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-19 Thread Peter Karich
Hi, the final solution is explained here in context: http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3caanlktimatgvplph_mgfbsughdoedc8tc2brrwxhid...@mail.gmail.com%3e " /If you are using Solr branch_3x or trunk, you can turn this off, by setting autoGeneratePhraseQueries to fa

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Peter, I recently had this issue, and I had to set splitOnCaseChange="0" to keep the word delimiter filter from doing what you describe. Can you try that and see if it helps? - Ken Hi Ken, yes this would solve my problem, but then I would lost a match for 'SuperMario' if I query 'mario', r

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:22 PM, Peter Karich wrote: > >> Hi, >> >> Please add preserveOriginal="1"  to your WDF [1] definition and reindex >> (or >> just try with the analysis page). > > but it is already there!? > >                         generateWordParts="1" generateNumberParts="1" > catenat

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? Regards, Peter. Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). This will make

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Markus Jelsma
Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). This will make sure the original input token is being preserved along the newly generated tokens. If you then pass it all through a lowercase filter, it should match both documents

WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, I am going crazy but which config is necessary to include the missing doc 2? I have: doc1 tw:aBc doc2 tw:abc Now a query "aBc" returns only doc 1 although when I try doc2 from admin/analysis.jsp then the term text 'abc' of the index gets highlighted as intended. I even indexed a simple ex

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
wrote: >> >> > Hi Guys, >> > >> > I encountered a problem when enabling WordDelimiterFilterFactory for >> both >> > index and query (pasted relative part of schema.xml at the bottom of >> > email). >> > >> > *1. Steps to reproduce:* >

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
se/SOLR-1852 , which was fixed in 1.4.1 > > On Tue, Sep 14, 2010 at 5:40 AM, yandong yao wrote: > > > Hi Guys, > > > > I encountered a problem when enabling WordDelimiterFilterFactory for both > > index and query (pasted relative part of schema.xml at the bottom

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread Robert Muir
enabling WordDelimiterFilterFactory for both > index and query (pasted relative part of schema.xml at the bottom of > email). > > *1. Steps to reproduce:* >1.1 The indexed sample document contains only one sentence: "This is a > TechNote." >1.2 Query is: q=Tech

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread Erick Erickson
Really well done problem statement by the way On Tue, Sep 14, 2010 at 5:40 AM, yandong yao wrote: > Hi Guys, > > I encountered a problem when enabling WordDelimiterFilterFactory for both > index and query (pasted relative part of schema.xml at the bottom of > email). &

A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps to reproduce:* 1.1 The indexed sample document contains only one sentence: "This is a TechNote." 1.2 Q

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-30 Thread Shawn Heisey
On 8/30/2010 9:01 AM, Shawn Heisey wrote: On 8/29/2010 2:17 PM, Erick Erickson wrote: <<>> Try putting this after any instances of, say, WhiteSpaceTokenizerFactory in your analyzser definition, and I believe you'll see that this is not true. At least looking at this in the analysis page from S

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-30 Thread Shawn Heisey
On 8/29/2010 2:17 PM, Erick Erickson wrote: <<>> Try putting this after any instances of, say, WhiteSpaceTokenizerFactory in your analyzser definition, and I believe you'll see that this is not true. At least looking at this in the analysis page from SOLR admin sure doesn't seem to support that

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Erick Erickson
There's nothing built into SOLR that I know of that'll deal with auto-detecting multiple languages and "doing the right thing". I know there's been discussion of that, searching the users' list might help... You may have to write your own analyzer that tries to do this, but I have no clue how you'd

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
Thank you for taking the time to help. The way I've got the word delimiter index filter set up with only one pass, "wolf-biederman" will result in wolf, biederman, wolfbiederman, and wolf-biederman. With two passes, the last one is not present. One pass changes "gremlin's" to gremlin and gr

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Erick Erickson
Look at the tokenizer/filter chain that makes up your analyzers, and see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for other tokenizer/analyzer/filter options. You're on the right track looking at the various choices provided, and I suspect you'll find what you need... Be a l

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
On 8/28/2010 7:59 PM, Shawn Heisey wrote: The only drop in term quality that I noticed was that possessive words (apostrophe-s) no longer have the original preserved. I haven't yet decided whether that's a problem. I finally did notice another drop in term quality from the dual pass - words

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
It's metadata for a collection of 45 million documents that is mostly photos, with some videos and text. The data is imported from a MySQL database and split among six large shards (each nearly 13GB) and a small shard with data added in the last week. That works out to between 300,000 and 50

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-28 Thread Shawn Heisey
It's metadata for a collection of 45 million documents that is mostly photos, with some videos and text. The data is imported from a MySQL database and split among six large shards (each nearly 13GB) and a small shard with data added in the last week, which usually works out to between 300,000

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-27 Thread Erick Erickson
roach. > > > On Thursday 26 August 2010 17:45:45 Shawn Heisey wrote: > > Can I pass my data through WordDelimiterFilterFactory more than once? > > It occurs to me that I might get better results if I can do some of the > > filters separately and use preserveOriginal on some

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-27 Thread Markus Jelsma
pass my data through WordDelimiterFilterFactory more than once? > It occurs to me that I might get better results if I can do some of the > filters separately and use preserveOriginal on some of them but not others. > > Currently I am using the following definition on both indexing and &

Multiple passes with WordDelimiterFilterFactory

2010-08-26 Thread Shawn Heisey
Can I pass my data through WordDelimiterFilterFactory more than once? It occurs to me that I might get better results if I can do some of the filters separately and use preserveOriginal on some of them but not others. Currently I am using the following definition on both indexing and

Re: dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

2010-03-11 Thread Erick Erickson
Kind of a shot in the dark here, but your parameters for index and query on WordDelimiterFilterFactory are different, especially suspicious is catenateWords. You could test this by looking in your index with the SOLR admin page and/or Luke to see what your actual terms are. And don't f

RE: dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

2010-03-11 Thread Ya-Wen Hsu
Yonik, thank you for your reply. When I don't use PreserveOriginal = 1 for WordDelimiterFilterFactory, the query "ain't" is parsed as "ain t" and no match is found in this case too. If I remove ' from the query, then I can get results. I used the anal

Re: dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

2010-03-11 Thread Yonik Seeley
On Thu, Mar 11, 2010 at 1:07 PM, Ya-Wen Hsu wrote: > Hi all, > > I'm facing the same issue as previous post here: > http://www.mail-archive.com/solr-user@lucene.apache.org/msg19511.html. Since > no one answers this post, I thought I'll ask again. In my case, I use below > setting for index > g

dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

2010-03-11 Thread Ya-Wen Hsu
Hi all, I'm facing the same issue as previous post here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg19511.html. Since no one answers this post, I thought I'll ask again. In my case, I use below setting for index and for query. When I use query with word "ain't", no result is

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-30 Thread Erick Erickson
if I set > generateWordParts and catenateWords to "0", the way term texts are created > for ".355" does not change. > > Thank you for your time. > > Regards > Rahul > > On Sun, Nov 29, 2009 at 1:07 AM, Steven A Rowe wrote: > > > Hi Rahu

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-29 Thread Rahul R
rds to "0", the way term texts are created for ".355" does not change. Thank you for your time. Regards Rahul On Sun, Nov 29, 2009 at 1:07 AM, Steven A Rowe wrote: > Hi Rahul, > > On 11/26/2009 at 12:53 AM, Rahul R wrote: > > Is there a way by which I can

RE: Trouble Configuring WordDelimiterFilterFactory

2009-11-28 Thread Steven A Rowe
Hi Rahul, On 11/26/2009 at 12:53 AM, Rahul R wrote: > Is there a way by which I can prevent the WordDelimiterFilterFactory > from totally acting on numerical data ? "prevent ... from totally acting on" is pretty vague, and nowhere AFAICT do you say precisely what it is you want

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-25 Thread Rahul R
a > combination of numbers, alphabets, special characters etc. I have a > requirement wherein the WordDelimiterFilterFactory does not work on numbers, > especially those with decimal points. Accuracy of results with relevance to > numerical data is quite important, So if the text fiel

  1   2   >