bq: To me this seems like a design flaw. The Solr fieldtypes seem like they
allow a developer to create types that should handle wildcards
intelligently.
Well, that's pretty impossible. WordDelimiter(Graph)FilterFactory is a
case in point. It's designed to break up on
uppercase/lowercase/numeric/n
It doesn't seem to matter what you do in the query analyzer, if you have a
wildcard, it won't use it. Which is exactly the behavior I observed.
the solution was to set preserveOriginal="1" and change the etl process to
not strip the dashes, letting the index analyzer do that. We have a lot of
lega
Webster, did you try escaping the special character (assuming you did not
do what Shawn did by replacing - with some other text and your indexed
tokens have -)?
On Thu, Jul 27, 2017 at 12:03 PM, Webster Homer
wrote:
> Shawn,
> Thank you for that. I didn't know about that feature of the WDF. It d
Shawn,
Thank you for that. I didn't know about that feature of the WDF. It doesn't
help my situation but it's great to know about.
Googling solr wildcard searches I found this link
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-t
The Admin/Analysis page is useful here. It'll show you what each bit
of your query analysis chain does and may well point you to the part
of the chain that's the problem.
Best,
Erick
On Wed, Jul 26, 2017 at 11:33 AM, Webster Homer wrote:
> checked the Pattern Replace it's OK. Can't use the prese
checked the Pattern Replace it's OK. Can't use the preserve original since
it preserves the hyphens too, which I don't want. It would be best if it
didn't touch the * at all
On Wed, Jul 26, 2017 at 1:30 PM, Saurabh Sethi
wrote:
> My guess is PatternReplaceFilterFactory is most likely the cause.
My guess is PatternReplaceFilterFactory is most likely the cause.
Also, based on your query, you might want to set preserveOriginal=1
You can take one filter out at a time and see which one is altering the
query.
On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer
wrote:
> 1. KeywordTokenizer - we
1. KeywordTokenizer - we want to treat the entire field as a single term to
parse
2. preserveOriginal = "0" Thought about changing this to 1
3. 6.2.2
This is the fieldtype
1. What tokenizer are you using?
2. Do you have preserveOriginal="1" flag set in your filter?
3. Which version of solr are you using?
On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer
wrote:
> I have several fieldtypes that use the WordDelimiterFilterFactory
>
> We have a fieldtype for cas numbers
rom: Jack Krupansky
To: solr-user@lucene.apache.org; Mike L.
Sent: Sunday, April 5, 2015 8:23 AM
Subject: Re: WordDelimiterFilterFactory - tokenizer question
You have to tell the filter what types of tokens to generate - words, numbers.
You told it to generate... nothing. You did te
You have to tell the filter what types of tokens to generate - words,
numbers. You told it to generate... nothing. You did tell it to preserve
the original, unfiltered token though, which is fine.
-- Jack Krupansky
On Sun, Apr 5, 2015 at 3:39 AM, Mike L.
wrote:
> Solr User Group,
> I have a
Hi,
Could you enable it on the querying side and re-test your case?
The rule of thumb I usually follow is to make the index and query side
transformations as close as possible.
HTH,
Dmitry
On Wed, Feb 4, 2015 at 6:14 AM, Modassar Ather
wrote:
> Hi,
>
> No I am not using WordDelimiterFilter on
Hi,
No I am not using WordDelimiterFilter on query side.
Regards,
Modassar
On Fri, Jan 30, 2015 at 5:12 PM, Dmitry Kan wrote:
> Hi,
>
> Do you use WordDelimiterFilter on query side as well?
>
> On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather
> wrote:
>
> > Hi,
> >
> > An insight in the behav
Hi,
Do you use WordDelimiterFilter on query side as well?
On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather
wrote:
> Hi,
>
> An insight in the behavior of WordDelimiterFilter will be very helpful.
> Please share your inputs.
>
> Thanks,
> Modassar
>
> On Thu, Jan 22, 2015 at 2:54 PM, Modassar At
Hi,
An insight in the behavior of WordDelimiterFilter will be very helpful.
Please share your inputs.
Thanks,
Modassar
On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather
wrote:
> Hi,
>
> I am using WordDelimiterFilter while indexing. Parser used is edismax.
> Phrase search is failing for terms li
Hey Ahmet,
Yeah I had missed Shawn's response, I'll have to give that a try as well. As
for the version, we're using 4.4. StandardTokenizer sets type for HANGUL,
HIRAGANA, IDEOGRAPHIC, KATAKANA, and SOUTHEAST_ASIAN and you're right, we're
using TypeTokenFilter to remove those.
Diego Fernand
Hi Diego,
Did you miss Shawn's response? His ICUTokenizerFactory solution is better than
mine.
By the way, what solr version are you using? Does StandardTokenizer set type
attribute for CJK words?
To filter out given types, you not need a custom filter. Type Token filter
serves exactly that
Great, thanks for the information! Right now we're using the StandardTokenizer
types to filter out CJK characters with a custom filter. I'll test using
MappingCharFilters, although I'm a little concerned with possible adverse
scenarios.
Diego Fernandez - 爱国
Software Engineer
US GSS Supporta
On 5/16/2014 9:24 AM, aiguofer wrote:
> Jack Krupansky-2 wrote
>> Typically the white space tokenizer is the best choice when the word
>> delimiter filter will be used.
>>
>> -- Jack Krupansky
>
> If we wanted to keep the StandardTokenizer (because we make use of the token
> types) but wanted to
Hi Aiguofer,
You mean ClassicTokenizer? Because StandardTokenizer does not set token types
(e-mail, url, etc).
I wouldn't go with the JFlex edit, mainly because maintenance costs. It will be
a burden to maintain a custom tokenizer.
MappingCharFilters could be used to manipulate tokenizer beha
Jack Krupansky-2 wrote
> Typically the white space tokenizer is the best choice when the word
> delimiter filter will be used.
>
> -- Jack Krupansky
If we wanted to keep the StandardTokenizer (because we make use of the token
types) but wanted to use the WDFF to get combinations of words that ar
Typically the white space tokenizer is the best choice when the word
delimiter filter will be used.
-- Jack Krupansky
-Original Message-
From: Shawn Heisey
Sent: Wednesday, April 16, 2014 11:03 PM
To: solr-user@lucene.apache.org
Subject: Re: WordDelimiterFilterFactory and
On 4/16/2014 8:37 PM, Bob Laferriere wrote:
>> I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when
>> used in conjunction with StandardTokenizerFactory (STF).
>> I see the following results for the document of “wi-fi”:
>>
>> Index: “wi”, “fi”
>> Query: “wi”,”fi”,”wifi”
>>
>
l.
Best,
Erick
On Wed, Apr 9, 2014 at 7:38 AM, Malte Hübner wrote:
>> -Ursprüngliche Nachricht-
>> Von: Erick Erickson [mailto:erickerick...@gmail.com]
>> Gesendet: Samstag, 29. März 2014 16:09
>> An: solr-user@lucene.apache.org
>> Betreff: Re: Word
Why do you say at the indexing part:
The given search term is: *X-002-99-495*
WordDelimiterFilterFactory indexes the following word parts:
* X (shouldn't be there)
* 00299495 (shouldn't be there)
??
You've set catenateNumbers="1" in your fieldType for the indexig part,
so WDFF is doing exactly wh
You can always try something like this out in the analysis.jsp page,
accessible from the Solr Admin home. Check out that page and see how it
allows you to enter text to represent what was indexed, and text for a
query. You can then see if there are matches. Very handy to see how the
various filters
Hi,
the final solution is explained here in context:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3caanlktimatgvplph_mgfbsughdoedc8tc2brrwxhid...@mail.gmail.com%3e
"
/If you are using Solr branch_3x or trunk, you can turn this off, by
setting autoGeneratePhraseQueries to fa
Peter,
I recently had this issue, and I had to set splitOnCaseChange="0" to
keep the word delimiter filter from doing what you describe. Can you
try that and see if it helps?
- Ken
Hi Ken,
yes this would solve my problem,
but then I would lost a match for 'SuperMario' if I query 'mario', r
On Thu, Nov 18, 2010 at 3:22 PM, Peter Karich wrote:
>
>> Hi,
>>
>> Please add preserveOriginal="1" to your WDF [1] definition and reindex
>> (or
>> just try with the analysis page).
>
> but it is already there!?
>
> generateWordParts="1" generateNumberParts="1"
> catenat
Hi,
Please add preserveOriginal="1" to your WDF [1] definition and reindex (or
just try with the analysis page).
but it is already there!?
Regards,
Peter.
Hi,
Please add preserveOriginal="1" to your WDF [1] definition and reindex (or
just try with the analysis page).
This will make
Hi,
Please add preserveOriginal="1" to your WDF [1] definition and reindex (or
just try with the analysis page).
This will make sure the original input token is being preserved along the
newly generated tokens. If you then pass it all through a lowercase filter, it
should match both documents
: In trying to understand the various options for
: WordDelimiterFilterFactory, I tried setting all options to 0. This seems
: to prevent a number of words from being output at all. In particular
: "can't" and "99dxl" don't get output, nor do any wods containing hypens.
: Is this correct behav
32 matches
Mail list logo