Nawab
Look at classicTokenizer. It is a good choice if you have part numbers
with hyphens. The second tokenizer on this page:
https://lucene.apache.org/solr/guide/6_6/tokenizers.html
Cheers -- Rick
On 01/03/2018 04:52 PM, Shawn Heisey wrote:
On 1/3/2018 1:56 PM, Nawab Zada Asad Iqbal wrote
On 1/3/2018 1:56 PM, Nawab Zada Asad Iqbal wrote:
Thanks Emir, Erick.
What i want to do is remove empty tokens after WordDelimiterGraphFilter ?
Is there any such option in WordDelimiterGraphFilter to not generate empty
tokens?
I use LengthFilterFactory with a minimum of 1 and a maximum of 512
WordDelimiterGraphFilterFactory is a new implementation so it's also
quite possible that the behavior just changed.
I just took a look and indeed it does. WordDelimiterFilterFactory
(done on "p / n whatever) produces
token: p n whatever
position: 1 2 3
whereas WordDelimiterGraphFilt
Thanks Emir, Erick.
What i want to do is remove empty tokens after WordDelimiterGraphFilter ?
Is there any such option in WordDelimiterGraphFilter to not generate empty
tokens?
This index field is intended to use for strange strings e.g. part numbers.
P/N HSC0424PP
The benefit of removing the emp
Hi Nawab,
The reason why you do not get shingle is because there is empty token because
after tokenizer you have 3 tokens ‘abc’, ‘-’ and ‘def’ so the token that you
are interested in are not next to each other and cannot form shingle.
What you can do is apply char filter before tokenization to re
If it's regular, you could try using a PatternReplaceCharFilterFactory
(PRCFF), which gets applied to the input before tokenization (note,
this is NOT PatternReplaceFilterFatory, which gets applied after
tokenization).
I don't really see how you could make this work though.
WhitespaceTokenizer wil
Hi,
So, I have a string for indexing:
abc - def (notice the space on either side of hyphen)
which is being processed with this filter-list:-
I get two shingle tokens at the e