lysis/common/src/java/org/apache/lucene/analysis/cjk/CJKBigramFilter.java#L64
)
ClassicTokenizer also adds obsolete TOKEN_TYPES "CJ" to the CJ token and
"ALPHANUM" to the Korean alphabet, but both are not targets for
CJKBigramFilter...
Thanks,
Yasufumi
2018年10月2日(火) 0:05 Shawn H
On 9/30/2018 10:14 PM, Yasufumi Mizoguchi wrote:
I am looking for the way to create CJK bigram tokens with ClassicTokenizer.
I tried this by using CJKBigramFilter, but it only supports for
StandardTokenizer...
CJKBigramFilter shouldn't care what tokenizer you're using. It should
Hi,
I am looking for the way to create CJK bigram tokens with ClassicTokenizer.
I tried this by using CJKBigramFilter, but it only supports for
StandardTokenizer...
So, is there any good way to do that?
Thanks,
Yasufumi
ose are Lucene classes, not Solr. Maybe someone
> who was around for whatever discussions happened on Lucene lists back in
> those days will comment.
>
> I wasn't able to find the issue where ClassicTokenizer was created, and I
> couldn't find any information discussing the ch
break
on hyphens, when it seems to me to work better the old way?
I really have no idea. Those are Lucene classes, not Solr. Maybe
someone who was around for whatever discussions happened on Lucene lists
back in those days will comment.
I wasn't able to find the issue where ClassicTokeni
me to work better the old way?
Thanks
Rick
On January 9, 2018 7:07:59 PM EST, Shawn Heisey wrote:
>On 1/9/2018 9:36 AM, Rick Leir wrote:
>> A while ago the default was changed to StandardTokenizer from
>ClassicTokenizer. The biggest difference seems to be that Classic does
>not b
On 1/9/2018 9:36 AM, Rick Leir wrote:
> A while ago the default was changed to StandardTokenizer from
> ClassicTokenizer. The biggest difference seems to be that Classic does not
> break on hyphens. There is also a different character pr(mumble). I prefer
> the Classic's non
Hi all
A while ago the default was changed to StandardTokenizer from ClassicTokenizer.
The biggest difference seems to be that Classic does not break on hyphens.
There is also a different character pr(mumble). I prefer the Classic's
non-break on hyphens.
What was the reason for changing
-ClassicTokenizer-instead-of-StandardTokenizer-tp3990249p3990278.html
Sent from the Solr - User mailing list archive at Nabble.com.
ello,
>
> I need to know that if I use ClassicTokenizer instead of StandardTokenizer
> then what things I will loose. Is it the case that in future solr versions
> ClassicTokenizer will be deprecated? or development in ClassicTokenizer is
> going to halt? Please let me know this.
&g
Hello,
I need to know that if I use ClassicTokenizer instead of StandardTokenizer
then what things I will loose. Is it the case that in future solr versions
ClassicTokenizer will be deprecated? or development in ClassicTokenizer is
going to halt? Please let me know this.
--
View this message in
11 matches
Mail list logo