The GUI is not built yet, so the jury is out. I plan to include switches to do the MoreLikeThis both ways, but I think it will do a better job because this is a specific case study/example in classification in the book Taming Text by Grant Ingersoll. It is a reasonable assumption that he knows more than I do.
-----Original Message----- From: David Hastings [mailto:hastings.recurs...@gmail.com] Sent: Tuesday, June 20, 2017 12:13 PM To: solr-user@lucene.apache.org Subject: Re: How are people using the ICUTokenizer? Have you successfully used the shingles with the MoreLikeThis query? Really curious about if this would to return the "interesting Phrases" On Tue, Jun 20, 2017 at 12:01 PM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote: > Joel, > > I think the issue is doing word-breaking according to ICU rules. So, if > you are trying to make sure your index breaks words properly on eastern > languages, just use ICU Tokenizer. Unless your text is already in an ICU > normal form, you should always use the ICUNormalizer character filter > along with this: > > https://cwiki.apache.org/confluence/display/solr/CharFilterFactories# > CharFilterFactories-solr.ICUNormalizer2CharFilterFactory > > I think that this would be good with Shingles when you are not > removing stop words, maybe in an alternate analysis of the same content. > > I'm using it in this way, with shingles for phrase recognition and > only doc freq and term freq - my possibly naïve idea is that I do not > need positions and offsets if I'm using shingles, and my main goal is > to do a MoreLikeThis query using the shingled versions of fields. > > -----Original Message----- > From: Joel Bernstein [mailto:joels...@gmail.com] > Sent: Tuesday, June 20, 2017 11:52 AM > To: solr-user@lucene.apache.org > Subject: How are people using the ICUTokenizer? > > It seems that there are some powerful capabilities in the > ICUTokenizer. I was wondering how the community is making use of it. > > Does anyone have experience working with the ICUTokenizer that they > can share? > > > Joel Bernstein > http://joelsolr.blogspot.com/ >