[jira] [Commented] (LUCENE-2458) queryparser makes all CJK queries phrase queries regardless of analyzer

Mr. Aleem (Jira) Mon, 10 Aug 2020 08:17:30 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174386#comment-17174386
 ]


Mr. Aleem commented on LUCENE-2458:
-----------------------------------

Я думаю, <a href="https://piratesfile.com/";>A Place to download All PC 
Software</a>что лучший способ продвинуться вперед - добавить поле CJK в solr, 
которое по умолчанию<a href="https://piratesfile.com/hitfilm-pro-crack/";>yeah 
this link</a>имеет противоположное<a 
href="https://piratesfile.com/hitfilm-pro-crack/";>or you can press on 
link</a>поведение Ինչպես նկատել է Կոժին, կարծես թե<a 
href="https://piratesfile.com/hitfilm-pro-crack/";>this Hit Film Software</a> 
will (т.е. рассматривает разделенные токены

<a href="https://piratesfile.com/hitfilm-pro-crack/";>как</a> полностью 
отдельные).

> queryparser makes all CJK queries phrase queries regardless of analyzer
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-2458
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2458
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Blocker
>             Fix For: 3.1, 4.0-ALPHA
>
>         Attachments: LUCENE-2458.patch, LUCENE-2458.patch, LUCENE-2458.patch, 
> LUCENE-2458.patch
>
>
> The queryparser automatically makes *ALL* CJK, Thai, Lao, Myanmar, Tibetan, 
> ... queries into phrase queries, even though you didn't ask for one, and 
> there isn't a way to turn this off.
> This completely breaks lucene for these languages, as it treats all queries 
> like 'grep'.
> Example: if you query for f:abcd with standardanalyzer, where a,b,c,d are 
> chinese characters, you get a phrasequery of "a b c d". if you use cjk 
> analyzer, its no better, its a phrasequery of  "ab bc cd", and if you use 
> smartchinese analyzer, you get a phrasequery like "ab cd". But the user 
> didn't ask for one, and they cannot turn it off.
> The reason is that the code to form phrase queries is not internationally 
> appropriate and assumes whitespace tokenization. If more than one token comes 
> out of whitespace delimited text, its automatically a phrase query no matter 
> what.
> The proposed patch fixes the core queryparser (with all backwards compat 
> kept) to only form phrase queries when the double quote operator is used. 
> Implementing subclasses can always extend the QP and auto-generate whatever 
> kind of queries they want that might completely break search for languages 
> they don't care about, but core general-purpose QPs should be language 
> independent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2458) queryparser makes all CJK queries phrase queries regardless of analyzer

Reply via email to