[ https://issues.apache.org/jira/browse/LUCENE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020802#comment-17020802 ]
Akanksha Jain edited comment on LUCENE-9153 at 1/22/20 6:11 AM: ---------------------------------------------------------------- While parsing, During debugging, control goes to the below method of QueryBuilder.java class *protected final Query createFieldQuery*(Analyzer analyzer, BooleanClause.Occur operator, String field, String queryText, boolean quoted, int phraseSlop) {} Where arguments are: *analyzer:* Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_47); *field:* "P_id" *queryText:* "/content/usergenerated/asi/cloud/content/sites/collaboration-for-development/en/groups/eastern-partnership-transport-panel/groups/eastern-partnership-transport-panel-road-safety/forum/jcr:content/content/primary/forum/test_my_asrp-1KN5/l2-kQnn/l3-kiVI/l4-wZeN/checking_again-tUzu" *quoted*: true *phraseSlop:* 0 Inside this method query divided into 2 termAttr. # "/content/usergenerated/asi/cloud/content/sites/collaboration-for-development/en/groups/eastern-partnership-transport-panel/groups/eastern-partnership-transport-panel-road-safety/forum/jcr:content/content/primary/forum/test_my_asrp-1KN5/l2-kQnn/l3-kiVI/l4-" # "wZeN/checking_again-tUzu" String got cut after 255 lengths. Later both terms added to PhraseQuery object Code snippet from lucene code {code:java} for (int i = 0; i < numTokens; i++) { int positionIncrement = 1; try { boolean hasNext = buffer.incrementToken(); assert hasNext == true; termAtt.fillBytesRef(); if (posIncrAtt != null) { positionIncrement = posIncrAtt.getPositionIncrement(); } } catch (IOException e) { // safe to ignore, because we know the number of tokens } if (enablePositionIncrements) { position += positionIncrement; pq.add(new Term(field, BytesRef.deepCopyOf(bytes)),position); } else { pq.add(new Term(field, BytesRef.deepCopyOf(bytes))); } } {code} *above code create phraseQuery* with below value "p_id:"/content/usergenerated/asi/cloud/content/sites/collaboration-for-development/en/groups/eastern-partnership-transport-panel/groups/eastern-partnership-transport-panel-road-safety/forum/jcr:content/content/primary/forum/test_my_asrp-1KN5/l2-kQnn/l3-kiVI*/l4- wZeN/*checking_again-tUzu" Which has space appended after 255 length. was (Author: akanksha88): While parsing, During debugging, control goes to the below method of QueryBuilder.java class *protected final Query createFieldQuery*(Analyzer analyzer, BooleanClause.Occur operator, String field, String queryText, boolean quoted, int phraseSlop) {} Where agguments are: *analyzer:* Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_47); *field:* "P_id" *queryText:* "/content/usergenerated/asi/cloud/content/sites/collaboration-for-development/en/groups/eastern-partnership-transport-panel/groups/eastern-partnership-transport-panel-road-safety/forum/jcr:content/content/primary/forum/test_my_asrp-1KN5/l2-kQnn/l3-kiVI/l4-wZeN/checking_again-tUzu" *quoted*: true *phraseSlop:* 0 Inside this method query divided into 2 termAttr. # "/content/usergenerated/asi/cloud/content/sites/collaboration-for-development/en/groups/eastern-partnership-transport-panel/groups/eastern-partnership-transport-panel-road-safety/forum/jcr:content/content/primary/forum/test_my_asrp-1KN5/l2-kQnn/l3-kiVI/l4-" # "wZeN/checking_again-tUzu" String got cut after 255 lengths. Later both terms added to PhraseQuery object Code snippet from lucene code for (int i = 0; i < numTokens; i++) { int positionIncrement = 1; try { boolean hasNext = buffer.incrementToken(); assert hasNext == true; termAtt.fillBytesRef(); if (posIncrAtt != null) { positionIncrement = posIncrAtt.getPositionIncrement(); } } catch (IOException e) { // safe to ignore, because we know the number of tokens } if (enablePositionIncrements) { position += positionIncrement; pq.add(new Term(field, BytesRef.deepCopyOf(bytes)),position); } else { pq.add(new Term(field, BytesRef.deepCopyOf(bytes))); } } *above code create phraseQuery* "p_id:"/content/usergenerated/asi/cloud/content/sites/collaboration-for-development/en/groups/eastern-partnership-transport-panel/groups/eastern-partnership-transport-panel-road-safety/forum/jcr:content/content/primary/forum/test_my_asrp-1KN5/l2-kQnn/l3-kiVI*/l4- wZeN/*checking_again-tUzu" Which has space appended after 255 length. > Lucene Query parser append space if query length is greater than 255 > -------------------------------------------------------------------- > > Key: LUCENE-9153 > URL: https://issues.apache.org/jira/browse/LUCENE-9153 > Project: Lucene - Core > Issue Type: Bug > Reporter: Akanksha Jain > Priority: Major > > Hello Everyone > > I am working with Lucene 4.7.1 > When parsing query using Lucene query parser. If query length is greater than > 255 bytes, it returns query with space appended after every 255 bytes, which > is causing further issues in my project. > > Can you please let me know why the term (parsed query contain > Arraylist<Term>) max length is 255 bytes. Why space is appended in between > the query? > > I will really appreciate it if someone can help me with this. > Do let me know if you have not understood my query and require some reference > > For analysis, Please check QueryBuilder.java class which has method > createFieldQuery(....) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org