[jira] [Commented] (SOLR-5669) queries containing \u return error: "Truncated unicode escape sequence."
[ https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017813#comment-17017813 ] Jacek Kikiewicz commented on SOLR-5669: --- Hi, It seems that many versions later (8.3) the bug is still there. Any chance for an update? > queries containing \u return error: "Truncated unicode escape sequence." > - > > Key: SOLR-5669 > URL: https://issues.apache.org/jira/browse/SOLR-5669 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 4.4 >Reporter: Dorin Oltean >Priority: Minor > > When I do the following query: > /select?q=\ujb > I get > {quote} > "org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape > sequence: j", > {quote} > To make it work i have to put in fornt of the query nother '\' > {noformat}\\ujb{noformat} > wich in fact leads to a different query in solr. > I use edismax qparser. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9147) Move the stored fields index off-heap
Adrien Grand created LUCENE-9147: Summary: Move the stored fields index off-heap Key: LUCENE-9147 URL: https://issues.apache.org/jira/browse/LUCENE-9147 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Now that the terms index is off-heap by default, it's almost embarrassing that many indices spend most of their memory usage on the stored fields index or the term vectors index, which are much less performance-sensitive than the terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1179: LUCENE-9147: Move the stored fields index off-heap.
jpountz opened a new pull request #1179: LUCENE-9147: Move the stored fields index off-heap. URL: https://github.com/apache/lucene-solr/pull/1179 This replaces the index of stored fields and term vectors with two `DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number of values to write up-front, so incoming doc IDs and file pointers are buffered on disk using temporary files that never get fsynced, but have index headers and footers to make sure any corruption in these files wouldn't propagate to the index. `DirectMonotonicReader` gets a specialized `binarySearch` implementation that leverages the metadata in order to avoid going to the IndexInput as often as possible. Actually in the common case, it would only go to a single sub `DirectReader` which, combined with the size of blocks of 1k values, helps bound the number of page faults to 2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017689#comment-17017689 ] Cao Manh Dat edited comment on SOLR-12859 at 1/17/20 9:31 AM: -- To be honest, I'm never fully understand the current authentication framework of Solr. When I did the HTTP/2 things, I basically convert the current interceptor of Apache HttpClient to an equivalent version. After spend sometime to look at the current code and the documentation. I'm guessing that {{isSolrThread()}} is a naive/workaround way to check whether the request is about to send to another node was actually sent by a Solr node or not? Let's look into this comment {quote} //if this is not running inside a Solr threadpool (as in testcases) // then no need to add any header {quote} above comment will make sense if we notice how the interceptors was added for Apache HttpClient {{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added into a static variable. This is ok if a JVM only host one node, but with test, a JVM will host several nodes, so there are several PKI interceptors will be added to that static variable. Moreoever every Apache HttpClient created by HttpClientUtil will share the same list of interceptors even with client created in test. So a client created by test, before sending a request will go through n generateToken() interceptors (n = number of Solr nodes). In that case how generateToken() can discriminate a request sent from a node with a request sent from a test method if all client will use a same list of interceptors? The naive solution was setting a flag called {{isSolrThread}} to distinguish these two case. In most of cases, a request sent by a node will be sent from a thread from a threadPool created by {{ExecutorUtil}}. So to make auth tests pass {{isServerPool.set(Boolean.TRUE);}} is set before calling any {{Runnable}}. With all of these context less review the mystery code again {code} SolrRequestInfo reqInfo = getRequestInfo(); String usr; if (reqInfo != null) { // 1.Author's idea: Ok, the thread is holding a request, if authentication is enabled, the req must hold a Principal Principal principal = reqInfo.getUserPrincipal(); if (principal == null) { // 2. Author's idea: the req did not pass authentication since Principal is not set, do not need to do anything here! // my comment: this is not true, SolrRequestInfo is also used as a garbage to put data into so many place rely on data inside SolrRequestInfo, the present of SolrRequestInfo does not mean that it comes from outside. return Optional.empty(); } else { usr = principal.getName(); } } else { if (!isSolrThread()) { // 3. Author's idea: so the req is not sent inside a thread created by ExecutorUtil, it must come from test code or outside world // my comment: it is not true, since in {{DocExpirationUpdateProcessorFactory}} a {{ScheduledThreadPoolExecutor}} was used instead of a threadPool created by ExecutorUtil return Optional.empty(); } // 4. Author's idea: if the req is sent by ExecutorUtil, it must come out from this node. usr = "$"; //special name to denote the user is the node itself } {code} But with new HTTP/2 client, interceptor is added to each client object, so there are no single static variable here -> no sharing interceptors between clients of nodes and clients of test -> if the interceptor's code is called it must be sent from a node. So the mystery block can be changed to for the interceptor of HTTP/2 client {code} SolrRequestInfo reqInfo = getRequestInfo(); String usr = NODE_IS_USER; if (reqInfo != null && reqInfo.getUserPrincipal() != null) usr = reqInfo.getUserPrincipal().getName() {code} was (Author: caomanhdat): To be honest, I'm never fully understand the current authentication framework of Solr. When I did the HTTP/2 things, I basically convert the current interceptor of Apache HttpClient to an equivalent version. After spend sometime to look at the current code and the documentation. I'm guessing that {{isSolrThread()}} is a naive/workaround way to check whether the request is about to send to another node was actually sent by a Solr node or not? Let's look into this comment {quote} //if this is not running inside a Solr threadpool (as in testcases) // then no need to add any header {quote} above comment will make sense if we notice how the interceptors was added for Apache HttpClient {{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added into a static variable. This is ok if a JVM only host one node, but with test, a JVM will host several nodes, so there are several PKI interceptors will be added to that static variable. Moreoever every Apache HttpClient created
[jira] [Comment Edited] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017689#comment-17017689 ] Cao Manh Dat edited comment on SOLR-12859 at 1/17/20 9:31 AM: -- To be honest, I'm never fully understand the current authentication framework of Solr. When I did the HTTP/2 things, I basically convert the current interceptor of Apache HttpClient to an equivalent version. After spend sometime to look at the current code and the documentation. I'm guessing that {{isSolrThread()}} is a naive/workaround way to check whether the request is about to send to another node was actually sent by a Solr node or not? Let's look into this comment {quote} //if this is not running inside a Solr threadpool (as in testcases) // then no need to add any header {quote} above comment will make sense if we notice how the interceptors was added for Apache HttpClient {{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added into a static variable. This is ok if a JVM only host one node, but with test, a JVM will host several nodes, so there are several PKI interceptors will be added to that static variable. Moreoever every Apache HttpClient created by HttpClientUtil will share the same list of interceptors even with client created in test. So a client created by test, before sending a request will go through n generateToken() interceptors. In that case how generateToken() can discriminate a request sent from a node with a request sent from a test method if all client will use a same list of interceptors? The naive solution was setting a flag called {{isSolrThread}} to distinguish these two case. In most of cases, a request sent by a node will be sent from a thread from a threadPool created by {{ExecutorUtil}}. So to make auth tests pass {{isServerPool.set(Boolean.TRUE);}} is set before calling any {{Runnable}}. With all of these context less review the mystery code again {code} SolrRequestInfo reqInfo = getRequestInfo(); String usr; if (reqInfo != null) { // 1.Author's idea: Ok, the thread is holding a request, if authentication is enabled, the req must hold a Principal Principal principal = reqInfo.getUserPrincipal(); if (principal == null) { // 2. Author's idea: the req did not pass authentication since Principal is not set, do not need to do anything here! // my comment: this is not true, SolrRequestInfo is also used as a garbage to put data into so many place rely on data inside SolrRequestInfo, the present of SolrRequestInfo does not mean that it comes from outside. return Optional.empty(); } else { usr = principal.getName(); } } else { if (!isSolrThread()) { // 3. Author's idea: so the req is not sent inside a thread created by ExecutorUtil, it must come from test code or outside world // my comment: it is not true, since in {{DocExpirationUpdateProcessorFactory}} a {{ScheduledThreadPoolExecutor}} was used instead of a threadPool created by ExecutorUtil return Optional.empty(); } // 4. Author's idea: if the req is sent by ExecutorUtil, it must come out from this node. usr = "$"; //special name to denote the user is the node itself } {code} But with new HTTP/2 client, interceptor is added to each client object, so there are no single static variable here -> no sharing interceptors between clients of nodes and clients of test -> if the interceptor's code is called it must be sent from a node. So the mystery block can be changed to for the interceptor of HTTP/2 client {code} SolrRequestInfo reqInfo = getRequestInfo(); String usr = NODE_IS_USER; if (reqInfo != null && reqInfo.getUserPrincipal() != null) usr = reqInfo.getUserPrincipal().getName() {code} was (Author: caomanhdat): To be honest, I'm never fully understand the current authentication framework of Solr. When I did the HTTP/2 things, I basically convert the current interceptor of Apache HttpClient to an equivalent version. After spend sometime to look at the current code and the documentation. I'm guessing that {{isSolrThread()}} is a naive/workaround way to check whether the request is about to send to another node was actually sent by a Solr node or not? Let's look into this comment {quote} //if this is not running inside a Solr threadpool (as in testcases) // then no need to add any header {quote} above comment will make sense if we notice how the interceptors was added for Apache HttpClient {{HttpClientUtil.addRequestInterceptor(interceptor)}} -> interceptor is added into a static variable. This is ok if a JVM only host one node, but with test, a JVM will host several nodes, so there are several PKI interceptors will be added to that static variable. Moreoever every Apache HttpClient created by HttpClientUtil will sha
[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017840#comment-17017840 ] Cao Manh Dat commented on SOLR-12859: - After having a chat with [~shalin], we kinda think that the Hoss's initial approach is more valid than mine because * it makes less significant change to the code base. * even {{DefaultSolrThreadFactory}} belongs to solr-core, we can't enforce tests to not using that. > DocExpirationUpdateProcessorFactory does not work with BasicAuth > > > Key: SOLR-12859 > URL: https://issues.apache.org/jira/browse/SOLR-12859 > Project: Solr > Issue Type: Bug >Affects Versions: 7.5 >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12859.patch > > > I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( > DocExpirationUpdateProcessorFactory ) to auto-delete documents. > > Turns out it doesn't work when Basic Auth is enabled. I get the following > stacktrace from the logs > {code:java} > 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [ ] > o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic > deletion of expired docs: Async exception during distributed update: Error > from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: > require authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > Async exception during distributed update: Error from server at > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require > authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0
[jira] [Commented] (LUCENE-9139) TestXYMultiPolygonShapeQueries test failures
[ https://issues.apache.org/jira/browse/LUCENE-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017853#comment-17017853 ] Adrien Grand commented on LUCENE-9139: -- The subtraction by-ay is indeed not accurate already in spite of the promotion from floats to doubles since their exponents differ by more than the number of mantissa bits of a double. And things might get worse with the multiplications. I wonder if your proposal would actually address the problem though, the problem is not that much the absolute values of the coordinates, but rather their relative values. For instance I believe you could have the same issue if we had some coordinates that are close but not equal to zero? I haven't looked at the test, but how does it know that these lines don't intersect, is it using better logic? > TestXYMultiPolygonShapeQueries test failures > > > Key: LUCENE-9139 > URL: https://issues.apache.org/jira/browse/LUCENE-9139 > Project: Lucene - Core > Issue Type: Test >Reporter: Ignacio Vera >Priority: Major > > We recently have two failures on CI from the test method > TestXYMultiPolygonShapeQueries. The reproduction lines are: > > {code:java} > ant test -Dtestcase=TestXYMultiPolygonShapeQueries > -Dtests.method=testRandomMedium -Dtests.seed=F1E142C2FBB612AF > -Dtests.multiplier=3 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=el -Dtests.timezone=EST5EDT -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII{code} > {code:java} > ant test -Dtestcase=TestXYMultiPolygonShapeQueries > -Dtests.method=testRandomMedium -Dtests.seed=363603A0428EC788 > -Dtests.multiplier=3 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=sv-SE -Dtests.timezone=America/Yakutat -Dtests.asserts=true > -Dtests.file.encoding=UTF-8{code} > > I dug into the failures and there seem to be due to numerical errors in the > GeoUtils.orient method. The method is detecting intersections of two very > long lines when it shouldn't. For example: > Line 1: > {code:java} > double ax = 3.31439550712E38; > double ay = -1.4151510014141656E37; > double bx = 3.4028234663852886E38; > double by = 9.641030236797581E20;{code} > Line 2: > {code:java} > double cx = 3.4028234663852886E38; > double cy = -0.0; > double dx = 3.4028234663852886E38; > double dy = -2.7386422951137726E38;{code} > My proposal to prevent those numerical errors is to modify the shape > generator to prevent creating shapes that expands more than half the float > space. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14090) Schema API - Delete copy field not working when the field contains an underscore
[ https://issues.apache.org/jira/browse/SOLR-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017863#comment-17017863 ] Frank Iversen commented on SOLR-14090: -- [~sarowe] Hi Steven Do I need to add any further information on this ticket, or is it enough? > Schema API - Delete copy field not working when the field contains an > underscore > > > Key: SOLR-14090 > URL: https://issues.apache.org/jira/browse/SOLR-14090 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: v2 API >Affects Versions: 8.3.1 >Reporter: Frank Iversen >Priority: Major > Attachments: image-2019-12-16-08-54-46-719.png > > > Copy field delete statements for the Schema API are not working when the > source field contains an underscore (havent tested this for "dest") . If the > underscore is removed deletion is working as intended. Another user of Solr > seems to have had the same problem and was asked to create a Jira ticket. I > can however not find this ticket anywhere hence this ticket. > [https://lucene.472066.n3.nabble.com/Error-deleting-copy-field-td4393097.html > |https://lucene.472066.n3.nabble.com/Error-deleting-copy-field-td4393097.html] > I upgraded to the newest version and the problem still persists. > It should be simple to reproduce. Just include and underscore in a copy field > statement in your schema file and try to delete the copy field using the api, > as seen below: > !image-2019-12-16-08-54-46-719.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9138) Behaviour of concurrent calls to IndexInput#clone is unclear
[ https://issues.apache.org/jira/browse/LUCENE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017864#comment-17017864 ] Adrien Grand commented on LUCENE-9138: -- Your analysis sounds right to me. It looks we don't have similar warnings on slice() because it doesn't need the current position of the input unlike clone(). > Behaviour of concurrent calls to IndexInput#clone is unclear > > > Key: LUCENE-9138 > URL: https://issues.apache.org/jira/browse/LUCENE-9138 > Project: Lucene - Core > Issue Type: Improvement > Components: core/store >Affects Versions: 8.4 >Reporter: David Turner >Priority: Minor > > I think this is a documentation issue, rather than anything actually wrong, > but need expert guidance to propose a fix. > The Javadocs for {{IndexInput#clone}} warn that it is not thread safe: > * This method is NOT thread safe, so if the current \{@code IndexInput} > * is being used by one thread while \{@code clone} is called by another, > * disaster could strike. > */ > @Override > public IndexInput clone() { > > However, there are places where {{clone()}} may be called concurrently. For > instance I believe {{SegmentReader#getFieldsReader}} clones an {{IndexInput}} > and requires no extra synchronization. I think this comment is supposed to > mean that you should not {{clone()}} an {{IndexInput}} while you're _reading > or seeking from it_ concurrently, but the precise guarantees aren't totally > clear. > > Furthermore there's no mention of the threadsafety of {{slice()}} and there > seem to be similar concurrent usages of it in e.g. > {{Lucene80DocValuesProducer}}. Does this have the same guarantees as > {{clone()}}? > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9138) Behaviour of concurrent calls to IndexInput#clone is unclear
[ https://issues.apache.org/jira/browse/LUCENE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017865#comment-17017865 ] Adrien Grand commented on LUCENE-9138: -- Would you like to open a pull request that clarifies this documentation? > Behaviour of concurrent calls to IndexInput#clone is unclear > > > Key: LUCENE-9138 > URL: https://issues.apache.org/jira/browse/LUCENE-9138 > Project: Lucene - Core > Issue Type: Improvement > Components: core/store >Affects Versions: 8.4 >Reporter: David Turner >Priority: Minor > > I think this is a documentation issue, rather than anything actually wrong, > but need expert guidance to propose a fix. > The Javadocs for {{IndexInput#clone}} warn that it is not thread safe: > * This method is NOT thread safe, so if the current \{@code IndexInput} > * is being used by one thread while \{@code clone} is called by another, > * disaster could strike. > */ > @Override > public IndexInput clone() { > > However, there are places where {{clone()}} may be called concurrently. For > instance I believe {{SegmentReader#getFieldsReader}} clones an {{IndexInput}} > and requires no extra synchronization. I think this comment is supposed to > mean that you should not {{clone()}} an {{IndexInput}} while you're _reading > or seeking from it_ concurrently, but the precise guarantees aren't totally > clear. > > Furthermore there's no mention of the threadsafety of {{slice()}} and there > seem to be similar concurrent usages of it in e.g. > {{Lucene80DocValuesProducer}}. Does this have the same guarantees as > {{clone()}}? > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9137) Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html
[ https://issues.apache.org/jira/browse/LUCENE-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9137. -- Resolution: Fixed I just pushed a fix, it might take a couple minutes to be applied. Thanks for reporting [~sebb]! cc [~ichattopadhyaya] > Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html > --- > > Key: LUCENE-9137 > URL: https://issues.apache.org/jira/browse/LUCENE-9137 > Project: Lucene - Core > Issue Type: Bug > Environment: Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html >Reporter: Sebb >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8615) Can LatLonShape's tessellator create more search-efficient triangles?
[ https://issues.apache.org/jira/browse/LUCENE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017875#comment-17017875 ] Adrien Grand commented on LUCENE-8615: -- This sounds like an interesting idea! > Can LatLonShape's tessellator create more search-efficient triangles? > - > > Key: LUCENE-8615 > URL: https://issues.apache.org/jira/browse/LUCENE-8615 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: 2-tessellations.png, re-tessellate-triangle.png, > screenshot-1.png > > > The triangular mesh produced by LatLonShape's Tessellator creates reasonable > numbers of triangles, which is helpful for indexing speed. However I'm > wondering that there are conditions when it might be beneficial to run > tessellation slightly differently in order to create triangles that are more > search-friendly. Given that we only index the minimum bounding rectangle for > each triangle, we always check for intersection between the query and the > triangle if the query intersects with the MBR of the triangle. So the smaller > the area of the triangle compared to its MBR, the higher the likeliness to > have false positive when querying. > For instance see the following shape, there are two ways that it can be > tessellated into two triangles. LatLonShape's Tessellator is going to return > either of them depending on which point is listed first in the polygon. Yet > the first one is more efficient that the second one: with the second one, > both triangles have roughly the same MBR (which is also the MBR of the > polygon), so both triangles will need to be checked all the time whenever the > query intersects with this shared MBR. On the other hand, with the first way, > both MBRs are smaller and don't overlap, which makes it more likely that only > one triangle needs to be checked at query time. > !2-tessellations.png! > Another example is the following polygon. It can be tessellated into a single > triangle. Yet at times it might be a better idea create more triangles so > that the overall area of MBRs is smaller and queries are less likely to run > into false positives. > !re-tessellate-triangle.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9148) Move the BKD index to its own file.
Adrien Grand created LUCENE-9148: Summary: Move the BKD index to its own file. Key: LUCENE-9148 URL: https://issues.apache.org/jira/browse/LUCENE-9148 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Lucene60PointsWriter stores both inner nodes and leaf nodes in the same file, interleaved. For instance if you have two fields, you would have {{}}. It's not ideal since leaves and inner nodes have quite different access patterns. Should we split this into two files? In the case when the BKD index is off-heap, this would also help force it into RAM with {{MMapDirectory#setPreload}}. Note that Lucene60PointsFormat already has a file that it calls "index" but it's really only about mapping fields to file pointers in the other file and not what I'm discussing here. But we could possibly store the BKD indices in this existing file if we want to avoid creating a new one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9139) TestXYMultiPolygonShapeQueries test failures
[ https://issues.apache.org/jira/browse/LUCENE-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017896#comment-17017896 ] Ignacio Vera commented on LUCENE-9139: -- You are totally right, my proposal does not solve the issue. What I have done to check that lines do not intersects is to change the intersect logic to use BigDecimals instead of doubles. In that case lines do not intersect (and test don't fail for those seeds). > TestXYMultiPolygonShapeQueries test failures > > > Key: LUCENE-9139 > URL: https://issues.apache.org/jira/browse/LUCENE-9139 > Project: Lucene - Core > Issue Type: Test >Reporter: Ignacio Vera >Priority: Major > > We recently have two failures on CI from the test method > TestXYMultiPolygonShapeQueries. The reproduction lines are: > > {code:java} > ant test -Dtestcase=TestXYMultiPolygonShapeQueries > -Dtests.method=testRandomMedium -Dtests.seed=F1E142C2FBB612AF > -Dtests.multiplier=3 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=el -Dtests.timezone=EST5EDT -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII{code} > {code:java} > ant test -Dtestcase=TestXYMultiPolygonShapeQueries > -Dtests.method=testRandomMedium -Dtests.seed=363603A0428EC788 > -Dtests.multiplier=3 -Dtests.slow=true -Dtests.badapples=true > -Dtests.locale=sv-SE -Dtests.timezone=America/Yakutat -Dtests.asserts=true > -Dtests.file.encoding=UTF-8{code} > > I dug into the failures and there seem to be due to numerical errors in the > GeoUtils.orient method. The method is detecting intersections of two very > long lines when it shouldn't. For example: > Line 1: > {code:java} > double ax = 3.31439550712E38; > double ay = -1.4151510014141656E37; > double bx = 3.4028234663852886E38; > double by = 9.641030236797581E20;{code} > Line 2: > {code:java} > double cx = 3.4028234663852886E38; > double cy = -0.0; > double dx = 3.4028234663852886E38; > double dy = -2.7386422951137726E38;{code} > My proposal to prevent those numerical errors is to modify the shape > generator to prevent creating shapes that expands more than half the float > space. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017895#comment-17017895 ] Chen Zhixiang commented on LUCENE-9130: --- Lucene SloppyPhraseMatcher.java public boolean nextMatch() throws IOException { if (!positioned) { return false; } PhrasePositions pp = pq.pop(); assert pp != null; // if the pq is not full, then positioned == false captureLead(pp); matchLength = end - pp.position; int next = pq.top().position; while (advancePP(pp)) { if (hasRpts && !advanceRpts(pp)) { break; // pps exhausted } if (pp.position > next) { // done minimizing current match-length pq.add(pp); if (matchLength <= slop) { return true; } pp = pq.pop(); next = pq.top().position; assert pp != null; // if the pq is not full, then positioned == false matchLength = end - pp.position; } else { int matchLength2 = end - pp.position; if (matchLength2 < matchLength) { matchLength = matchLength2; } } captureLead(pp); } positioned = false; return matchLength <= slop; } Condition while (advancePP(pp)) doesn't match, and directly skip, matchLength=3, slop=2, so return false. I believe here exists a bug, but i cannot figure out why. > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017898#comment-17017898 ] Chen Zhixiang commented on LUCENE-9130: --- PhraseQuery.java: public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { return new PhraseWeight(this, field, searcher, scoreMode) { private transient TermStates states[]; @Override protected Similarity.SimScorer getStats(IndexSearcher searcher) throws IOException { final int[] positions = PhraseQuery.this.getPositions(); Here positions' values are 0,1,2,3,4. Why is it inited so? Is there any document for PhraseQuery's sloppy match algorithm? > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9137) Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html
[ https://issues.apache.org/jira/browse/LUCENE-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017902#comment-17017902 ] Ishan Chattopadhyaya edited comment on LUCENE-9137 at 1/17/20 10:46 AM: Oh, I'm sorry that this happened. Thanks a lot, Sebb & Adrien. I'll take a look tomorrow as to how I missed it. was (Author: ichattopadhyaya): Oh, I'm sorry that this happened. Thanks a lot, Adrien. I'll take a look tomorrow as to how I missed it. > Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html > --- > > Key: LUCENE-9137 > URL: https://issues.apache.org/jira/browse/LUCENE-9137 > Project: Lucene - Core > Issue Type: Bug > Environment: Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html >Reporter: Sebb >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9137) Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html
[ https://issues.apache.org/jira/browse/LUCENE-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017902#comment-17017902 ] Ishan Chattopadhyaya commented on LUCENE-9137: -- Oh, I'm sorry that this happened. Thanks a lot, Adrien. I'll take a look tomorrow as to how I missed it. > Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html > --- > > Key: LUCENE-9137 > URL: https://issues.apache.org/jira/browse/LUCENE-9137 > Project: Lucene - Core > Issue Type: Bug > Environment: Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html >Reporter: Sebb >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9137) Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html
[ https://issues.apache.org/jira/browse/LUCENE-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017907#comment-17017907 ] Ishan Chattopadhyaya commented on LUCENE-9137: -- Seems like sloppy grep search/replace work on my end. Instead of %s/8\.4\.0/8.4.1/g, i must've done %s/8.4.0/8.4.1/g which also replaced the underscores with dots. > Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html > --- > > Key: LUCENE-9137 > URL: https://issues.apache.org/jira/browse/LUCENE-9137 > Project: Lucene - Core > Issue Type: Bug > Environment: Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html >Reporter: Sebb >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017914#comment-17017914 ] Chen Zhixiang commented on LUCENE-9130: --- PhraseQuery.Builder: public Builder add(Term term) { return add(term, positions.isEmpty() ? 0 : 1 + positions.get(positions.size() - 1)); } /** * Adds a term to the end of the query phrase. * The relative position of the term within the phrase is specified explicitly, but must be greater than * or equal to that of the previously added term. * A greater position allows phrases with gaps (e.g. in connection with stopwords). * If the position is equal, you most likely should be using * \{@link MultiPhraseQuery} instead which only requires one term at each position to match; this class requires * all of them. */ public Builder add(Term term, int position) { ... I used the prev api add(term), but there is another api which can specify an extra position argument. Here in this case, maybe i should pass in positions 0 2 4 6 7, which can be got from analyzing of raw query text... > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017917#comment-17017917 ] Chen Zhixiang commented on LUCENE-9130: --- After i change the api call to add(term, position), and pass 0,2,4,6,7 which is the same as analyzed in indexing (because they are the same text string), it matches! Now i'm confused: what's the relation between Term's position value and PhraseQuery's slop parameter? > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhixiang resolved LUCENE-9130. --- Resolution: Not A Bug > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] balaji-s opened a new pull request #1180: Update solr-tutorial.adoc
balaji-s opened a new pull request #1180: Update solr-tutorial.adoc URL: https://github.com/apache/lucene-solr/pull/1180 When executing the tutorial in windows 10 platform line 664 throws an error "& was unexpected at this time.", so adding an escape characters for "&" and "|" # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] balaji-s closed pull request #1180: Update solr-tutorial.adoc
balaji-s closed pull request #1180: Update solr-tutorial.adoc URL: https://github.com/apache/lucene-solr/pull/1180 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017956#comment-17017956 ] Gus Heck commented on LUCENE-9130: -- This is an issue tracker not a support portal. It's for when you are *certain* of a specific behavior in contravention of published documentation, or clear errors (like unexpected stack traces). When you are *confused* or don't know something you will get better responses using the mailing list. > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz merged pull request #1158: LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`.
jpountz merged pull request #1158: LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`. URL: https://github.com/apache/lucene-solr/pull/1158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata
[ https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017968#comment-17017968 ] ASF subversion and git services commented on LUCENE-9116: - Commit fb3ca8d000d6e5203a57625942b754f1d5757fac in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fb3ca8d ] LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`. (#1149) (#1158) All the metadata can be directly encoded in the `DataOutput`. > Simplify postings API by removing long[] metadata > - > > Key: LUCENE-9116 > URL: https://issues.apache.org/jira/browse/LUCENE-9116 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > The postings API allows to store metadata about a term either in a long[] or > in a byte[]. This is unnecessary as all information could be encoded in the > byte[], which is what most codecs do in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14193) Update tutorial.adoc(line no:664) so that command executes in windows enviroment
balaji sundaram created SOLR-14193: -- Summary: Update tutorial.adoc(line no:664) so that command executes in windows enviroment Key: SOLR-14193 URL: https://issues.apache.org/jira/browse/SOLR-14193 Project: Solr Issue Type: Bug Components: documentation Affects Versions: 8.4 Reporter: balaji sundaram {{When executing the following command in windows 10 "java -jar -Dc=films -Dparams=f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=| -Dauto example\exampledocs\post.jar example\films\*.csv", it throws error "& was unexpected at this time."}} Fix: the command should escape "&" and "|" symbol{{}} {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9116) Simplify postings API by removing long[] metadata
[ https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9116. -- Fix Version/s: 8.5 Resolution: Fixed > Simplify postings API by removing long[] metadata > - > > Key: LUCENE-9116 > URL: https://issues.apache.org/jira/browse/LUCENE-9116 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 8.5 > > Time Spent: 50m > Remaining Estimate: 0h > > The postings API allows to store metadata about a term either in a long[] or > in a byte[]. This is unnecessary as all information could be encoded in the > byte[], which is what most codecs do in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017089#comment-17017089 ] Bruno Roustant edited comment on LUCENE-9125 at 1/17/20 12:44 PM: -- Here it is: (wikimedium10K) Task QPS trunk StdDev QPS patch StdDev Pct diff HighIntervalsOrdered 463.57 (13.2%) 443.74 (19.6%) -4.3% ( -32% - 32%) Respell 382.45 (14.7%) 374.88 (21.3%) -2.0% ( -33% - 39%) OrHighLow 1746.37 (6.8%) 1737.44 (7.0%) -0.5% ( -13% - 14%) AndHighLow 4208.34 (6.1%) 4186.85 (5.8%) -0.5% ( -11% - 12%) HighTerm 5697.99 (7.5%) 5673.66 (5.1%) -0.4% ( -12% - 13%) BrowseMonthTaxoFacets 4679.40 (3.7%) 4664.60 (2.6%) -0.3% ( -6% - 6%) Prefix3 442.09 (17.3%) 441.77 (16.6%) -0.1% ( -28% - 40%) BrowseDateTaxoFacets 4104.50 (3.4%) 4102.05 (2.8%) -0.1% ( -6% - 6%) OrHighMed 681.54 (11.8%) 681.70 (10.6%) 0.0% ( -20% - 25%) AndHighHigh 978.85 (8.3%) 979.47 (9.9%) 0.1% ( -16% - 19%) BrowseDayOfYearTaxoFacets 3615.56 (2.8%) 3620.94 (2.4%) 0.1% ( -4% - 5%) MedTerm 5964.33 (5.7%) 5980.59 (5.8%) 0.3% ( -10% - 12%) LowTerm 6555.56 (4.8%) 6576.49 (5.3%) 0.3% ( -9% - 10%) Fuzzy2 73.24 (16.4%) 73.55 (16.1%) 0.4% ( -27% - 39%) Fuzzy1 887.86 (5.3%) 892.14 (2.7%) 0.5% ( -7% - 8%) HighPhrase 901.57 (5.7%) 905.94 (6.6%) 0.5% ( -11% - 13%) OrHighHigh 741.70 (11.5%) 745.44 (8.4%) 0.5% ( -17% - 23%) BrowseMonthSSDVFacets 3462.54 (4.2%) 3480.43 (3.0%) 0.5% ( -6% - 8%) HighSloppyPhrase 617.51 (6.9%) 620.74 (7.8%) 0.5% ( -13% - 16%) PKLookup 275.55 (5.2%) 277.01 (5.0%) 0.5% ( -9% - 11%) MedSloppyPhrase 1843.18 (4.7%) 1853.23 (3.8%) 0.5% ( -7% - 9%) LowSloppyPhrase 2085.07 (4.3%) 2098.25 (3.9%) 0.6% ( -7% - 9%) BrowseDayOfYearSSDVFacets 2985.60 (2.5%) 3009.10 (2.6%) 0.8% ( -4% - 6%) AndHighMed 1712.96 (5.8%) 1729.47 (4.5%) 1.0% ( -8% - 12%) LowSpanNear 2006.25 (6.2%) 2029.83 (6.0%) 1.2% ( -10% - 14%) MedSpanNear 814.10 (12.3%) 823.97 (10.1%) 1.2% ( -18% - 26%) HighSpanNear 593.47 (10.3%) 600.77 (10.6%) 1.2% ( -17% - 24%) HighTermDayOfYearSort 1035.41 (7.8%) 1050.76 (6.5%) 1.5% ( -11% - 17%) Wildcard 772.44 (10.7%) 791.42 (12.7%) 2.5% ( -18% - 28%) MedPhrase 806.70 (8.7%) 827.27 (8.1%) 2.5% ( -13% - 21%) LowPhrase 805.91 (7.9%) 831.26 (5.3%) 3.1% ( -9% - 17%) IntNRQ 1898.15 (8.1%) 1967.24 (9.8%) 3.6% ( -13% - 23%) HighTermMonthSort 3150.77 (12.1%) 3300.42 (13.5%) 4.7% ( -18% - 34%) was (Author: broustant): Here it is: Task QPS trunk StdDev QPS patch StdDev Pct diff HighIntervalsOrdered 463.57 (13.2%) 443.74 (19.6%) -4.3% ( -32% - 32%) Respell 382.45 (14.7%) 374.88 (21.3%) -2.0% ( -33% - 39%) OrHighLow 1746.37 (6.8%) 1737.44 (7.0%) -0.5% ( -13% - 14%) AndHighLow 4208.34 (6.1%) 4186.85 (5.8%) -0.5% ( -11% - 12%) HighTerm 5697.99 (7.5%) 5673.66 (5.1%) -0.4% ( -12% - 13%) BrowseMonthTaxoFacets 4679.40 (3.7%) 4664.60 (2.6%) -0.3% ( -6% - 6%) Prefix3 442.09 (17.3%) 441.77 (16.6%) -0.1% ( -28% - 40%) BrowseDateTaxoFacets 4104.50 (3.4%) 4102.05 (2.8%) -0.1% ( -6% - 6%) OrHighMed 681.54 (11.8%) 681.70 (10.6%) 0.0% ( -20% - 25%) AndHighHigh 978.85 (8.3%) 979.47 (9.9%) 0.1% ( -16% - 19%) BrowseDayOfYearTaxoFacets 3615.56 (2.8%) 3620.94 (2.4%) 0.1% ( -4% - 5%) MedTerm 5964.33 (5.7%) 5980.59
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017978#comment-17017978 ] Bruno Roustant commented on LUCENE-9125: In the benchmark above I used by error wikimedium10k (I edited to mention that). Here is the benchmark for wikimediumall: Task QPS trunk StdDev QPS patch StdDev Pct diff OrHighNotHigh 769.84 (4.8%) 756.84 (5.0%) -1.7% ( -10% - 8%) OrNotHighLow 664.03 (4.2%) 653.64 (3.4%) -1.6% ( -8% - 6%) OrNotHighMed 574.56 (3.0%) 566.90 (2.5%) -1.3% ( -6% - 4%) MedTerm 1373.80 (3.9%) 1359.30 (5.1%) -1.1% ( -9% - 8%) AndHighHigh 19.84 (3.6%) 19.67 (2.9%) -0.9% ( -7% - 5%) AndHighLow 474.49 (2.9%) 470.36 (3.6%) -0.9% ( -7% - 5%) Fuzzy1 69.27 (10.7%) 68.75 (11.0%) -0.7% ( -20% - 23%) OrNotHighHigh 569.30 (3.4%) 565.26 (5.0%) -0.7% ( -8% - 7%) MedPhrase 36.97 (2.4%) 36.76 (2.7%) -0.6% ( -5% - 4%) HighTerm 1133.65 (4.2%) 1128.30 (4.3%) -0.5% ( -8% - 8%) OrHighLow 227.08 (2.9%) 226.24 (3.3%) -0.4% ( -6% - 6%) OrHighHigh 24.17 (2.6%) 24.08 (2.4%) -0.4% ( -5% - 4%) Prefix3 25.30 (3.8%) 25.22 (3.7%) -0.3% ( -7% - 7%) OrHighMed 48.26 (3.1%) 48.11 (3.1%) -0.3% ( -6% - 6%) LowTerm 1087.75 (3.4%) 1084.44 (3.3%) -0.3% ( -6% - 6%) AndHighMed 69.62 (3.9%) 69.44 (4.1%) -0.3% ( -7% - 7%) HighSloppyPhrase 15.11 (2.6%) 15.08 (2.6%) -0.2% ( -5% - 5%) Respell 43.34 (2.0%) 43.28 (2.3%) -0.1% ( -4% - 4%) OrHighNotLow 666.79 (3.4%) 665.98 (4.9%) -0.1% ( -8% - 8%) HighSpanNear 8.21 (1.8%) 8.20 (2.0%) -0.1% ( -3% - 3%) HighIntervalsOrdered 14.46 (1.2%) 14.45 (1.4%) -0.1% ( -2% - 2%) HighPhrase 333.99 (3.3%) 333.74 (3.9%) -0.1% ( -7% - 7%) MedSpanNear 12.08 (1.8%) 12.07 (2.0%) -0.1% ( -3% - 3%) LowPhrase 481.10 (2.5%) 481.14 (3.4%) 0.0% ( -5% - 6%) MedSloppyPhrase 6.78 (2.9%) 6.78 (2.9%) 0.0% ( -5% - 6%) PKLookup 157.80 (2.5%) 157.83 (2.5%) 0.0% ( -4% - 5%) LowSpanNear 21.48 (2.1%) 21.48 (2.3%) 0.0% ( -4% - 4%) OrHighNotMed 590.59 (3.9%) 591.21 (3.8%) 0.1% ( -7% - 8%) BrowseMonthTaxoFacets 1.06 (1.1%) 1.06 (0.9%) 0.1% ( -1% - 2%) LowSloppyPhrase 40.57 (2.1%) 40.63 (2.2%) 0.1% ( -4% - 4%) IntNRQ 124.31 (4.2%) 124.53 (4.9%) 0.2% ( -8% - 9%) BrowseDateTaxoFacets 1.00 (1.0%) 1.00 (0.7%) 0.2% ( -1% - 1%) BrowseDayOfYearTaxoFacets 0.99 (0.9%) 1.00 (0.7%) 0.2% ( -1% - 1%) HighTermDayOfYearSort 18.57 (6.2%) 18.62 (6.0%) 0.3% ( -11% - 13%) BrowseMonthSSDVFacets 4.38 (1.0%) 4.40 (0.9%) 0.4% ( -1% - 2%) BrowseDayOfYearSSDVFacets 3.92 (0.7%) 3.94 (0.7%) 0.5% ( 0% - 1%) Wildcard 52.17 (4.0%) 52.47 (5.0%) 0.6% ( -8% - 9%) Fuzzy2 57.57 (9.5%) 58.32 (9.3%) 1.3% ( -16% - 22%) HighTermMonthSort 40.51 (14.2%) 41.47 (13.9%) 2.4% ( -22% - 35%) {quote}There's an option for lucene-util to format the output for JIRA {quote} Last time I used this option Jira interpreted some tags and the resulting display was not better than this basic one. {quote}Looking at the results you posted, the optimization seems fairly invisible {quote} Yes. The change optimizes the construction only of the CompiledAutomaton, so this is a tiny part of the fuzzy query execution. {quote}that's 4.7% of "noise" {quote} Yes, there is noise. I tried baseline vs baseline and got the same noise. Maybe with wikimediumall this time there is less noise. > Improve Automaton.step() with binary search and introduce Auto
[jira] [Created] (SOLR-14194) Allow Highlighting to work for indexes with uniqueKey that is not stored
Andrzej Wislowski created SOLR-14194: Summary: Allow Highlighting to work for indexes with uniqueKey that is not stored Key: SOLR-14194 URL: https://issues.apache.org/jira/browse/SOLR-14194 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: highlighter Affects Versions: master (9.0) Reporter: Andrzej Wislowski Fix For: master (9.0) Highlighting requires uniqueKey to be a stored field. I have changed Highlighter allow returning results on indexes with uniqueKey that is a not stored field, but saved as a docvalue type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018007#comment-17018007 ] Erick Erickson commented on LUCENE-9147: If you only knew how much of my time with clients is spent dealing with "how much memory should I allocate" ;). So while I don't have an opinion on the technical aspects, anything we can do to reduce heap requirements is welcome. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14194) Allow Highlighting to work for indexes with uniqueKey that is not stored
[ https://issues.apache.org/jira/browse/SOLR-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Wislowski updated SOLR-14194: - Attachment: SOLR-14194.patch Status: Open (was: Open) > Allow Highlighting to work for indexes with uniqueKey that is not stored > > > Key: SOLR-14194 > URL: https://issues.apache.org/jira/browse/SOLR-14194 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: master (9.0) >Reporter: Andrzej Wislowski >Priority: Minor > Fix For: master (9.0) > > Attachments: SOLR-14194.patch > > > Highlighting requires uniqueKey to be a stored field. I have changed > Highlighter allow returning results on indexes with uniqueKey that is a not > stored field, but saved as a docvalue type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018022#comment-17018022 ] Adrien Grand commented on LUCENE-9147: -- [~erickerickson] Yeah I have similar motivations, with many users who want to open terabytes of indices on rather small nodes. In my case the main heap user is usually the terms index of a primary/foreign key, so the ability to load the terms index off-heap addresses most of the problem. But since it should be an even less contentious move for stored fields and term vectors, I thought we should do it! :) > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14184) replace DirectUpdateHandler2.commitOnClose with (negated) TestInjection.skipIndexWriterCommitOnClose
[ https://issues.apache.org/jira/browse/SOLR-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018027#comment-17018027 ] ASF subversion and git services commented on SOLR-14184: Commit 5f2d7c4855987670489d68884c787e4cfb377fa9 in lucene-solr's branch refs/heads/gradle-master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f2d7c4 ] SOLR-14184: Internal 'test' variable DirectUpdateHandler2.commitOnClose has been removed and replaced with TestInjection.skipIndexWriterCommitOnClose > replace DirectUpdateHandler2.commitOnClose with (negated) > TestInjection.skipIndexWriterCommitOnClose > > > Key: SOLR-14184 > URL: https://issues.apache.org/jira/browse/SOLR-14184 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-14184.patch, SOLR-14184.patch > > > {code:java} > public static volatile boolean commitOnClose = true; // TODO: make this a > real config option or move it to TestInjection > {code} > Lots of tests muck with this (to simulate unclean shutdown and force tlog > replay on restart) but there's no garuntee that it is reset properly. > It should be replaced by logic in {{TestInjection}} that is correctly cleaned > up by {{TestInjection.reset()}} > > It's been replaced with the (negated) option > {{TestInjection.skipIndexWriterCommitOnClose}} which is automatically reset > to it's default value of {{false}} by {{TestInjection.reset()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs
[ https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018025#comment-17018025 ] ASF subversion and git services commented on SOLR-14130: Commit 35d8e3de6d5931bfd6cba3221cfd0dca7f97c1a1 in lucene-solr's branch refs/heads/gradle-master from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=35d8e3d ] SOLR-14130: Continue to improve log parsing logic > Add postlogs command line tool for indexing Solr logs > - > > Key: SOLR-14130 > URL: https://issues.apache.org/jira/browse/SOLR-14130 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 > PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at > 8.46.51 AM.png > > > This ticket adds a simple command line tool for posting Solr logs to a solr > index. The tool works with the out of the box Solr log format. Still a work > in progress but currently indexes: > * queries > * updates > * commits > * new searchers > * errors - including stack traces > Attached are some sample visualizations using Solr Streaming Expressions and > Math Expressions after the data has been loaded. The visualizations show: > time series, scatter plots, histograms and quantile plots, but really this is > just scratching the surface of the visualizations that can be done with the > Solr logs. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018026#comment-17018026 ] ASF subversion and git services commented on LUCENE-8369: - Commit 78655239c58a1ed72d6e015dd05a0b355c936999 in lucene-solr's branch refs/heads/gradle-master from Nicholas Knize [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7865523 ] LUCENE-8369: Remove obsolete spatial module > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9144) Error message on OneDimensionBKDWriter is wrong when adding too many points
[ https://issues.apache.org/jira/browse/LUCENE-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018024#comment-17018024 ] ASF subversion and git services commented on LUCENE-9144: - Commit eb13d5bc8b3b0497ce2aca3d99e37884dc54599a in lucene-solr's branch refs/heads/gradle-master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eb13d5b ] LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer. (#1178) > Error message on OneDimensionBKDWriter is wrong when adding too many points > --- > > Key: LUCENE-9144 > URL: https://issues.apache.org/jira/browse/LUCENE-9144 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.5 > > Time Spent: 20m > Remaining Estimate: 0h > > The error message for the 1D BKD writer when adding too many points is wrong > because: > 1) It uses pointCount (which is always 0 at that point) instead of valueCount > 2) It concatenate the numbers as a string instead of adding them. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018049#comment-17018049 ] Michael McCandless commented on LUCENE-9125: {quote}Here is the benchmark for wikimediumall: {quote} Thanks – these results look more realistic! Looks like mostly noise ... The Automaton queries only use the {{step}} API while constructing the {{RunAutomaton}} which is then used to (quickly) walk the transitions right? > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018052#comment-17018052 ] Kevin Watters commented on SOLR-13749: -- Hey [~gus] Dan Fox just did the backport. It's available here: [https://github.com/apache/lucene-solr/pull/1175] Was curious if you wouldn't mind giving it a merge? There were no code changes between master and 8x for this pull request. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{
[jira] [Commented] (SOLR-5669) queries containing \u return error: "Truncated unicode escape sequence."
[ https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018051#comment-17018051 ] Gus Heck commented on SOLR-5669: This is more an undocumented feature than a bug. That error message comes from SolrQueryParserBase: {code} /** Returns the numeric value of the hexadecimal character */ static final int hexToInt(char c) throws ParseException { if ('0' <= c && c <= '9') { return c - '0'; } else if ('a' <= c && c <= 'f'){ return c - 'a' + 10; } else if ('A' <= c && c <= 'F') { return c - 'A' + 10; } else { throw new ParseException("Non-hex character in Unicode escape sequence: " + c); } } {code} I don't find documentation of it in the ref guide however, so that could be added. For edismax one might also request an enhancement to not error on this, which would be consistent with the stated goal in the edismax docs of being tolerant of errors. This ticket however should probably only document the feature in the ref guide. (or point to said documentation if my quick search through the guide failed to reveal it) > queries containing \u return error: "Truncated unicode escape sequence." > - > > Key: SOLR-5669 > URL: https://issues.apache.org/jira/browse/SOLR-5669 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 4.4 >Reporter: Dorin Oltean >Priority: Minor > > When I do the following query: > /select?q=\ujb > I get > {quote} > "org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape > sequence: j", > {quote} > To make it work i have to put in fornt of the query nother '\' > {noformat}\\ujb{noformat} > wich in fact leads to a different query in solr. > I use edismax qparser. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018057#comment-17018057 ] Michael McCandless commented on LUCENE-9130: {quote}After i change the api call to add(term, position), and pass 0,2,4,6,7 which is the same as analyzed in indexing (because they are the same text string), it matches! {quote} If you use the same analyzer during query parsing as you used during indexing, this (setting the right positions for each term in the {{PhraseQuery}}) should have happened "for free". {quote}Now i'm confused: what's the relation between Term's position value and PhraseQuery's slop parameter? {quote} The {{slop}} parameter states how precisely the term positions in the document must match the positions in the query. A {{slop}} of 0 means the match must be identical, a {{slop}} of 1 means it can tolerate one term being in the wrong position in the document, etc. It's an edit-distance measure, at the term level. > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9130) Failed to match when create PhraseQuery with terms analyzed from long query text
[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018061#comment-17018061 ] Michael McCandless commented on LUCENE-9130: {quote}This is an issue tracker not a support portal. It's for when you are *certain* of a specific behavior in contravention of published documentation, or clear errors (like unexpected stack traces). When you are *confused* or don't know something you will get better responses using the mailing list. {quote} +1, but really [~mkhl] should have stated this when he originally resolved the issue as INVALID. > Failed to match when create PhraseQuery with terms analyzed from long query > text > > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Chen Zhixiang >Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018063#comment-17018063 ] Gus Heck commented on SOLR-13749: - Sure, I'll look at it tomorow > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}} > > {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} > {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}} > {{ }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1179: LUCENE-9147: Move the stored fields index off-heap.
mikemccand commented on a change in pull request #1179: LUCENE-9147: Move the stored fields index off-heap. URL: https://github.com/apache/lucene-solr/pull/1179#discussion_r367955319 ## File path: lucene/core/src/java/org/apache/lucene/util/packed/DirectMonotonicReader.java ## @@ -101,20 +104,99 @@ public static LongValues getInstance(Meta meta, RandomAccessInput data) throws I readers[i] = DirectReader.getInstance(data, meta.bpvs[i], meta.offsets[i]); } } -final int blockShift = meta.blockShift; - -final long[] mins = meta.mins; -final float[] avgs = meta.avgs; -return new LongValues() { - - @Override - public long get(long index) { -final int block = (int) (index >>> blockShift); -final long blockIndex = index & ((1 << blockShift) - 1); -final long delta = readers[block].get(blockIndex); -return mins[block] + (long) (avgs[block] * blockIndex) + delta; + +return new DirectMonotonicReader(meta.blockShift, readers, meta.mins, meta.avgs, meta.bpvs); + } + + private final int blockShift; + private final LongValues[] readers; + private final long[] mins; + private final float[] avgs; + private final byte[] bpvs; + private final int nonZeroBpvs; + + private DirectMonotonicReader(int blockShift, LongValues[] readers, long[] mins, float[] avgs, byte[] bpvs) { +this.blockShift = blockShift; +this.readers = readers; +this.mins = mins; +this.avgs = avgs; +this.bpvs = bpvs; +if (readers.length != mins.length || readers.length != avgs.length || readers.length != bpvs.length) { + throw new IllegalArgumentException(); +} +int nonZeroBpvs = 0; +for (byte b : bpvs) { + if (b != 0) { +nonZeroBpvs++; + } +} +this.nonZeroBpvs = nonZeroBpvs; + } + + @Override + public long get(long index) { +final int block = (int) (index >>> blockShift); +final long blockIndex = index & ((1 << blockShift) - 1); +final long delta = readers[block].get(blockIndex); +return mins[block] + (long) (avgs[block] * blockIndex) + delta; + } + + /** Get lower/upper bounds for the value at a given index without hitting the direct reader. */ + private long[] getBounds(long index) { +final int block = (int) (index >>> blockShift); Review comment: Do we know this incoming `long index` is small enough not to overflow `int` after right shift? Should we use `Math.toIntExact` instead to confirm? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018082#comment-17018082 ] David Smiley commented on LUCENE-9077: -- In general I suggest jumping between major versions on a single checkout/worktree. On my machine I have multiple "git worktree" for the major branches. If I want to go between say branch_8x and say perhaps the 8.4 release branch then I do it on that worktree and *not* master. It's just too disruptive to thinks like expected JDK, modules, IDE issues etc. > Gradle build > > > Key: LUCENE-9077 > URL: https://issues.apache.org/jira/browse/LUCENE-9077 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (9.0) > > Time Spent: 2.5h > Remaining Estimate: 0h > > This task focuses on providing gradle-based build equivalent for Lucene and > Solr (on master branch). See notes below on why this respin is needed. > The code lives on *gradle-master* branch. It is kept with sync with *master*. > Try running the following to see an overview of helper guides concerning > typical workflow, testing and ant-migration helpers: > gradlew :help > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. Once you have a > patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. > * (/) Apply forbiddenAPIs > * (/) Generate hardware-aware gradle defaults for parallelism (count of > workers and test JVMs). > * (/) Fail the build if --tests filter is applied and no tests execute > during the entire build (this allows for an empty set of filtered tests at > single project level). > * (/) Port other settings and randomizations from common-build.xml > * (/) Configure security policy/ sandboxing for tests. > * (/) test's console output on -Ptests.verbose=true > * (/) add a :helpDeps explanation to how the dependency system works > (palantir plugin, lockfile) and how to retrieve structured information about > current dependencies of a given module (in a tree-like output). > * (/) jar checksums, jar checksum computation and validation. This should be > done without intermediate folders (directly on dependency sets). > * (/) verify min. JVM version and exact gradle version on build startup to > minimize odd build side-effects > * (/) Repro-line for failed tests/ runs. > * (/) add a top-level README note about building with gradle (and the > required JVM). > * (/) add an equivalent of 'validate-source-patterns' > (check-source-patterns.groovy) to precommit. > * (/) add an equivalent of 'rat-sources' to precommit. > * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) > to precommit. > * (/) javadoc compilation > Hard-to-implement stuff already investigated: > * (/) (done) -*Printing console output of failed tests.* There doesn't seem > to be any way to do this in a reasonably efficient way. There are onOutput > listeners but they're slow to operate and solr tests emit *tons* of output so > it's an overkill.- > * (!) (LUCENE-9120) *Tests working with security-debug logs or other > JVM-early log output*. Gradle's test runner works by redirecting Java's > stdout/ syserr so this just won't work. Perhaps we can spin the ant-based > test runner for such corner-cases. > Of lesser importance: > * Add an equivalent of 'documentation-lint" to precommit. > * (/) add rendering of javadocs (gradlew javadoc) > * Attach javadocs to maven publications. > * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid > it'll be difficult to run it sensibly because gradle doesn't offer cwd > separation for the forked test runners. > * if you diff solr packaged distribution against ant-created distribution > there are minor differences in library versions and some JARs are excluded/ > moved around. I didn't try to force these as everything seems to work (tests, > etc.) – perhaps these differences should be fixed in the ant build instead. > * [EOE] identify and port various "regenerate" tasks from ant builds > (javacc, precompiled automata, etc.) > * Fill in POM details in gradle/defaults-maven.gradle so that they reflect > the previous content better (dependencies aside). > * Add any IDE integration layers that should be added (I use IntelliJ and it > imports the project out of the box, without the need for any special tuning). > * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; > currently XSLT...) > * I didn't bother adding Solr dist/test-framework to packaging (who'd use it > from a binary distribution? > > *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and > Cao Mạnh Đạt
[jira] [Comment Edited] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018082#comment-17018082 ] David Smiley edited comment on LUCENE-9077 at 1/17/20 2:41 PM: --- In general I suggest NOT jumping between major versions on a single checkout/worktree. On my machine I have multiple "git worktree" for the major branches. If I want to go between say branch_8x and say perhaps the 8.4 release branch then I do it on that worktree and *not* master. It's just too disruptive to thinks like expected JDK, modules, IDE issues etc. was (Author: dsmiley): In general I suggest jumping between major versions on a single checkout/worktree. On my machine I have multiple "git worktree" for the major branches. If I want to go between say branch_8x and say perhaps the 8.4 release branch then I do it on that worktree and *not* master. It's just too disruptive to thinks like expected JDK, modules, IDE issues etc. > Gradle build > > > Key: LUCENE-9077 > URL: https://issues.apache.org/jira/browse/LUCENE-9077 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (9.0) > > Time Spent: 2.5h > Remaining Estimate: 0h > > This task focuses on providing gradle-based build equivalent for Lucene and > Solr (on master branch). See notes below on why this respin is needed. > The code lives on *gradle-master* branch. It is kept with sync with *master*. > Try running the following to see an overview of helper guides concerning > typical workflow, testing and ant-migration helpers: > gradlew :help > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. Once you have a > patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. > * (/) Apply forbiddenAPIs > * (/) Generate hardware-aware gradle defaults for parallelism (count of > workers and test JVMs). > * (/) Fail the build if --tests filter is applied and no tests execute > during the entire build (this allows for an empty set of filtered tests at > single project level). > * (/) Port other settings and randomizations from common-build.xml > * (/) Configure security policy/ sandboxing for tests. > * (/) test's console output on -Ptests.verbose=true > * (/) add a :helpDeps explanation to how the dependency system works > (palantir plugin, lockfile) and how to retrieve structured information about > current dependencies of a given module (in a tree-like output). > * (/) jar checksums, jar checksum computation and validation. This should be > done without intermediate folders (directly on dependency sets). > * (/) verify min. JVM version and exact gradle version on build startup to > minimize odd build side-effects > * (/) Repro-line for failed tests/ runs. > * (/) add a top-level README note about building with gradle (and the > required JVM). > * (/) add an equivalent of 'validate-source-patterns' > (check-source-patterns.groovy) to precommit. > * (/) add an equivalent of 'rat-sources' to precommit. > * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) > to precommit. > * (/) javadoc compilation > Hard-to-implement stuff already investigated: > * (/) (done) -*Printing console output of failed tests.* There doesn't seem > to be any way to do this in a reasonably efficient way. There are onOutput > listeners but they're slow to operate and solr tests emit *tons* of output so > it's an overkill.- > * (!) (LUCENE-9120) *Tests working with security-debug logs or other > JVM-early log output*. Gradle's test runner works by redirecting Java's > stdout/ syserr so this just won't work. Perhaps we can spin the ant-based > test runner for such corner-cases. > Of lesser importance: > * Add an equivalent of 'documentation-lint" to precommit. > * (/) add rendering of javadocs (gradlew javadoc) > * Attach javadocs to maven publications. > * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid > it'll be difficult to run it sensibly because gradle doesn't offer cwd > separation for the forked test runners. > * if you diff solr packaged distribution against ant-created distribution > there are minor differences in library versions and some JARs are excluded/ > moved around. I didn't try to force these as everything seems to work (tests, > etc.) – perhaps these differences should be fixed in the ant build instead. > * [EOE] identify and port various "regenerate" tasks from ant builds > (javacc, precompiled automata, etc.) > * Fill in POM details in gradle/defaults-maven.gradle so that they reflect > the previous content better (dependencies aside). > * Add any IDE integrat
[jira] [Updated] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-9134: Attachment: gen-kuromoji.patch > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, gen-kuromoji.patch > > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018088#comment-17018088 ] Dawid Weiss commented on LUCENE-9134: - I wanted to start with Kuromoji dictionary regeneration but it turned out it uses a patch command (which Windows lacks). I leave the patch here for now, will pick it up again when I figure out how to do this in a platform-independent way. > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, gen-kuromoji.patch > > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018090#comment-17018090 ] Dawid Weiss commented on LUCENE-9134: - Note to self: use jgit. Treat the expanded dictionary as a fresh repo and apply the patch with jgit's patch command. > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, gen-kuromoji.patch > > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018096#comment-17018096 ] Erick Erickson commented on LUCENE-9134: BTW, I've got the javacc bits working, just trying to clean up enough so we don't need to hand-edit the results afterwards. Having real trouble getting IntelliJ to recompile on demand even when the files haven't changed, it used to. Or getting the gradle build to use -xlint options. Digging... > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, gen-kuromoji.patch > > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9142) Add documentation to Operations.determinize, SortedIntSet, and FrozenSet
[ https://issues.apache.org/jira/browse/LUCENE-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018103#comment-17018103 ] Michael McCandless commented on LUCENE-9142: This is indeed dangerously sneaky code – {{SortedIntSet.equals}} has special logic to compare only to a {{FrozenIntSet}} ... it's kinda weird that it cannot compare against another {{SortedIntSet}}, while {{FrozenIntSet.equals}} is symmetric (can compare against either class). Maybe we could at least fix both of these {{equals}} methods to invoke the same (static) {{equals}}? Hmm, and {{FrozenIntSet.equals}} looks buggy when it's comparing to a {{SortedIntSet}} – it's checking the length of the {{values}} array in the {{SortedIntSet}} when I think it should instead check against {{upto}}? If that's really it bug it may indeed be causing our determinize/minimize to not collapse as many states as it should? > Add documentation to Operations.determinize, SortedIntSet, and FrozenSet > > > Key: LUCENE-9142 > URL: https://issues.apache.org/jira/browse/LUCENE-9142 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: Mike Drob >Priority: Major > > Was tracing through the fuzzy query code, and IntelliJ helpfully pointed out > that we have mismatched types when trying to reuse states, and so we may be > creating more states than we need to. > Relevant snippets: > {code:title=Operations.java} > Map newstate = new HashMap<>(); > final SortedIntSet statesSet = new SortedIntSet(5); > Integer q = newstate.get(statesSet); > {code} > {{q}} is always going to be null in this path because there are no > SortedIntSet keys in the map. > There are also very little javadoc on SortedIntSet, so I'm having trouble > following the precise relationship between all the pieces here. > cc: [~mikemccand] [~romseygeek] - I would appreciate any pointers if you have > them -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018114#comment-17018114 ] Michael McCandless commented on LUCENE-9123: This solution would fix Kuromoji to create a simple chain of tokens, all with position increment 1 (no overlapping compound tokens)? Would you only use that mode when parsing the synonyms to build the synonym filter (or synonym graph filter)? (Since that seems to be where the error is occurring here). Or would you also use that as your primary Tokenizer (which would mean you don't also get compound words directly out of Kuromoji). Net/net it's disappointing that neither synonym filter nor synonym graph filter can correctly consume an incoming token graph; it'd be somewhat tricky to fix, but is important. I thought we had a dedicated issue for that but I cannot locate it now. > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9137) Broken link 'Change log' for 8.4.1 on https://lucene.apache.org/core/downloads.html
[ https://issues.apache.org/jira/browse/LUCENE-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018122#comment-17018122 ] Sebb commented on LUCENE-9137: -- The published web page has yet to be updated: $ curl -Is https://lucene.apache.org/core/downloads.html HTTP/1.1 200 OK Date: Fri, 17 Jan 2020 15:29:26 GMT Last-Modified: Tue, 14 Jan 2020 13:08:01 GMT > Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html > --- > > Key: LUCENE-9137 > URL: https://issues.apache.org/jira/browse/LUCENE-9137 > Project: Lucene - Core > Issue Type: Bug > Environment: Broken link 'Change log' for 8.4.1 on > https://lucene.apache.org/core/downloads.html >Reporter: Sebb >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018125#comment-17018125 ] Bruno Roustant commented on LUCENE-9125: {quote}The Automaton queries only use the {{step}} API while constructing the {{RunAutomaton}} {quote} Correct. Automaton.step() was also used in the minimize() Operation, which is now a bit faster. > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-9098) Report problematic term value when fuzzy query is too complex
[ https://issues.apache.org/jira/browse/LUCENE-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-9098: Reopen so we can fix the failing seed ... > Report problematic term value when fuzzy query is too complex > - > > Key: LUCENE-9098 > URL: https://issues.apache.org/jira/browse/LUCENE-9098 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is the lucene compliment to SOLR-13190, when fuzzy query gets a term > that expands to too many states, we throw an exception but don't provide > insight on the problematic term. We should improve the error reporting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9098) Report problematic term value when fuzzy query is too complex
[ https://issues.apache.org/jira/browse/LUCENE-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018138#comment-17018138 ] Michael McCandless commented on LUCENE-9098: Indeed, this is still failing on current master (fb3ca8d000d6e5203a57625942b754f1d5757fac). Looks like the test tries to make a random string that for sure will attempt to use too many states during determinize, yet this particular random string does not... here's the full failure: {noformat} [junit4:pickseed] Seed property 'tests.seed' already defined: CE3DF037C6D29401 [junit4] says Привет! Master seed: CE3DF037C6D29401 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(16174@localhost). [junit4] Suite: org.apache.lucene.search.TestFuzzyQuery [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestFuzzyQuery -Dtests.method=testErrorMessage -Dtests.seed=CE3DF037C6D29401 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=fr-GN -Dtests.timezon\ e=US/Pacific-New -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 0.23s | TestFuzzyQuery.testErrorMessage <<< [junit4] > Throwable #1: junit.framework.AssertionFailedError: Unexpected exception type, expected FuzzyTermsException but got java.lang.UnsupportedOperationException [junit4] > at __randomizedtesting.SeedInfo.seed([CE3DF037C6D29401:1836CAB94FFCBD4F]:0) [junit4] > at org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2752) [junit4] > at org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2740) [junit4] > at org.apache.lucene.search.TestFuzzyQuery.testErrorMessage(TestFuzzyQuery.java:507) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:566) [junit4] > at java.base/java.lang.Thread.run(Thread.java:834) [junit4] > Caused by: java.lang.UnsupportedOperationException [junit4] > at org.apache.lucene.search.TestFuzzyQuery$1.iterator(TestFuzzyQuery.java:511) [junit4] > at org.apache.lucene.index.Terms.intersect(Terms.java:70) [junit4] > at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:205) [junit4] > at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:232) [junit4] > at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:131) [junit4] > at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:196) [junit4] > at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:303) [junit4] > at org.apache.lucene.search.TestFuzzyQuery.lambda$testErrorMessage$6(TestFuzzyQuery.java:508) [junit4] > at org.apache.lucene.util.LuceneTestCase._expectThrows(LuceneTestCase.java:2870) [junit4] > at org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2745) [junit4] > ... 38 more [junit4] 2> NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=5, maxDocsPerChunk=10, blockSize=8), termVectorsFormat=Co\ mpressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=5, blockSize=8)), sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@3f2c18f9), locale=fr-GN, timezone=US/Pacific-New [junit4] 2> NOTE: Linux 4.4.0-165-generic amd64/Oracle Corporation 11.0.2 (64-bit)/cpus=8,threads=1,free=477446800,total=536870912 [junit4] 2> NOTE: All tests run in this JVM: [TestFuzzyQuery] [junit4] Completed [1/1 (1!)] in 0.45s, 1 test, 1 failure <<< FAILURES!{noformat} > Report problematic term value when fuzzy query is too complex > - > > Key: LUCENE-9098 > URL: https://issues.apache.org/jira/browse/LUCENE-9098 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is the lucene compliment to SOLR-13190, when fuzzy query gets a term > that expands to too many states, we throw an exception but don't provide > insight on the problematic term. We should improve the error reporting. -- This message was sent by Atlassian Jira (v8.3.4#803005) -
[jira] [Commented] (SOLR-14073) Fix segment look ahead NPE in CollapsingQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018150#comment-17018150 ] Joel Bernstein commented on SOLR-14073: --- The initial patch seems to work but it's quite hard to reason about. I'm going to try a different approach which is easier to reason about which is to prepopulate the contexts array rather then populating it as the segments are visited. This should eliminate the look-ahead NPE as well. > Fix segment look ahead NPE in CollapsingQParserPlugin > - > > Key: SOLR-14073 > URL: https://issues.apache.org/jira/browse/SOLR-14073 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14073.patch > > > The CollapsingQParserPlugin has a bug that if every segment is not visited > during the collect it throws an NPE. This causes the CollapsingQParserPlugin > to not work when used with any feature that short circuits the segments > during the collect. This includes using the CollapsingQParserPlugin twice in > the same query and the time limiting collector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14073) Fix segment look ahead NPE in CollapsingQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018150#comment-17018150 ] Joel Bernstein edited comment on SOLR-14073 at 1/17/20 4:03 PM: The initial patch seems to work but it's quite hard to reason about. I'm going to try a different approach which is easier to reason about which is to pre-populate the contexts array rather then populating it as the segments are visited. This should eliminate the look-ahead NPE as well. was (Author: joel.bernstein): The initial patch seems to work but it's quite hard to reason about. I'm going to try a different approach which is easier to reason about which is to prepopulate the contexts array rather then populating it as the segments are visited. This should eliminate the look-ahead NPE as well. > Fix segment look ahead NPE in CollapsingQParserPlugin > - > > Key: SOLR-14073 > URL: https://issues.apache.org/jira/browse/SOLR-14073 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14073.patch > > > The CollapsingQParserPlugin has a bug that if every segment is not visited > during the collect it throws an NPE. This causes the CollapsingQParserPlugin > to not work when used with any feature that short circuits the segments > during the collect. This includes using the CollapsingQParserPlugin twice in > the same query and the time limiting collector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018159#comment-17018159 ] Kazuaki Hiraga commented on LUCENE-9123: {quote} This solution would fix Kuromoji to create a simple chain of tokens, all with position increment 1 (no overlapping compound tokens)? {quote} Yes. Although I may need to test more documents to ensure that the fix will produce a simple chain of tokens, it seems working fine so far. {quote} Would you only use that mode when parsing the synonyms to build the synonym filter (or synonym graph filter)? (Since that seems to be where the error is occurring here). Or would you also use that as your primary Tokenizer (which would mean you don't also get compound words directly out of Kuromoji). {quote} In my case, I use this mode as my primary Tokenizer configuration since I usually want to have decompound tokens. It would be nice if synonym filter and synonym graph filter can work with this mode without the patch. However, I don't think there are many situations that we need original tokens along with decompound ones (I cannot say we will never need though). Current workaround for this issue is using normal mode that will not produce decompound tokens. But, for example, we cannot get a document that contains 株式会社 by using a query 会社 because 株式会社 will be one token and normal mode doesn't produce decoumpound tokens that will produce two tokens 株式 and 会社 (in this case, we can use n-gram in addition to tokenize field to get a document but it has other issues). I will try to find out that one which dedicated issue for the filter. If there's no one, I will create a ticket to record the issue. > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018187#comment-17018187 ] Tomoko Uchida commented on LUCENE-9123: --- {{quote}} However, I don't think there are many situations that we need original tokens along with decompound ones {{quote}} Personally I agree with that. Concerning full text searching, we rarely need original tokens when we use the "search mode". Why don't we set "discardCompoundToken" to true by default from here (I think this minor change in behaviour is Okay for next 8.x release)? > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018187#comment-17018187 ] Tomoko Uchida edited comment on LUCENE-9123 at 1/17/20 4:59 PM: {quote} However, I don't think there are many situations that we need original tokens along with decompound ones {quote} Personally I agree with that. Concerning full text searching, we rarely need original tokens when we use the "search mode". Why don't we set "discardCompoundToken" to true by default from here (I think this minor change in behaviour is Okay for next 8.x release)? was (Author: tomoko uchida): {{quote}} However, I don't think there are many situations that we need original tokens along with decompound ones {{quote}} Personally I agree with that. Concerning full text searching, we rarely need original tokens when we use the "search mode". Why don't we set "discardCompoundToken" to true by default from here (I think this minor change in behaviour is Okay for next 8.x release)? > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on issue #1176: LUCENE-9143 Add error-prone checks to build, but disabled
madrob commented on issue #1176: LUCENE-9143 Add error-prone checks to build, but disabled URL: https://github.com/apache/lucene-solr/pull/1176#issuecomment-575721644 Yea, I'll split this out into the warnings separately and the build changes into an individual file as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #1181: LUCENE-9145 First pass addressing static analysis
madrob opened a new pull request #1181: LUCENE-9145 First pass addressing static analysis URL: https://github.com/apache/lucene-solr/pull/1181 Fixed a bunch of the smaller warnings found by error-prone compiler plugin, while ignoring a lot of the bigger ones. This is just the warnings found by #1176 without the build changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #1179: LUCENE-9147: Move the stored fields index off-heap.
jpountz commented on issue #1179: LUCENE-9147: Move the stored fields index off-heap. URL: https://github.com/apache/lucene-solr/pull/1179#issuecomment-575722955 We have this information for a few datasets at https://elasticsearch-benchmarks.elastic.co/index.html#tracks/geonames/nightly/default/30d. For instance 3.8MB for the geonames dataset or 6MB for the HTTP logs dataset. It's not much, but these datasets are not very large either (3.1GB and 18.6GB respectively). As you can see it's the main contributor to memory usage after the terms index (which we still load it on-heap for the `_id` field for now) so if a user loads the terms index off-heap, it may well be that stored fields are the main heap user. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018207#comment-17018207 ] Kazuaki Hiraga commented on LUCENE-9123: [~tomoko], Hm.. it sounds tricky and difficulty to draw a line between acceptable change and unacceptable one. I think changing default behavior has more impact on outcome of Tokenizer rather than modifying signature of constructors, which users may want to intentionally change. I thought we didn't want to do that at the point release. That's the reason to set false to this option. What do you think? we can change the behavior? > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1181: LUCENE-9145 First pass addressing static analysis
dweiss commented on issue #1181: LUCENE-9145 First pass addressing static analysis URL: https://github.com/apache/lucene-solr/pull/1181#issuecomment-575727247 I'd run a full test suite and if it passes just commit it in. Most of these look like legitimate bug fixes! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018213#comment-17018213 ] Chris M. Hostetter commented on SOLR-12859: --- [~caomanhdat] - i can't find your patch, did you already delete it? (or forget to add it) I think what' you were saying... {quote}if isSolrThread == true, set usr = "$" even incase of principal == null {quote} ...is pretty similar to what i had hypothosized... {quote}My initial reaction was to "swap" the order of the Principle/isSolrThread() checks... {quote} ...but i still have the same concerns... {quote}...that seems risky (especially since AFAICT, every SolrClient used for distributed Solr requests will return "true" for isSolrThread() – meaning PKI would probably completely stop forwarding credentials if we did that? {quote} To put it another way: * Assume {{blockUnknown=false}} * So unauthenticated request that gets accepted by the AuthenticationPlugin will have a null Principal. * as things stand right now, if that "principal==null" request gets forwarded to a distributed node, it will _never_ have a PKI header added * With your proposed change, if it does get forwarded _by another thread_ such as ConcurrentUpdateSolrClient (where {{isSolrThread == true}}) then a PKI header will be added initiating it is a "principal == '$' (Super User)" request ** ie: anonymous requests will be "promoted" to being super user requests on the distributed nodes i have no idea what practical problems that may cause, but it certainly sounds bad. {quote}...think that the Hoss's initial approach is more valid than mine because * it makes less significant change to the code base.{quote} Yeah, it seems like there are a lot of weird edge cases of the way PKI for "background server threads" works that really needs to be discussed/re-considered -- particularly since even forwarding updates from leader to replicas happens in a background thread (inside of the update client) ... but that should probably happen in a new jira with wider discussion & visibility. My patch seems like the minimum amount of change needed to fix the current bug, so unless someone sees a security problem with it (that doesn't already exist) i would suggest we move forward using my patch and re-consider the overall "isSolrThread" concept in a new jira? > DocExpirationUpdateProcessorFactory does not work with BasicAuth > > > Key: SOLR-12859 > URL: https://issues.apache.org/jira/browse/SOLR-12859 > Project: Solr > Issue Type: Bug >Affects Versions: 7.5 >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12859.patch > > > I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( > DocExpirationUpdateProcessorFactory ) to auto-delete documents. > > Turns out it doesn't work when Basic Auth is enabled. I get the following > stacktrace from the logs > {code:java} > 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [ ] > o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic > deletion of expired docs: Async exception during distributed update: Error > from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: > require authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > Async exception during distributed update: Error from server at > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require > authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32
[jira] [Comment Edited] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018159#comment-17018159 ] Kazuaki Hiraga edited comment on LUCENE-9123 at 1/17/20 5:56 PM: - {quote} This solution would fix Kuromoji to create a simple chain of tokens, all with position increment 1 (no overlapping compound tokens)? {quote} Yes. Although I may need to test more documents to ensure that the fix will produce a simple chain of tokens, it seems working fine so far. {quote} Would you only use that mode when parsing the synonyms to build the synonym filter (or synonym graph filter)? (Since that seems to be where the error is occurring here). Or would you also use that as your primary Tokenizer (which would mean you don't also get compound words directly out of Kuromoji). {quote} In my case, I use this mode as my primary Tokenizer configuration since I usually want to have decompound tokens. It would be nice if synonym filter and synonym graph filter can work with this mode without the patch. However, I don't think there are many situations that we need original tokens along with decompound ones (I cannot say we will never need though). Current workaround for this issue is using normal mode that will not produce decompound tokens. But, for example, we cannot get a document that contains 株式会社 by using a query 会社 because 株式会社 will be one token and normal mode doesn't produce decoumpound tokens that will produce two tokens 株式 and 会社 (in this case, we can use n-gram in addition to tokenize field to get a document but it has other issues). Therefore, there are two issues. #1 Kuromoji produces compound and decompound tokens on both of search mode and extended mode, which compound one is rarely needed. #2 Neither synonym filter nor synonym graph filter can work with tokens that overlap position. [~mikemccand], I will try to find the ticket for #2. If there's no one, I will create one. And I will change the title of this ticket to focus on #1. was (Author: h.kazuaki): {quote} This solution would fix Kuromoji to create a simple chain of tokens, all with position increment 1 (no overlapping compound tokens)? {quote} Yes. Although I may need to test more documents to ensure that the fix will produce a simple chain of tokens, it seems working fine so far. {quote} Would you only use that mode when parsing the synonyms to build the synonym filter (or synonym graph filter)? (Since that seems to be where the error is occurring here). Or would you also use that as your primary Tokenizer (which would mean you don't also get compound words directly out of Kuromoji). {quote} In my case, I use this mode as my primary Tokenizer configuration since I usually want to have decompound tokens. It would be nice if synonym filter and synonym graph filter can work with this mode without the patch. However, I don't think there are many situations that we need original tokens along with decompound ones (I cannot say we will never need though). Current workaround for this issue is using normal mode that will not produce decompound tokens. But, for example, we cannot get a document that contains 株式会社 by using a query 会社 because 株式会社 will be one token and normal mode doesn't produce decoumpound tokens that will produce two tokens 株式 and 会社 (in this case, we can use n-gram in addition to tokenize field to get a document but it has other issues). I will try to find out that one which dedicated issue for the filter. If there's no one, I will create a ticket to record the issue. > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym
[jira] [Commented] (LUCENE-4702) Terms dictionary compression
[ https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018228#comment-17018228 ] Adrien Grand commented on LUCENE-4702: -- I did some research on what's taking space in the terms dictionary, and while suffixes take a fair amount of space for text fields, it tends to rather be stats for ID fields, so I did a couple changes to also use LZ4 to do some run-length encoding for doc freqs (frequent runs of 1s for ids, and interestingly there are many runs of 1s for the body field of wikibigall too), suffix lengths, which are also frequently the same especially for ID fields (always the same for UUID or flake IDs and very little variance for auto-increment IDs). Finally we were wasting some space with the pulsing optimization too since we kept writing the delta of file pointers in spite of the fact that these deltas are almost always zeros for ID fields since we don't write postings in the doc file but in the terms dictionary. The compression is significantly better now as the size of the tim file goes down by 18% from 937MB to 767MB. Here are the stats for the body and id fields if you are curious: {code} "id" field index FST: 72 bytes terms: 6647577 terms 39885462 bytes (6.0 bytes/term) blocks: 189932 blocks 184655 terms-only blocks 5277 sub-block-only blocks 0 mixed blocks 0 floor blocks 189932 non-floor blocks 0 floor sub-blocks 14059850 term suffix bytes before compression (52.8 suffix-bytes/block) 10023973 compressed term suffix bytes (0.71 compression ratio - compression count by algorithm: NO_COMPRESSION: 189932) 6647577 term stats bytes before compression (11.7 stats-bytes/block) 2226414 compressed term stats bytes (0.33 compression ratio) 26962631 other bytes (142.0 other-bytes/block) {code} {code} "body" field index FST: 72 bytes terms: 46916528 terms 595069147 bytes (12.7 bytes/term) blocks: 1507239 blocks 1158537 terms-only blocks 471 sub-block-only blocks 348231 mixed blocks 318391 floor blocks 491775 non-floor blocks 1015464 floor sub-blocks 359880365 term suffix bytes before compression (196.3 suffix-bytes/block) 295898442 compressed term suffix bytes (0.82 compression ratio - compression count by algorithm: NO_COMPRESSION: 252273, LOWERCASE_ASCII: 1190011, LZ4: 64955) 94426201 term stats bytes before compression (45.1 stats-bytes/block) 68022105 compressed term stats bytes (0.72 compression ratio) 213996755 other bytes (142.0 other-bytes/block) {code} I see a 10% slowdown on PKLookup that I'll look into. > Terms dictionary compression > > > Key: LUCENE-4702 > URL: https://issues.apache.org/jira/browse/LUCENE-4702 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Trivial > Attachments: LUCENE-4702.patch, LUCENE-4702.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > I've done a quick test with the block tree terms dictionary by replacing a > call to IndexOutput.writeBytes to write suffix bytes with a call to > LZ4.compressHC to test the peformance hit. Interestingly, search performance > was very good (see comparison table below) and the tim files were 14% smaller > (from 150432 bytes overall to 129516). > {noformat} > TaskQPS baseline StdDevQPS compressed StdDev > Pct diff > Fuzzy1 111.50 (2.0%) 78.78 (1.5%) > -29.4% ( -32% - -26%) > Fuzzy2 36.99 (2.7%) 28.59 (1.5%) > -22.7% ( -26% - -18%) > Respell 122.86 (2.1%) 103.89 (1.7%) > -15.4% ( -18% - -11%) > Wildcard 100.58 (4.3%) 94.42 (3.2%) > -6.1% ( -13% -1%) > Prefix3 124.90 (5.7%) 122.67 (4.7%) > -1.8% ( -11% -9%) >OrHighLow 169.87 (6.8%) 167.77 (8.0%) > -1.2% ( -15% - 14%) > LowTerm 1949.85 (4.5%) 1929.02 (3.4%) > -1.1% ( -8% -7%) > AndHighLow 2011.95 (3.5%) 1991.85 (3.3%) > -1.0% ( -7% -5%) > OrHighHigh 155.63 (6.7%) 154.12 (7.9%) > -1.0% ( -14% - 14%) > AndHighHigh 341.82 (1.2%) 339.49 (1.7%) > -0.7% ( -3% -2%) >OrHighMed 217.55 (6.3%) 216.16 (7.1%) > -0.6% ( -13% - 13%) > IntNRQ 53.10 (10.9%) 52.90 (8.6%) > -0.4% ( -17% - 21%) > MedTerm 998.11 (3.8%) 994.82 (5.6%) > -0.3% ( -9% -9%) >
[jira] [Updated] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-13965: --- Attachment: SOLR-13965.01.patch > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-13965: --- Status: Patch Available (was: Open) > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018234#comment-17018234 ] Christine Poerschke commented on SOLR-13965: GraphHandler and StreamHandler code sharing/duplication was mentioned both here and on the pull request. SOLR-13965.01.patch factors out a static {{StreamHandler.addExpressiblePlugins}} method which GraphHandler could then use too. (Note that this does _not_ fix the {{SolrConfig.classVsSolrPluginInfo.get(Expressible.class)}} suspected bug that [~mdrob] mentioned on the PR – {{Expressible.class}} vs. {{Expressible.class.getName()}} was the suspected type mismatch there, right?) > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4499) Multi-word synonym filter (synonym expansion)
[ https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018233#comment-17018233 ] Alessandro Benedetti commented on LUCENE-4499: -- hi [~Tagar], sorry for the late reply, I contributed a patch that is still waiting for a review: [https://issues.apache.org/jira/browse/SOLR-12238|https://issues.apache.org/jira/browse/SOLR-12238] It's a bit old, so it may require some porting effort, but it could help you. > Multi-word synonym filter (synonym expansion) > - > > Key: LUCENE-4499 > URL: https://issues.apache.org/jira/browse/LUCENE-4499 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: 4.1, 6.0 >Reporter: Roman Chyla >Priority: Major > Labels: analysis, multi-word, synonyms > Fix For: 6.0 > > Attachments: LUCENE-4499.patch, LUCENE-4499.patch > > > I apologize for bringing the multi-token synonym expansion up again. There is > an old, unresolved issue at LUCENE-1622 [1] > While solving the problem for our needs [2], I discovered that the current > SolrSynonym parser (and the wonderful FTS) have almost everything to > satisfactorily handle both the query and index time synonym expansion. It > seems that people often need to use the synonym filter *slightly* differently > at indexing and query time. > In our case, we must do different things during indexing and querying. > Example sentence: Mirrors of the Hubble space telescope pointed at XA5 > This is what we need (comma marks position bump): > indexing: mirrors,hubble|hubble space > telescope|hst,space,telescope,pointed,xa5|astroobject#5 > querying: +mirrors +(hubble space telescope | hst) +pointed > +(xa5|astroboject#5) > This translated to following needs: > indexing time: > single-token synonyms => return only synonyms > multi-token synonyms => return original tokens *AND* the synonyms > query time: > single-token: return only synonyms (but preserve case) > multi-token: return only synonyms > > We need the original tokens for the proximity queries, if we indexed 'hubble > space telescope' > as one token, we cannot search for 'hubble NEAR telescope' > You may (not) be surprised, but Lucene already supports ALL of these > requirements. The patch is an attempt to state the problem differently. I am > not sure if it is the best option, however it works perfectly for our needs > and it seems it could work for general public too. Especially if the > SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and > people would just choose what situation they use. Please look at the unittest. > links: > [1] https://issues.apache.org/jira/browse/LUCENE-1622 > [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 > [3] seems to have similar request: > http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1033: SOLR-13965: Use Plugin to add new expressions to GraphHandler
cpoerschke commented on a change in pull request #1033: SOLR-13965: Use Plugin to add new expressions to GraphHandler URL: https://github.com/apache/lucene-solr/pull/1033#discussion_r368076257 ## File path: solr/core/src/java/org/apache/solr/handler/GraphHandler.java ## @@ -92,24 +104,29 @@ public void inform(SolrCore core) { } // This pulls all the overrides and additions from the config +List pluginInfos = core.getSolrConfig().getPluginInfos(Expressible.class.getName()); + +// Check deprecated approach. Object functionMappingsObj = initArgs.get("streamFunctions"); if(null != functionMappingsObj){ + log.warn("solrconfig.xml: is deprecated for adding additional streaming functions to GraphHandler."); NamedList functionMappings = (NamedList)functionMappingsObj; for(Entry functionMapping : functionMappings) { String key = functionMapping.getKey(); PluginInfo pluginInfo = new PluginInfo(key, Collections.singletonMap("class", functionMapping.getValue())); - -if (pluginInfo.pkgName == null) { - Class clazz = core.getResourceLoader().findClass((String) functionMapping.getValue(), - Expressible.class); - streamFactory.withFunctionName(key, clazz); -} else { - StreamHandler.ExpressibleHolder holder = new StreamHandler.ExpressibleHolder(pluginInfo, core, SolrConfig.classVsSolrPluginInfo.get(Expressible.class)); - streamFactory.withFunctionName(key, () -> holder.getClazz()); -} - +pluginInfos.add(pluginInfo); } +} +for (PluginInfo pluginInfo : pluginInfos) { + if (pluginInfo.pkgName != null) { +ExpressibleHolder holder = new ExpressibleHolder(pluginInfo, core, SolrConfig.classVsSolrPluginInfo.get(Expressible.class)); +streamFactory.withFunctionName(pluginInfo.name, +() -> holder.getClazz()); + } else { +Class clazz = core.getMemClassLoader().findClass(pluginInfo.className, Expressible.class); Review comment: > Since this code is duplicated between Stream & Graph, can we factor it out into a common method? Attached SOLR-13965.01.patch to the JIRA ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]
[ https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018237#comment-17018237 ] ASF subversion and git services commented on LUCENE-9053: - Commit 8147e491ce3905bb3543f2c7e34a4ecb60382b49 in lucene-solr's branch refs/heads/master from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8147e49 ] LUCENE-9053: improve FST's package-info.java comment to clarify required (Unicode code point) sort order for FST.Builder > java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c > 8b] vs input=[ef ac 81 67 75 72 65] > --- > > Key: LUCENE-9053 > URL: https://issues.apache.org/jira/browse/LUCENE-9053 > Project: Lucene - Core > Issue Type: Bug >Reporter: gitesh >Priority: Minor > > Even if the inputs are sorted in unicode order, I get following exception > while creating FST: > > {code:java} > // Input values (keys). These must be provided to Builder in Unicode sorted > order! > String inputValues[] = {"𝐴", "figure", "flagship"}; > long outputValues[] = {5, 7, 12}; > PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); > Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs); > BytesRefBuilder scratchBytes = new BytesRefBuilder(); > IntsRefBuilder scratchInts = new IntsRefBuilder(); > for (int i = 0; i < inputValues.length; i++) { > scratchBytes.copyChars(inputValues[i]); > builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), > outputValues[i]); > } > FST fst = builder.finish(); > Long value = Util.get(fst, new BytesRef("figure")); > System.out.println(value); > {code} > Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are > using the ligature character{color} fl {color:#172b4d}above. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]
[ https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018239#comment-17018239 ] ASF subversion and git services commented on LUCENE-9053: - Commit 155b099caaa786f9e9c84c72eca2ee6683d289b1 in lucene-solr's branch refs/heads/branch_8x from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=155b099 ] LUCENE-9053: improve FST's package-info.java comment to clarify required (Unicode code point) sort order for FST.Builder > java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c > 8b] vs input=[ef ac 81 67 75 72 65] > --- > > Key: LUCENE-9053 > URL: https://issues.apache.org/jira/browse/LUCENE-9053 > Project: Lucene - Core > Issue Type: Bug >Reporter: gitesh >Priority: Minor > > Even if the inputs are sorted in unicode order, I get following exception > while creating FST: > > {code:java} > // Input values (keys). These must be provided to Builder in Unicode sorted > order! > String inputValues[] = {"𝐴", "figure", "flagship"}; > long outputValues[] = {5, 7, 12}; > PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); > Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs); > BytesRefBuilder scratchBytes = new BytesRefBuilder(); > IntsRefBuilder scratchInts = new IntsRefBuilder(); > for (int i = 0; i < inputValues.length; i++) { > scratchBytes.copyChars(inputValues[i]); > builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), > outputValues[i]); > } > FST fst = builder.finish(); > Long value = Util.get(fst, new BytesRef("figure")); > System.out.println(value); > {code} > Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are > using the ligature character{color} fl {color:#172b4d}above. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14073) Fix segment look ahead NPE in CollapsingQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-14073: -- Attachment: SOLR-14073.patch > Fix segment look ahead NPE in CollapsingQParserPlugin > - > > Key: SOLR-14073 > URL: https://issues.apache.org/jira/browse/SOLR-14073 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14073.patch, SOLR-14073.patch > > > The CollapsingQParserPlugin has a bug that if every segment is not visited > during the collect it throws an NPE. This causes the CollapsingQParserPlugin > to not work when used with any feature that short circuits the segments > during the collect. This includes using the CollapsingQParserPlugin twice in > the same query and the time limiting collector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-12859: -- Attachment: SOLR-12859.patch Status: Patch Available (was: Patch Available) Attaching a slightly updated patch with javadoc additions that makes it clear the new {{setUserPrincipalName()}} method in LocalSolrQueryRequest is experimental and subject ot change in case we want to remove if if/when more comprehensive changes are made to PKI/isSolrThread. > DocExpirationUpdateProcessorFactory does not work with BasicAuth > > > Key: SOLR-12859 > URL: https://issues.apache.org/jira/browse/SOLR-12859 > Project: Solr > Issue Type: Bug >Affects Versions: 7.5 >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12859.patch, SOLR-12859.patch > > > I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( > DocExpirationUpdateProcessorFactory ) to auto-delete documents. > > Turns out it doesn't work when Basic Auth is enabled. I get the following > stacktrace from the logs > {code:java} > 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [ ] > o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic > deletion of expired docs: Async exception during distributed update: Error > from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: > require authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > Async exception during distributed update: Error from server at > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require > authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.UpdateRequestProcess
[GitHub] [lucene-solr] nknize commented on issue #1174: LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core
nknize commented on issue #1174: LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core URL: https://github.com/apache/lucene-solr/pull/1174#issuecomment-575759956 Thanks @mikemccand. Since this is a refactor from sandbox to the core lucene jar I'm planning to backport this to 8x. This way master to 8x backports are reduced to bug fixes only. Any objections? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize edited a comment on issue #1174: LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core
nknize edited a comment on issue #1174: LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core URL: https://github.com/apache/lucene-solr/pull/1174#issuecomment-575759956 Thanks @mikemccand. Since this is a refactor from sandbox to the core lucene jar I'm planning to backport this to 8x. This way future master to 8x backports are reduced to bug fixes only. Any objections? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9145) Address warnings found by static analysis
[ https://issues.apache.org/jira/browse/LUCENE-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018260#comment-17018260 ] ASF subversion and git services commented on LUCENE-9145: - Commit 338d386ae08a1edecb89df5497cb46d0abf154ad in lucene-solr's branch refs/heads/master from Mike [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=338d386 ] LUCENE-9145 First pass addressing static analysis (#1181) Fixed a bunch of the smaller warnings found by error-prone compiler plugin, while ignoring a lot of the bigger ones. > Address warnings found by static analysis > - > > Key: LUCENE-9145 > URL: https://issues.apache.org/jira/browse/LUCENE-9145 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #1181: LUCENE-9145 First pass addressing static analysis
madrob merged pull request #1181: LUCENE-9145 First pass addressing static analysis URL: https://github.com/apache/lucene-solr/pull/1181 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018278#comment-17018278 ] Tomoko Uchida commented on LUCENE-9123: --- I thought the change in the behavior has very small or no impact for users who use the Tokenizer for searching, but yes it would affect users who use it for pure tokenization purpose. While keeping backward compatibility (within the same major version) is important, not emitting compound tokens would be prefered to get along with succeeding token filters and compound tokens are not needed for most use cases. I think it'd be better that we change the behavior at some point. How about this proposal: we can create two patches, one for the master and one for 8x. On 8x branch, add the new constructor so you can use it from the next update. There is no change in the default behavior. On the master branch, switch the default behavior (users who don't like the change can still swich back by using the full constructor). > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_revised.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit merged pull request #1174: LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core
asfgit merged pull request #1174: LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core URL: https://github.com/apache/lucene-solr/pull/1174 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8621) Move LatLonShape and XYShape out of sandbox
[ https://issues.apache.org/jira/browse/LUCENE-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018308#comment-17018308 ] ASF subversion and git services commented on LUCENE-8621: - Commit aad849bf87ab69c1bd0eb34518181e1f3c1c42f2 in lucene-solr's branch refs/heads/master from Nicholas Knize [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=aad849b ] LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes from sandbox to core > Move LatLonShape and XYShape out of sandbox > --- > > Key: LUCENE-8621 > URL: https://issues.apache.org/jira/browse/LUCENE-8621 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Nick Knize >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > LatLonShape has matured a lot over the last months, I'd like to start > thinking about moving it out of sandbox so that it doesn't stay there for too > long like what happened to LatLonPoint. I am pretty happy with the current > encoding. To my knowledge, we might just need to do a minor modification > because of > LUCENE-8620. > {{XYShape}} and foundation classes will also need to be refactored. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9149) Increase data dimension limit in BKD
Nick Knize created LUCENE-9149: -- Summary: Increase data dimension limit in BKD Key: LUCENE-9149 URL: https://issues.apache.org/jira/browse/LUCENE-9149 Project: Lucene - Core Issue Type: Improvement Reporter: Nick Knize LUCENE-8496 added selective indexing; the ability to designate the first K <= N dimensions for driving the construction of the BKD internal nodes. Follow on work stored the "data dimensions" for only the leaf nodes and only the "index dimensions" are stored for the internal nodes. While {{maxPointsInLeafNode}} is still important for managing the BKD heap memory footprint (thus we don't want this to get too large), I'd like to propose increasing the {{MAX_DIMENSIONS}} limit (to something not too crazy like 16; effectively doubling the index dimension limit) while maintaining the {{MAX_INDEX_DIMENSIONS}} at 8. Doing this will enable us to encode higher dimension data within a lower dimension index (e.g., 3D tessellated triangles as a 10 dimension point using only the first 6 dimensions for index construction) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"
[ https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018314#comment-17018314 ] Houston Putman commented on SOLR-11746: --- I've been updating the patch with everything detailed above (docValues, norms, and the specialized [* TO *] functionality for floats and doubles), as well as extended tests. I have run into a snag with the {{NormsFieldExistsQuery}}. For PointFields (not TrieFields), the behavior of a field's {{SchemaField.indexOptions}} do not line up with the {{FieldInfo.indexOptions}} for the same field. This means that when [FieldInfo.hasNorms()|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/FieldInfo.java#L321] is called in {{NormsFieldExistsQuery}}, for PointFields, *false* will be returned even if the same logic {{(omitNorms == false && IndexOptions != IndexOptions.None)}} used with the data in {{SchemaField}} returns *true*. Since this "hasNorms" logic is different in {{FieldType}}, which uses SchemaType, and {{NormsFieldExistsQuery}}, which uses {{FieldInfo}}, the logic in FieldType cannot accurately determine if the NormsFieldExistsQuery is the correct method to use. I've been unable so far to figure out how FieldInfo and SchemaField have received different values for IndexOptions. (This seems to be the reason why the logic results in different results, {{omitNorms}} has the correct value in both classes.) Any advice here beyond just omitting the {{NormsFieldExistsQuery}} entirely? To be clear this issue only exists for PointFields. > numeric fields need better error handling for prefix/wildcard syntax -- > consider uniform support for "foo:* == foo:[* TO *]" > > > Key: SOLR-11746 > URL: https://issues.apache.org/jira/browse/SOLR-11746 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0 >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch > > > On the solr-user mailing list, Torsten Krah pointed out that with Trie > numeric fields, query syntax such as {{foo_d:\*}} has been functionality > equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported > for Point based numeric fields. > The fact that this type of syntax works (for {{indexed="true"}} Trie fields) > appears to have been an (untested, undocumented) fluke of Trie fields given > that they use indexed terms for the (encoded) numeric terms and inherit the > default implementation of {{FieldType.getPrefixQuery}} which produces a > prefix query against the {{""}} (empty string) term. > (Note that this syntax has aparently _*never*_ worked for Trie fields with > {{indexed="false" docValues="true"}} ) > In general, we should assess the behavior users attempt a prefix/wildcard > syntax query against numeric fields, as currently the behavior is largely > non-sensical: prefix/wildcard syntax frequently match no docs w/o any sort > of error, and the aformentioned {{numeric_field:*}} behaves inconsistently > between points/trie fields and between indexed/docValued trie fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9149) Increase data dimension limit in BKD
[ https://issues.apache.org/jira/browse/LUCENE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Knize updated LUCENE-9149: --- Attachment: LUCENE-9149.patch Status: Open (was: Open) Attached patch: * refactors most {{numDataDim}} variables to more accurate name {{numDims}} * increases {{MAX_DIMENSIONS}} to 16, keeps {{MAX_INDEX_DIMENSIONS}} at 8 * updates random test suites to test with new increased limit Will open a PR for review > Increase data dimension limit in BKD > > > Key: LUCENE-9149 > URL: https://issues.apache.org/jira/browse/LUCENE-9149 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Nick Knize >Priority: Major > Attachments: LUCENE-9149.patch > > > LUCENE-8496 added selective indexing; the ability to designate the first K <= > N dimensions for driving the construction of the BKD internal nodes. Follow > on work stored the "data dimensions" for only the leaf nodes and only the > "index dimensions" are stored for the internal nodes. While > {{maxPointsInLeafNode}} is still important for managing the BKD heap memory > footprint (thus we don't want this to get too large), I'd like to propose > increasing the {{MAX_DIMENSIONS}} limit (to something not too crazy like 16; > effectively doubling the index dimension limit) while maintaining the > {{MAX_INDEX_DIMENSIONS}} at 8. > Doing this will enable us to encode higher dimension data within a lower > dimension index (e.g., 3D tessellated triangles as a 10 dimension point using > only the first 6 dimensions for index construction) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize opened a new pull request #1182: LUCENE-9149: Increase data dimension limit in BKD
nknize opened a new pull request #1182: LUCENE-9149: Increase data dimension limit in BKD URL: https://github.com/apache/lucene-solr/pull/1182 [LUCENE-8496](https://issues.apache.org/jira/browse/LUCENE-8496) added selective indexing; the ability to designate the first K <= N dimensions for driving the construction of the BKD internal nodes. Follow on work stored the "data dimensions" for only the leaf nodes and only the "index dimensions" are stored for the internal nodes. While maxPointsInLeafNode is still important for managing the BKD heap memory footprint (thus we don't want this to get too large), I'd like to propose increasing the `MAX_DIMENSIONS` limit (to something not too crazy like 16; effectively doubling the index dimension limit) while maintaining the `MAX_INDEX_DIMENSIONS` at 8. Doing this will enable us to encode higher dimension data within a lower dimension index (e.g., 3D tessellated triangles as a 10 dimension point using only the first 6 dimensions for index construction) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-9077: --- Description: This task focuses on providing gradle-based build equivalent for Lucene and Solr (on master branch). See notes below on why this respin is needed. The code lives on *gradle-master* branch. It is kept with sync with *master*. Try running the following to see an overview of helper guides concerning typical workflow, testing and ant-migration helpers: gradlew :help A list of items that needs to be added or requires work. If you'd like to work on any of these, please add your name to the list. Once you have a patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. * (/) Apply forbiddenAPIs * (/) Generate hardware-aware gradle defaults for parallelism (count of workers and test JVMs). * (/) Fail the build if --tests filter is applied and no tests execute during the entire build (this allows for an empty set of filtered tests at single project level). * (/) Port other settings and randomizations from common-build.xml * (/) Configure security policy/ sandboxing for tests. * (/) test's console output on -Ptests.verbose=true * (/) add a :helpDeps explanation to how the dependency system works (palantir plugin, lockfile) and how to retrieve structured information about current dependencies of a given module (in a tree-like output). * (/) jar checksums, jar checksum computation and validation. This should be done without intermediate folders (directly on dependency sets). * (/) verify min. JVM version and exact gradle version on build startup to minimize odd build side-effects * (/) Repro-line for failed tests/ runs. * (/) add a top-level README note about building with gradle (and the required JVM). * (/) add an equivalent of 'validate-source-patterns' (check-source-patterns.groovy) to precommit. * (/) add an equivalent of 'rat-sources' to precommit. * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) to precommit. * (/) javadoc compilation Hard-to-implement stuff already investigated: * (/) (done) -*Printing console output of failed tests.* There doesn't seem to be any way to do this in a reasonably efficient way. There are onOutput listeners but they're slow to operate and solr tests emit *tons* of output so it's an overkill.- * (!) (LUCENE-9120) *Tests working with security-debug logs or other JVM-early log output*. Gradle's test runner works by redirecting Java's stdout/ syserr so this just won't work. Perhaps we can spin the ant-based test runner for such corner-cases. Of lesser importance: * Add an equivalent of 'documentation-lint" to precommit. * Do not require files to be committed before running precommit. * (/) add rendering of javadocs (gradlew javadoc) * Attach javadocs to maven publications. * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid it'll be difficult to run it sensibly because gradle doesn't offer cwd separation for the forked test runners. * if you diff solr packaged distribution against ant-created distribution there are minor differences in library versions and some JARs are excluded/ moved around. I didn't try to force these as everything seems to work (tests, etc.) – perhaps these differences should be fixed in the ant build instead. * [EOE] identify and port various "regenerate" tasks from ant builds (javacc, precompiled automata, etc.) * Fill in POM details in gradle/defaults-maven.gradle so that they reflect the previous content better (dependencies aside). * Add any IDE integration layers that should be added (I use IntelliJ and it imports the project out of the box, without the need for any special tuning). * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; currently XSLT...) * I didn't bother adding Solr dist/test-framework to packaging (who'd use it from a binary distribution? *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and Cao Mạnh Đạt but also applies lessons learned from those two efforts: * *Do not try to do too many things at once*. If we deviate too far from master, the branch will be hard to merge. * *Do everything in baby-steps* and add small, independent build fragments replacing the old ant infrastructure. * *Try to engage people to run, test and contribute early*. It can't be a one-man effort. The more people understand and can contribute to the build, the more healthy it will be. was: This task focuses on providing gradle-based build equivalent for Lucene and Solr (on master branch). See notes below on why this respin is needed. The code lives on *gradle-master* branch. It is kept with sync with *master*. Try running the following to see an overview of helper guides concerning typical workflow, testing and ant-migration helpers: gradlew :help A lis
[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"
[ https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018350#comment-17018350 ] Chris M. Hostetter commented on SOLR-11746: --- [~houston] - i'm having a little trouble following what exactly the problem is ... whether you're referring to: * a (modified) patch you have in progress that seems to have a bug but you can't figure out where ** posting the patch for discussion would help) *OR* * that independent of any of the changes discussed here, you've found a bug in that that trying to use {{NormsFieldExistsQuery}} doesn't seem to work with PointsFields (in solr) at all because {{FieldInfo.hasNorms()}} on an existing index is returning false when it shouldn't based on the indexOptions set (in solr) when indexing thta field ** in which case perhaps make a Sub-Task and post an isolated "test only" white box patch asserting what {{FieldInfo.hasNorms()}} returns for a a variety of field types? Either way: a patch with a failing test case and some nocommit comments drawing attention to the problematic bits would be helpful for further discussion. > numeric fields need better error handling for prefix/wildcard syntax -- > consider uniform support for "foo:* == foo:[* TO *]" > > > Key: SOLR-11746 > URL: https://issues.apache.org/jira/browse/SOLR-11746 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0 >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch > > > On the solr-user mailing list, Torsten Krah pointed out that with Trie > numeric fields, query syntax such as {{foo_d:\*}} has been functionality > equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported > for Point based numeric fields. > The fact that this type of syntax works (for {{indexed="true"}} Trie fields) > appears to have been an (untested, undocumented) fluke of Trie fields given > that they use indexed terms for the (encoded) numeric terms and inherit the > default implementation of {{FieldType.getPrefixQuery}} which produces a > prefix query against the {{""}} (empty string) term. > (Note that this syntax has aparently _*never*_ worked for Trie fields with > {{indexed="false" docValues="true"}} ) > In general, we should assess the behavior users attempt a prefix/wildcard > syntax query against numeric fields, as currently the behavior is largely > non-sensical: prefix/wildcard syntax frequently match no docs w/o any sort > of error, and the aformentioned {{numeric_field:*}} behaves inconsistently > between points/trie fields and between indexed/docValued trie fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9150) Restore support for dynamic PlanetModel in Geo3D
Nick Knize created LUCENE-9150: -- Summary: Restore support for dynamic PlanetModel in Geo3D Key: LUCENE-9150 URL: https://issues.apache.org/jira/browse/LUCENE-9150 Project: Lucene - Core Issue Type: Improvement Reporter: Nick Knize LUCENE-7072 removed dynamic planet model support in Geo3D. This was logical at the time (given the state of Lucene and spatial projections and coordinate reference systems). Since then, however, there have been a lot of new developments within the OGC community around [Coordinate Reference Systems|https://docs.opengeospatial.org/as/18-005r4/18-005r4.html], [Dynamic Coordinate Reference Systems|http://docs.opengeospatial.org/DRAFTS/18-058.html], and [Updated ISO Standards|https://www.iso.org/obp/ui/#iso:std:iso:19111:ed-3:v1:en]. It would be useful for Geo3D (and eventually LatLon*) to support different geographic datums to make lucene a viable option for indexing/searching in different spatial reference systems (e.g., more accurately computing query shape relations to BKD's internal nodes using datum consistent with the spatial projection). This would also provide an alternative to other limitations of the {{LatLon*/XY*}} implementation (e.g., pole/dateline crossing, quantization of small polygons). I'd like to propose keeping the current WGS84 static datum as the default for Geo3D but adding back the constructors to accept custom planet models. Perhaps this could be listed as an "expert" API feature? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"
[ https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Houston Putman updated SOLR-11746: -- Attachment: SOLR-11746.patch > numeric fields need better error handling for prefix/wildcard syntax -- > consider uniform support for "foo:* == foo:[* TO *]" > > > Key: SOLR-11746 > URL: https://issues.apache.org/jira/browse/SOLR-11746 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0 >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch > > > On the solr-user mailing list, Torsten Krah pointed out that with Trie > numeric fields, query syntax such as {{foo_d:\*}} has been functionality > equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported > for Point based numeric fields. > The fact that this type of syntax works (for {{indexed="true"}} Trie fields) > appears to have been an (untested, undocumented) fluke of Trie fields given > that they use indexed terms for the (encoded) numeric terms and inherit the > default implementation of {{FieldType.getPrefixQuery}} which produces a > prefix query against the {{""}} (empty string) term. > (Note that this syntax has aparently _*never*_ worked for Trie fields with > {{indexed="false" docValues="true"}} ) > In general, we should assess the behavior users attempt a prefix/wildcard > syntax query against numeric fields, as currently the behavior is largely > non-sensical: prefix/wildcard syntax frequently match no docs w/o any sort > of error, and the aformentioned {{numeric_field:*}} behaves inconsistently > between points/trie fields and between indexed/docValued trie fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"
[ https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018375#comment-17018375 ] Houston Putman commented on SOLR-11746: --- Sorry about that Hoss, it's #2. I've posted my patch with a comment around the test that fails {{TestSolrQueryParser.testDocsWithValuesInField()}}. {code:java} reproduce with: ant test -Dtestcase=TestSolrQueryParser -Dtests.method=testDocsWithValuesInField -Dtests.seed=8945CFEE0F9CB0A8 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=kab-DZ -Dtests.timezone=Brazil/East -Dtests.asserts=true -Dtests.file.encoding=UTF-8{code} I'm pretty sure it's a Solr bug with creating FieldInfo objects, but I'm new to this part of Solr and haven't been able to track down how the IndexOptions get populated yet. > numeric fields need better error handling for prefix/wildcard syntax -- > consider uniform support for "foo:* == foo:[* TO *]" > > > Key: SOLR-11746 > URL: https://issues.apache.org/jira/browse/SOLR-11746 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0 >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch > > > On the solr-user mailing list, Torsten Krah pointed out that with Trie > numeric fields, query syntax such as {{foo_d:\*}} has been functionality > equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported > for Point based numeric fields. > The fact that this type of syntax works (for {{indexed="true"}} Trie fields) > appears to have been an (untested, undocumented) fluke of Trie fields given > that they use indexed terms for the (encoded) numeric terms and inherit the > default implementation of {{FieldType.getPrefixQuery}} which produces a > prefix query against the {{""}} (empty string) term. > (Note that this syntax has aparently _*never*_ worked for Trie fields with > {{indexed="false" docValues="true"}} ) > In general, we should assess the behavior users attempt a prefix/wildcard > syntax query against numeric fields, as currently the behavior is largely > non-sensical: prefix/wildcard syntax frequently match no docs w/o any sort > of error, and the aformentioned {{numeric_field:*}} behaves inconsistently > between points/trie fields and between indexed/docValued trie fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018391#comment-17018391 ] Lucene/Solr QA commented on SOLR-13965: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 44m 53s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 48m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13965 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991242/SOLR-13965.01.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / aad849bf87a | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/652/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/652/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"
[ https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018406#comment-17018406 ] Chris M. Hostetter commented on SOLR-11746: --- [~houston] - i have to step away for a minute, but based on my poking around a bit i think that fundamentally the problem is that – at a lucene level – Point fields just don't seem to ever support (or care about) norms? Unlike other solr fieldtypes, none of the {{...solr.schema.*PointField}} impls ever _create_ or pass a {{...lucene.document.FieldType}} instance (containing the "omitNorms" setting from the SchemaField) to the underlying {{...lucene.document.Field}} instance that they create their {{...solr.schema.FieldType.createField()}} method – because the underlying classes (like {{...lucene.document.IntPoint}} don't _allow_ you to specify you're own FieldType (where you set things like {{omitNorms}} ... instead that's all private & internal to the (point {{Field}} impl There are no existing lucene layer test cases of using Point Fields + NormsFieldExistsQuery, and I'm pretty sure if you tried to write one you'd find that you can never make it pass, because it's physically impossible to create a "Point" field with {{omitNorms=true}} ? BUT ... I don't think this is a bug? ... If you look back at what Uwe said above when he suggested using NormsFieldExistsQuery he was very specific... {quote}If the field has norms enabled (e.g. a text field or StringField with norms), then you can also use NormsFieldExistsQuery: {quote} ...i think fundamentally your patch should be restructured to ensure it never attempts to use NormsFieldExistsQuery with Point based fields? Off the top of my head, i would straw man suggest eliminating the concept of {{getSpecializedExistenceQuery}} and instead just make FieldType use... {code:java} public Query getExistenceQuery(QParser parser, SchemaField field) { if (field.hasDocValues()) { return new DocValuesFieldExistsQuery(field.getName()); } else if (!isPointField() && !field.omitNorms() && field.indexOptions() != IndexOptions.NONE) { return new NormsFieldExistsQuery(field.getName()); } // Default to an unbounded range query return getRangeQuery(...); // ? getSpecializedRangeQuery ? } {code} And let subclasses (like the point fields) override getExistenceQuery as needed. (Although I generally hate the existence of that {{isPointField()}} method, so i'm not fully sold on this idea ... I'm also not really clear on the purpose/need of getSpecializedRangeQuery as opposed to just letting subclasses override {{getRangeQuery(...)}} ... so take this entire suggestion with a grain of salt) > numeric fields need better error handling for prefix/wildcard syntax -- > consider uniform support for "foo:* == foo:[* TO *]" > > > Key: SOLR-11746 > URL: https://issues.apache.org/jira/browse/SOLR-11746 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0 >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch > > > On the solr-user mailing list, Torsten Krah pointed out that with Trie > numeric fields, query syntax such as {{foo_d:\*}} has been functionality > equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported > for Point based numeric fields. > The fact that this type of syntax works (for {{indexed="true"}} Trie fields) > appears to have been an (untested, undocumented) fluke of Trie fields given > that they use indexed terms for the (encoded) numeric terms and inherit the > default implementation of {{FieldType.getPrefixQuery}} which produces a > prefix query against the {{""}} (empty string) term. > (Note that this syntax has aparently _*never*_ worked for Trie fields with > {{indexed="false" docValues="true"}} ) > In general, we should assess the behavior users attempt a prefix/wildcard > syntax query against numeric fields, as currently the behavior is largely > non-sensical: prefix/wildcard syntax frequently match no docs w/o any sort > of error, and the aformentioned {{numeric_field:*}} behaves inconsistently > between points/trie fields and between indexed/docValued trie fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12859) DocExpirationUpdateProcessorFactory does not work with BasicAuth
[ https://issues.apache.org/jira/browse/SOLR-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018407#comment-17018407 ] Lucene/Solr QA commented on SOLR-12859: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 63m 0s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-12859 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991247/SOLR-12859.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / aad849b | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/653/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/653/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > DocExpirationUpdateProcessorFactory does not work with BasicAuth > > > Key: SOLR-12859 > URL: https://issues.apache.org/jira/browse/SOLR-12859 > Project: Solr > Issue Type: Bug >Affects Versions: 7.5 >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12859.patch, SOLR-12859.patch > > > I setup a cluster with basic auth and then wanted to use Solr's TTL feature ( > DocExpirationUpdateProcessorFactory ) to auto-delete documents. > > Turns out it doesn't work when Basic Auth is enabled. I get the following > stacktrace from the logs > {code:java} > 2018-10-12 22:06:38.967 ERROR (autoExpireDocs-42-thread-1) [ ] > o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic > deletion of expired docs: Async exception during distributed update: Error > from server at http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: > require authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > Async exception during distributed update: Error from server at > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6: require > authentication > request: > http://192.168.0.8:8983/solr/gettingstarted_shard2_replica_n6/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.0.8%3A8983%2Fsolr%2Fgettingstarted_shard1_replica_n2%2F&wt=javabin&version=2 > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:964) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi - 2018-09-18 13:07:55] > at > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1976) > ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - > jimczi -
[GitHub] [lucene-solr] msfroh commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit
msfroh commented on a change in pull request #1155: LUCENE-8962: Add ability to selectively merge on commit URL: https://github.com/apache/lucene-solr/pull/1155#discussion_r368182168 ## File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java ## @@ -277,6 +285,92 @@ public void testSetters() { assertSetters(new LogDocMergePolicy()); } + public void testMergeOnCommit() throws IOException, InterruptedException { +Directory dir = newDirectory(); +IndexWriter firstWriter = new IndexWriter(dir, newIndexWriterConfig(new MockAnalyzer(random())) +.setMergePolicy(NoMergePolicy.INSTANCE)); +for (int i = 0; i < 5; i++) { + TestIndexWriter.addDoc(firstWriter); + firstWriter.flush(); +} +DirectoryReader firstReader = DirectoryReader.open(firstWriter); +assertEquals(5, firstReader.leaves().size()); +firstReader.close(); +firstWriter.close(); + +MergePolicy mergeOnCommitPolicy = new LogDocMergePolicy() { + @Override + public MergeSpecification findCommitMerges(SegmentInfos segmentInfos, MergeContext mergeContext) throws IOException { +// Optimize down to a single segment on commit +MergeSpecification mergeSpecification = new MergeSpecification(); +List nonMergingSegments = new ArrayList<>(); +for (SegmentCommitInfo sci : segmentInfos) { + if (mergeContext.getMergingSegments().contains(sci) == false) { +nonMergingSegments.add(sci); + } +} +mergeSpecification.add(new OneMerge(nonMergingSegments)); +return mergeSpecification; + } +}; + +IndexWriter writerWithMergePolicy = new IndexWriter(dir, newIndexWriterConfig(new MockAnalyzer(random())) +.setMergePolicy(mergeOnCommitPolicy)); + +writerWithMergePolicy.commit(); + +DirectoryReader unmergedReader = DirectoryReader.open(writerWithMergePolicy); +assertEquals(5, unmergedReader.leaves().size()); // Don't merge unless there's a change +unmergedReader.close(); + +TestIndexWriter.addDoc(writerWithMergePolicy); +writerWithMergePolicy.commit(); + +DirectoryReader mergedReader = DirectoryReader.open(writerWithMergePolicy); +assertEquals(1, mergedReader.leaves().size()); // Now we merge on commit +mergedReader.close(); + +LineFileDocs lineFileDocs = new LineFileDocs(random()); +int docCount = atLeast(1000); +AtomicInteger indexedDocs = new AtomicInteger(0); +int numIndexingThreads = atLeast(2); +CountDownLatch startingGun = new CountDownLatch(1); +Collection indexingThreads = new ArrayList<>(); +for (int i = 0; i < numIndexingThreads; i++) { + Thread t = new Thread(() -> { +try { + while (indexedDocs.getAndIncrement() < docCount) { +writerWithMergePolicy.addDocument(lineFileDocs.nextDoc()); +if (rarely()) { + writerWithMergePolicy.commit(); +} + } +} catch (IOException e) { + e.printStackTrace(); + fail(); +} + }); + t.start(); + indexingThreads.add(t); +} +startingGun.countDown(); +for (Thread t : indexingThreads) { + t.join(); +} +writerWithMergePolicy.commit(); +assertEquals(1, writerWithMergePolicy.listOfSegmentCommitInfos().size()); Review comment: I just found that this assertion sometimes fails. If there are some pending/running merges left over from the indexing threads, the segments associated with those merges will be excluded from merging on commit. I'll update this test to wait for pending merges to finish before committing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org