[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365108552 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,167 @@ +import org.gradle.api.internal.project.IsolatedAntBuilder + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// This applies the Apache RAT plugin to our source and test files + +// Largely copied from Apache Kafka +apply plugin: RatPlugin +// This invocation needs to go out to each project instead of being here +rat { Review comment: This technically isn't an invocation; typically it configures defaults for some task. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365109858 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,167 @@ +import org.gradle.api.internal.project.IsolatedAntBuilder + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// This applies the Apache RAT plugin to our source and test files + +// Largely copied from Apache Kafka +apply plugin: RatPlugin +// This invocation needs to go out to each project instead of being here +rat { +} + +class RatTask extends DefaultTask { +@Input +List includes = [] + +@Input +List excludes = [] + +def reportDir = project.file('build/rat') +def xmlReport = new File(reportDir, 'rat-report.xml') + +def generateXmlReport(File reportDir) { +// Probably better to use the IsolatedAntBuilder if we can, but it seems to have issues with substringMatcher +// def antBuilder = services.get(IsolatedAntBuilder) + +def ratClasspath = project.configurations.rat +def projectPath = project.getRootDir().getAbsolutePath() +ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: ratClasspath.asPath) +ant.report(format: 'xml', reportFile: xmlReport, addDefaultLicenseMatchers: true) { +fileset(dir: projectPath) { +patternset { +includes.each { +include(name: it) +} +excludes.each { +exclude(name: it) +} +} +} + +// The license rules below were manually copied from lucene/common-build.xml, there is currently no mechanism to sync them + +// BSD 4-clause stuff (is disallowed below) +substringMatcher(licenseFamilyCategory: "BSD4 ", licenseFamilyName: "Original BSD License (with advertising clause)") { +pattern(substring: "All advertising materials") +} + +// BSD-like stuff +substringMatcher(licenseFamilyCategory: "BSD ", licenseFamilyName: "Modified BSD License") { +// brics automaton +pattern(substring: "Copyright (c) 2001-2009 Anders Moeller") +// snowball +pattern(substring: "Copyright (c) 2001, Dr Martin Porter") +// UMASS kstem +pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF MASSACHUSETTS AND OTHER CONTRIBUTORS") +// Egothor +pattern(substring: "Egothor Software License version 1.00") +// JaSpell +pattern(substring: "Copyright (c) 2005 Bruno Martins") +// d3.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS") +// highlight.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS") +} + +// MIT-like +substringMatcher(licenseFamilyCategory: "MIT ", licenseFamilyName:"Modified BSD License") { +// ICU license +pattern(substring: "Permission is hereby granted, free of charge, to any person obtaining a copy") +} + +// Apache +substringMatcher(licenseFamilyCategory: "AL ", licenseFamilyName: "Apache") { +pattern(substring: "Licensed to the Apache Software Foundation (ASF) under") +// this is the old - school one under some files +pattern(substring: "Licensed under the Apache License, Version 2.0 (the "License")") +} + +substringMatcher(licenseFamilyCategory: "GEN ", licenseFamilyName: "Generated") { +// +pattern(substring: "Produced by GNUPLOT") +// +pattern(substring: "This file was generated automatically by the Snowball to Java compiler") +// +pattern(substring: "ANTLR GENERATED CODE") +} + +approvedLicense(familyName: "Apache"
[jira] [Commented] (SOLR-14165) SolrResponse serialVersionUID has changed
[ https://issues.apache.org/jira/browse/SOLR-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012551#comment-17012551 ] Ishan Chattopadhyaya commented on SOLR-14165: - Thank you very much [~andywebb1975] for discovering and fixing this bug. And also for pushing us for inclusion into 8.4.1, thus ensuring we don't drop the ball. > SolrResponse serialVersionUID has changed > - > > Key: SOLR-14165 > URL: https://issues.apache.org/jira/browse/SOLR-14165 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.4 >Reporter: Andy Webb >Assignee: Noble Paul >Priority: Blocker > Fix For: 8.4.1 > > Time Spent: 40m > Remaining Estimate: 0h > > SOLR-13821 changed the signature of > {{org.apache.solr.client.solrj.SolrResponse}}, making serialisations of the > class incompatible between versions. > Original text from SOLR-13821: > {quote} > hi, > We've been experimenting with doing a rolling in-place upgrade from Solr > 8.3.1 to 8.4.0 on a non-production system, but have found that we get this > exception for some operations, including when requesting > /solr/admin/collections?action=overseerstatus on a node whose version is > inconsistent with the overseer: > java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse; > local class incompatible: stream classdesc serialVersionUID = > -7931100103360242645, local class serialVersionUID = 2239939671435624715 > As far as I can see, this is due to the change to the SolrResponse class's > signature in commit e3bd5a7. My experimentation has shown that if the > serialVersionUID of that class is set explicitly to its previous value the > exception no longer occurs. > I'm not sure if this is a necessary or good fix, but I wanted to share this > issue with you in case it's something that you think needs resolving. > thanks, > Andy > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14066) Deprecate DIH
[ https://issues.apache.org/jira/browse/SOLR-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-14066: Description: DataImportHandler has outlived its utility. DIH doesn't need to remain inside Solr anymore. Plan is to deprecate DIH in 8.5, remove from 9.0. Also, work on handing it off to volunteers in the community (so far, [~rohitcse] has volunteered to maintain it). (was: DataImportHandler has outlived its utility. DIH doesn't need to remain inside Solr anymore. Let us deprecate DIH in 8.4 (and remove it from the Solr distro in 9x or 10x).) > Deprecate DIH > - > > Key: SOLR-14066 > URL: https://issues.apache.org/jira/browse/SOLR-14066 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: image-2019-12-14-19-58-39-314.png > > Time Spent: 40m > Remaining Estimate: 0h > > DataImportHandler has outlived its utility. DIH doesn't need to remain inside > Solr anymore. Plan is to deprecate DIH in 8.5, remove from 9.0. Also, work on > handing it off to volunteers in the community (so far, [~rohitcse] has > volunteered to maintain it). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1158: LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`.
jpountz opened a new pull request #1158: LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`. URL: https://github.com/apache/lucene-solr/pull/1158 All the metadata can be directly encoded in the `DataOutput`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata
[ https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012561#comment-17012561 ] Adrien Grand commented on LUCENE-9116: -- [~dsmiley] The attached pull request resurrects the FST postings format. Note that because of the other change, it now stores all outputs on final arcs. But I wonder that this was probably mostly the case previously already, and this doesn't seem to be a requirement for SolrTagger, which seems to only use this postings format because it needs a fast terms dictionary? > Simplify postings API by removing long[] metadata > - > > Key: LUCENE-9116 > URL: https://issues.apache.org/jira/browse/LUCENE-9116 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The postings API allows to store metadata about a term either in a long[] or > in a byte[]. This is unnecessary as all information could be encoded in the > byte[], which is what most codecs do in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9098) Report problematic term value when fuzzy query is too complex
[ https://issues.apache.org/jira/browse/LUCENE-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012567#comment-17012567 ] Jim Ferenczi commented on LUCENE-9098: -- The CI found a reproducible failure with this change: {noformat} ant test -Dtestcase=TestFuzzyQuery -Dtests.method=testErrorMessage -Dtests.seed=CE3DF037C6D29401 -Dtests.slow=true -Dtests.locale=fr-GN -Dtests.timezone=US/Pacific-New -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 {noformat} [~mdrob] can you take a look ? > Report problematic term value when fuzzy query is too complex > - > > Key: LUCENE-9098 > URL: https://issues.apache.org/jira/browse/LUCENE-9098 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is the lucene compliment to SOLR-13190, when fuzzy query gets a term > that expands to too many states, we throw an exception but don't provide > insight on the problematic term. We should improve the error reporting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-9077: Description: This task focuses on providing gradle-based build equivalent for Lucene and Solr (on master branch). See notes below on why this respin is needed. The code lives on *gradle-master* branch. It is kept with sync with *master*. Try running the following to see an overview of helper guides concerning typical workflow, testing and ant-migration helpers: gradlew :help A list of items that needs to be added or requires work. If you'd like to work on any of these, please add your name to the list. Once you have a patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. * (/) Apply forbiddenAPIs * (/) Generate hardware-aware gradle defaults for parallelism (count of workers and test JVMs). * (/) Fail the build if --tests filter is applied and no tests execute during the entire build (this allows for an empty set of filtered tests at single project level). * (/) Port other settings and randomizations from common-build.xml * (/) Configure security policy/ sandboxing for tests. * (/) test's console output on -Ptests.verbose=true * (/) add a :helpDeps explanation to how the dependency system works (palantir plugin, lockfile) and how to retrieve structured information about current dependencies of a given module (in a tree-like output). * (/) jar checksums, jar checksum computation and validation. This should be done without intermediate folders (directly on dependency sets). * (/) verify min. JVM version and exact gradle version on build startup to minimize odd build side-effects * (/) Repro-line for failed tests/ runs. * (/) add a top-level README note about building with gradle (and the required JVM). * (/) add an equivalent of 'validate-source-patterns' (check-source-patterns.groovy) to precommit. * add an equivalent of 'rat-sources' to precommit. * add an equivalent of 'check-example-lucene-match-version' (solr only) to precommit. * add an equivalent of 'documentation-lint" to precommit. Hard-to-implement stuff already investigated: * (/) (done) -*Printing console output of failed tests.* There doesn't seem to be any way to do this in a reasonably efficient way. There are onOutput listeners but they're slow to operate and solr tests emit *tons* of output so it's an overkill.- * (!) (LUCENE-9120) *Tests working with security-debug logs or other JVM-early log output*. Gradle's test runner works by redirecting Java's stdout/ syserr so this just won't work. Perhaps we can spin the ant-based test runner for such corner-cases. Of lesser importance: * add rendering of javadocs (gradlew javadoc) and attaching them to maven publications. * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid it'll be difficult to run it sensibly because gradle doesn't offer cwd separation for the forked test runners. * if you diff solr packaged distribution against ant-created distribution there are minor differences in library versions and some JARs are excluded/ moved around. I didn't try to force these as everything seems to work (tests, etc.) – perhaps these differences should be fixed in the ant build instead. * identify and port any other "check" utilities that may be called from ant. (Mark's branch has some of this stuff already implemented) * [EOE] identify and port various "regenerate" tasks from ant builds (javacc, precompiled automata, etc.) * fill in POM details in gradle/defaults-maven.gradle so that they reflect the previous content better (dependencies aside). * Add any IDE integration layers that should be added (I use IntelliJ and it imports the project out of the box, without the need for any special tuning). * *Clean up dependencies, especially for Solr*: any \{ transitive = false } should just explicitly exclude whatever they don't need (and their dependencies currently declared explicitly should be folded). Figure out which scope to import a dependency to. * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; currently XSLT...) * I didn't bother adding Solr dist/test-framework to packaging (who'd use it from a binary distribution? *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and Cao Mạnh Đạt but also applies lessons learned from those two efforts: * *Do not try to do too many things at once*. If we deviate too far from master, the branch will be hard to merge. * *Do everything in baby-steps* and add small, independent build fragments replacing the old ant infrastructure. * *Try to engage people to run, test and contribute early*. It can't be a one-man effort. The more people understand and can contribute to the build, the more healthy it will be. was: This task focuses on providing gradle-based build equivalent for Lucene and Solr (on ma
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365184348 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,167 @@ +import org.gradle.api.internal.project.IsolatedAntBuilder + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// This applies the Apache RAT plugin to our source and test files + +// Largely copied from Apache Kafka +apply plugin: RatPlugin +// This invocation needs to go out to each project instead of being here +rat { +} + +class RatTask extends DefaultTask { +@Input +List includes = [] + +@Input +List excludes = [] + +def reportDir = project.file('build/rat') +def xmlReport = new File(reportDir, 'rat-report.xml') + +def generateXmlReport(File reportDir) { +// Probably better to use the IsolatedAntBuilder if we can, but it seems to have issues with substringMatcher +// def antBuilder = services.get(IsolatedAntBuilder) + +def ratClasspath = project.configurations.rat +def projectPath = project.getRootDir().getAbsolutePath() Review comment: This is wrong I think. It should be project.projectDir This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365185129 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,167 @@ +import org.gradle.api.internal.project.IsolatedAntBuilder + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// This applies the Apache RAT plugin to our source and test files + +// Largely copied from Apache Kafka +apply plugin: RatPlugin +// This invocation needs to go out to each project instead of being here +rat { +} + +class RatTask extends DefaultTask { +@Input +List includes = [] + +@Input +List excludes = [] + +def reportDir = project.file('build/rat') +def xmlReport = new File(reportDir, 'rat-report.xml') + +def generateXmlReport(File reportDir) { +// Probably better to use the IsolatedAntBuilder if we can, but it seems to have issues with substringMatcher +// def antBuilder = services.get(IsolatedAntBuilder) + +def ratClasspath = project.configurations.rat +def projectPath = project.getRootDir().getAbsolutePath() +ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: ratClasspath.asPath) +ant.report(format: 'xml', reportFile: xmlReport, addDefaultLicenseMatchers: true) { +fileset(dir: projectPath) { +patternset { +includes.each { +include(name: it) +} +excludes.each { +exclude(name: it) +} +} +} + +// The license rules below were manually copied from lucene/common-build.xml, there is currently no mechanism to sync them + +// BSD 4-clause stuff (is disallowed below) +substringMatcher(licenseFamilyCategory: "BSD4 ", licenseFamilyName: "Original BSD License (with advertising clause)") { +pattern(substring: "All advertising materials") +} + +// BSD-like stuff +substringMatcher(licenseFamilyCategory: "BSD ", licenseFamilyName: "Modified BSD License") { +// brics automaton +pattern(substring: "Copyright (c) 2001-2009 Anders Moeller") +// snowball +pattern(substring: "Copyright (c) 2001, Dr Martin Porter") +// UMASS kstem +pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF MASSACHUSETTS AND OTHER CONTRIBUTORS") +// Egothor +pattern(substring: "Egothor Software License version 1.00") +// JaSpell +pattern(substring: "Copyright (c) 2005 Bruno Martins") +// d3.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS") +// highlight.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS") +} + +// MIT-like +substringMatcher(licenseFamilyCategory: "MIT ", licenseFamilyName:"Modified BSD License") { +// ICU license +pattern(substring: "Permission is hereby granted, free of charge, to any person obtaining a copy") +} + +// Apache +substringMatcher(licenseFamilyCategory: "AL ", licenseFamilyName: "Apache") { +pattern(substring: "Licensed to the Apache Software Foundation (ASF) under") +// this is the old - school one under some files +pattern(substring: "Licensed under the Apache License, Version 2.0 (the "License")") +} + +substringMatcher(licenseFamilyCategory: "GEN ", licenseFamilyName: "Generated") { +// +pattern(substring: "Produced by GNUPLOT") +// +pattern(substring: "This file was generated automatically by the Snowball to Java compiler") +// +pattern(substring: "ANTLR GENERATED CODE") +} + +approvedLicense(familyName: "Apache"
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle
dweiss commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365185988 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,167 @@ +import org.gradle.api.internal.project.IsolatedAntBuilder + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// This applies the Apache RAT plugin to our source and test files + +// Largely copied from Apache Kafka +apply plugin: RatPlugin +// This invocation needs to go out to each project instead of being here +rat { +} + +class RatTask extends DefaultTask { +@Input +List includes = [] + +@Input +List excludes = [] + +def reportDir = project.file('build/rat') +def xmlReport = new File(reportDir, 'rat-report.xml') + +def generateXmlReport(File reportDir) { +// Probably better to use the IsolatedAntBuilder if we can, but it seems to have issues with substringMatcher +// def antBuilder = services.get(IsolatedAntBuilder) + +def ratClasspath = project.configurations.rat +def projectPath = project.getRootDir().getAbsolutePath() +ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: ratClasspath.asPath) +ant.report(format: 'xml', reportFile: xmlReport, addDefaultLicenseMatchers: true) { +fileset(dir: projectPath) { +patternset { +includes.each { +include(name: it) +} +excludes.each { +exclude(name: it) +} +} +} + +// The license rules below were manually copied from lucene/common-build.xml, there is currently no mechanism to sync them + +// BSD 4-clause stuff (is disallowed below) +substringMatcher(licenseFamilyCategory: "BSD4 ", licenseFamilyName: "Original BSD License (with advertising clause)") { +pattern(substring: "All advertising materials") +} + +// BSD-like stuff +substringMatcher(licenseFamilyCategory: "BSD ", licenseFamilyName: "Modified BSD License") { +// brics automaton +pattern(substring: "Copyright (c) 2001-2009 Anders Moeller") +// snowball +pattern(substring: "Copyright (c) 2001, Dr Martin Porter") +// UMASS kstem +pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF MASSACHUSETTS AND OTHER CONTRIBUTORS") +// Egothor +pattern(substring: "Egothor Software License version 1.00") +// JaSpell +pattern(substring: "Copyright (c) 2005 Bruno Martins") +// d3.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS") +// highlight.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS") +} + +// MIT-like +substringMatcher(licenseFamilyCategory: "MIT ", licenseFamilyName:"Modified BSD License") { +// ICU license +pattern(substring: "Permission is hereby granted, free of charge, to any person obtaining a copy") +} + +// Apache +substringMatcher(licenseFamilyCategory: "AL ", licenseFamilyName: "Apache") { +pattern(substring: "Licensed to the Apache Software Foundation (ASF) under") +// this is the old - school one under some files +pattern(substring: "Licensed under the Apache License, Version 2.0 (the "License")") +} + +substringMatcher(licenseFamilyCategory: "GEN ", licenseFamilyName: "Generated") { +// +pattern(substring: "Produced by GNUPLOT") +// +pattern(substring: "This file was generated automatically by the Snowball to Java compiler") +// +pattern(substring: "ANTLR GENERATED CODE") +} + +approvedLicense(familyName: "Apache"
[jira] [Updated] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-9077: Description: This task focuses on providing gradle-based build equivalent for Lucene and Solr (on master branch). See notes below on why this respin is needed. The code lives on *gradle-master* branch. It is kept with sync with *master*. Try running the following to see an overview of helper guides concerning typical workflow, testing and ant-migration helpers: gradlew :help A list of items that needs to be added or requires work. If you'd like to work on any of these, please add your name to the list. Once you have a patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. * (/) Apply forbiddenAPIs * (/) Generate hardware-aware gradle defaults for parallelism (count of workers and test JVMs). * (/) Fail the build if --tests filter is applied and no tests execute during the entire build (this allows for an empty set of filtered tests at single project level). * (/) Port other settings and randomizations from common-build.xml * (/) Configure security policy/ sandboxing for tests. * (/) test's console output on -Ptests.verbose=true * (/) add a :helpDeps explanation to how the dependency system works (palantir plugin, lockfile) and how to retrieve structured information about current dependencies of a given module (in a tree-like output). * (/) jar checksums, jar checksum computation and validation. This should be done without intermediate folders (directly on dependency sets). * (/) verify min. JVM version and exact gradle version on build startup to minimize odd build side-effects * (/) Repro-line for failed tests/ runs. * (/) add a top-level README note about building with gradle (and the required JVM). * (/) add an equivalent of 'validate-source-patterns' (check-source-patterns.groovy) to precommit. * add an equivalent of 'rat-sources' to precommit. * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) to precommit. * add an equivalent of 'documentation-lint" to precommit. Hard-to-implement stuff already investigated: * (/) (done) -*Printing console output of failed tests.* There doesn't seem to be any way to do this in a reasonably efficient way. There are onOutput listeners but they're slow to operate and solr tests emit *tons* of output so it's an overkill.- * (!) (LUCENE-9120) *Tests working with security-debug logs or other JVM-early log output*. Gradle's test runner works by redirecting Java's stdout/ syserr so this just won't work. Perhaps we can spin the ant-based test runner for such corner-cases. Of lesser importance: * add rendering of javadocs (gradlew javadoc) and attaching them to maven publications. * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid it'll be difficult to run it sensibly because gradle doesn't offer cwd separation for the forked test runners. * if you diff solr packaged distribution against ant-created distribution there are minor differences in library versions and some JARs are excluded/ moved around. I didn't try to force these as everything seems to work (tests, etc.) – perhaps these differences should be fixed in the ant build instead. * identify and port any other "check" utilities that may be called from ant. (Mark's branch has some of this stuff already implemented) * [EOE] identify and port various "regenerate" tasks from ant builds (javacc, precompiled automata, etc.) * fill in POM details in gradle/defaults-maven.gradle so that they reflect the previous content better (dependencies aside). * Add any IDE integration layers that should be added (I use IntelliJ and it imports the project out of the box, without the need for any special tuning). * *Clean up dependencies, especially for Solr*: any \{ transitive = false } should just explicitly exclude whatever they don't need (and their dependencies currently declared explicitly should be folded). Figure out which scope to import a dependency to. * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; currently XSLT...) * I didn't bother adding Solr dist/test-framework to packaging (who'd use it from a binary distribution? *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and Cao Mạnh Đạt but also applies lessons learned from those two efforts: * *Do not try to do too many things at once*. If we deviate too far from master, the branch will be hard to merge. * *Do everything in baby-steps* and add small, independent build fragments replacing the old ant infrastructure. * *Try to engage people to run, test and contribute early*. It can't be a one-man effort. The more people understand and can contribute to the build, the more healthy it will be. was: This task focuses on providing gradle-based build equivalent for Lucene and Solr (o
[jira] [Updated] (LUCENE-9077) Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-9077: Description: This task focuses on providing gradle-based build equivalent for Lucene and Solr (on master branch). See notes below on why this respin is needed. The code lives on *gradle-master* branch. It is kept with sync with *master*. Try running the following to see an overview of helper guides concerning typical workflow, testing and ant-migration helpers: gradlew :help A list of items that needs to be added or requires work. If you'd like to work on any of these, please add your name to the list. Once you have a patch/ pull request let me (dweiss) know - I'll try to coordinate the merges. * (/) Apply forbiddenAPIs * (/) Generate hardware-aware gradle defaults for parallelism (count of workers and test JVMs). * (/) Fail the build if --tests filter is applied and no tests execute during the entire build (this allows for an empty set of filtered tests at single project level). * (/) Port other settings and randomizations from common-build.xml * (/) Configure security policy/ sandboxing for tests. * (/) test's console output on -Ptests.verbose=true * (/) add a :helpDeps explanation to how the dependency system works (palantir plugin, lockfile) and how to retrieve structured information about current dependencies of a given module (in a tree-like output). * (/) jar checksums, jar checksum computation and validation. This should be done without intermediate folders (directly on dependency sets). * (/) verify min. JVM version and exact gradle version on build startup to minimize odd build side-effects * (/) Repro-line for failed tests/ runs. * (/) add a top-level README note about building with gradle (and the required JVM). * (/) add an equivalent of 'validate-source-patterns' (check-source-patterns.groovy) to precommit. * add an equivalent of 'rat-sources' to precommit. * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) to precommit. * add an equivalent of 'documentation-lint" to precommit. Hard-to-implement stuff already investigated: * (/) (done) -*Printing console output of failed tests.* There doesn't seem to be any way to do this in a reasonably efficient way. There are onOutput listeners but they're slow to operate and solr tests emit *tons* of output so it's an overkill.- * (!) (LUCENE-9120) *Tests working with security-debug logs or other JVM-early log output*. Gradle's test runner works by redirecting Java's stdout/ syserr so this just won't work. Perhaps we can spin the ant-based test runner for such corner-cases. Of lesser importance: * add rendering of javadocs (gradlew javadoc) and attaching them to maven publications. * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid it'll be difficult to run it sensibly because gradle doesn't offer cwd separation for the forked test runners. * if you diff solr packaged distribution against ant-created distribution there are minor differences in library versions and some JARs are excluded/ moved around. I didn't try to force these as everything seems to work (tests, etc.) – perhaps these differences should be fixed in the ant build instead. * [EOE] identify and port various "regenerate" tasks from ant builds (javacc, precompiled automata, etc.) * fill in POM details in gradle/defaults-maven.gradle so that they reflect the previous content better (dependencies aside). * Add any IDE integration layers that should be added (I use IntelliJ and it imports the project out of the box, without the need for any special tuning). * *Clean up dependencies, especially for Solr*: any \{ transitive = false } should just explicitly exclude whatever they don't need (and their dependencies currently declared explicitly should be folded). Figure out which scope to import a dependency to. * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; currently XSLT...) * I didn't bother adding Solr dist/test-framework to packaging (who'd use it from a binary distribution? *{color:#ff}Note:{color}* this builds on the work done by Mark Miller and Cao Mạnh Đạt but also applies lessons learned from those two efforts: * *Do not try to do too many things at once*. If we deviate too far from master, the branch will be hard to merge. * *Do everything in baby-steps* and add small, independent build fragments replacing the old ant infrastructure. * *Try to engage people to run, test and contribute early*. It can't be a one-man effort. The more people understand and can contribute to the build, the more healthy it will be. was: This task focuses on providing gradle-based build equivalent for Lucene and Solr (on master branch). See notes below on why this respin is needed. The code lives on *gradle-master* branch. It is kept with sync with *mast
[jira] [Commented] (SOLR-14158) package manager to read keys from packagestore and not ZK
[ https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012773#comment-17012773 ] ASF subversion and git services commented on SOLR-14158: Commit 6fb085943c6e9c6f82db67c6ccfe641e64e1899e in lucene-solr's branch refs/heads/gradle-master from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6fb0859 ] SOLR-14158: Package manager to read keys from package store, not ZK > package manager to read keys from packagestore and not ZK > -- > > Key: SOLR-14158 > URL: https://issues.apache.org/jira/browse/SOLR-14158 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: packages >Affects Versions: 8.4 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Blocker > Labels: packagemanager > Fix For: 8.4.1 > > Time Spent: 1h > Remaining Estimate: 0h > > The security of the package system relies on securing ZK. It's much easier > for users to secure the file system than securing ZK. > We provide an option to read public keys from file store. > This will > * Have a special directory called {{_trusted_}} . Direct writes are forbidden > to that directory over http > * The CLI directly writes to the keys to > {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to > fetch the public key files from that node > * Package artifacts will continue to be uploaded over http -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-11554) Support handling OPTIONS request for Hadoop authentication filter
[ https://issues.apache.org/jira/browse/SOLR-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gézapeti updated SOLR-11554: Attachment: SOLR-11554.patch > Support handling OPTIONS request for Hadoop authentication filter > - > > Key: SOLR-11554 > URL: https://issues.apache.org/jira/browse/SOLR-11554 > Project: Solr > Issue Type: Bug >Affects Versions: 6.4 >Reporter: Hrishikesh Gadre >Priority: Minor > Attachments: SOLR-11554.patch > > > As part of SOLR-9513 we added a Solr authentication plugin which uses Hadoop > security framework. The HTTP client interface provided by Hadoop framework > does not send the authentication information preemptively. Instead it sends > an OPTIONS request first. If the server responds with 401 error, it resends > the request with the proper authentication information. This jira is to > handle the OPTIONS request as part of the Solr authentication plugin for > Hadoop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties
Andrzej Bialecki created SOLR-14182: --- Summary: Move metric reporters config from solr.xml to ZK cluster properties Key: SOLR-14182 URL: https://issues.apache.org/jira/browse/SOLR-14182 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 8.4 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 8.5 Metric reporters are currently configured statically in solr.xml, which makes it difficult to change dynamically or in a containerized environment. We should move this section to ZK /cluster.properties and add a back-compat migration shim. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs
[ https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012875#comment-17012875 ] ASF subversion and git services commented on SOLR-14130: Commit d68f3e1a441b39485900e9d94e9686fb12b4ff87 in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d68f3e1 ] SOLR-14130: Improve robustness of the logs parser > Add postlogs command line tool for indexing Solr logs > - > > Key: SOLR-14130 > URL: https://issues.apache.org/jira/browse/SOLR-14130 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 > PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at > 8.46.51 AM.png > > > This ticket adds a simple command line tool for posting Solr logs to a solr > index. The tool works with the out of the box Solr log format. Still a work > in progress but currently indexes: > * queries > * updates > * commits > * new searchers > * errors - including stack traces > Attached are some sample visualizations using Solr Streaming Expressions and > Math Expressions after the data has been loaded. The visualizations show: > time series, scatter plots, histograms and quantile plots, but really this is > just scratching the surface of the visualizations that can be done with the > Solr logs. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs
[ https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012881#comment-17012881 ] ASF subversion and git services commented on SOLR-14130: Commit 1cb085afcbc04e861b76955bfd4944141c47d1ad in lucene-solr's branch refs/heads/branch_8x from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1cb085a ] SOLR-14130: Improve robustness of the logs parser > Add postlogs command line tool for indexing Solr logs > - > > Key: SOLR-14130 > URL: https://issues.apache.org/jira/browse/SOLR-14130 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, > Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 > PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at > 8.46.51 AM.png > > > This ticket adds a simple command line tool for posting Solr logs to a solr > index. The tool works with the out of the box Solr log format. Still a work > in progress but currently indexes: > * queries > * updates > * commits > * new searchers > * errors - including stack traces > Attached are some sample visualizations using Solr Streaming Expressions and > Math Expressions after the data has been loaded. The visualizations show: > time series, scatter plots, histograms and quantile plots, but really this is > just scratching the surface of the visualizations that can be done with the > Solr logs. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13892) Add postfilter support to {!join} queries
[ https://issues.apache.org/jira/browse/SOLR-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012909#comment-17012909 ] ASF subversion and git services commented on SOLR-13892: Commit 4712524860553504a0557810eebc43db54bb8ce9 in lucene-solr's branch refs/heads/jira/SOLR-13892 from Jason Gerlowski [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4712524 ] SOLR-13892: Add "join" postfilter implementation > Add postfilter support to {!join} queries > - > > Key: SOLR-13892 > URL: https://issues.apache.org/jira/browse/SOLR-13892 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13892.patch, SOLR-13892.patch > > > The JoinQParserPlugin would be a lot performant in many use-cases if it could > operate as a post-filter, especially when doc-values for the involved fields > are available. > With this issue, I'd like to propose a post-filter implementation for the > {{join}} qparser. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
Bruno Roustant created LUCENE-9125: -- Summary: Improve Automaton.step() with binary search and introduce Automaton.next() Key: LUCENE-9125 URL: https://issues.apache.org/jira/browse/LUCENE-9125 Project: Lucene - Core Issue Type: Improvement Reporter: Bruno Roustant Implement the existing todo in Automaton.step() (lookup a transition from a source state depending on a given label) to use binary search since the transitions are sorted. Introduce new method Automaton.next() to optimize iteration & lookup over all the transitions of a state. This will be used in RunAutomaton constructor and in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on issue #1159: SOLR-13892: Add top-level docValues "join" implementation
gerlowskija commented on issue #1159: SOLR-13892: Add top-level docValues "join" implementation URL: https://github.com/apache/lucene-solr/pull/1159#issuecomment-573054962 This is not ready yet. But I figured it was far enough along to let others provide in-line review if they'd like. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija opened a new pull request #1159: SOLR-13892: Add top-level docValues "join" implementation
gerlowskija opened a new pull request #1159: SOLR-13892: Add top-level docValues "join" implementation URL: https://github.com/apache/lucene-solr/pull/1159 # Description Many "join" use-cases can be made more performant by using "top-level" docValues data structures, instead of the per-segment structures that are used currently. Users should have the ability to pick between top-level and per-segment, based on the particulars of their index and use case. # Solution This PR introduces a "top-level" implementation in the form of a "join" postfilter. Users get the "top-level" behavior by specifying `cache=false cost=101` as local params on their join. We may decide to repackage this implementation as a Two-Phase Iterator before merging, though that is still up in the air. # Tests Functional tests have been added in TestJoin.java. Performance tests validating the improvement performance in select use-cases can be found in the comments on SOLR-13892. # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [X] I have developed this patch against the `master` branch. - [X] I have run `ant precommit` and the appropriate test suite. - [X] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija edited a comment on issue #1159: SOLR-13892: Add top-level docValues "join" implementation
gerlowskija edited a comment on issue #1159: SOLR-13892: Add top-level docValues "join" implementation URL: https://github.com/apache/lucene-solr/pull/1159#issuecomment-573054962 This is not ready yet. But I figured it was far enough along to let others provide in-line review if they'd like. Still needed: - review of tests - clarify/unify ref-guide coverage on types of joins and when each should be used. - minor cleanup This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository url.
zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository url. URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573055429 @joel-bernstein Cloudera is hosting a mirror of the restlet repository now, so I've added the new repo url instead of the old one which is not in use since 6.6/7.0. (dependent feat added in https://issues.apache.org/jira/browse/SOLR-1301, removed in https://issues.apache.org/jira/browse/SOLR-9221 ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search
bruno-roustant opened a new pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search URL: https://github.com/apache/lucene-solr/pull/1160 and introduce Automaton.next(). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar
[ https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012918#comment-17012918 ] Zsolt Gyulavari commented on SOLR-13756: [~jbernste] I'm happy to announce that Cloudera is hosting a mirror of the restlet repository from now. I've added the new repo url instead of the old one which is not in use since 6.6/7.0. (dependent feature added in https://issues.apache.org/jira/browse/SOLR-1301, removed in https://issues.apache.org/jira/browse/SOLR-9221 ) > ivy cannot download org.restlet.ext.servlet jar > --- > > Key: SOLR-13756 > URL: https://issues.apache.org/jira/browse/SOLR-13756 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chongchen Chen >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > I checkout the project and run `ant idea`, it will try to download jars. But > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > will return 404 now. > [ivy:retrieve] public: tried > [ivy:retrieve] > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > [ivy:retrieve]:: > [ivy:retrieve]:: FAILED DOWNLOADS:: > [ivy:retrieve]:: ^ see resolution messages for details ^ :: > [ivy:retrieve]:: > [ivy:retrieve]:: > org.restlet.jee#org.restlet;2.3.0!org.restlet.jar > [ivy:retrieve]:: > org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar > [ivy:retrieve]:: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012926#comment-17012926 ] Bruno Roustant commented on LUCENE-9125: I benchmarked using non-trivial automata for automaton/fuzzy queries. Making Automaton.step() use binary search reduces step() call time by -20% (and obviously it becomes O(log(n))). By introducing Automaton.next() to iterate & lookup more efficiently through a state transitions, I measured -40% binary search loop executions. This is a net gain with same functional logic, because each time we increase the lower bound for the binary search instead of always starting from the first transition. This will result in faster AutomatonQuery and FuzzyQuery construction. > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012926#comment-17012926 ] Bruno Roustant edited comment on LUCENE-9125 at 1/10/20 2:37 PM: - I benchmarked using non-trivial automata for automaton/fuzzy queries. Making Automaton.step() use binary search reduces step() call time by -20% (and obviously it becomes O(log n)). By introducing Automaton.next() to iterate & lookup more efficiently through a state transitions, I measured -40% binary search loop executions. This is a net gain with same functional logic, because each time we increase the lower bound for the binary search instead of always starting from the first transition. This will result in faster AutomatonQuery and FuzzyQuery construction. was (Author: broustant): I benchmarked using non-trivial automata for automaton/fuzzy queries. Making Automaton.step() use binary search reduces step() call time by -20% (and obviously it becomes O(log(n))). By introducing Automaton.next() to iterate & lookup more efficiently through a state transitions, I measured -40% binary search loop executions. This is a net gain with same functional logic, because each time we increase the lower bound for the binary search instead of always starting from the first transition. This will result in faster AutomatonQuery and FuzzyQuery construction. > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()
[ https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012926#comment-17012926 ] Bruno Roustant edited comment on LUCENE-9125 at 1/10/20 2:39 PM: - I benchmarked using non-trivial automata for automaton/fuzzy queries. Making Automaton.step() use binary search reduced step() call time by -20% in my tests (and obviously it becomes O(log n)). By introducing Automaton.next() to iterate & lookup more efficiently through a state transitions, I measured -40% binary search loop executions. This is a net gain with same functional logic, because each time we increase the lower bound for the binary search instead of always starting from the first transition. This will result in faster AutomatonQuery and FuzzyQuery construction. was (Author: broustant): I benchmarked using non-trivial automata for automaton/fuzzy queries. Making Automaton.step() use binary search reduces step() call time by -20% (and obviously it becomes O(log n)). By introducing Automaton.next() to iterate & lookup more efficiently through a state transitions, I measured -40% binary search loop executions. This is a net gain with same functional logic, because each time we increase the lower bound for the binary search instead of always starting from the first transition. This will result in faster AutomatonQuery and FuzzyQuery construction. > Improve Automaton.step() with binary search and introduce Automaton.next() > -- > > Key: LUCENE-9125 > URL: https://issues.apache.org/jira/browse/LUCENE-9125 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Implement the existing todo in Automaton.step() (lookup a transition from a > source state depending on a given label) to use binary search since the > transitions are sorted. > Introduce new method Automaton.next() to optimize iteration & lookup over all > the transitions of a state. This will be used in RunAutomaton constructor and > in MinimizationOperations.minimize(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13892) Add postfilter support to {!join} queries
[ https://issues.apache.org/jira/browse/SOLR-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-13892: --- Attachment: join-increasing-from-matches1.png > Add postfilter support to {!join} queries > - > > Key: SOLR-13892 > URL: https://issues.apache.org/jira/browse/SOLR-13892 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13892.patch, SOLR-13892.patch, > join-increasing-from-matches1.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > The JoinQParserPlugin would be a lot performant in many use-cases if it could > operate as a post-filter, especially when doc-values for the involved fields > are available. > With this issue, I'd like to propose a post-filter implementation for the > {{join}} qparser. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012942#comment-17012942 ] Tomoko Uchida commented on LUCENE-9004: --- Hi, it seems that some devs are strongly interested in this issue and I privately have received feedback (and expectations). So I just wanted to share my latest WIP branch. [https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2] And an usage code snippet for that is: [https://gist.github.com/mocobeta/a5b18506ebc933c0afa7ab61d1dd2295] I introduced a brand new codecs and indexer for vector search so this no longer depends on DocValues, though it's still on pretty early stage (especially, segment merging is not yet implemented). I intend to continue to work and I'll do my best, but to be honest I am not sure if my approach is the best - or I can create a great patch that can be merged to Lucene core... I welcome that someone takes over it in some different, more sophisticated/efficient ways. My current attempt might be useful as a reference or the starting point. > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layered_graph.png > > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch when merging. The > process is going to be limited, at least initially, to graphs that can fit > in RAM since we require random access to the entire graph while constructing > it: In order to add links bidirectionally we must continually update existing > documents. > I think we want to express this API to users as a single joint > {{KnnGraphField}} abstraction that joins together the vectors and the graph > as a single joi
[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012942#comment-17012942 ] Tomoko Uchida edited comment on LUCENE-9004 at 1/10/20 3:02 PM: Hi, it seems that some devs are strongly interested in this issue and I privately have received feedback (and expectations). So I just wanted to share my latest WIP branch. [https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2] And here is an usage code snippet for that: [https://gist.github.com/mocobeta/a5b18506ebc933c0afa7ab61d1dd2295] I introduced a brand new codec and indexer for vector search so this no longer depends on DocValues, though it's still on pretty early stage (especially, segment merging is not yet implemented). I intend to continue to work and I'll do my best, but to be honest I am not sure if my approach is the best - or I can create a great patch that can be merged to Lucene core... I welcome that someone takes over it in some different, more sophisticated/efficient ways. My current attempt might be useful as a reference or the starting point. was (Author: tomoko uchida): Hi, it seems that some devs are strongly interested in this issue and I privately have received feedback (and expectations). So I just wanted to share my latest WIP branch. [https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2] And an usage code snippet for that is: [https://gist.github.com/mocobeta/a5b18506ebc933c0afa7ab61d1dd2295] I introduced a brand new codecs and indexer for vector search so this no longer depends on DocValues, though it's still on pretty early stage (especially, segment merging is not yet implemented). I intend to continue to work and I'll do my best, but to be honest I am not sure if my approach is the best - or I can create a great patch that can be merged to Lucene core... I welcome that someone takes over it in some different, more sophisticated/efficient ways. My current attempt might be useful as a reference or the starting point. > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layered_graph.png > > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural a
[jira] [Commented] (SOLR-13892) Add postfilter support to {!join} queries
[ https://issues.apache.org/jira/browse/SOLR-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012945#comment-17012945 ] Jason Gerlowski commented on SOLR-13892: Revisiting this after some time away. I've moved the code to a PR here: https://github.com/apache/lucene-solr/pull/1159. I've found this confusing on some jiras, so to be explicit: *all patches on this jira predate the PR and are out of date*. The main (albeit, temporary) addition to the PR at this point is a performance test driver I threw together to show a use-case where the top-level DV approach shines performance-wise. The performance test mimics a common setup for doing document-level authorization: a "user_acls" collection has users and the groups they belong to, and a "products" collection has product records with a field representing the groups that record is visible to. The performance test stages the "user_acls" data with 100 users (user1, user2 ...user100): each belonging to an increasing number of groups. This lets us show one advantage of the top-level DV approach: it scales much better with the number of "from" matches. !join-increasing-from-matches1.png! The takeaway here isn't that the top-level approach is better or worse than existing approaches. This perf test is only one specific use case after all. But it's pretty clear that it serves some use-cases better and it's worth getting in. Next steps for this jira are: * consider Two-Phase Iterator instead of postfilter. (I'm not sure TPI makes as much sense here as it did on SOLR-13890, but still thinking through some of the lessons learned there.) * cleanup * clarify (unify?) ref-guide coverage on different joins, and when each can/should be used. > Add postfilter support to {!join} queries > - > > Key: SOLR-13892 > URL: https://issues.apache.org/jira/browse/SOLR-13892 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13892.patch, SOLR-13892.patch, > join-increasing-from-matches1.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > The JoinQParserPlugin would be a lot performant in many use-cases if it could > operate as a post-filter, especially when doc-values for the involved fields > are available. > With this issue, I'd like to propose a post-filter implementation for the > {{join}} qparser. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9126) Javadoc linting options silently swallow documentation errors
Dawid Weiss created LUCENE-9126: --- Summary: Javadoc linting options silently swallow documentation errors Key: LUCENE-9126 URL: https://issues.apache.org/jira/browse/LUCENE-9126 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss I tried to compile javadocs in gradle and I couldn't do it... The output was full of errors. I eventually narrowed the problem down to lint options – how they are interpreted and parsed just doesn't make any sense to me. Try this: {code} # Examples below use plain javadoc from Java 11. cd lucene/core {code} This emulates what we have in Ant (this is roughly the options Ant emits): {code} javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet -Xdoclint:all -Xdoclint:-missing -Xdoclint:-accessibility => no errors. {code} Now rerun it with this syntax: {code} javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet -Xdoclint:all,-missing,-accessibility => 100 errors, 5 warnings {code} This time javadoc displays errors about undefined tags (unknown tag: lucene.experimental), HTML warnings (warning: empty tag), etc. Let's add our custom tags and add overview file: {code} javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet -Xdoclint:all,-missing,-accessibility => 100 errors, 5 warnings => still HTML warnings {code} Let's get rid of html linting: {code} javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet -Xdoclint:all,-missing,-accessibility,-html => 3 errors => malformed HTML syntax in overview.html: src\java\overview.html:150: error: bad use of '>' (>) {code} Finally, let's get rid of syntax linting: {code} javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" -tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages org -quiet -Xdoclint:all,-missing,-accessibility,-html,-syntax => passes {code} There are definitely bugs in our documentation -- look at the extra ">" in the overview file, for example: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/overview.html#L150 What I can't understand is why the first syntax suppresses pretty much ALL the errors, including missing custom tag definitions. This should work, given what's written in [1]? [1] https://docs.oracle.com/en/java/javase/11/tools/javadoc.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012946#comment-17012946 ] David Smiley commented on LUCENE-3069: -- FYI [~billy] the FSTOrd postings format is slated for removal in LUCENE-9116. It's a matter of maintenance, so if you or someone wishes to keep it around, some work is needed. Fortunately FSTPostingsFormat is staying. > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang >Priority: Major > Labels: gsoc2013 > Fix For: 4.7 > > Attachments: LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, df-ttf-estimate.txt, > example.png > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata
[ https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012950#comment-17012950 ] David Smiley commented on LUCENE-9116: -- Indeed; the SolrTextTagger doesn't fundamentally require any particular postingsFormat but it beats so hard on the term dictionary that "FST50" performs really well. I approved your PR; changes look good. I love the simplification! I commented on LUCENE-3069 which introduced this format to alert Watchers there about this matter. Lets be fully transparent about decisions to remove things. Lets not commit this for a week. > Simplify postings API by removing long[] metadata > - > > Key: LUCENE-9116 > URL: https://issues.apache.org/jira/browse/LUCENE-9116 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The postings API allows to store metadata about a term either in a long[] or > in a byte[]. This is unnecessary as all information could be encoded in the > byte[], which is what most codecs do in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar
[ https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012967#comment-17012967 ] Joel Bernstein commented on SOLR-13756: --- I think that makes sense until the restlet artifacts make it to maven central, which they may never (It's been 12 years and haven't made it yet). [~uschindler], any thoughts on this? > ivy cannot download org.restlet.ext.servlet jar > --- > > Key: SOLR-13756 > URL: https://issues.apache.org/jira/browse/SOLR-13756 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chongchen Chen >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > I checkout the project and run `ant idea`, it will try to download jars. But > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > will return 404 now. > [ivy:retrieve] public: tried > [ivy:retrieve] > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > [ivy:retrieve]:: > [ivy:retrieve]:: FAILED DOWNLOADS:: > [ivy:retrieve]:: ^ see resolution messages for details ^ :: > [ivy:retrieve]:: > [ivy:retrieve]:: > org.restlet.jee#org.restlet;2.3.0!org.restlet.jar > [ivy:retrieve]:: > org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar > [ivy:retrieve]:: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository url.
uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository url. URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573084882 This is not everything. You also need to change the Pom.xml.template files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit closed pull request #1146: SOLR-6613: TextField.analyzeMultiTerm does not throw an exception…
asfgit closed pull request #1146: SOLR-6613: TextField.analyzeMultiTerm does not throw an exception… URL: https://github.com/apache/lucene-solr/pull/1146 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-6613) TextField.analyzeMultiTerm should not throw exception when analyzer returns no term
[ https://issues.apache.org/jira/browse/SOLR-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012981#comment-17012981 ] ASF subversion and git services commented on SOLR-6613: --- Commit 0b072ecedb93202a132612e72cd880fdcc51ea25 in lucene-solr's branch refs/heads/master from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0b072ec ] SOLR-6613: TextField.analyzeMultiTerm does not throw an exception when Analyzer returns no terms. (Bruno Roustant) Closes #1146 > TextField.analyzeMultiTerm should not throw exception when analyzer returns > no term > --- > > Key: SOLR-6613 > URL: https://issues.apache.org/jira/browse/SOLR-6613 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.3.1, 4.10.2, 6.0 >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Attachments: TestTextField.java > > Time Spent: 50m > Remaining Estimate: 0h > > In TextField.analyzeMultiTerm() > at line > try { > if (!source.incrementToken()) > throw new SolrException(); > The method should not throw an exception if there is no token because having > no token is legitimate because all tokens may be filtered out (e.g. with a > blocking Filter such as StopFilter). > In this case it should simply return null (as it already returns null in some > cases, see first line of method). However, SolrQueryParserBase needs also to > be fixed to correctly handle null returned by TextField.analyzeMultiTerm(). > See attached TestTextField for the corresponding new test class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search
dweiss commented on a change in pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search URL: https://github.com/apache/lucene-solr/pull/1160#discussion_r365306207 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java ## @@ -658,22 +658,84 @@ public String toDot() { public int step(int state, int label) { assert state >= 0; assert label >= 0; -int trans = states[2*state]; -int limit = trans + 3*states[2*state+1]; -// TODO: we could do bin search; transitions are sorted -while (trans < limit) { - int dest = transitions[trans]; - int min = transitions[trans+1]; - int max = transitions[trans+2]; - if (min <= label && label <= max) { -return dest; +int stateIndex = 2 * state; +int firstTransitionIndex = states[stateIndex]; +int numTransitions = states[stateIndex + 1]; + +// Since transitions are sorted, +// binary search the transition for which label is within [minLabel, maxLabel]. +int low = 0; +int high = numTransitions - 1; +while (low <= high) { + int mid = (low + high) >>> 1; + int transitionIndex = firstTransitionIndex + 3 * mid; + int minLabel = transitions[transitionIndex + 1]; + if (minLabel > label) { +high = mid - 1; + } else { +int maxLabel = transitions[transitionIndex + 2]; +if (maxLabel < label){ + low = mid + 1; +} else { + return transitions[transitionIndex]; +} } - trans += 3; } - return -1; } + /** + * Looks for the next transition that matches the provided label, assuming determinism. + * + * This method is similar to {@link #step(int, int)} but is used more efficiently + * when iterating over multiple transitions from the same source state. It keeps + * the latest reached transition index in {@code transition.transitionUpto} so + * the next call to this method can continue from there instead of restarting + * from the first transition. + * + * @param transition The transition to start the lookup from (inclusive, using its + * {@link Transition#source} and {@link Transition#transitionUpto}). + * It is updated with the matched transition; + * or with {@link Transition#dest} = -1 if no match. + * @param label The codepoint to look up. + * @return The destination state; or -1 if no matching outgoing transition. + */ + public int next(Transition transition, int label) { +// Copy of step() method with +// - binary search 'low' bound initialized to transition.transitionUpto. +// - param transition .dest/.min/.max/.transitionUpto set to the matching transition. +assert transition.source >= 0; +assert label >= 0; +int stateIndex = 2 * transition.source; +int firstTransitionIndex = states[stateIndex]; +int numTransitions = states[stateIndex + 1]; + +// Since transitions are sorted, Review comment: can we extract the binary search subroutine into a separate method? It'd inline anyway, very likely and it'd result in less code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata
[ https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013007#comment-17013007 ] Adrien Grand commented on LUCENE-9116: -- David, I'm worried that we might be setting a precedent here. Today we are very permissive when it comes to adding a new postings format to Lucene, like UniformSplit recently. I like it this way, it helps drive innovation, and hopefully some of the ideas of these experimental formats eventually get merged into the default codec like the pulsing optimization. I don't think we should make experimental formats harder to remove than add, otherwise they get in the way of improving the default codec, which is wrong. I called out that I was removing these formats in a comment, I can send a notice to the dev list next time to give it more visibility. On a separate note, I think the Solr docs should be explicit that non-default codecs/formats are not supported backward-compatibility-wise. Users might be surprised otherwise to get corruption errors when upgrading to a new minor? > Simplify postings API by removing long[] metadata > - > > Key: LUCENE-9116 > URL: https://issues.apache.org/jira/browse/LUCENE-9116 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The postings API allows to store metadata about a term either in a long[] or > in a byte[]. This is unnecessary as all information could be encoded in the > byte[], which is what most codecs do in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar
[ https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013009#comment-17013009 ] Uwe Schindler commented on SOLR-13756: -- We do not need to change anything here, except changing the repository urls of Cloudera. But very important: resolve order should have Cloudera last, if it also has stuff from other repos, otherwise it's not good security wise, as I trust Maven Central more than Cloudera. But I really would like to keep the Cloudera local repo, as it contains only its own stuff. I think when restlet fixes it's redirects we are fine also with their server. We can't fix older releases. > ivy cannot download org.restlet.ext.servlet jar > --- > > Key: SOLR-13756 > URL: https://issues.apache.org/jira/browse/SOLR-13756 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chongchen Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I checkout the project and run `ant idea`, it will try to download jars. But > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > will return 404 now. > [ivy:retrieve] public: tried > [ivy:retrieve] > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > [ivy:retrieve]:: > [ivy:retrieve]:: FAILED DOWNLOADS:: > [ivy:retrieve]:: ^ see resolution messages for details ^ :: > [ivy:retrieve]:: > [ivy:retrieve]:: > org.restlet.jee#org.restlet;2.3.0!org.restlet.jar > [ivy:retrieve]:: > org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar > [ivy:retrieve]:: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository url.
uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository url. URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573098779 Please don't change the symbolic names. Only update urls. I am not fully happy with adding a repository that contains stuff mirrored from maven Central. So the newly added Cloudera one should not contain stuff except Cloudera stuff and restlet. Please not a mirror of significant parts of Maven Central! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar
[ https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013025#comment-17013025 ] Joel Bernstein commented on SOLR-13756: --- Restlet has fixed the redirects and I was able to do a maven build of [https://github.com/lucidworks/zeppelin-solr], which has dependencies on Solr. I believe the older versions of Solr can now be built as well. The restlet dependencies may become a problem again in the future. So having Cloudera hosting them as well is a good insurance policy. And agreed Cloudera should resolve last. > ivy cannot download org.restlet.ext.servlet jar > --- > > Key: SOLR-13756 > URL: https://issues.apache.org/jira/browse/SOLR-13756 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chongchen Chen >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I checkout the project and run `ant idea`, it will try to download jars. But > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > will return 404 now. > [ivy:retrieve] public: tried > [ivy:retrieve] > https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar > [ivy:retrieve]:: > [ivy:retrieve]:: FAILED DOWNLOADS:: > [ivy:retrieve]:: ^ see resolution messages for details ^ :: > [ivy:retrieve]:: > [ivy:retrieve]:: > org.restlet.jee#org.restlet;2.3.0!org.restlet.jar > [ivy:retrieve]:: > org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar > [ivy:retrieve]:: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365315479 ## File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java ## @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) { docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 5x name any way @Override Query makeFilter(String fname, BytesRef[] byteRefs) { -return new DocValuesTermsQuery(fname, byteRefs);//constant scores +// TODO Further tune this heuristic number +return (byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs); + } +}, +docValuesTermsFilterTopLevel { + @Override + Query makeFilter(String fname, BytesRef[] byteRefs) { +return new TopLevelDocValuesTermsQuery(fname, byteRefs); Review comment: I'd forgotten about this, done. FWIW I buy Joel's argument that most real-world use-cases are going to require `cache=false`. But I worry that users who want caching will be bitten by this. I guess it depends how hands-on you expect users to be with their query tuning. For hands-off users, this change is best. But for hands-on users, we've just created a "gotcha". I'll make sure to add this to the ref-guide docs at the least. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository url.
zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository url. URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573104015 Thanks for the quick review. I will revert the symbolic name of the restlet repository, but for the cloudera one it's not just an URL update, but a whole different mirror, so it might be justified if it doesn't break something I'm unaware of. I'll have to check back next week about whether we can set up a new mirror just for the restlets and nothing else. The original cloudera releases repo is no longer needed imho. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9127) index migration from 7 to 8 failing
Niranjan created LUCENE-9127: Summary: index migration from 7 to 8 failing Key: LUCENE-9127 URL: https://issues.apache.org/jira/browse/LUCENE-9127 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 8.4 Reporter: Niranjan we have been using solr4 for more than 16 year, now it is to time to upgrade, when we have decided to upgrade, started with migrating index from 4 -> 5, 5 -> 6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it give me errors. Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))): This index was initially created with Lucene 6.x while the current version is 8.4.0 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later. command I'm trying {noformat} java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index -verbose {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9127) index migration from 7 to 8 failing
[ https://issues.apache.org/jira/browse/LUCENE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan updated LUCENE-9127: - Description: we have been using solr4 for more than 16 year, now it is to time to upgrade, when we have decided to upgrade, started with migrating index from 4 -> 5, 5 -> 6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it give me errors. {noformat} Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))): This index was initially created with Lucene 6.x while the current version is 8.4.0 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.{noformat} {noformat} java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index -verbose {noformat} was: we have been using solr4 for more than 16 year, now it is to time to upgrade, when we have decided to upgrade, started with migrating index from 4 -> 5, 5 -> 6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it give me errors. Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))): This index was initially created with Lucene 6.x while the current version is 8.4.0 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later. command I'm trying {noformat} java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index -verbose {noformat} > index migration from 7 to 8 failing > --- > > Key: LUCENE-9127 > URL: https://issues.apache.org/jira/browse/LUCENE-9127 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.4 >Reporter: Niranjan >Priority: Major > > we have been using solr4 for more than 16 year, now it is to time to upgrade, > when we have decided to upgrade, started with migrating index from 4 -> 5, 5 > -> 6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it > give me errors. > {noformat} > Exception in thread "main" > org.apache.lucene.index.IndexFormatTooOldException: Format version is not > supported (resource > BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))): > This index was initially created with Lucene 6.x while the current version > is 8.4.0 and Lucene only supports reading the current and previous major > versions.. This version of Lucene only supports indexes created with release > 7.0 and later.{noformat} > {noformat} > java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar > org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index > -verbose {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1157: Add RAT check using Gradle
madrob commented on a change in pull request #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365319087 ## File path: gradle/validation/rat-sources.gradle ## @@ -0,0 +1,167 @@ +import org.gradle.api.internal.project.IsolatedAntBuilder + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// This applies the Apache RAT plugin to our source and test files + +// Largely copied from Apache Kafka +apply plugin: RatPlugin +// This invocation needs to go out to each project instead of being here +rat { +} + +class RatTask extends DefaultTask { +@Input +List includes = [] + +@Input +List excludes = [] + +def reportDir = project.file('build/rat') +def xmlReport = new File(reportDir, 'rat-report.xml') + +def generateXmlReport(File reportDir) { +// Probably better to use the IsolatedAntBuilder if we can, but it seems to have issues with substringMatcher +// def antBuilder = services.get(IsolatedAntBuilder) + +def ratClasspath = project.configurations.rat +def projectPath = project.getRootDir().getAbsolutePath() +ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: ratClasspath.asPath) +ant.report(format: 'xml', reportFile: xmlReport, addDefaultLicenseMatchers: true) { +fileset(dir: projectPath) { +patternset { +includes.each { +include(name: it) +} +excludes.each { +exclude(name: it) +} +} +} + +// The license rules below were manually copied from lucene/common-build.xml, there is currently no mechanism to sync them + +// BSD 4-clause stuff (is disallowed below) +substringMatcher(licenseFamilyCategory: "BSD4 ", licenseFamilyName: "Original BSD License (with advertising clause)") { +pattern(substring: "All advertising materials") +} + +// BSD-like stuff +substringMatcher(licenseFamilyCategory: "BSD ", licenseFamilyName: "Modified BSD License") { +// brics automaton +pattern(substring: "Copyright (c) 2001-2009 Anders Moeller") +// snowball +pattern(substring: "Copyright (c) 2001, Dr Martin Porter") +// UMASS kstem +pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF MASSACHUSETTS AND OTHER CONTRIBUTORS") +// Egothor +pattern(substring: "Egothor Software License version 1.00") +// JaSpell +pattern(substring: "Copyright (c) 2005 Bruno Martins") +// d3.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS") +// highlight.js +pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS") +} + +// MIT-like +substringMatcher(licenseFamilyCategory: "MIT ", licenseFamilyName:"Modified BSD License") { +// ICU license +pattern(substring: "Permission is hereby granted, free of charge, to any person obtaining a copy") +} + +// Apache +substringMatcher(licenseFamilyCategory: "AL ", licenseFamilyName: "Apache") { +pattern(substring: "Licensed to the Apache Software Foundation (ASF) under") +// this is the old - school one under some files +pattern(substring: "Licensed under the Apache License, Version 2.0 (the "License")") +} + +substringMatcher(licenseFamilyCategory: "GEN ", licenseFamilyName: "Generated") { +// +pattern(substring: "Produced by GNUPLOT") +// +pattern(substring: "This file was generated automatically by the Snowball to Java compiler") +// +pattern(substring: "ANTLR GENERATED CODE") +} + +approvedLicense(familyName: "Apache"
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013041#comment-17013041 ] Michael Sokolov commented on LUCENE-9004: - [~tomoko] you are too modest! The [LUCENE-9004-aknn-2|https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2] is in pretty good shape! It is functionally correct now, implementing the hierarchical version of the paper sited in the LUCENE-9004 issue. Also, I believe with the patch I posted there, we now have merging working, and I think search across multiple segments falls out naturally since you implemented a Query that can be collected in the usual way across segments. I also did some comparisons with the C/C++ version in [https://github.com/nmslib/hnswlib,|https://github.com/nmslib/hnswlib] the reference implementation, and got similar overlap results with vanilla hyper-parameter choices, so I am pretty confident you have faithfully reproduced that algorithm. Now it has to be said that performance is not what we would want - it's quite a bit slower than the C/C++ version. I haven't had a chance to dig in to the cause yet, but I suspect we could be helped by ensuring that vector computations are done using vectorized instructions. We might also be able to reduce object instantiation here and there. I think it's time to post back to a branch in the Apache git repository so we can enlist contributions from the community here to help this go forward. I'll try to get that done this weekend > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > Attachments: hnsw_layered_graph.png > > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph from scratch
[GitHub] [lucene-solr] gerlowskija commented on issue #1151: SOLR-13890: Add "top-level" DVTQ implementation
gerlowskija commented on issue #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#issuecomment-573107306 Ready for review again when you get a chance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9100) JapaneseTokenizer produces inconsistent tokens
[ https://issues.apache.org/jira/browse/LUCENE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013045#comment-17013045 ] Michael McCandless commented on LUCENE-9100: {quote}Maybe a solution here is to use the tokenizer with `discardPunctuation==false`, then stripping the punctuation tokens in a filter. {quote} +1, that sounds like a possible workaround. But it's still spooky that tokens can be formed across (deleted) punctuation ... > JapaneseTokenizer produces inconsistent tokens > -- > > Key: LUCENE-9100 > URL: https://issues.apache.org/jira/browse/LUCENE-9100 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 7.2 >Reporter: Elbek Kamoliddinov >Priority: Major > > We use {{JapaneseTokenizer}} on prod and seeing some inconsistent behavior. > With this text: > {{"マギアリス【単版話】 4話 (Unlimited Comics)"}} I get different results if I insert > space before `【` char. Here is the small code snippet demonstrating the case > (note we use our own dictionary and connection costs): > {code:java} > Analyzer analyzer = new Analyzer() { > @Override > protected TokenStreamComponents createComponents(String > fieldName) { > //Tokenizer tokenizer = new > JapaneseTokenizer(newAttributeFactory(), null, true, > JapaneseTokenizer.Mode.SEARCH); > Tokenizer tokenizer = new > JapaneseTokenizer(newAttributeFactory(), dictionaries.systemDictionary, > dictionaries.unknownDictionary, dictionaries.connectionCosts, null, true, > JapaneseTokenizer.Mode.SEARCH); > return new TokenStreamComponents(tokenizer, new > LowerCaseFilter(tokenizer)); > } > }; > String text1 = "マギアリス【単版話】 4話 (Unlimited Comics)"; > String text2 = "マギアリス 【単版話】 4話 (Unlimited Comics)"; //inserted space > try (TokenStream tokens = analyzer.tokenStream("field", new > StringReader(text1))) { > CharTermAttribute chars = > tokens.addAttribute(CharTermAttribute.class); > tokens.reset(); > while (tokens.incrementToken()) { > System.out.println(chars.toString()); > } > tokens.end(); > } catch (IOException e) { > // should never happen with a StringReader > throw new RuntimeException(e); > } {code} > Output is: > {code:java} > //text1 > マギ > アリス > 単 > 版 > 話 > 4 > 話 > unlimited > comics > //text2 > マギア > リス > 単 > 版 > 話 > 4 > 話 > unlimited > comics{code} > It looks like tokenizer doesn't view the punctuation ({{【}} is > {{Character.START_PUNCTUATION}} type) as an indicator that there should be a > token break, and somehow 【 punctuation char causes difference in the output. > If I use the {{JapaneseTokenizer}} tokenizer then this problem doesn't > manifest because it doesn't tokenize {{マギアリス}} into multiple tokens and > outputs as is. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on issue #1157: Add RAT check using Gradle
madrob commented on issue #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-573107997 Cleaned a bunch of stuff up, included your review feedback. Thanks, @dweiss! One piece that I'm still struggling with is that `./gradlew rat` will execute on the root but doesn't delegate to `:lucene:rat` and `:solr:rat` (and in fact those targets don't even exist), and then I'd like those to delegate to rat in all the sub-modules. I tried going through our other validation tasks to look for examples and quickly got lost - can you point me to the best practices here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365336343 ## File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java ## @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) { docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 5x name any way @Override Query makeFilter(String fname, BytesRef[] byteRefs) { -return new DocValuesTermsQuery(fname, byteRefs);//constant scores +// TODO Further tune this heuristic number +return (byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs); + } +}, +docValuesTermsFilterTopLevel { + @Override + Query makeFilter(String fname, BytesRef[] byteRefs) { +return new TopLevelDocValuesTermsQuery(fname, byteRefs); Review comment: This is just about a default. So-called "Hands-on users" (experts) will be able to be explicit with cache=true. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365341958 ## File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java ## @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) { docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 5x name any way @Override Query makeFilter(String fname, BytesRef[] byteRefs) { -return new DocValuesTermsQuery(fname, byteRefs);//constant scores +// TODO Further tune this heuristic number +return (byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs); + } +}, +docValuesTermsFilterTopLevel { + @Override + Query makeFilter(String fname, BytesRef[] byteRefs) { +return new TopLevelDocValuesTermsQuery(fname, byteRefs); Review comment: Sure, setting`cache=true` is trivial. It's _knowing_ that you need to be explicit for this one query in particular that's the tricky part. But sure, I'm just hand-wringing; I've already made the change. It's documented in the ref-guide, so that's as good as we're going to get. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365341958 ## File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java ## @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) { docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 5x name any way @Override Query makeFilter(String fname, BytesRef[] byteRefs) { -return new DocValuesTermsQuery(fname, byteRefs);//constant scores +// TODO Further tune this heuristic number +return (byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs); + } +}, +docValuesTermsFilterTopLevel { + @Override + Query makeFilter(String fname, BytesRef[] byteRefs) { +return new TopLevelDocValuesTermsQuery(fname, byteRefs); Review comment: Sure, setting`cache=true` is trivial. It's _knowing_ that you need to be explicit for this one query in particular that's the tricky part. But I'm just hand-wringing; I've already made the change. It's documented in the ref-guide, so that's as good as we're going to get. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9127) index migration from 7 to 8 failing
[ https://issues.apache.org/jira/browse/LUCENE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved LUCENE-9127. Resolution: Information Provided Upgrading more than one major version of Lucene is explicitly not supported. Starting with Lucene 6, a marker is written into segments indicating which version of Lucene it was created with. When segments are merged, the earliest marker is preserved, so even if you rewrite all your segments with 7x, the 6x marker is preserved. Lucene will refuse to open any index that has a marker lower than X-1, so if Lucene 8x sees any segment where the earliest marker is 6x or earlier (or is missing), it'll throw an exception. You have to re-index your data from the system of record into a fresh 8x collection. > index migration from 7 to 8 failing > --- > > Key: LUCENE-9127 > URL: https://issues.apache.org/jira/browse/LUCENE-9127 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.4 >Reporter: Niranjan >Priority: Major > > we have been using solr4 for more than 16 year, now it is to time to upgrade, > when we have decided to upgrade, started with migrating index from 4 -> 5, 5 > -> 6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it > give me errors. > {noformat} > Exception in thread "main" > org.apache.lucene.index.IndexFormatTooOldException: Format version is not > supported (resource > BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))): > This index was initially created with Lucene 6.x while the current version > is 8.4.0 and Lucene only supports reading the current and previous major > versions.. This version of Lucene only supports indexes created with release > 7.0 and later.{noformat} > {noformat} > java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar > org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index > -verbose {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365344397 ## File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java ## @@ -92,22 +92,28 @@ Query makeFilter(String fname, BytesRef[] byteRefs) { @Override Query makeFilter(String fname, BytesRef[] byteRefs) { // TODO Further tune this heuristic number -return (byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs); +return disableCacheByDefault((byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs)); Review comment: right here simply call docValuesTermsFilterTopLevel.makeFilter or docValuesTermsFilterPerSegment.makeFilter. Yes, enum methods may refer to other enums in the same enum :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365345185 ## File path: solr/solr-ref-guide/src/other-parsers.adoc ## @@ -1037,11 +1037,11 @@ An optional parameter used to determine which of several query implementations s + `termsFilter` the default `method`. Uses a `BooleanQuery` or a `TermInSetQuery` depending on the number of terms. Scales well with index size, but only moderately with the number of query terms. + -`docValuesTermsFilter` can only be used on fields with docValues data. Chooses between the `docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods using the number of query terms as a rough heuristic. Users should typically use this method instead of using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, unless they've done performance testing to validate one of the methods on queries of all sizes. Depending on the implementation picked, this method may rely on expensive data structures which are lazily populated after each commit. If you commit frequently and your use-case can tolerate a static warming query, consider adding one to `solrconfig.xml` so that this work is done as a part of the commit itself and not attached directly to user requests. +`docValuesTermsFilter` can only be used on fields with docValues data. The `cache` parameter is false by default. Chooses between the `docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods using the number of query terms as a rough heuristic. Users should typically use this method instead of using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, unless they've done performance testing to validate one of the methods on queries of all sizes. Depending on the implementation picked, this method may rely on expensive data structures which are lazily populated after each commit. If you commit frequently and your use-case can tolerate a static warming query, consider adding one to `solrconfig.xml` so that this work is done as a part of the commit itself and not attached directly to user requests. Review comment: Just a side-comment about our ref docs: In the ref docs I write or update, I convert them to one sentence per line. This makes the diffs easy to read! The pain of no newlines is very apparent here. Change or not as you wish. CC @ctargett This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11369) Zookeeper credentials are showed up on the Solr Admin GUI
[ https://issues.apache.org/jira/browse/SOLR-11369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013085#comment-17013085 ] Jason Gerlowski commented on SOLR-11369: This is fixed in 7.x and above. SOLR-12976 isn't about hiding specific properties, it's about combining a few settings to make them easier to use/understand. > Zookeeper credentials are showed up on the Solr Admin GUI > - > > Key: SOLR-11369 > URL: https://issues.apache.org/jira/browse/SOLR-11369 > Project: Solr > Issue Type: Bug > Components: Admin UI, security >Reporter: Ivan Pekhov >Priority: Major > > Hello Guys, > We've been noticing this problem with Solr version 5.4.1 and it's still the > case for the version 6.6.0. The problem is that we're using SolrCloud with > secured Zookeeper and our users are granted access to Solr Admin GUI, and, at > the same time, they are not supposed to have access to Zookeeper credentials, > i.e. usernames and passwords. However, we (and some of our users) have found > out that Zookeeper credentials are displayed on at least two sections of the > Solr Admin GUI, i.e. "Dashboard" and "Java Properties". > Having taken a look at the JavaScript code that runs behind the scenes for > those pages, we can see that the sensitive parameters ( -DzkDigestPassword, > -DzkDigestReadonlyPassword, -DzkDigestReadonlyUsername, -DzkDigestUsername ) > are fetched via AJAX from the following two URL paths: > /solr/admin/info/system > /solr/admin/info/properties > Could you please consider for the future Solr releases removing the Zookeeper > parameters mentioned above from the output of these URLs and from other URLs > that contain this information in their output, if there are any besides the > ones mentioned? We find that it is be pretty challenging (and probably > impossible) to restrict users from accessing some particular paths with > security.json mechanism, and we think that that would be beneficial for > overall Solr security to hide Zookeeper credentials. > Thank you so much for your consideration! > Best regards, > Ivan Pekhov -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"
[ https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013095#comment-17013095 ] Chris M. Hostetter commented on SOLR-11746: --- {quote}I believe that is the only backwards incompatibility that was introduced by the reverted patch. {quote} that was _my_ only backcompat concern: the redefining how NaN behaved in range queries (which has since been reverted) I agree, fixing {{foo:*}} in point fields and docValues only trie fields to behave consistently is a bug fix – wasn't trying to suggest that fixing that bug was a backcompat problem. my point about that old comment where i refered to the "current" (circa 2017) behaior of point fields is that the only one that seems to be tested in the current code/patches is the sinple {{foo:*}} case being fixed to do something useful -- we should also make sure we have tests that hte other nonsensical input cases give informative (and consistent) errors. > numeric fields need better error handling for prefix/wildcard syntax -- > consider uniform support for "foo:* == foo:[* TO *]" > > > Key: SOLR-11746 > URL: https://issues.apache.org/jira/browse/SOLR-11746 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0 >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, > SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch > > > On the solr-user mailing list, Torsten Krah pointed out that with Trie > numeric fields, query syntax such as {{foo_d:\*}} has been functionality > equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported > for Point based numeric fields. > The fact that this type of syntax works (for {{indexed="true"}} Trie fields) > appears to have been an (untested, undocumented) fluke of Trie fields given > that they use indexed terms for the (encoded) numeric terms and inherit the > default implementation of {{FieldType.getPrefixQuery}} which produces a > prefix query against the {{""}} (empty string) term. > (Note that this syntax has aparently _*never*_ worked for Trie fields with > {{indexed="false" docValues="true"}} ) > In general, we should assess the behavior users attempt a prefix/wildcard > syntax query against numeric fields, as currently the behavior is largely > non-sensical: prefix/wildcard syntax frequently match no docs w/o any sort > of error, and the aformentioned {{numeric_field:*}} behaves inconsistently > between points/trie fields and between indexed/docValued trie fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r36535 ## File path: solr/solr-ref-guide/src/other-parsers.adoc ## @@ -1037,11 +1037,11 @@ An optional parameter used to determine which of several query implementations s + `termsFilter` the default `method`. Uses a `BooleanQuery` or a `TermInSetQuery` depending on the number of terms. Scales well with index size, but only moderately with the number of query terms. + -`docValuesTermsFilter` can only be used on fields with docValues data. Chooses between the `docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods using the number of query terms as a rough heuristic. Users should typically use this method instead of using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, unless they've done performance testing to validate one of the methods on queries of all sizes. Depending on the implementation picked, this method may rely on expensive data structures which are lazily populated after each commit. If you commit frequently and your use-case can tolerate a static warming query, consider adding one to `solrconfig.xml` so that this work is done as a part of the commit itself and not attached directly to user requests. +`docValuesTermsFilter` can only be used on fields with docValues data. The `cache` parameter is false by default. Chooses between the `docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods using the number of query terms as a rough heuristic. Users should typically use this method instead of using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, unless they've done performance testing to validate one of the methods on queries of all sizes. Depending on the implementation picked, this method may rely on expensive data structures which are lazily populated after each commit. If you commit frequently and your use-case can tolerate a static warming query, consider adding one to `solrconfig.xml` so that this work is done as a part of the commit itself and not attached directly to user requests. Review comment: I thought about this too when I saw the diff. I remember it coming up as a discussion point when the ref-guide was initially being hashed out too. I won't change my docs here, as it's probably worth keeping the file as a whole consistent in the strategy it uses. But maybe it's worth revisiting at some point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365358715 ## File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java ## @@ -92,22 +92,28 @@ Query makeFilter(String fname, BytesRef[] byteRefs) { @Override Query makeFilter(String fname, BytesRef[] byteRefs) { // TODO Further tune this heuristic number -return (byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs); +return disableCacheByDefault((byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs)); Review comment: Hah. I don't know whether to think that's awesome or revolting. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365364487 ## File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java ## @@ -92,22 +92,28 @@ Query makeFilter(String fname, BytesRef[] byteRefs) { @Override Query makeFilter(String fname, BytesRef[] byteRefs) { // TODO Further tune this heuristic number -return (byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs); +return disableCacheByDefault((byteRefs.length > 700) ? new TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, byteRefs)); Review comment: LOL there is something slightly unsettling about it I admit. Though it's not confusing or anything. Any way I value brevity over a lot and I'll take it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata
[ https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013121#comment-17013121 ] David Smiley commented on LUCENE-9116: -- +1 for a notice to the dev list and users too. My point is about notice so that others might potentially volunteer or convey to us that the format is more useful than we are aware of. Ultimately we should be able to remove what we want to maintain, however. I was very sincere when I volunteered to port FST50, so thank you for stepping up! The Solr ref guide {{solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc:112}} already has a good notice on back-compat for these formats. The tagger-handler.adoc is probably the only spot that advices setting this, so perhaps it should also have a notice. > Simplify postings API by removing long[] metadata > - > > Key: LUCENE-9116 > URL: https://issues.apache.org/jira/browse/LUCENE-9116 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The postings API allows to store metadata about a term either in a long[] or > in a byte[]. This is unnecessary as all information could be encoded in the > byte[], which is what most codecs do in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9116) Simplify postings API by removing long[] metadata
[ https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013121#comment-17013121 ] David Smiley edited comment on LUCENE-9116 at 1/10/20 6:35 PM: --- +1 for a notice to the dev list and users too. My point is about notice so that others might potentially volunteer or convey to us that the format is more useful than we are aware of. Ultimately we should be able to remove what we don't want to maintain, however. I was very sincere when I volunteered to port FST50, so thank you for stepping up! The Solr ref guide {{solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc:112}} already has a good notice on back-compat for these formats. The tagger-handler.adoc is probably the only spot that advices setting this, so perhaps it should also have a notice. was (Author: dsmiley): +1 for a notice to the dev list and users too. My point is about notice so that others might potentially volunteer or convey to us that the format is more useful than we are aware of. Ultimately we should be able to remove what we want to maintain, however. I was very sincere when I volunteered to port FST50, so thank you for stepping up! The Solr ref guide {{solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc:112}} already has a good notice on back-compat for these formats. The tagger-handler.adoc is probably the only spot that advices setting this, so perhaps it should also have a notice. > Simplify postings API by removing long[] metadata > - > > Key: LUCENE-9116 > URL: https://issues.apache.org/jira/browse/LUCENE-9116 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The postings API allows to store metadata about a term either in a long[] or > in a byte[]. This is unnecessary as all information could be encoded in the > byte[], which is what most codecs do in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] yonik merged pull request #1131: SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada…
yonik merged pull request #1131: SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada… URL: https://github.com/apache/lucene-solr/pull/1131 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14134) Clear shared core's concurrency cache
[ https://issues.apache.org/jira/browse/SOLR-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013144#comment-17013144 ] ASF subversion and git services commented on SOLR-14134: Commit 66ec4228908dcabf60d3e6069967e576325829c6 in lucene-solr's branch refs/heads/jira/SOLR-13101 from Andy Vuong [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=66ec422 ] SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada… (#1131) * Add lazy and time-based evictiction of shared core concurrency metadata from in-memory cache * Switch back to simple map hash, evict on close, and evict on register * Evict from unload * Address comments > Clear shared core's concurrency cache > - > > Key: SOLR-14134 > URL: https://issues.apache.org/jira/browse/SOLR-14134 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Andy Vuong >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > In shared collections, each replica's core has an associated entry in a > metadata cache we call the shared core's concurrency cache (see > SharedCoreConcurrencyController) that is used to facilitate concurrent > indexing support of a single shard and associated optimizations. > Entries are currently created on demand - i.e. when request triggered > pulls/pushes are initiated but there's no way of clearing the cache unless > the node goes down and JVM restarts. > Eviction from this cache is needed to facilitate things such as collection > name reuse. Currently if you delete a collection and then recreate, you can > create a Replica containing the same core name as a previously active > collection/replica and have a pre-existing entry in the concurrency cache > (barring restarts between this point). The net effect is at least one > indexing batch failure before the cache returns to a correct state. > Eviction will also support scale - say 50k collections and thousands of > entries for cores located in memory is highly ineffective especially if a > large number of collections are accessed infrequently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13932) Review directory locking and Blob interactions
[ https://issues.apache.org/jira/browse/SOLR-13932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-13932. - Resolution: Fixed > Review directory locking and Blob interactions > -- > > Key: SOLR-13932 > URL: https://issues.apache.org/jira/browse/SOLR-13932 > Project: Solr > Issue Type: Sub-task >Reporter: Ilan Ginzburg >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Review resolution of local index directory content vs Blob copy. > There has been wrong understanding of following line acquiring a lock on > index directory. > {{solrCore.getDirectoryFactory().get(indexDirPath, > DirectoryFactory.DirContext.DEFAULT, > solrCore.getSolrConfig().indexConfig.lockType);}} > From Yonik: > _A couple things about Directory locking the locks were only ever to > prevent more than one IndexWriter from trying to modify the same index. The > IndexWriter grabs a write lock once when it is created and does not release > it until it is closed._ > _Directories are not locked on acquisition of the Directory from the > DirectoryFactory. See the IndexWriter constructor, where the lock is > explicitly grabbed._ > Review CorePushPull#pullUpdateFromBlob, ServerSideMetadata and other classes > as relevant. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14134) Clear shared core's concurrency cache
[ https://issues.apache.org/jira/browse/SOLR-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-14134. - Resolution: Fixed > Clear shared core's concurrency cache > - > > Key: SOLR-14134 > URL: https://issues.apache.org/jira/browse/SOLR-14134 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Andy Vuong >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > In shared collections, each replica's core has an associated entry in a > metadata cache we call the shared core's concurrency cache (see > SharedCoreConcurrencyController) that is used to facilitate concurrent > indexing support of a single shard and associated optimizations. > Entries are currently created on demand - i.e. when request triggered > pulls/pushes are initiated but there's no way of clearing the cache unless > the node goes down and JVM restarts. > Eviction from this cache is needed to facilitate things such as collection > name reuse. Currently if you delete a collection and then recreate, you can > create a Replica containing the same core name as a previously active > collection/replica and have a pre-existing entry in the concurrency cache > (barring restarts between this point). The net effect is at least one > indexing batch failure before the cache returns to a correct state. > Eviction will also support scale - say 50k collections and thousands of > entries for cores located in memory is highly ineffective especially if a > large number of collections are accessed infrequently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ctargett commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation
ctargett commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365406299 ## File path: solr/solr-ref-guide/src/other-parsers.adoc ## @@ -1037,11 +1037,11 @@ An optional parameter used to determine which of several query implementations s + `termsFilter` the default `method`. Uses a `BooleanQuery` or a `TermInSetQuery` depending on the number of terms. Scales well with index size, but only moderately with the number of query terms. + -`docValuesTermsFilter` can only be used on fields with docValues data. Chooses between the `docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods using the number of query terms as a rough heuristic. Users should typically use this method instead of using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, unless they've done performance testing to validate one of the methods on queries of all sizes. Depending on the implementation picked, this method may rely on expensive data structures which are lazily populated after each commit. If you commit frequently and your use-case can tolerate a static warming query, consider adding one to `solrconfig.xml` so that this work is done as a part of the commit itself and not attached directly to user requests. +`docValuesTermsFilter` can only be used on fields with docValues data. The `cache` parameter is false by default. Chooses between the `docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods using the number of query terms as a rough heuristic. Users should typically use this method instead of using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, unless they've done performance testing to validate one of the methods on queries of all sizes. Depending on the implementation picked, this method may rely on expensive data structures which are lazily populated after each commit. If you commit frequently and your use-case can tolerate a static warming query, consider adding one to `solrconfig.xml` so that this work is done as a part of the commit itself and not attached directly to user requests. Review comment: When I think of it, I've been changing to one sentence per line when I'm editing also (but I often forget). I don't know that it really changes that much to have a whole page consistent - we should probably encourage one sentence per line, but if it's a big file and you can only do a little, I think that's fine and maybe someone else will get inspired later to do the rest. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14173) Ref Guide Redesign
[ https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013180#comment-17013180 ] Jason Gerlowski commented on SOLR-14173: bq. I did however put up files at http://home.apache.org/~ctargett/RefGuideRedesign/index.html I get a 404 on that link, just a heads up in case anyone else tries. > Ref Guide Redesign > -- > > Key: SOLR-14173 > URL: https://issues.apache.org/jira/browse/SOLR-14173 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Cassandra Targett >Assignee: Cassandra Targett >Priority: Major > > The current design of the Ref Guide was essentially copied from a > Jekyll-based documentation theme > (https://idratherbewriting.com/documentation-theme-jekyll/), which had a > couple important benefits for that time: > * It was well-documented and since I had little experience with Jekyll and > its Liquid templates and since I was the one doing it, I wanted to make it as > easy on myself as possible > * It was designed for documentation specifically so took care of all the > things like inter-page navigation, etc. > * It helped us get from Confluence to our current system quickly > It had some drawbacks, though: > * It wasted a lot of space on the page > * The theme was built for Markdown files, so did not take advantage of the > features of the {{jekyll-asciidoc}} plugin we use (the in-page TOC being one > big example - the plugin could create it at build time, but the theme > included JS to do it as the page loads, so we use the JS) > * It had a lot of JS and overlapping CSS files. While it used Bootstrap it > used a customized CSS on top of it for theming that made modifications > complex (it was hard to figure out how exactly a change would behave) > * With all the stuff I'd changed in my bumbling way just to get things to > work back then, I broke a lot of the stuff Bootstrap is supposed to give us > in terms of responsiveness and making the Guide usable even on smaller screen > sizes. > After upgrading the Asciidoctor components in SOLR-12786 and stopping the PDF > (SOLR-13782), I wanted to try to set us up for a more flexible system. We > need it for things like Joel's work on the visual guide for streaming > expressions (SOLR-13105), and in order to implement other ideas we might have > on how to present information in the future. > I view this issue as a phase 1 of an overall redesign that I've already > started in a local branch. I'll explain in a comment the changes I've already > made, and will use this issue to create and push a branch where we can > discuss in more detail. > Phase 1 here will be under-the-hood CSS/JS changes + overall page layout > changes. > Phase 2 (issue TBD) will be a wholesale re-organization of all the pages of > the Guide. > Phase 3 (issue TBD) will explore moving us from Jekyll to another static site > generator that is better suited for our content format, file types, and build > conventions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief
[ https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013189#comment-17013189 ] Jason Gerlowski commented on SOLR-13934: Well, whether or not we want to keep the {{bin/post}} tool eventually, we have it now so we should maintain the docs for it. I'm going to merge this PR today (with a few tweaks). > Documentation on SimplePostTool for Windows users is pretty brief > - > > Key: SOLR-13934 > URL: https://issues.apache.org/jira/browse/SOLR-13934 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SimplePostTool >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > SimplePostTool on windows doesn't have enough documentation, you end up > googling to get it to work. Need to provide better example. > https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief
[ https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-13934: -- Assignee: Jason Gerlowski > Documentation on SimplePostTool for Windows users is pretty brief > - > > Key: SOLR-13934 > URL: https://issues.apache.org/jira/browse/SOLR-13934 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SimplePostTool >Affects Versions: 8.3 >Reporter: David Eric Pugh >Assignee: Jason Gerlowski >Priority: Minor > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > SimplePostTool on windows doesn't have enough documentation, you end up > googling to get it to work. Need to provide better example. > https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14173) Ref Guide Redesign
[ https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013190#comment-17013190 ] Cassandra Targett commented on SOLR-14173: -- bq. I get a 404 on that link Bah, it's http://people.apache.org/~ctargett/RefGuideRedesign/index.html > Ref Guide Redesign > -- > > Key: SOLR-14173 > URL: https://issues.apache.org/jira/browse/SOLR-14173 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Cassandra Targett >Assignee: Cassandra Targett >Priority: Major > > The current design of the Ref Guide was essentially copied from a > Jekyll-based documentation theme > (https://idratherbewriting.com/documentation-theme-jekyll/), which had a > couple important benefits for that time: > * It was well-documented and since I had little experience with Jekyll and > its Liquid templates and since I was the one doing it, I wanted to make it as > easy on myself as possible > * It was designed for documentation specifically so took care of all the > things like inter-page navigation, etc. > * It helped us get from Confluence to our current system quickly > It had some drawbacks, though: > * It wasted a lot of space on the page > * The theme was built for Markdown files, so did not take advantage of the > features of the {{jekyll-asciidoc}} plugin we use (the in-page TOC being one > big example - the plugin could create it at build time, but the theme > included JS to do it as the page loads, so we use the JS) > * It had a lot of JS and overlapping CSS files. While it used Bootstrap it > used a customized CSS on top of it for theming that made modifications > complex (it was hard to figure out how exactly a change would behave) > * With all the stuff I'd changed in my bumbling way just to get things to > work back then, I broke a lot of the stuff Bootstrap is supposed to give us > in terms of responsiveness and making the Guide usable even on smaller screen > sizes. > After upgrading the Asciidoctor components in SOLR-12786 and stopping the PDF > (SOLR-13782), I wanted to try to set us up for a more flexible system. We > need it for things like Joel's work on the visual guide for streaming > expressions (SOLR-13105), and in order to implement other ideas we might have > on how to present information in the future. > I view this issue as a phase 1 of an overall redesign that I've already > started in a local branch. I'll explain in a comment the changes I've already > made, and will use this issue to create and push a branch where we can > discuss in more detail. > Phase 1 here will be under-the-hood CSS/JS changes + overall page layout > changes. > Phase 2 (issue TBD) will be a wholesale re-organization of all the pages of > the Guide. > Phase 3 (issue TBD) will explore moving us from Jekyll to another static site > generator that is better suited for our content format, file types, and build > conventions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief
[ https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013189#comment-17013189 ] Jason Gerlowski edited comment on SOLR-13934 at 1/10/20 8:37 PM: - Well, whether or not we want to keep the {{bin/post}} tool longer term, we have it now so we should maintain the docs for it. I'm going to merge this PR today (with a few tweaks). was (Author: gerlowskija): Well, whether or not we want to keep the {{bin/post}} tool eventually, we have it now so we should maintain the docs for it. I'm going to merge this PR today (with a few tweaks). > Documentation on SimplePostTool for Windows users is pretty brief > - > > Key: SOLR-13934 > URL: https://issues.apache.org/jira/browse/SOLR-13934 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SimplePostTool >Affects Versions: 8.3 >Reporter: David Eric Pugh >Assignee: Jason Gerlowski >Priority: Minor > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > SimplePostTool on windows doesn't have enough documentation, you end up > googling to get it to work. Need to provide better example. > https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14173) Ref Guide Redesign
[ https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009848#comment-17009848 ] Cassandra Targett edited comment on SOLR-14173 at 1/10/20 8:38 PM: --- The branch with my work so far is in a branch ({{jira/solr-14173}}). Github link: https://github.com/apache/lucene-solr/tree/jira/solr-14173. It's still a WIP so I won't create a PR for it yet unless someone wants one. I did however, put up files at: http://people.apache.org/~ctargett/RefGuideRedesign/index.html. Feel free to take a look and let me know your thoughts on overall look & feel and if you find buggy behavior. *There are still bugs* - I'll list them below. h3. What's Changed *Updated dependencies* Updated: * Bootstrap 3.3.7 to 4.1.3 * JQuery 2.1.4 to 3.3.1 * AnchorJS 2.0.0 to 4.2.0 Added: * Malihu Custom Scrollbar 3.1.5 - to make the new sidebar scroll * PopperJS 1.14.3 - required by Bootstrap Removed: * {{toc.js}} - no longer used * {{ref-guide-toc.js}} - no longer used * TOC-related includes * Print-related layouts, includes, and CSS * Leftover PDF-only fonts *New Layout* * Sidebar nav is now fixed to the left side of the screen and content is to the right designed to use as much space to the right as the browser has available (up to 1238px or so, I think) * Top nav does not span the page, but stays on top of the content to give room for the sidebar nav * Changed the in-page TOCs to be built at the time the HTML is generated. This makes all in-page TOCs now always float to the right side of the page (IOW, we lost ability to choose where to put it but IMO this is a simplification) * Moved the "search" bar to the sidebar nav instead of the top nav *CSS Cleanup* * Removed lavish-bootstrap.css and replaced with Bootstrap's native CSS (which will be easier to upgrade in the future) * Re-organized all the CSS files and separated them into broad groups: ** decoration.css: buttons, forms, horizontal lines, lead paragraphs, tabs/pills ** navs.css: all navigation elements such as the top nav, sidebar nav, footer, in-page TOC ** ref-guide.css: all elements which impact the display of content such as overall body, tables, lists, links, code samples, etc. ** search.css: all elements related to the page-title lookup * Moved CSS elements from other files into the above files and organized them by what they control * Added significant comments to CSS files about what rules are controlling and how those elements are used (more to do here) h3. Known Issues * The fancy tab thing for multiple code examples in one section isn't styled right when you click other tabs * The top nav won't be responsive in smaller screens * Behavior of sidebar on smaller screens could be improved * Still many overlapping CSS rules for elements and many unused CSS rules to be cleaned up * Sidebar requires too much scrolling - Phase 2 will trim this down * Now unused CSS/JS files haven't been deleted yet * Search box shows results in the sidebar nav - I wasn't able to see this until yesterday and not sure how I feel about it. At any rate, I haven't worked with it much yet and it needs more work * Home page (index.html) needs some additional love was (Author: ctargett): The branch with my work so far is in a branch ({{jira/solr-14173}}). Github link: https://github.com/apache/lucene-solr/tree/jira/solr-14173. It's still a WIP so I won't create a PR for it yet unless someone wants one. I did however, put up files at: http://home.apache.org/~ctargett/RefGuideRedesign/index.html. Feel free to take a look and let me know your thoughts on overall look & feel and if you find buggy behavior. *There are still bugs* - I'll list them below. h3. What's Changed *Updated dependencies* Updated: * Bootstrap 3.3.7 to 4.1.3 * JQuery 2.1.4 to 3.3.1 * AnchorJS 2.0.0 to 4.2.0 Added: * Malihu Custom Scrollbar 3.1.5 - to make the new sidebar scroll * PopperJS 1.14.3 - required by Bootstrap Removed: * {{toc.js}} - no longer used * {{ref-guide-toc.js}} - no longer used * TOC-related includes * Print-related layouts, includes, and CSS * Leftover PDF-only fonts *New Layout* * Sidebar nav is now fixed to the left side of the screen and content is to the right designed to use as much space to the right as the browser has available (up to 1238px or so, I think) * Top nav does not span the page, but stays on top of the content to give room for the sidebar nav * Changed the in-page TOCs to be built at the time the HTML is generated. This makes all in-page TOCs now always float to the right side of the page (IOW, we lost ability to choose where to put it but IMO this is a simplification) * Moved the "search" bar to the sidebar nav instead of the top nav *CSS Cleanup* * Removed lavish-bootstrap.css and replaced with Bootstrap's native CSS (which will be easier to upgrade in the future) * Re-organized all the CSS
[jira] [Commented] (SOLR-6613) TextField.analyzeMultiTerm should not throw exception when analyzer returns no term
[ https://issues.apache.org/jira/browse/SOLR-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013199#comment-17013199 ] ASF subversion and git services commented on SOLR-6613: --- Commit 72dea4919ebc79721167d451e7c7afa022aeee05 in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=72dea49 ] SOLR-6613: TextField.analyzeMultiTerm does not throw an exception when Analyzer returns no terms. (Bruno Roustant) > TextField.analyzeMultiTerm should not throw exception when analyzer returns > no term > --- > > Key: SOLR-6613 > URL: https://issues.apache.org/jira/browse/SOLR-6613 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.3.1, 4.10.2, 6.0 >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Attachments: TestTextField.java > > Time Spent: 1h > Remaining Estimate: 0h > > In TextField.analyzeMultiTerm() > at line > try { > if (!source.incrementToken()) > throw new SolrException(); > The method should not throw an exception if there is no token because having > no token is legitimate because all tokens may be filtered out (e.g. with a > blocking Filter such as StopFilter). > In this case it should simply return null (as it already returns null in some > cases, see first line of method). However, SolrQueryParserBase needs also to > be fixed to correctly handle null returned by TextField.analyzeMultiTerm(). > See attached TestTextField for the corresponding new test class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6613) TextField.analyzeMultiTerm should not throw exception when analyzer returns no term
[ https://issues.apache.org/jira/browse/SOLR-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant resolved SOLR-6613. -- Fix Version/s: 8.5 Resolution: Fixed > TextField.analyzeMultiTerm should not throw exception when analyzer returns > no term > --- > > Key: SOLR-6613 > URL: https://issues.apache.org/jira/browse/SOLR-6613 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.3.1, 4.10.2, 6.0 >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.5 > > Attachments: TestTextField.java > > Time Spent: 1h > Remaining Estimate: 0h > > In TextField.analyzeMultiTerm() > at line > try { > if (!source.incrementToken()) > throw new SolrException(); > The method should not throw an exception if there is no token because having > no token is legitimate because all tokens may be filtered out (e.g. with a > blocking Filter such as StopFilter). > In this case it should simply return null (as it already returns null in some > cases, see first line of method). However, SolrQueryParserBase needs also to > be fixed to correctly handle null returned by TextField.analyzeMultiTerm(). > See attached TestTextField for the corresponding new test class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief
[ https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013214#comment-17013214 ] Jason Gerlowski commented on SOLR-13934: I made a few tweaks visible on the PR: particularly I combined the SimplePostTool and "Windows Support" sections, which had some duplication. I also changed to asciidoc formatting to have 1 sentence per line, which allows subsequent changes to show up a bit nicer in diffs. bq. Related maybe, should a lot of the code in the shell script be moved into the post.jar? I agree. If we keep the bin/post tool around longer term (or someone gets a chance now), that logic should all move into the jar. There's no reason to duplicate it. The {{bin/solr}} and {{bin/solr.cmd}} scripts could benefit from this in much the same way, coincidentally. It's been on my backburner for a long time. (It's an easy change but a hard one to verify, since there's no tests to check bin/solr behavior with). I'll merge the doc changes soon. If anyone has suggestions (before or after) let me know. > Documentation on SimplePostTool for Windows users is pretty brief > - > > Key: SOLR-13934 > URL: https://issues.apache.org/jira/browse/SOLR-13934 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SimplePostTool >Affects Versions: 8.3 >Reporter: David Eric Pugh >Assignee: Jason Gerlowski >Priority: Minor > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > SimplePostTool on windows doesn't have enough documentation, you end up > googling to get it to work. Need to provide better example. > https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search
bruno-roustant commented on a change in pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search URL: https://github.com/apache/lucene-solr/pull/1160#discussion_r365452385 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java ## @@ -658,22 +658,84 @@ public String toDot() { public int step(int state, int label) { assert state >= 0; assert label >= 0; -int trans = states[2*state]; -int limit = trans + 3*states[2*state+1]; -// TODO: we could do bin search; transitions are sorted -while (trans < limit) { - int dest = transitions[trans]; - int min = transitions[trans+1]; - int max = transitions[trans+2]; - if (min <= label && label <= max) { -return dest; +int stateIndex = 2 * state; +int firstTransitionIndex = states[stateIndex]; +int numTransitions = states[stateIndex + 1]; + +// Since transitions are sorted, +// binary search the transition for which label is within [minLabel, maxLabel]. +int low = 0; +int high = numTransitions - 1; +while (low <= high) { + int mid = (low + high) >>> 1; + int transitionIndex = firstTransitionIndex + 3 * mid; + int minLabel = transitions[transitionIndex + 1]; + if (minLabel > label) { +high = mid - 1; + } else { +int maxLabel = transitions[transitionIndex + 2]; +if (maxLabel < label){ + low = mid + 1; +} else { + return transitions[transitionIndex]; +} } - trans += 3; } - return -1; } + /** + * Looks for the next transition that matches the provided label, assuming determinism. + * + * This method is similar to {@link #step(int, int)} but is used more efficiently + * when iterating over multiple transitions from the same source state. It keeps + * the latest reached transition index in {@code transition.transitionUpto} so + * the next call to this method can continue from there instead of restarting + * from the first transition. + * + * @param transition The transition to start the lookup from (inclusive, using its + * {@link Transition#source} and {@link Transition#transitionUpto}). + * It is updated with the matched transition; + * or with {@link Transition#dest} = -1 if no match. + * @param label The codepoint to look up. + * @return The destination state; or -1 if no matching outgoing transition. + */ + public int next(Transition transition, int label) { +// Copy of step() method with +// - binary search 'low' bound initialized to transition.transitionUpto. +// - param transition .dest/.min/.max/.transitionUpto set to the matching transition. +assert transition.source >= 0; +assert label >= 0; +int stateIndex = 2 * transition.source; +int firstTransitionIndex = states[stateIndex]; +int numTransitions = states[stateIndex + 1]; + +// Since transitions are sorted, Review comment: I'll try. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on issue #1157: Add RAT check using Gradle
madrob commented on issue #1157: Add RAT check using Gradle URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-573246947 Almost there! Precommit currently fails with some license failures, I'll need to look at that deeper what exclusions we're actually using in the ant build, but I think we're super close now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] MarcusSorealheis commented on issue #1141: SOLR-14147 change the Security manager to default to true.
MarcusSorealheis commented on issue #1141: SOLR-14147 change the Security manager to default to true. URL: https://github.com/apache/lucene-solr/pull/1141#issuecomment-573257518 should i set up a vm to test the commit on Windows? @rmuir @erikhatcher This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org