date:20200110

dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365108552
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,167 @@
+import org.gradle.api.internal.project.IsolatedAntBuilder
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// This applies the Apache RAT plugin to our source and test files
+
+// Largely copied from Apache Kafka
+apply plugin: RatPlugin
+// This invocation needs to go out to each project instead of being here
+rat {
 
 Review comment:
   This technically isn't an invocation; typically it configures defaults for 
some task. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365109858
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,167 @@
+import org.gradle.api.internal.project.IsolatedAntBuilder
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// This applies the Apache RAT plugin to our source and test files
+
+// Largely copied from Apache Kafka
+apply plugin: RatPlugin
+// This invocation needs to go out to each project instead of being here
+rat {
+}
+
+class RatTask extends DefaultTask {
+@Input
+List includes = []
+
+@Input
+List excludes = []
+
+def reportDir = project.file('build/rat')
+def xmlReport = new File(reportDir, 'rat-report.xml')
+
+def generateXmlReport(File reportDir) {
+// Probably better to use the IsolatedAntBuilder if we can, but it 
seems to have issues with substringMatcher
+// def antBuilder = services.get(IsolatedAntBuilder)
+
+def ratClasspath = project.configurations.rat
+def projectPath = project.getRootDir().getAbsolutePath()
+ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: 
ratClasspath.asPath)
+ant.report(format: 'xml', reportFile: xmlReport, 
addDefaultLicenseMatchers: true) {
+fileset(dir: projectPath) {
+patternset {
+includes.each {
+include(name: it)
+}
+excludes.each {
+exclude(name: it)
+}
+}
+}
+
+// The license rules below were manually copied from 
lucene/common-build.xml, there is currently no mechanism to sync them
+
+// BSD 4-clause stuff (is disallowed below)
+substringMatcher(licenseFamilyCategory: "BSD4 ", 
licenseFamilyName: "Original BSD License (with advertising clause)") {
+pattern(substring: "All advertising materials")
+}
+
+// BSD-like stuff
+substringMatcher(licenseFamilyCategory: "BSD  ", 
licenseFamilyName: "Modified BSD License") {
+// brics automaton
+pattern(substring: "Copyright (c) 2001-2009 Anders Moeller")
+// snowball
+pattern(substring: "Copyright (c) 2001, Dr Martin Porter")
+// UMASS kstem
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF 
MASSACHUSETTS AND OTHER CONTRIBUTORS")
+// Egothor
+pattern(substring: "Egothor Software License version 1.00")
+// JaSpell
+pattern(substring: "Copyright (c) 2005 Bruno Martins")
+// d3.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT 
HOLDERS AND CONTRIBUTORS")
+// highlight.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS 
AND CONTRIBUTORS")
+}
+
+// MIT-like
+substringMatcher(licenseFamilyCategory: "MIT  ", 
licenseFamilyName:"Modified BSD License") {
+// ICU license
+pattern(substring: "Permission is hereby granted, free of 
charge, to any person obtaining a copy")
+}
+
+// Apache
+substringMatcher(licenseFamilyCategory: "AL   ", 
licenseFamilyName: "Apache") {
+pattern(substring: "Licensed to the Apache Software Foundation 
(ASF) under")
+// this is the old - school one under some files
+pattern(substring: "Licensed under the Apache License, Version 
2.0 (the "License")")
+}
+
+substringMatcher(licenseFamilyCategory: "GEN  ", 
licenseFamilyName: "Generated") {
+// 
+pattern(substring: "Produced by GNUPLOT")
+// 
+pattern(substring: "This file was generated automatically by 
the Snowball to Java compiler")
+// 
+pattern(substring: "ANTLR GENERATED CODE")
+}
+
+approvedLicense(familyName: "Apache"

[jira] [Commented] (SOLR-14165) SolrResponse serialVersionUID has changed

2020-01-10 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012551#comment-17012551
 ] 

Ishan Chattopadhyaya commented on SOLR-14165:
-

Thank you very much [~andywebb1975] for discovering and fixing this bug. And 
also for pushing us for inclusion into 8.4.1, thus ensuring we don't drop the 
ball.

> SolrResponse serialVersionUID has changed
> -
>
> Key: SOLR-14165
> URL: https://issues.apache.org/jira/browse/SOLR-14165
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.4
>Reporter: Andy Webb
>Assignee: Noble Paul
>Priority: Blocker
> Fix For: 8.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SOLR-13821 changed the signature of 
> {{org.apache.solr.client.solrj.SolrResponse}}, making serialisations of the 
> class incompatible between versions.
> Original text from SOLR-13821:
> {quote}
> hi,
> We've been experimenting with doing a rolling in-place upgrade from Solr 
> 8.3.1 to 8.4.0 on a non-production system, but have found that we get this 
> exception for some operations, including when requesting 
> /solr/admin/collections?action=overseerstatus on a node whose version is 
> inconsistent with the overseer:
> java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse; 
> local class incompatible: stream classdesc serialVersionUID = 
> -7931100103360242645, local class serialVersionUID = 2239939671435624715
> As far as I can see, this is due to the change to the SolrResponse class's 
> signature in commit e3bd5a7. My experimentation has shown that if the 
> serialVersionUID of that class is set explicitly to its previous value the 
> exception no longer occurs.
> I'm not sure if this is a necessary or good fix, but I wanted to share this 
> issue with you in case it's something that you think needs resolving.
> thanks,
> Andy
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14066) Deprecate DIH

2020-01-10 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-14066:

Description: DataImportHandler has outlived its utility. DIH doesn't need 
to remain inside Solr anymore. Plan is to deprecate DIH in 8.5, remove from 
9.0. Also, work on handing it off to volunteers in the community (so far, 
[~rohitcse] has volunteered to maintain it).  (was: DataImportHandler has 
outlived its utility. DIH doesn't need to remain inside Solr anymore. Let us 
deprecate DIH in 8.4 (and remove it from the Solr distro in 9x or 10x).)

> Deprecate DIH
> -
>
> Key: SOLR-14066
> URL: https://issues.apache.org/jira/browse/SOLR-14066
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: image-2019-12-14-19-58-39-314.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DataImportHandler has outlived its utility. DIH doesn't need to remain inside 
> Solr anymore. Plan is to deprecate DIH in 8.5, remove from 9.0. Also, work on 
> handing it off to volunteers in the community (so far, [~rohitcse] has 
> volunteered to maintain it).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz opened a new pull request #1158: LUCENE-9116: Remove long[] from `PostingsWriterBase#encodeTerm`.

jpountz opened a new pull request #1158: LUCENE-9116: Remove long[] from 
`PostingsWriterBase#encodeTerm`.
URL: https://github.com/apache/lucene-solr/pull/1158
 
 
   All the metadata can be directly encoded in the `DataOutput`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata

2020-01-10 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012561#comment-17012561
 ] 

Adrien Grand commented on LUCENE-9116:
--

[~dsmiley] The attached pull request resurrects the FST postings format. Note 
that because of the other change, it now stores all outputs on final arcs. But 
I wonder that this was probably mostly the case previously already, and this 
doesn't seem to be a requirement for SolrTagger, which seems to only use this 
postings format because it needs a fast terms dictionary?

> Simplify postings API by removing long[] metadata
> -
>
> Key: LUCENE-9116
> URL: https://issues.apache.org/jira/browse/LUCENE-9116
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The postings API allows to store metadata about a term either in a long[] or 
> in a byte[]. This is unnecessary as all information could be encoded in the 
> byte[], which is what most codecs do in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9098) Report problematic term value when fuzzy query is too complex

2020-01-10 Thread Jim Ferenczi (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012567#comment-17012567
 ] 

Jim Ferenczi commented on LUCENE-9098:
--

The CI found a reproducible failure with this change:

{noformat}
ant test  -Dtestcase=TestFuzzyQuery -Dtests.method=testErrorMessage
-Dtests.seed=CE3DF037C6D29401 -Dtests.slow=true -Dtests.locale=fr-GN
-Dtests.timezone=US/Pacific-New -Dtests.asserts=true
-Dtests.file.encoding=ISO-8859-1
{noformat}

[~mdrob] can you take a look ?


> Report problematic term value when fuzzy query is too complex
> -
>
> Key: LUCENE-9098
> URL: https://issues.apache.org/jira/browse/LUCENE-9098
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is the lucene compliment to SOLR-13190, when fuzzy query gets a term 
> that expands to too many states, we throw an exception but don't provide 
> insight on the problematic term. We should improve the error reporting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9077) Gradle build



 [ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9077:

Description: 
This task focuses on providing gradle-based build equivalent for Lucene and 
Solr (on master branch). See notes below on why this respin is needed.

The code lives on *gradle-master* branch. It is kept with sync with *master*. 
Try running the following to see an overview of helper guides concerning 
typical workflow, testing and ant-migration helpers:

gradlew :help

A list of items that needs to be added or requires work. If you'd like to work 
on any of these, please add your name to the list. Once you have a patch/ pull 
request let me (dweiss) know - I'll try to coordinate the merges.
 * (/) Apply forbiddenAPIs
 * (/) Generate hardware-aware gradle defaults for parallelism (count of 
workers and test JVMs).
 * (/) Fail the build if --tests filter is applied and no tests execute during 
the entire build (this allows for an empty set of filtered tests at single 
project level).
 * (/) Port other settings and randomizations from common-build.xml
 * (/) Configure security policy/ sandboxing for tests.
 * (/) test's console output on -Ptests.verbose=true
 * (/) add a :helpDeps explanation to how the dependency system works (palantir 
plugin, lockfile) and how to retrieve structured information about current 
dependencies of a given module (in a tree-like output).
 * (/) jar checksums, jar checksum computation and validation. This should be 
done without intermediate folders (directly on dependency sets).
 * (/) verify min. JVM version and exact gradle version on build startup to 
minimize odd build side-effects
 * (/) Repro-line for failed tests/ runs.
 * (/) add a top-level README note about building with gradle (and the required 
JVM).
 * (/) add an equivalent of 'validate-source-patterns' 
(check-source-patterns.groovy) to precommit.
 * add an equivalent of 'rat-sources' to precommit.
 * add an equivalent of 'check-example-lucene-match-version' (solr only) to 
precommit.
 * add an equivalent of 'documentation-lint" to precommit.

Hard-to-implement stuff already investigated:
 * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
to be any way to do this in a reasonably efficient way. There are onOutput 
listeners but they're slow to operate and solr tests emit *tons* of output so 
it's an overkill.-
 * (!) (LUCENE-9120) *Tests working with security-debug logs or other JVM-early 
log output*. Gradle's test runner works by redirecting Java's stdout/ syserr so 
this just won't work. Perhaps we can spin the ant-based test runner for such 
corner-cases.

Of lesser importance:
 * add rendering of javadocs (gradlew javadoc) and attaching them to maven 
publications.
 * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
it'll be difficult to run it sensibly because gradle doesn't offer cwd 
separation for the forked test runners.
 * if you diff solr packaged distribution against ant-created distribution 
there are minor differences in library versions and some JARs are excluded/ 
moved around. I didn't try to force these as everything seems to work (tests, 
etc.) – perhaps these differences should  be fixed in the ant build instead.
 * identify and port any other "check" utilities that may be called from ant. 
(Mark's branch has some of this stuff already implemented)
 * [EOE] identify and port various "regenerate" tasks from ant builds (javacc, 
precompiled automata, etc.)
 * fill in POM details in gradle/defaults-maven.gradle so that they reflect the 
previous content better (dependencies aside).
 * Add any IDE integration layers that should be added (I use IntelliJ and it 
imports the project out of the box, without the need for any special tuning).
 * *Clean up dependencies, especially for Solr*: any \{ transitive = false } 
should just explicitly exclude whatever they don't need (and their dependencies 
currently declared explicitly should be folded). Figure out which scope to 
import a dependency to.
 * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; currently 
XSLT...)
 * I didn't bother adding Solr dist/test-framework to packaging (who'd use it 
from a binary distribution? 

 

*{color:#ff}Note:{color}* this builds on the work done by Mark Miller and 
Cao Mạnh Đạt but also applies lessons learned from those two efforts:
 * *Do not try to do too many things at once*. If we deviate too far from 
master, the branch will be hard to merge.
 * *Do everything in baby-steps* and add small, independent build fragments 
replacing the old ant infrastructure.
 * *Try to engage people to run, test and contribute early*. It can't be a 
one-man effort. The more people understand and can contribute to the build, the 
more healthy it will be.

 

  was:
This task focuses on providing gradle-based build equivalent for Lucene and 
Solr (on ma

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365184348
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,167 @@
+import org.gradle.api.internal.project.IsolatedAntBuilder
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// This applies the Apache RAT plugin to our source and test files
+
+// Largely copied from Apache Kafka
+apply plugin: RatPlugin
+// This invocation needs to go out to each project instead of being here
+rat {
+}
+
+class RatTask extends DefaultTask {
+@Input
+List includes = []
+
+@Input
+List excludes = []
+
+def reportDir = project.file('build/rat')
+def xmlReport = new File(reportDir, 'rat-report.xml')
+
+def generateXmlReport(File reportDir) {
+// Probably better to use the IsolatedAntBuilder if we can, but it 
seems to have issues with substringMatcher
+// def antBuilder = services.get(IsolatedAntBuilder)
+
+def ratClasspath = project.configurations.rat
+def projectPath = project.getRootDir().getAbsolutePath()
 
 Review comment:
   This is wrong I think. It should be project.projectDir


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365185129
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,167 @@
+import org.gradle.api.internal.project.IsolatedAntBuilder
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// This applies the Apache RAT plugin to our source and test files
+
+// Largely copied from Apache Kafka
+apply plugin: RatPlugin
+// This invocation needs to go out to each project instead of being here
+rat {
+}
+
+class RatTask extends DefaultTask {
+@Input
+List includes = []
+
+@Input
+List excludes = []
+
+def reportDir = project.file('build/rat')
+def xmlReport = new File(reportDir, 'rat-report.xml')
+
+def generateXmlReport(File reportDir) {
+// Probably better to use the IsolatedAntBuilder if we can, but it 
seems to have issues with substringMatcher
+// def antBuilder = services.get(IsolatedAntBuilder)
+
+def ratClasspath = project.configurations.rat
+def projectPath = project.getRootDir().getAbsolutePath()
+ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: 
ratClasspath.asPath)
+ant.report(format: 'xml', reportFile: xmlReport, 
addDefaultLicenseMatchers: true) {
+fileset(dir: projectPath) {
+patternset {
+includes.each {
+include(name: it)
+}
+excludes.each {
+exclude(name: it)
+}
+}
+}
+
+// The license rules below were manually copied from 
lucene/common-build.xml, there is currently no mechanism to sync them
+
+// BSD 4-clause stuff (is disallowed below)
+substringMatcher(licenseFamilyCategory: "BSD4 ", 
licenseFamilyName: "Original BSD License (with advertising clause)") {
+pattern(substring: "All advertising materials")
+}
+
+// BSD-like stuff
+substringMatcher(licenseFamilyCategory: "BSD  ", 
licenseFamilyName: "Modified BSD License") {
+// brics automaton
+pattern(substring: "Copyright (c) 2001-2009 Anders Moeller")
+// snowball
+pattern(substring: "Copyright (c) 2001, Dr Martin Porter")
+// UMASS kstem
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF 
MASSACHUSETTS AND OTHER CONTRIBUTORS")
+// Egothor
+pattern(substring: "Egothor Software License version 1.00")
+// JaSpell
+pattern(substring: "Copyright (c) 2005 Bruno Martins")
+// d3.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT 
HOLDERS AND CONTRIBUTORS")
+// highlight.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS 
AND CONTRIBUTORS")
+}
+
+// MIT-like
+substringMatcher(licenseFamilyCategory: "MIT  ", 
licenseFamilyName:"Modified BSD License") {
+// ICU license
+pattern(substring: "Permission is hereby granted, free of 
charge, to any person obtaining a copy")
+}
+
+// Apache
+substringMatcher(licenseFamilyCategory: "AL   ", 
licenseFamilyName: "Apache") {
+pattern(substring: "Licensed to the Apache Software Foundation 
(ASF) under")
+// this is the old - school one under some files
+pattern(substring: "Licensed under the Apache License, Version 
2.0 (the "License")")
+}
+
+substringMatcher(licenseFamilyCategory: "GEN  ", 
licenseFamilyName: "Generated") {
+// 
+pattern(substring: "Produced by GNUPLOT")
+// 
+pattern(substring: "This file was generated automatically by 
the Snowball to Java compiler")
+// 
+pattern(substring: "ANTLR GENERATED CODE")
+}
+
+approvedLicense(familyName: "Apache"

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1157: Add RAT check using Gradle

dweiss commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365185988
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,167 @@
+import org.gradle.api.internal.project.IsolatedAntBuilder
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// This applies the Apache RAT plugin to our source and test files
+
+// Largely copied from Apache Kafka
+apply plugin: RatPlugin
+// This invocation needs to go out to each project instead of being here
+rat {
+}
+
+class RatTask extends DefaultTask {
+@Input
+List includes = []
+
+@Input
+List excludes = []
+
+def reportDir = project.file('build/rat')
+def xmlReport = new File(reportDir, 'rat-report.xml')
+
+def generateXmlReport(File reportDir) {
+// Probably better to use the IsolatedAntBuilder if we can, but it 
seems to have issues with substringMatcher
+// def antBuilder = services.get(IsolatedAntBuilder)
+
+def ratClasspath = project.configurations.rat
+def projectPath = project.getRootDir().getAbsolutePath()
+ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: 
ratClasspath.asPath)
+ant.report(format: 'xml', reportFile: xmlReport, 
addDefaultLicenseMatchers: true) {
+fileset(dir: projectPath) {
+patternset {
+includes.each {
+include(name: it)
+}
+excludes.each {
+exclude(name: it)
+}
+}
+}
+
+// The license rules below were manually copied from 
lucene/common-build.xml, there is currently no mechanism to sync them
+
+// BSD 4-clause stuff (is disallowed below)
+substringMatcher(licenseFamilyCategory: "BSD4 ", 
licenseFamilyName: "Original BSD License (with advertising clause)") {
+pattern(substring: "All advertising materials")
+}
+
+// BSD-like stuff
+substringMatcher(licenseFamilyCategory: "BSD  ", 
licenseFamilyName: "Modified BSD License") {
+// brics automaton
+pattern(substring: "Copyright (c) 2001-2009 Anders Moeller")
+// snowball
+pattern(substring: "Copyright (c) 2001, Dr Martin Porter")
+// UMASS kstem
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF 
MASSACHUSETTS AND OTHER CONTRIBUTORS")
+// Egothor
+pattern(substring: "Egothor Software License version 1.00")
+// JaSpell
+pattern(substring: "Copyright (c) 2005 Bruno Martins")
+// d3.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT 
HOLDERS AND CONTRIBUTORS")
+// highlight.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS 
AND CONTRIBUTORS")
+}
+
+// MIT-like
+substringMatcher(licenseFamilyCategory: "MIT  ", 
licenseFamilyName:"Modified BSD License") {
+// ICU license
+pattern(substring: "Permission is hereby granted, free of 
charge, to any person obtaining a copy")
+}
+
+// Apache
+substringMatcher(licenseFamilyCategory: "AL   ", 
licenseFamilyName: "Apache") {
+pattern(substring: "Licensed to the Apache Software Foundation 
(ASF) under")
+// this is the old - school one under some files
+pattern(substring: "Licensed under the Apache License, Version 
2.0 (the "License")")
+}
+
+substringMatcher(licenseFamilyCategory: "GEN  ", 
licenseFamilyName: "Generated") {
+// 
+pattern(substring: "Produced by GNUPLOT")
+// 
+pattern(substring: "This file was generated automatically by 
the Snowball to Java compiler")
+// 
+pattern(substring: "ANTLR GENERATED CODE")
+}
+
+approvedLicense(familyName: "Apache"

[jira] [Updated] (LUCENE-9077) Gradle build



 [ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9077:

Description: 
This task focuses on providing gradle-based build equivalent for Lucene and 
Solr (on master branch). See notes below on why this respin is needed.

The code lives on *gradle-master* branch. It is kept with sync with *master*. 
Try running the following to see an overview of helper guides concerning 
typical workflow, testing and ant-migration helpers:

gradlew :help

A list of items that needs to be added or requires work. If you'd like to work 
on any of these, please add your name to the list. Once you have a patch/ pull 
request let me (dweiss) know - I'll try to coordinate the merges.
 * (/) Apply forbiddenAPIs
 * (/) Generate hardware-aware gradle defaults for parallelism (count of 
workers and test JVMs).
 * (/) Fail the build if --tests filter is applied and no tests execute during 
the entire build (this allows for an empty set of filtered tests at single 
project level).
 * (/) Port other settings and randomizations from common-build.xml
 * (/) Configure security policy/ sandboxing for tests.
 * (/) test's console output on -Ptests.verbose=true
 * (/) add a :helpDeps explanation to how the dependency system works (palantir 
plugin, lockfile) and how to retrieve structured information about current 
dependencies of a given module (in a tree-like output).
 * (/) jar checksums, jar checksum computation and validation. This should be 
done without intermediate folders (directly on dependency sets).
 * (/) verify min. JVM version and exact gradle version on build startup to 
minimize odd build side-effects
 * (/) Repro-line for failed tests/ runs.
 * (/) add a top-level README note about building with gradle (and the required 
JVM).
 * (/) add an equivalent of 'validate-source-patterns' 
(check-source-patterns.groovy) to precommit.
 * add an equivalent of 'rat-sources' to precommit.
 * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) to 
precommit.
 * add an equivalent of 'documentation-lint" to precommit.

Hard-to-implement stuff already investigated:
 * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
to be any way to do this in a reasonably efficient way. There are onOutput 
listeners but they're slow to operate and solr tests emit *tons* of output so 
it's an overkill.-
 * (!) (LUCENE-9120) *Tests working with security-debug logs or other JVM-early 
log output*. Gradle's test runner works by redirecting Java's stdout/ syserr so 
this just won't work. Perhaps we can spin the ant-based test runner for such 
corner-cases.

Of lesser importance:
 * add rendering of javadocs (gradlew javadoc) and attaching them to maven 
publications.
 * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
it'll be difficult to run it sensibly because gradle doesn't offer cwd 
separation for the forked test runners.
 * if you diff solr packaged distribution against ant-created distribution 
there are minor differences in library versions and some JARs are excluded/ 
moved around. I didn't try to force these as everything seems to work (tests, 
etc.) – perhaps these differences should  be fixed in the ant build instead.
 * identify and port any other "check" utilities that may be called from ant. 
(Mark's branch has some of this stuff already implemented)
 * [EOE] identify and port various "regenerate" tasks from ant builds (javacc, 
precompiled automata, etc.)
 * fill in POM details in gradle/defaults-maven.gradle so that they reflect the 
previous content better (dependencies aside).
 * Add any IDE integration layers that should be added (I use IntelliJ and it 
imports the project out of the box, without the need for any special tuning).
 * *Clean up dependencies, especially for Solr*: any \{ transitive = false } 
should just explicitly exclude whatever they don't need (and their dependencies 
currently declared explicitly should be folded). Figure out which scope to 
import a dependency to.
 * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; currently 
XSLT...)
 * I didn't bother adding Solr dist/test-framework to packaging (who'd use it 
from a binary distribution? 

 

*{color:#ff}Note:{color}* this builds on the work done by Mark Miller and 
Cao Mạnh Đạt but also applies lessons learned from those two efforts:
 * *Do not try to do too many things at once*. If we deviate too far from 
master, the branch will be hard to merge.
 * *Do everything in baby-steps* and add small, independent build fragments 
replacing the old ant infrastructure.
 * *Try to engage people to run, test and contribute early*. It can't be a 
one-man effort. The more people understand and can contribute to the build, the 
more healthy it will be.

 

  was:
This task focuses on providing gradle-based build equivalent for Lucene and 
Solr (o

[jira] [Updated] (LUCENE-9077) Gradle build

2020-01-10 Thread ASF subversion and git services (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9077:

Description: 
This task focuses on providing gradle-based build equivalent for Lucene and 
Solr (on master branch). See notes below on why this respin is needed.

The code lives on *gradle-master* branch. It is kept with sync with *master*. 
Try running the following to see an overview of helper guides concerning 
typical workflow, testing and ant-migration helpers:

gradlew :help

A list of items that needs to be added or requires work. If you'd like to work 
on any of these, please add your name to the list. Once you have a patch/ pull 
request let me (dweiss) know - I'll try to coordinate the merges.
 * (/) Apply forbiddenAPIs
 * (/) Generate hardware-aware gradle defaults for parallelism (count of 
workers and test JVMs).
 * (/) Fail the build if --tests filter is applied and no tests execute during 
the entire build (this allows for an empty set of filtered tests at single 
project level).
 * (/) Port other settings and randomizations from common-build.xml
 * (/) Configure security policy/ sandboxing for tests.
 * (/) test's console output on -Ptests.verbose=true
 * (/) add a :helpDeps explanation to how the dependency system works (palantir 
plugin, lockfile) and how to retrieve structured information about current 
dependencies of a given module (in a tree-like output).
 * (/) jar checksums, jar checksum computation and validation. This should be 
done without intermediate folders (directly on dependency sets).
 * (/) verify min. JVM version and exact gradle version on build startup to 
minimize odd build side-effects
 * (/) Repro-line for failed tests/ runs.
 * (/) add a top-level README note about building with gradle (and the required 
JVM).
 * (/) add an equivalent of 'validate-source-patterns' 
(check-source-patterns.groovy) to precommit.
 * add an equivalent of 'rat-sources' to precommit.
 * (/) add an equivalent of 'check-example-lucene-match-version' (solr only) to 
precommit.
 * add an equivalent of 'documentation-lint" to precommit.

Hard-to-implement stuff already investigated:
 * (/) (done)  -*Printing console output of failed tests.* There doesn't seem 
to be any way to do this in a reasonably efficient way. There are onOutput 
listeners but they're slow to operate and solr tests emit *tons* of output so 
it's an overkill.-
 * (!) (LUCENE-9120) *Tests working with security-debug logs or other JVM-early 
log output*. Gradle's test runner works by redirecting Java's stdout/ syserr so 
this just won't work. Perhaps we can spin the ant-based test runner for such 
corner-cases.

Of lesser importance:
 * add rendering of javadocs (gradlew javadoc) and attaching them to maven 
publications.
 * Add test 'beasting' (rerunning the same suite multiple times). I'm afraid 
it'll be difficult to run it sensibly because gradle doesn't offer cwd 
separation for the forked test runners.
 * if you diff solr packaged distribution against ant-created distribution 
there are minor differences in library versions and some JARs are excluded/ 
moved around. I didn't try to force these as everything seems to work (tests, 
etc.) – perhaps these differences should  be fixed in the ant build instead.
 * [EOE] identify and port various "regenerate" tasks from ant builds (javacc, 
precompiled automata, etc.)
 * fill in POM details in gradle/defaults-maven.gradle so that they reflect the 
previous content better (dependencies aside).
 * Add any IDE integration layers that should be added (I use IntelliJ and it 
imports the project out of the box, without the need for any special tuning).
 * *Clean up dependencies, especially for Solr*: any \{ transitive = false } 
should just explicitly exclude whatever they don't need (and their dependencies 
currently declared explicitly should be folded). Figure out which scope to 
import a dependency to.
 * Add Solr packaging for docs/* (see TODO in packaging/build.gradle; currently 
XSLT...)
 * I didn't bother adding Solr dist/test-framework to packaging (who'd use it 
from a binary distribution? 

 

*{color:#ff}Note:{color}* this builds on the work done by Mark Miller and 
Cao Mạnh Đạt but also applies lessons learned from those two efforts:
 * *Do not try to do too many things at once*. If we deviate too far from 
master, the branch will be hard to merge.
 * *Do everything in baby-steps* and add small, independent build fragments 
replacing the old ant infrastructure.
 * *Try to engage people to run, test and contribute early*. It can't be a 
one-man effort. The more people understand and can contribute to the build, the 
more healthy it will be.

 

  was:
This task focuses on providing gradle-based build equivalent for Lucene and 
Solr (on master branch). See notes below on why this respin is needed.

The code lives on *gradle-master* branch. It is kept with sync with *mast

[jira] [Commented] (SOLR-14158) package manager to read keys from packagestore and not ZK



[ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012773#comment-17012773
 ] 

ASF subversion and git services commented on SOLR-14158:


Commit 6fb085943c6e9c6f82db67c6ccfe641e64e1899e in lucene-solr's branch 
refs/heads/gradle-master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6fb0859 ]

SOLR-14158: Package manager to read keys from package store, not ZK


> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Affects Versions: 8.4
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: packagemanager
> Fix For: 8.4.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  
> This will
> * Have a special directory called {{_trusted_}} . Direct writes are forbidden 
> to that directory over http
>  * The CLI directly writes to the keys to 
> {{/filestore/_trusted_/keys/}} directory. Other nodes are asked to 
> fetch the public key files from that node
>  * Package artifacts will continue to be uploaded over http



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-11554) Support handling OPTIONS request for Hadoop authentication filter

2020-01-10 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gézapeti updated SOLR-11554:

Attachment: SOLR-11554.patch

> Support handling OPTIONS request for Hadoop authentication filter
> -
>
> Key: SOLR-11554
> URL: https://issues.apache.org/jira/browse/SOLR-11554
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 6.4
>Reporter: Hrishikesh Gadre
>Priority: Minor
> Attachments: SOLR-11554.patch
>
>
> As part of SOLR-9513 we added a Solr authentication plugin which uses Hadoop 
> security framework. The HTTP client interface provided by Hadoop framework 
> does not send the authentication information preemptively. Instead it sends 
> an OPTIONS request first. If the server responds with 401 error, it resends 
> the request with the proper authentication information. This jira is to 
> handle the OPTIONS request as part of the Solr authentication plugin for 
> Hadoop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties

2020-01-10 Thread Andrzej Bialecki (Jira)

Andrzej Bialecki created SOLR-14182:
---

 Summary: Move metric reporters config from solr.xml to ZK cluster 
properties
 Key: SOLR-14182
 URL: https://issues.apache.org/jira/browse/SOLR-14182
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.4
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
 Fix For: 8.5


Metric reporters are currently configured statically in solr.xml, which makes 
it difficult to change dynamically or in a containerized environment.

We should move this section to ZK /cluster.properties and add a back-compat 
migration shim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012875#comment-17012875
 ] 

ASF subversion and git services commented on SOLR-14130:


Commit d68f3e1a441b39485900e9d94e9686fb12b4ff87 in lucene-solr's branch 
refs/heads/master from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d68f3e1 ]

SOLR-14130: Improve robustness of the logs parser


> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 
> PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 
> 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012881#comment-17012881
 ] 

ASF subversion and git services commented on SOLR-14130:


Commit 1cb085afcbc04e861b76955bfd4944141c47d1ad in lucene-solr's branch 
refs/heads/branch_8x from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1cb085a ]

SOLR-14130: Improve robustness of the logs parser


> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 
> PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 
> 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13892) Add postfilter support to {!join} queries

2020-01-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012909#comment-17012909
 ] 

ASF subversion and git services commented on SOLR-13892:


Commit 4712524860553504a0557810eebc43db54bb8ce9 in lucene-solr's branch 
refs/heads/jira/SOLR-13892 from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4712524 ]

SOLR-13892: Add "join" postfilter implementation


> Add postfilter support to {!join} queries
> -
>
> Key: SOLR-13892
> URL: https://issues.apache.org/jira/browse/SOLR-13892
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13892.patch, SOLR-13892.patch
>
>
> The JoinQParserPlugin would be a lot performant in many use-cases if it could 
> operate as a post-filter, especially when doc-values for the involved fields 
> are available.
> With this issue, I'd like to propose a post-filter implementation for the 
> {{join}} qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

Bruno Roustant created LUCENE-9125:
--

 Summary: Improve Automaton.step() with binary search and introduce 
Automaton.next()
 Key: LUCENE-9125
 URL: https://issues.apache.org/jira/browse/LUCENE-9125
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Bruno Roustant


Implement the existing todo in Automaton.step() (lookup a transition from a 
source state depending on a given label) to use binary search since the 
transitions are sorted.

Introduce new method Automaton.next() to optimize iteration & lookup over all 
the transitions of a state. This will be used in RunAutomaton constructor and 
in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija commented on issue #1159: SOLR-13892: Add top-level docValues "join" implementation

gerlowskija commented on issue #1159: SOLR-13892: Add top-level docValues 
"join" implementation
URL: https://github.com/apache/lucene-solr/pull/1159#issuecomment-573054962
 
 
   This is not ready yet.  But I figured it was far enough along to let others 
provide in-line review if they'd like.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija opened a new pull request #1159: SOLR-13892: Add top-level docValues "join" implementation

gerlowskija opened a new pull request #1159: SOLR-13892: Add top-level
docValues "join" implementation
URL: https://github.com/apache/lucene-solr/pull/1159

# Description

Many "join" use-cases can be made more performant by using "top-level"
docValues data structures, instead of the per-segment structures that are used
currently. Users should have the ability to pick between top-level and
per-segment, based on the particulars of their index and use case.

# Solution
This PR introduces a "top-level" implementation in the form of a "join"
postfilter. Users get the "top-level" behavior by specifying `cache=false
cost=101` as local params on their join.

We may decide to repackage this implementation as a Two-Phase Iterator
before merging, though that is still up in the air.

# Tests
Functional tests have been added in TestJoin.java. Performance tests
validating the improvement performance in select use-cases can be found in the
comments on SOLR-13892.

# Checklist

Please review the following and check all that apply:

- [X] I have reviewed the guidelines for [How to
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms
to the standards described there to the best of my ability.
- [X] I have created a Jira issue and added the issue ID to my pull request
title.
- [X] I have given Solr maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended)
- [X] I have developed this patch against the `master` branch.
- [X] I have run `ant precommit` and the appropriate test suite.
- [X] I have added tests for my changes.
- [ ] I have added documentation for the [Ref
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide)
(for Solr changes only).

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija edited a comment on issue #1159: SOLR-13892: Add top-level docValues "join" implementation

gerlowskija edited a comment on issue #1159: SOLR-13892: Add top-level 
docValues "join" implementation
URL: https://github.com/apache/lucene-solr/pull/1159#issuecomment-573054962
 
 
   This is not ready yet.  But I figured it was far enough along to let others 
provide in-line review if they'd like.
   
   Still needed:
   
   - review of tests
   - clarify/unify ref-guide coverage on types of joins and when each should be 
used.
   - minor cleanup


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository url.

zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository 
url.
URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573055429
 
 
   @joel-bernstein
   Cloudera is hosting a mirror of the restlet repository now, so I've added 
the new repo url instead of the old one which is not in use since 6.6/7.0.
   (dependent feat added in https://issues.apache.org/jira/browse/SOLR-1301, 
removed in https://issues.apache.org/jira/browse/SOLR-9221 )


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search

bruno-roustant opened a new pull request #1160: LUCENE-9125: Improve 
Automaton.step() with binary search
URL: https://github.com/apache/lucene-solr/pull/1160
 
 
   and introduce Automaton.next().


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2020-01-10 Thread Zsolt Gyulavari (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012918#comment-17012918
 ] 

Zsolt Gyulavari commented on SOLR-13756:


[~jbernste] 
I'm happy to announce that Cloudera is hosting a mirror of the restlet 
repository from now.

I've added the new repo url instead of the old one which is not in use since 
6.6/7.0.
(dependent feature added in https://issues.apache.org/jira/browse/SOLR-1301, 
removed in https://issues.apache.org/jira/browse/SOLR-9221 )

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()



[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012926#comment-17012926
 ] 

Bruno Roustant commented on LUCENE-9125:


I benchmarked using non-trivial automata for automaton/fuzzy queries.

Making Automaton.step() use binary search reduces step() call time by -20% (and 
obviously it becomes O(log(n))).

By introducing Automaton.next() to iterate & lookup more efficiently through a 
state transitions, I measured -40% binary search loop executions. This is a net 
gain with same functional logic, because each time we increase the lower bound 
for the binary search instead of always starting from the first transition.
This will result in faster AutomatonQuery and FuzzyQuery construction.

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()



[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012926#comment-17012926
 ] 

Bruno Roustant edited comment on LUCENE-9125 at 1/10/20 2:37 PM:
-

I benchmarked using non-trivial automata for automaton/fuzzy queries.

Making Automaton.step() use binary search reduces step() call time by -20% (and 
obviously it becomes O(log n)).

By introducing Automaton.next() to iterate & lookup more efficiently through a 
state transitions, I measured -40% binary search loop executions. This is a net 
gain with same functional logic, because each time we increase the lower bound 
for the binary search instead of always starting from the first transition.
 This will result in faster AutomatonQuery and FuzzyQuery construction.


was (Author: broustant):
I benchmarked using non-trivial automata for automaton/fuzzy queries.

Making Automaton.step() use binary search reduces step() call time by -20% (and 
obviously it becomes O(log(n))).

By introducing Automaton.next() to iterate & lookup more efficiently through a 
state transitions, I measured -40% binary search loop executions. This is a net 
gain with same functional logic, because each time we increase the lower bound 
for the binary search instead of always starting from the first transition.
This will result in faster AutomatonQuery and FuzzyQuery construction.

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()



[ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012926#comment-17012926
 ] 

Bruno Roustant edited comment on LUCENE-9125 at 1/10/20 2:39 PM:
-

I benchmarked using non-trivial automata for automaton/fuzzy queries.

Making Automaton.step() use binary search reduced step() call time by -20% in 
my tests (and obviously it becomes O(log n)).

By introducing Automaton.next() to iterate & lookup more efficiently through a 
state transitions, I measured -40% binary search loop executions. This is a net 
gain with same functional logic, because each time we increase the lower bound 
for the binary search instead of always starting from the first transition.
 This will result in faster AutomatonQuery and FuzzyQuery construction.


was (Author: broustant):
I benchmarked using non-trivial automata for automaton/fuzzy queries.

Making Automaton.step() use binary search reduces step() call time by -20% (and 
obviously it becomes O(log n)).

By introducing Automaton.next() to iterate & lookup more efficiently through a 
state transitions, I measured -40% binary search loop executions. This is a net 
gain with same functional logic, because each time we increase the lower bound 
for the binary search instead of always starting from the first transition.
 This will result in faster AutomatonQuery and FuzzyQuery construction.

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --
>
> Key: LUCENE-9125
> URL: https://issues.apache.org/jira/browse/LUCENE-9125
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13892) Add postfilter support to {!join} queries



 [ 
https://issues.apache.org/jira/browse/SOLR-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13892:
---
Attachment: join-increasing-from-matches1.png

> Add postfilter support to {!join} queries
> -
>
> Key: SOLR-13892
> URL: https://issues.apache.org/jira/browse/SOLR-13892
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13892.patch, SOLR-13892.patch, 
> join-increasing-from-matches1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The JoinQParserPlugin would be a lot performant in many use-cases if it could 
> operate as a post-filter, especially when doc-values for the involved fields 
> are available.
> With this issue, I'd like to propose a post-filter implementation for the 
> {{join}} qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-01-10 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012942#comment-17012942
 ] 

Tomoko Uchida commented on LUCENE-9004:
---

Hi,
 it seems that some devs are strongly interested in this issue and I privately 
have received feedback (and expectations). So I just wanted to share my latest 
WIP branch.
 
[https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2]
 And an usage code snippet for that is: 
[https://gist.github.com/mocobeta/a5b18506ebc933c0afa7ab61d1dd2295]

I introduced a brand new codecs and indexer for vector search so this no longer 
depends on DocValues, though it's still on pretty early stage (especially, 
segment merging is not yet implemented).


 I intend to continue to work and I'll do my best, but to be honest I am not 
sure if my approach is the best - or I can create a great patch that can be 
merged to Lucene core... I welcome that someone takes over it in some 
different, more sophisticated/efficient ways. My current attempt might be 
useful as a reference or the starting point.
  

 

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joi

[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search

2020-01-10 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012942#comment-17012942
 ] 

Tomoko Uchida edited comment on LUCENE-9004 at 1/10/20 3:02 PM:


Hi,
 it seems that some devs are strongly interested in this issue and I privately 
have received feedback (and expectations). So I just wanted to share my latest 
WIP branch.
 
[https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2]
 And here is an usage code snippet for that: 
[https://gist.github.com/mocobeta/a5b18506ebc933c0afa7ab61d1dd2295]

I introduced a brand new codec and indexer for vector search so this no longer 
depends on DocValues, though it's still on pretty early stage (especially, 
segment merging is not yet implemented).

I intend to continue to work and I'll do my best, but to be honest I am not 
sure if my approach is the best - or I can create a great patch that can be 
merged to Lucene core... I welcome that someone takes over it in some 
different, more sophisticated/efficient ways. My current attempt might be 
useful as a reference or the starting point.
  

 


was (Author: tomoko uchida):
Hi,
 it seems that some devs are strongly interested in this issue and I privately 
have received feedback (and expectations). So I just wanted to share my latest 
WIP branch.
 
[https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2]
 And an usage code snippet for that is: 
[https://gist.github.com/mocobeta/a5b18506ebc933c0afa7ab61d1dd2295]

I introduced a brand new codecs and indexer for vector search so this no longer 
depends on DocValues, though it's still on pretty early stage (especially, 
segment merging is not yet implemented).


 I intend to continue to work and I'll do my best, but to be honest I am not 
sure if my approach is the best - or I can create a great patch that can be 
merged to Lucene core... I welcome that someone takes over it in some 
different, more sophisticated/efficient ways. My current attempt might be 
useful as a reference or the starting point.
  

 

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural a

[jira] [Commented] (SOLR-13892) Add postfilter support to {!join} queries



[ 
https://issues.apache.org/jira/browse/SOLR-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012945#comment-17012945
 ] 

Jason Gerlowski commented on SOLR-13892:


Revisiting this after some time away.  I've moved the code to a PR here: 
https://github.com/apache/lucene-solr/pull/1159.  I've found this confusing on 
some jiras, so to be explicit: *all patches on this jira predate the PR and are 
out of date*.

The main (albeit, temporary) addition to the PR at this point is a performance 
test driver I threw together to show a use-case where the top-level DV approach 
shines performance-wise.  The performance test mimics a common setup for doing 
document-level authorization: a "user_acls" collection has users and the groups 
they belong to, and a "products" collection has product records with a field 
representing the groups that record is visible to.

The performance test stages the "user_acls" data with 100 users (user1, user2 
...user100): each belonging to an increasing number of groups.  This lets us 
show one advantage of the top-level DV approach: it scales much better with the 
number of "from" matches.

 !join-increasing-from-matches1.png! 

The takeaway here isn't that the top-level approach is better or worse than 
existing approaches.  This perf test is only one specific use case after all.  
But it's pretty clear that it serves some use-cases better and it's worth 
getting in.

Next steps for this jira are:
* consider Two-Phase Iterator instead of postfilter.  (I'm not sure TPI makes 
as much sense here as it did on SOLR-13890, but still thinking through some of 
the lessons learned there.)
* cleanup
* clarify (unify?) ref-guide coverage on different joins, and when each 
can/should be used.

> Add postfilter support to {!join} queries
> -
>
> Key: SOLR-13892
> URL: https://issues.apache.org/jira/browse/SOLR-13892
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13892.patch, SOLR-13892.patch, 
> join-increasing-from-matches1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The JoinQParserPlugin would be a lot performant in many use-cases if it could 
> operate as a post-filter, especially when doc-values for the involved fields 
> are available.
> With this issue, I'd like to propose a post-filter implementation for the 
> {{join}} qparser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9126) Javadoc linting options silently swallow documentation errors

Dawid Weiss created LUCENE-9126:
---

 Summary: Javadoc linting options silently swallow documentation 
errors
 Key: LUCENE-9126
 URL: https://issues.apache.org/jira/browse/LUCENE-9126
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss


I tried to compile javadocs in gradle and I couldn't do it... The output was 
full of errors.

I eventually narrowed the problem down to lint options – how they are 
interpreted and parsed just doesn't make any sense to me. Try this:

{code}
# Examples below use plain javadoc from Java 11.
cd lucene/core
{code}

This emulates what we have in Ant (this is roughly the options Ant emits):
{code}
javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages org 
-quiet -Xdoclint:all -Xdoclint:-missing -Xdoclint:-accessibility
=> no errors.
{code}

Now rerun it with this syntax:
{code}
javadoc -d build\output -encoding "UTF-8" -sourcepath src\java -subpackages org 
-quiet -Xdoclint:all,-missing,-accessibility
=> 100 errors, 5 warnings
{code}

This time javadoc displays errors about undefined tags (unknown tag: 
lucene.experimental), HTML warnings (warning: empty  tag), etc.

Let's add our custom tags and add overview file:
{code}
javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" 
-tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output -encoding 
"UTF-8" -sourcepath src\java -subpackages org -quiet 
-Xdoclint:all,-missing,-accessibility
=> 100 errors, 5 warnings
=> still HTML warnings
{code}

Let's get rid of html linting:
{code}
javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" 
-tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output -encoding 
"UTF-8" -sourcepath src\java -subpackages org -quiet 
-Xdoclint:all,-missing,-accessibility,-html
=> 3 errors
=> malformed HTML syntax in overview.html: src\java\overview.html:150: error: 
bad use of '>' (>)
{code}

Finally, let's get rid of syntax linting:
{code}
javadoc -overview "src/java/overview.html" -tag "lucene.experimental:a:xxx" 
-tag "lucene.internal:a:xxx" -tag "lucene.spi:t:xxx" -d build\output -encoding 
"UTF-8" -sourcepath src\java -subpackages org -quiet 
-Xdoclint:all,-missing,-accessibility,-html,-syntax
=> passes
{code}

There are definitely bugs in our documentation -- look at the extra ">" in the 
overview file, for example:
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/overview.html#L150

What I can't understand is why the first syntax suppresses pretty much ALL the 
errors, including missing custom tag definitions. This should work, given 
what's written in [1]?

[1] https://docs.oracle.com/en/java/javase/11/tools/javadoc.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary



[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012946#comment-17012946
 ] 

David Smiley commented on LUCENE-3069:
--

FYI [~billy] the FSTOrd postings format is slated for removal in LUCENE-9116.  
It's a matter of maintenance, so if you or someone wishes to keep it around, 
some work is needed.  Fortunately FSTPostingsFormat is staying.

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>Priority: Major
>  Labels: gsoc2013
> Fix For: 4.7
>
> Attachments: LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, df-ttf-estimate.txt, 
> example.png
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata



[ 
https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012950#comment-17012950
 ] 

David Smiley commented on LUCENE-9116:
--

Indeed; the SolrTextTagger doesn't fundamentally require any particular 
postingsFormat but it beats so hard on the term dictionary that "FST50" 
performs really well.

I approved your PR; changes look good.  I love the simplification!

I commented on LUCENE-3069 which introduced this format to alert Watchers there 
about this matter. Lets be fully transparent about decisions to remove things.  
Lets not commit this for a week.

> Simplify postings API by removing long[] metadata
> -
>
> Key: LUCENE-9116
> URL: https://issues.apache.org/jira/browse/LUCENE-9116
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The postings API allows to store metadata about a term either in a long[] or 
> in a byte[]. This is unnecessary as all information could be encoded in the 
> byte[], which is what most codecs do in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2020-01-10 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012967#comment-17012967
 ] 

Joel Bernstein commented on SOLR-13756:
---

I think that makes sense until the restlet artifacts make it to maven central, 
which they may never (It's been 12 years and haven't made it yet). 
[~uschindler], any thoughts on this? 

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository url.

uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository 
url.
URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573084882
 
 
   This is not everything. You also need to change the Pom.xml.template files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] asfgit closed pull request #1146: SOLR-6613: TextField.analyzeMultiTerm does not throw an exception…

2020-01-10 Thread ASF subversion and git services (Jira)

asfgit closed pull request #1146: SOLR-6613: TextField.analyzeMultiTerm does 
not throw an exception…
URL: https://github.com/apache/lucene-solr/pull/1146
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-6613) TextField.analyzeMultiTerm should not throw exception when analyzer returns no term



[ 
https://issues.apache.org/jira/browse/SOLR-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012981#comment-17012981
 ] 

ASF subversion and git services commented on SOLR-6613:
---

Commit 0b072ecedb93202a132612e72cd880fdcc51ea25 in lucene-solr's branch 
refs/heads/master from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0b072ec ]

SOLR-6613: TextField.analyzeMultiTerm does not throw an exception when Analyzer 
returns no terms. (Bruno Roustant)

Closes #1146


> TextField.analyzeMultiTerm should not throw exception when analyzer returns 
> no term
> ---
>
> Key: SOLR-6613
> URL: https://issues.apache.org/jira/browse/SOLR-6613
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.3.1, 4.10.2, 6.0
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Attachments: TestTextField.java
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In TextField.analyzeMultiTerm()
> at line
> try {
>   if (!source.incrementToken())
> throw new SolrException();
> The method should not throw an exception if there is no token because having 
> no token is legitimate because all tokens may be filtered out (e.g. with a 
> blocking Filter such as StopFilter).
> In this case it should simply return null (as it already returns null in some 
> cases, see first line of method). However, SolrQueryParserBase needs also to 
> be fixed to correctly handle null returned by TextField.analyzeMultiTerm().
> See attached TestTextField for the corresponding new test class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search

dweiss commented on a change in pull request #1160: LUCENE-9125: Improve 
Automaton.step() with binary search
URL: https://github.com/apache/lucene-solr/pull/1160#discussion_r365306207
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java
 ##
 @@ -658,22 +658,84 @@ public String toDot() {
   public int step(int state, int label) {
 assert state >= 0;
 assert label >= 0;
-int trans = states[2*state];
-int limit = trans + 3*states[2*state+1];
-// TODO: we could do bin search; transitions are sorted
-while (trans < limit) {
-  int dest = transitions[trans];
-  int min = transitions[trans+1];
-  int max = transitions[trans+2];
-  if (min <= label && label <= max) {
-return dest;
+int stateIndex = 2 * state;
+int firstTransitionIndex = states[stateIndex];
+int numTransitions = states[stateIndex + 1];
+
+// Since transitions are sorted,
+// binary search the transition for which label is within [minLabel, 
maxLabel].
+int low = 0;
+int high = numTransitions - 1;
+while (low <= high) {
+  int mid = (low + high) >>> 1;
+  int transitionIndex = firstTransitionIndex + 3 * mid;
+  int minLabel = transitions[transitionIndex + 1];
+  if (minLabel > label) {
+high = mid - 1;
+  } else {
+int maxLabel = transitions[transitionIndex + 2];
+if (maxLabel < label){
+  low = mid + 1;
+} else {
+  return transitions[transitionIndex];
+}
   }
-  trans += 3;
 }
-
 return -1;
   }
 
+  /**
+   * Looks for the next transition that matches the provided label, assuming 
determinism.
+   * 
+   * This method is similar to {@link #step(int, int)} but is used more 
efficiently
+   * when iterating over multiple transitions from the same source state. It 
keeps
+   * the latest reached transition index in {@code transition.transitionUpto} 
so
+   * the next call to this method can continue from there instead of restarting
+   * from the first transition.
+   *
+   * @param transition The transition to start the lookup from (inclusive, 
using its
+   *   {@link Transition#source} and {@link 
Transition#transitionUpto}).
+   *   It is updated with the matched transition;
+   *   or with {@link Transition#dest} = -1 if no match.
+   * @param label  The codepoint to look up.
+   * @return The destination state; or -1 if no matching outgoing transition.
+   */
+  public int next(Transition transition, int label) {
+// Copy of step() method with
+// - binary search 'low' bound initialized to transition.transitionUpto.
+// - param transition .dest/.min/.max/.transitionUpto set to the matching 
transition.
+assert transition.source >= 0;
+assert label >= 0;
+int stateIndex = 2 * transition.source;
+int firstTransitionIndex = states[stateIndex];
+int numTransitions = states[stateIndex + 1];
+
+// Since transitions are sorted,
 
 Review comment:
   can we extract the binary search subroutine into a separate method? It'd 
inline anyway, very likely and it'd result in less code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata

2020-01-10 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013007#comment-17013007
 ] 

Adrien Grand commented on LUCENE-9116:
--

David, I'm worried that we might be setting a precedent here. Today we are very 
permissive when it comes to adding a new postings format to Lucene, like 
UniformSplit recently. I like it this way, it helps drive innovation, and 
hopefully some of the ideas of these experimental formats eventually get merged 
into the default codec like the pulsing optimization. I don't think we should 
make experimental formats harder to remove than add, otherwise they get in the 
way of improving the default codec, which is wrong. I called out that I was 
removing these formats in a comment, I can send a notice to the dev list next 
time to give it more visibility.

On a separate note, I think the Solr docs should be explicit that non-default 
codecs/formats are not supported backward-compatibility-wise. Users might be 
surprised otherwise to get corruption errors when upgrading to a new minor?

> Simplify postings API by removing long[] metadata
> -
>
> Key: LUCENE-9116
> URL: https://issues.apache.org/jira/browse/LUCENE-9116
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The postings API allows to store metadata about a term either in a long[] or 
> in a byte[]. This is unnecessary as all information could be encoded in the 
> byte[], which is what most codecs do in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2020-01-10 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013009#comment-17013009
 ] 

Uwe Schindler commented on SOLR-13756:
--

We do not need to change anything here, except changing the repository urls of 
Cloudera. But very important: resolve order should have Cloudera last, if it 
also has stuff from other repos, otherwise it's not good security wise, as I 
trust Maven Central more than Cloudera. But I really would like to keep the 
Cloudera local repo, as it contains only its own stuff.

I think when restlet fixes it's redirects we are fine also with their server.

We can't fix older releases.

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository url.

uschindler commented on issue #1144: SOLR-13756 updated restlet mvn repository 
url.
URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573098779
 
 
   Please don't change the symbolic names. Only update urls.
   
   I am not fully happy with adding a repository that contains stuff mirrored 
from maven Central. So the newly added Cloudera one should not contain stuff 
except Cloudera stuff and restlet. Please not a mirror of significant parts of 
Maven Central!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13756) ivy cannot download org.restlet.ext.servlet jar

2020-01-10 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013025#comment-17013025
 ] 

Joel Bernstein commented on SOLR-13756:
---

Restlet has fixed the redirects and I was able to do a maven build of 
[https://github.com/lucidworks/zeppelin-solr], which has dependencies on Solr. 
I believe the older versions of Solr can now be built as well.

The restlet dependencies may become a problem again in the future. So having 
Cloudera hosting them as well is a good insurance policy. And agreed Cloudera 
should resolve last. 

> ivy cannot download org.restlet.ext.servlet jar
> ---
>
> Key: SOLR-13756
> URL: https://issues.apache.org/jira/browse/SOLR-13756
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chongchen Chen
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I checkout the project and run `ant idea`, it will try to download jars. But  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
>  will return 404 now.  
> [ivy:retrieve] public: tried
> [ivy:retrieve]  
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
> [ivy:retrieve]::
> [ivy:retrieve]::  FAILED DOWNLOADS::
> [ivy:retrieve]:: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]::
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet;2.3.0!org.restlet.jar
> [ivy:retrieve]:: 
> org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar
> [ivy:retrieve]::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

gerlowskija commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365315479
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
 ##
 @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) {
 docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 
5x name any way
   @Override
   Query makeFilter(String fname, BytesRef[] byteRefs) {
-return new DocValuesTermsQuery(fname, byteRefs);//constant scores
+// TODO Further tune this heuristic number
+return (byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs);
+  }
+},
+docValuesTermsFilterTopLevel {
+  @Override
+  Query makeFilter(String fname, BytesRef[] byteRefs) {
+return new TopLevelDocValuesTermsQuery(fname, byteRefs);
 
 Review comment:
   I'd forgotten about this, done.
   
   FWIW I buy Joel's argument that most real-world use-cases are going to 
require `cache=false`.  But I worry that users who want caching will be bitten 
by this.  I guess it depends how hands-on you expect users to be with their 
query tuning.  For hands-off users, this change is best. But for hands-on 
users, we've just created a "gotcha".
   
   I'll make sure to add this to the ref-guide docs at the least.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository url.

zsgyulavari commented on issue #1144: SOLR-13756 updated restlet mvn repository 
url.
URL: https://github.com/apache/lucene-solr/pull/1144#issuecomment-573104015
 
 
   Thanks for the quick review.
   I will revert the symbolic name of the restlet repository, but for the 
cloudera one it's not just an URL update, but a whole different mirror, so it 
might be justified if it doesn't break something I'm unaware of.
   
   I'll have to check back next week about whether we can set up a new mirror 
just for the restlets and nothing else. The original cloudera releases repo is 
no longer needed imho.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9127) index migration from 7 to 8 failing

2020-01-10 Thread Niranjan (Jira)

Niranjan created LUCENE-9127:


 Summary: index migration from 7 to 8 failing
 Key: LUCENE-9127
 URL: https://issues.apache.org/jira/browse/LUCENE-9127
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 8.4
Reporter: Niranjan


we have been using solr4 for more than 16 year, now it is to time to upgrade, 
when we have decided to upgrade, started with migrating index from 4 -> 5, 5 -> 
6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it give me 
errors.
Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: 
Format version is not supported (resource 
BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))):
 This index was initially created with Lucene 6.x while the current version is 
8.4.0 and Lucene only supports reading the current and previous major 
versions.. This version of Lucene only supports indexes created with release 
7.0 and later.
command I'm trying 
{noformat}
java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar 
org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index 
-verbose {noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9127) index migration from 7 to 8 failing

2020-01-10 Thread Niranjan (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niranjan updated LUCENE-9127:
-
Description: 
we have been using solr4 for more than 16 year, now it is to time to upgrade, 
when we have decided to upgrade, started with migrating index from 4 -> 5, 5 -> 
6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it give me 
errors.
{noformat}
 Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: 
Format version is not supported (resource 
BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))):
 This index was initially created with Lucene 6.x while the current version is 
8.4.0 and Lucene only supports reading the current and previous major 
versions.. This version of Lucene only supports indexes created with release 
7.0 and later.{noformat}
{noformat}
java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar 
org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index 
-verbose {noformat}
 

  was:
we have been using solr4 for more than 16 year, now it is to time to upgrade, 
when we have decided to upgrade, started with migrating index from 4 -> 5, 5 -> 
6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it give me 
errors.
Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: 
Format version is not supported (resource 
BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))):
 This index was initially created with Lucene 6.x while the current version is 
8.4.0 and Lucene only supports reading the current and previous major 
versions.. This version of Lucene only supports indexes created with release 
7.0 and later.
command I'm trying 
{noformat}
java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar 
org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index 
-verbose {noformat}
 


> index migration from 7 to 8 failing
> ---
>
> Key: LUCENE-9127
> URL: https://issues.apache.org/jira/browse/LUCENE-9127
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.4
>Reporter: Niranjan
>Priority: Major
>
> we have been using solr4 for more than 16 year, now it is to time to upgrade, 
> when we have decided to upgrade, started with migrating index from 4 -> 5, 5 
> -> 6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it 
> give me errors.
> {noformat}
>  Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
> supported (resource 
> BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))):
>  This index was initially created with Lucene 6.x while the current version 
> is 8.4.0 and Lucene only supports reading the current and previous major 
> versions.. This version of Lucene only supports indexes created with release 
> 7.0 and later.{noformat}
> {noformat}
> java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar 
> org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index 
> -verbose {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #1157: Add RAT check using Gradle

madrob commented on a change in pull request #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#discussion_r365319087
 
 

 ##
 File path: gradle/validation/rat-sources.gradle
 ##
 @@ -0,0 +1,167 @@
+import org.gradle.api.internal.project.IsolatedAntBuilder
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// This applies the Apache RAT plugin to our source and test files
+
+// Largely copied from Apache Kafka
+apply plugin: RatPlugin
+// This invocation needs to go out to each project instead of being here
+rat {
+}
+
+class RatTask extends DefaultTask {
+@Input
+List includes = []
+
+@Input
+List excludes = []
+
+def reportDir = project.file('build/rat')
+def xmlReport = new File(reportDir, 'rat-report.xml')
+
+def generateXmlReport(File reportDir) {
+// Probably better to use the IsolatedAntBuilder if we can, but it 
seems to have issues with substringMatcher
+// def antBuilder = services.get(IsolatedAntBuilder)
+
+def ratClasspath = project.configurations.rat
+def projectPath = project.getRootDir().getAbsolutePath()
+ant.taskdef(resource: 'org/apache/rat/anttasks/antlib.xml', classpath: 
ratClasspath.asPath)
+ant.report(format: 'xml', reportFile: xmlReport, 
addDefaultLicenseMatchers: true) {
+fileset(dir: projectPath) {
+patternset {
+includes.each {
+include(name: it)
+}
+excludes.each {
+exclude(name: it)
+}
+}
+}
+
+// The license rules below were manually copied from 
lucene/common-build.xml, there is currently no mechanism to sync them
+
+// BSD 4-clause stuff (is disallowed below)
+substringMatcher(licenseFamilyCategory: "BSD4 ", 
licenseFamilyName: "Original BSD License (with advertising clause)") {
+pattern(substring: "All advertising materials")
+}
+
+// BSD-like stuff
+substringMatcher(licenseFamilyCategory: "BSD  ", 
licenseFamilyName: "Modified BSD License") {
+// brics automaton
+pattern(substring: "Copyright (c) 2001-2009 Anders Moeller")
+// snowball
+pattern(substring: "Copyright (c) 2001, Dr Martin Porter")
+// UMASS kstem
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF 
MASSACHUSETTS AND OTHER CONTRIBUTORS")
+// Egothor
+pattern(substring: "Egothor Software License version 1.00")
+// JaSpell
+pattern(substring: "Copyright (c) 2005 Bruno Martins")
+// d3.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT 
HOLDERS AND CONTRIBUTORS")
+// highlight.js
+pattern(substring: "THIS SOFTWARE IS PROVIDED BY THE REGENTS 
AND CONTRIBUTORS")
+}
+
+// MIT-like
+substringMatcher(licenseFamilyCategory: "MIT  ", 
licenseFamilyName:"Modified BSD License") {
+// ICU license
+pattern(substring: "Permission is hereby granted, free of 
charge, to any person obtaining a copy")
+}
+
+// Apache
+substringMatcher(licenseFamilyCategory: "AL   ", 
licenseFamilyName: "Apache") {
+pattern(substring: "Licensed to the Apache Software Foundation 
(ASF) under")
+// this is the old - school one under some files
+pattern(substring: "Licensed under the Apache License, Version 
2.0 (the "License")")
+}
+
+substringMatcher(licenseFamilyCategory: "GEN  ", 
licenseFamilyName: "Generated") {
+// 
+pattern(substring: "Produced by GNUPLOT")
+// 
+pattern(substring: "This file was generated automatically by 
the Snowball to Java compiler")
+// 
+pattern(substring: "ANTLR GENERATED CODE")
+}
+
+approvedLicense(familyName: "Apache"

[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2020-01-10 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013041#comment-17013041
 ] 

Michael Sokolov commented on LUCENE-9004:
-

[~tomoko] you are too modest! The 
[LUCENE-9004-aknn-2|https://github.com/mocobeta/lucene-solr-mirror/commits/jira/LUCENE-9004-aknn-2]
 is in pretty good shape! It is functionally correct now, implementing the 
hierarchical version of the paper sited in the LUCENE-9004 issue. Also, I 
believe with the patch I posted there, we now have merging working, and I think 
search across multiple segments falls out naturally since you implemented a 
Query that can be collected in the usual way across segments.

I also did some comparisons with the C/C++ version in 
[https://github.com/nmslib/hnswlib,|https://github.com/nmslib/hnswlib] the 
reference implementation, and got similar overlap results with vanilla 
hyper-parameter choices, so I am pretty confident you have faithfully 
reproduced that algorithm. Now it has to be said that performance is not what 
we would want - it's quite a bit slower than the C/C++ version. I haven't had a 
chance to dig in to the cause yet, but I suspect we could be helped by ensuring 
that vector computations are done using vectorized instructions. We might also 
be able to reduce object instantiation here and there.

I think it's time to post back to a branch in the Apache git repository so we 
can enlist contributions from the community here to help this go forward. I'll 
try to get that done this weekend

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch

[GitHub] [lucene-solr] gerlowskija commented on issue #1151: SOLR-13890: Add "top-level" DVTQ implementation

gerlowskija commented on issue #1151: SOLR-13890: Add "top-level" DVTQ 
implementation
URL: https://github.com/apache/lucene-solr/pull/1151#issuecomment-573107306
 
 
   Ready for review again when you get a chance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9100) JapaneseTokenizer produces inconsistent tokens

2020-01-10 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013045#comment-17013045
 ] 

Michael McCandless commented on LUCENE-9100:


{quote}Maybe a solution here is to use the tokenizer with 
`discardPunctuation==false`, then stripping the punctuation tokens in a filter.
{quote}
+1, that sounds like a possible workaround.

But it's still spooky that tokens can be formed across (deleted) punctuation ...

> JapaneseTokenizer produces inconsistent tokens
> --
>
> Key: LUCENE-9100
> URL: https://issues.apache.org/jira/browse/LUCENE-9100
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 7.2
>Reporter: Elbek Kamoliddinov
>Priority: Major
>
> We use {{JapaneseTokenizer}} on prod and seeing some inconsistent behavior. 
> With this text:
>  {{"マギアリス【単版話】 4話 (Unlimited Comics)"}} I get different results if I insert 
> space before `【` char. Here is the small code snippet demonstrating the case 
> (note we use our own dictionary and connection costs):
> {code:java}
> Analyzer analyzer = new Analyzer() {
> @Override
> protected TokenStreamComponents createComponents(String 
> fieldName) {
> //Tokenizer tokenizer = new 
> JapaneseTokenizer(newAttributeFactory(), null, true, 
> JapaneseTokenizer.Mode.SEARCH);
> Tokenizer tokenizer = new 
> JapaneseTokenizer(newAttributeFactory(), dictionaries.systemDictionary, 
> dictionaries.unknownDictionary, dictionaries.connectionCosts, null, true, 
> JapaneseTokenizer.Mode.SEARCH);
> return new TokenStreamComponents(tokenizer, new 
> LowerCaseFilter(tokenizer));
> }
> };
> String text1 = "マギアリス【単版話】 4話 (Unlimited Comics)";
> String text2 = "マギアリス 【単版話】 4話 (Unlimited Comics)"; //inserted space
> try (TokenStream tokens = analyzer.tokenStream("field", new 
> StringReader(text1))) {
> CharTermAttribute chars = 
> tokens.addAttribute(CharTermAttribute.class);
> tokens.reset();
> while (tokens.incrementToken()) {
> System.out.println(chars.toString());
> }
> tokens.end();
> } catch (IOException e) {
> // should never happen with a StringReader
> throw new RuntimeException(e);
> } {code}
> Output is:
> {code:java}
> //text1
>  マギ
> アリス
> 単
> 版
> 話
> 4
> 話
> unlimited
> comics
> //text2
> マギア
> リス
> 単
> 版
> 話
> 4
> 話
> unlimited
> comics{code}
> It looks like tokenizer doesn't view the punctuation ({{【}} is 
> {{Character.START_PUNCTUATION}} type) as an indicator that there should be a 
> token break, and somehow 【 punctuation char causes difference in the output. 
> If I use the {{JapaneseTokenizer}} tokenizer then this problem doesn't 
> manifest because it doesn't tokenize {{マギアリス}} into multiple tokens and 
> outputs as is. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on issue #1157: Add RAT check using Gradle

madrob commented on issue #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-573107997
 
 
   Cleaned a bunch of stuff up, included your review feedback. Thanks, @dweiss!
   
   One piece that I'm still struggling with is that `./gradlew rat` will 
execute on the root but doesn't delegate to `:lucene:rat` and `:solr:rat` (and 
in fact those targets don't even exist), and then I'd like those to delegate to 
rat in all the sub-modules. I tried going through our other validation tasks to 
look for examples and quickly got lost - can you point me to the best practices 
here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

dsmiley commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365336343
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
 ##
 @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) {
 docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 
5x name any way
   @Override
   Query makeFilter(String fname, BytesRef[] byteRefs) {
-return new DocValuesTermsQuery(fname, byteRefs);//constant scores
+// TODO Further tune this heuristic number
+return (byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs);
+  }
+},
+docValuesTermsFilterTopLevel {
+  @Override
+  Query makeFilter(String fname, BytesRef[] byteRefs) {
+return new TopLevelDocValuesTermsQuery(fname, byteRefs);
 
 Review comment:
   This is just about a default.  So-called "Hands-on users" (experts) will be 
able to be explicit with cache=true.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

gerlowskija commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365341958
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
 ##
 @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) {
 docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 
5x name any way
   @Override
   Query makeFilter(String fname, BytesRef[] byteRefs) {
-return new DocValuesTermsQuery(fname, byteRefs);//constant scores
+// TODO Further tune this heuristic number
+return (byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs);
+  }
+},
+docValuesTermsFilterTopLevel {
+  @Override
+  Query makeFilter(String fname, BytesRef[] byteRefs) {
+return new TopLevelDocValuesTermsQuery(fname, byteRefs);
 
 Review comment:
   Sure, setting`cache=true` is trivial.  It's _knowing_ that you need to be 
explicit for this one query in particular that's the tricky part.
   
   But sure, I'm just hand-wringing; I've already made the change.  It's 
documented in the ref-guide, so that's as good as we're going to get.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

gerlowskija commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365341958
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
 ##
 @@ -88,7 +91,20 @@ Query makeFilter(String fname, BytesRef[] byteRefs) {
 docValuesTermsFilter {//on 4x this is FieldCacheTermsFilter but we use the 
5x name any way
   @Override
   Query makeFilter(String fname, BytesRef[] byteRefs) {
-return new DocValuesTermsQuery(fname, byteRefs);//constant scores
+// TODO Further tune this heuristic number
+return (byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs);
+  }
+},
+docValuesTermsFilterTopLevel {
+  @Override
+  Query makeFilter(String fname, BytesRef[] byteRefs) {
+return new TopLevelDocValuesTermsQuery(fname, byteRefs);
 
 Review comment:
   Sure, setting`cache=true` is trivial.  It's _knowing_ that you need to be 
explicit for this one query in particular that's the tricky part.
   
   But I'm just hand-wringing; I've already made the change.  It's documented 
in the ref-guide, so that's as good as we're going to get.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9127) index migration from 7 to 8 failing

2020-01-10 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved LUCENE-9127.

Resolution: Information Provided

Upgrading more than one major version of Lucene is explicitly not supported. 
Starting with Lucene 6, a marker is written into segments indicating which 
version of Lucene it was created with. When segments are merged, the earliest 
marker is preserved, so even if you rewrite all your segments with 7x, the 6x 
marker is preserved.

Lucene will refuse to open any index that has a marker lower than X-1, so if 
Lucene 8x sees any segment where the earliest marker is 6x or earlier (or is 
missing), it'll throw an exception.

You have to re-index your data from the system of record into a fresh 8x 
collection.

> index migration from 7 to 8 failing
> ---
>
> Key: LUCENE-9127
> URL: https://issues.apache.org/jira/browse/LUCENE-9127
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.4
>Reporter: Niranjan
>Priority: Major
>
> we have been using solr4 for more than 16 year, now it is to time to upgrade, 
> when we have decided to upgrade, started with migrating index from 4 -> 5, 5 
> -> 6, 6 -> 7, it was working as expected, but when it comes to 7 -> 8 , it 
> give me errors.
> {noformat}
>  Exception in thread "main" 
> org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
> supported (resource 
> BufferedChecksumIndexInput(MMapIndexInput(path="/deploy/solrmaster/data/data//index/segments_4n"))):
>  This index was initially created with Lucene 6.x while the current version 
> is 8.4.0 and Lucene only supports reading the current and previous major 
> versions.. This version of Lucene only supports indexes created with release 
> 7.0 and later.{noformat}
> {noformat}
> java -cp ./lucene-core-8.4.0.jar:./lucene-backward-codecs-8.4.0.jar 
> org.apache.lucene.index.IndexUpgrader /deploy/solrmaster/data/data/job/index 
> -verbose {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

dsmiley commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365344397
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
 ##
 @@ -92,22 +92,28 @@ Query makeFilter(String fname, BytesRef[] byteRefs) {
   @Override
   Query makeFilter(String fname, BytesRef[] byteRefs) {
 // TODO Further tune this heuristic number
-return (byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs);
+return disableCacheByDefault((byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs));
 
 Review comment:
   right here simply call docValuesTermsFilterTopLevel.makeFilter or 
docValuesTermsFilterPerSegment.makeFilter.  Yes, enum methods may refer to 
other enums in the same enum :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

dsmiley commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365345185
 
 

 ##
 File path: solr/solr-ref-guide/src/other-parsers.adoc
 ##
 @@ -1037,11 +1037,11 @@ An optional parameter used to determine which of 
several query implementations s
 +
 `termsFilter` the default `method`.  Uses a `BooleanQuery` or a 
`TermInSetQuery` depending on the number of terms.  Scales well with index 
size, but only moderately with the number of query terms.
 +
-`docValuesTermsFilter` can only be used on fields with docValues data.  
Chooses between the `docValuesTermsFilterTopLevel` and 
`docValuesTermsFilterPerSegment` methods using the number of query terms as a 
rough heuristic.  Users should typically use this method instead of using 
`docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, 
unless they've done performance testing to validate one of the methods on 
queries of all sizes.  Depending on the implementation picked, this method may 
rely on expensive data structures which are lazily populated after each commit. 
 If you commit frequently and your use-case can tolerate a static warming 
query, consider adding one to `solrconfig.xml` so that this work is done as a 
part of the commit itself and not attached directly to user requests.
+`docValuesTermsFilter` can only be used on fields with docValues data.  The 
`cache` parameter is false by default.  Chooses between the 
`docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods 
using the number of query terms as a rough heuristic.  Users should typically 
use this method instead of using `docValuesTermsFilterTopLevel` or 
`docValuesTermsFilterPerSegment` directly, unless they've done performance 
testing to validate one of the methods on queries of all sizes.  Depending on 
the implementation picked, this method may rely on expensive data structures 
which are lazily populated after each commit.  If you commit frequently and 
your use-case can tolerate a static warming query, consider adding one to 
`solrconfig.xml` so that this work is done as a part of the commit itself and 
not attached directly to user requests.
 
 Review comment:
   Just a side-comment about our ref docs:  In the ref docs I write or update, 
I convert them to one sentence per line.  This makes the diffs easy to read!  
The pain of no newlines is very apparent here.  Change or not as you wish.  CC 
@ctargett 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11369) Zookeeper credentials are showed up on the Solr Admin GUI



[ 
https://issues.apache.org/jira/browse/SOLR-11369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013085#comment-17013085
 ] 

Jason Gerlowski commented on SOLR-11369:


This is fixed in 7.x and above.  SOLR-12976 isn't about hiding specific 
properties, it's about combining a few settings to make them easier to 
use/understand.

> Zookeeper credentials are showed up on the Solr Admin GUI
> -
>
> Key: SOLR-11369
> URL: https://issues.apache.org/jira/browse/SOLR-11369
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI, security
>Reporter: Ivan Pekhov
>Priority: Major
>
> Hello Guys,
> We've been noticing this problem with Solr version 5.4.1 and it's still the 
> case for the version 6.6.0. The problem is that we're using SolrCloud with 
> secured Zookeeper and our users are granted access to Solr Admin GUI, and, at 
> the same time, they are not supposed to have access to Zookeeper credentials, 
> i.e. usernames and passwords. However, we (and some of our users) have found 
> out that Zookeeper credentials are displayed on at least two sections of the 
> Solr Admin GUI, i.e. "Dashboard" and "Java Properties".
> Having taken a look at the JavaScript code that runs behind the scenes for 
> those pages, we can see that the sensitive parameters ( -DzkDigestPassword, 
> -DzkDigestReadonlyPassword, -DzkDigestReadonlyUsername, -DzkDigestUsername ) 
> are fetched via AJAX from the following two URL paths:
> /solr/admin/info/system
> /solr/admin/info/properties
> Could you please consider for the future Solr releases removing the Zookeeper 
> parameters mentioned above from the output of these URLs and from other URLs 
> that contain this information in their output, if there are any besides the 
> ones mentioned? We find that it is be pretty challenging (and probably 
> impossible) to restrict users from accessing some particular paths with 
> security.json mechanism, and we think that that would be beneficial for 
> overall Solr security to hide Zookeeper credentials.
> Thank you so much for your consideration!
> Best regards,
> Ivan Pekhov



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-10 Thread Chris M. Hostetter (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013095#comment-17013095
 ] 

Chris M. Hostetter commented on SOLR-11746:
---

{quote}I believe that is the only backwards incompatibility that was introduced 
by the reverted patch.
{quote}
that was _my_ only backcompat concern: the redefining how NaN behaved in range 
queries (which has since been reverted)

I agree, fixing {{foo:*}} in point fields and docValues only trie fields to 
behave consistently is a bug fix – wasn't trying to suggest that fixing that 
bug was a backcompat problem.  my point about that old comment where i refered 
to the "current" (circa 2017) behaior of point fields is that the only one that 
seems to be tested in the current code/patches is the sinple  {{foo:*}} case 
being fixed to do something useful -- we should also make sure we have tests 
that hte other nonsensical input cases give informative (and consistent) errors.

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0
>Reporter: Chris M. Hostetter
>Assignee: Houston Putman
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch, 
> SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

gerlowskija commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r36535
 
 

 ##
 File path: solr/solr-ref-guide/src/other-parsers.adoc
 ##
 @@ -1037,11 +1037,11 @@ An optional parameter used to determine which of 
several query implementations s
 +
 `termsFilter` the default `method`.  Uses a `BooleanQuery` or a 
`TermInSetQuery` depending on the number of terms.  Scales well with index 
size, but only moderately with the number of query terms.
 +
-`docValuesTermsFilter` can only be used on fields with docValues data.  
Chooses between the `docValuesTermsFilterTopLevel` and 
`docValuesTermsFilterPerSegment` methods using the number of query terms as a 
rough heuristic.  Users should typically use this method instead of using 
`docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, 
unless they've done performance testing to validate one of the methods on 
queries of all sizes.  Depending on the implementation picked, this method may 
rely on expensive data structures which are lazily populated after each commit. 
 If you commit frequently and your use-case can tolerate a static warming 
query, consider adding one to `solrconfig.xml` so that this work is done as a 
part of the commit itself and not attached directly to user requests.
+`docValuesTermsFilter` can only be used on fields with docValues data.  The 
`cache` parameter is false by default.  Chooses between the 
`docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods 
using the number of query terms as a rough heuristic.  Users should typically 
use this method instead of using `docValuesTermsFilterTopLevel` or 
`docValuesTermsFilterPerSegment` directly, unless they've done performance 
testing to validate one of the methods on queries of all sizes.  Depending on 
the implementation picked, this method may rely on expensive data structures 
which are lazily populated after each commit.  If you commit frequently and 
your use-case can tolerate a static warming query, consider adding one to 
`solrconfig.xml` so that this work is done as a part of the commit itself and 
not attached directly to user requests.
 
 Review comment:
   I thought about this too when I saw the diff.  I remember it coming up as a 
discussion point when the ref-guide was initially being hashed out too.
   
   I won't change my docs here, as it's probably worth keeping the file as a 
whole consistent in the strategy it uses.  But maybe it's worth revisiting at 
some point.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

gerlowskija commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365358715
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
 ##
 @@ -92,22 +92,28 @@ Query makeFilter(String fname, BytesRef[] byteRefs) {
   @Override
   Query makeFilter(String fname, BytesRef[] byteRefs) {
 // TODO Further tune this heuristic number
-return (byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs);
+return disableCacheByDefault((byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs));
 
 Review comment:
   Hah.  I don't know whether to think that's awesome or revolting.  Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

dsmiley commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365364487
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
 ##
 @@ -92,22 +92,28 @@ Query makeFilter(String fname, BytesRef[] byteRefs) {
   @Override
   Query makeFilter(String fname, BytesRef[] byteRefs) {
 // TODO Further tune this heuristic number
-return (byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs);
+return disableCacheByDefault((byteRefs.length > 700) ? new 
TopLevelDocValuesTermsQuery(fname, byteRefs) : new DocValuesTermsQuery(fname, 
byteRefs));
 
 Review comment:
   LOL there is something slightly unsettling about it I admit.  Though it's 
not confusing or anything.  Any way I value brevity over a lot and I'll take it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9116) Simplify postings API by removing long[] metadata



[ 
https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013121#comment-17013121
 ] 

David Smiley commented on LUCENE-9116:
--

+1 for a notice to the dev list and users too.  My point is about notice so 
that others might potentially volunteer or convey to us that the format is more 
useful than we are aware of.  Ultimately we should be able to remove what we 
want to maintain, however.  I was very sincere when I volunteered to port 
FST50, so thank you for stepping up!

The Solr ref guide 
{{solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc:112}} 
already has a good notice on back-compat for these formats.  The 
tagger-handler.adoc is probably the only spot that advices setting this, so 
perhaps it should also have a notice.

> Simplify postings API by removing long[] metadata
> -
>
> Key: LUCENE-9116
> URL: https://issues.apache.org/jira/browse/LUCENE-9116
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The postings API allows to store metadata about a term either in a long[] or 
> in a byte[]. This is unnecessary as all information could be encoded in the 
> byte[], which is what most codecs do in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9116) Simplify postings API by removing long[] metadata



[ 
https://issues.apache.org/jira/browse/LUCENE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013121#comment-17013121
 ] 

David Smiley edited comment on LUCENE-9116 at 1/10/20 6:35 PM:
---

+1 for a notice to the dev list and users too.  My point is about notice so 
that others might potentially volunteer or convey to us that the format is more 
useful than we are aware of.  Ultimately we should be able to remove what we 
don't want to maintain, however.  I was very sincere when I volunteered to port 
FST50, so thank you for stepping up!

The Solr ref guide 
{{solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc:112}} 
already has a good notice on back-compat for these formats.  The 
tagger-handler.adoc is probably the only spot that advices setting this, so 
perhaps it should also have a notice.


was (Author: dsmiley):
+1 for a notice to the dev list and users too.  My point is about notice so 
that others might potentially volunteer or convey to us that the format is more 
useful than we are aware of.  Ultimately we should be able to remove what we 
want to maintain, however.  I was very sincere when I volunteered to port 
FST50, so thank you for stepping up!

The Solr ref guide 
{{solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc:112}} 
already has a good notice on back-compat for these formats.  The 
tagger-handler.adoc is probably the only spot that advices setting this, so 
perhaps it should also have a notice.

> Simplify postings API by removing long[] metadata
> -
>
> Key: LUCENE-9116
> URL: https://issues.apache.org/jira/browse/LUCENE-9116
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The postings API allows to store metadata about a term either in a long[] or 
> in a byte[]. This is unnecessary as all information could be encoded in the 
> byte[], which is what most codecs do in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] yonik merged pull request #1131: SOLR-14134: Add lazy and time-based evictiction of shared core concurrency metada…

2020-01-10 Thread ASF subversion and git services (Jira)

yonik merged pull request #1131: SOLR-14134: Add lazy and time-based 
evictiction of shared core concurrency metada…
URL: https://github.com/apache/lucene-solr/pull/1131
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14134) Clear shared core's concurrency cache



[ 
https://issues.apache.org/jira/browse/SOLR-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013144#comment-17013144
 ] 

ASF subversion and git services commented on SOLR-14134:


Commit 66ec4228908dcabf60d3e6069967e576325829c6 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Andy Vuong
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=66ec422 ]

SOLR-14134: Add lazy and time-based evictiction of shared core concurrency 
metada… (#1131)

* Add lazy and time-based evictiction of shared core concurrency metadata from 
in-memory cache

* Switch back to simple map hash, evict on close, and evict on register

* Evict from unload

* Address comments


> Clear shared core's concurrency cache
> -
>
> Key: SOLR-14134
> URL: https://issues.apache.org/jira/browse/SOLR-14134
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Andy Vuong
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In shared collections, each replica's core has an associated entry in a 
> metadata cache we call the shared core's concurrency cache (see 
> SharedCoreConcurrencyController) that is used to facilitate concurrent 
> indexing support of a single shard and associated optimizations. 
> Entries are currently created on demand - i.e. when request triggered 
> pulls/pushes are initiated but there's no way of clearing the cache unless 
> the node goes down and JVM restarts.
> Eviction from this cache is needed to facilitate things such as collection 
> name reuse. Currently if you delete a collection and then recreate, you can 
> create a Replica containing the same core name as a previously active 
> collection/replica and have a pre-existing entry in the concurrency cache 
> (barring restarts between this point). The net effect is at least one 
> indexing batch failure before the cache returns to a correct state.
> Eviction will also support scale - say 50k collections and thousands of 
> entries for cores located in memory is highly ineffective especially if a 
> large number of collections are accessed infrequently. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13932) Review directory locking and Blob interactions

2020-01-10 Thread Yonik Seeley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-13932.
-
Resolution: Fixed

> Review directory locking and Blob interactions
> --
>
> Key: SOLR-13932
> URL: https://issues.apache.org/jira/browse/SOLR-13932
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Ilan Ginzburg
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Review resolution of local index directory content vs Blob copy.
> There has been wrong understanding of following line acquiring a lock on 
> index directory.
>  {{solrCore.getDirectoryFactory().get(indexDirPath, 
> DirectoryFactory.DirContext.DEFAULT, 
> solrCore.getSolrConfig().indexConfig.lockType);}}
> From Yonik:
> _A couple things about Directory locking the locks were only ever to 
> prevent more than one IndexWriter from trying to modify the same index. The 
> IndexWriter grabs a write lock once when it is created and does not release 
> it until it is closed._ 
> _Directories are not locked on acquisition of the Directory from the 
> DirectoryFactory. See the IndexWriter constructor, where the lock is 
> explicitly grabbed._
> Review CorePushPull#pullUpdateFromBlob, ServerSideMetadata and other classes 
> as relevant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14134) Clear shared core's concurrency cache

2020-01-10 Thread Yonik Seeley (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-14134.
-
Resolution: Fixed

> Clear shared core's concurrency cache
> -
>
> Key: SOLR-14134
> URL: https://issues.apache.org/jira/browse/SOLR-14134
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Andy Vuong
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In shared collections, each replica's core has an associated entry in a 
> metadata cache we call the shared core's concurrency cache (see 
> SharedCoreConcurrencyController) that is used to facilitate concurrent 
> indexing support of a single shard and associated optimizations. 
> Entries are currently created on demand - i.e. when request triggered 
> pulls/pushes are initiated but there's no way of clearing the cache unless 
> the node goes down and JVM restarts.
> Eviction from this cache is needed to facilitate things such as collection 
> name reuse. Currently if you delete a collection and then recreate, you can 
> create a Replica containing the same core name as a previously active 
> collection/replica and have a pre-existing entry in the concurrency cache 
> (barring restarts between this point). The net effect is at least one 
> indexing batch failure before the cache returns to a correct state.
> Eviction will also support scale - say 50k collections and thousands of 
> entries for cores located in memory is highly ineffective especially if a 
> large number of collections are accessed infrequently. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ctargett commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation

ctargett commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r365406299
 
 

 ##
 File path: solr/solr-ref-guide/src/other-parsers.adoc
 ##
 @@ -1037,11 +1037,11 @@ An optional parameter used to determine which of 
several query implementations s
 +
 `termsFilter` the default `method`.  Uses a `BooleanQuery` or a 
`TermInSetQuery` depending on the number of terms.  Scales well with index 
size, but only moderately with the number of query terms.
 +
-`docValuesTermsFilter` can only be used on fields with docValues data.  
Chooses between the `docValuesTermsFilterTopLevel` and 
`docValuesTermsFilterPerSegment` methods using the number of query terms as a 
rough heuristic.  Users should typically use this method instead of using 
`docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, 
unless they've done performance testing to validate one of the methods on 
queries of all sizes.  Depending on the implementation picked, this method may 
rely on expensive data structures which are lazily populated after each commit. 
 If you commit frequently and your use-case can tolerate a static warming 
query, consider adding one to `solrconfig.xml` so that this work is done as a 
part of the commit itself and not attached directly to user requests.
+`docValuesTermsFilter` can only be used on fields with docValues data.  The 
`cache` parameter is false by default.  Chooses between the 
`docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods 
using the number of query terms as a rough heuristic.  Users should typically 
use this method instead of using `docValuesTermsFilterTopLevel` or 
`docValuesTermsFilterPerSegment` directly, unless they've done performance 
testing to validate one of the methods on queries of all sizes.  Depending on 
the implementation picked, this method may rely on expensive data structures 
which are lazily populated after each commit.  If you commit frequently and 
your use-case can tolerate a static warming query, consider adding one to 
`solrconfig.xml` so that this work is done as a part of the commit itself and 
not attached directly to user requests.
 
 Review comment:
   When I think of it, I've been changing to one sentence per line when I'm 
editing also (but I often forget). I don't know that it really changes that 
much to have a whole page consistent - we should probably encourage one 
sentence per line, but if it's a big file and you can only do a little, I think 
that's fine and maybe someone else will get inspired later to do the rest.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14173) Ref Guide Redesign



[ 
https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013180#comment-17013180
 ] 

Jason Gerlowski commented on SOLR-14173:


bq. I did however put up files at 
http://home.apache.org/~ctargett/RefGuideRedesign/index.html

I get a 404 on that link, just a heads up in case anyone else tries.

> Ref Guide Redesign
> --
>
> Key: SOLR-14173
> URL: https://issues.apache.org/jira/browse/SOLR-14173
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Cassandra Targett
>Assignee: Cassandra Targett
>Priority: Major
>
> The current design of the Ref Guide was essentially copied from a 
> Jekyll-based documentation theme 
> (https://idratherbewriting.com/documentation-theme-jekyll/), which had a 
> couple important benefits for that time:
> * It was well-documented and since I had little experience with Jekyll and 
> its Liquid templates and since I was the one doing it, I wanted to make it as 
> easy on myself as possible
> * It was designed for documentation specifically so took care of all the 
> things like inter-page navigation, etc.
> * It helped us get from Confluence to our current system quickly
> It had some drawbacks, though:
> * It wasted a lot of space on the page
> * The theme was built for Markdown files, so did not take advantage of the 
> features of the {{jekyll-asciidoc}} plugin we use (the in-page TOC being one 
> big example - the plugin could create it at build time, but the theme 
> included JS to do it as the page loads, so we use the JS)
> * It had a lot of JS and overlapping CSS files. While it used Bootstrap it 
> used a customized CSS on top of it for theming that made modifications 
> complex (it was hard to figure out how exactly a change would behave)
> * With all the stuff I'd changed in my bumbling way just to get things to 
> work back then, I broke a lot of the stuff Bootstrap is supposed to give us 
> in terms of responsiveness and making the Guide usable even on smaller screen 
> sizes.
> After upgrading the Asciidoctor components in SOLR-12786 and stopping the PDF 
> (SOLR-13782), I wanted to try to set us up for a more flexible system. We 
> need it for things like Joel's work on the visual guide for streaming 
> expressions (SOLR-13105), and in order to implement other ideas we might have 
> on how to present information in the future.
> I view this issue as a phase 1 of an overall redesign that I've already 
> started in a local branch. I'll explain in a comment the changes I've already 
> made, and will use this issue to create and push a branch where we can 
> discuss in more detail.
> Phase 1 here will be under-the-hood CSS/JS changes + overall page layout 
> changes.
> Phase 2 (issue TBD) will be a wholesale re-organization of all the pages of 
> the Guide.
> Phase 3 (issue TBD) will explore moving us from Jekyll to another static site 
> generator that is better suited for our content format, file types, and build 
> conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief



[ 
https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013189#comment-17013189
 ] 

Jason Gerlowski commented on SOLR-13934:


Well, whether or not we want to keep the {{bin/post}} tool eventually, we have 
it now so we should maintain the docs for it.  I'm going to merge this PR today 
(with a few tweaks).

> Documentation on SimplePostTool for Windows users is pretty brief
> -
>
> Key: SOLR-13934
> URL: https://issues.apache.org/jira/browse/SOLR-13934
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SimplePostTool
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimplePostTool on windows doesn't have enough documentation, you end up 
> googling to get it to work.  Need to provide better example.
> https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief



 [ 
https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-13934:
--

Assignee: Jason Gerlowski

> Documentation on SimplePostTool for Windows users is pretty brief
> -
>
> Key: SOLR-13934
> URL: https://issues.apache.org/jira/browse/SOLR-13934
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SimplePostTool
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Assignee: Jason Gerlowski
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimplePostTool on windows doesn't have enough documentation, you end up 
> googling to get it to work.  Need to provide better example.
> https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14173) Ref Guide Redesign

2020-01-10 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013190#comment-17013190
 ] 

Cassandra Targett commented on SOLR-14173:
--

bq. I get a 404 on that link

Bah, it's http://people.apache.org/~ctargett/RefGuideRedesign/index.html

> Ref Guide Redesign
> --
>
> Key: SOLR-14173
> URL: https://issues.apache.org/jira/browse/SOLR-14173
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Cassandra Targett
>Assignee: Cassandra Targett
>Priority: Major
>
> The current design of the Ref Guide was essentially copied from a 
> Jekyll-based documentation theme 
> (https://idratherbewriting.com/documentation-theme-jekyll/), which had a 
> couple important benefits for that time:
> * It was well-documented and since I had little experience with Jekyll and 
> its Liquid templates and since I was the one doing it, I wanted to make it as 
> easy on myself as possible
> * It was designed for documentation specifically so took care of all the 
> things like inter-page navigation, etc.
> * It helped us get from Confluence to our current system quickly
> It had some drawbacks, though:
> * It wasted a lot of space on the page
> * The theme was built for Markdown files, so did not take advantage of the 
> features of the {{jekyll-asciidoc}} plugin we use (the in-page TOC being one 
> big example - the plugin could create it at build time, but the theme 
> included JS to do it as the page loads, so we use the JS)
> * It had a lot of JS and overlapping CSS files. While it used Bootstrap it 
> used a customized CSS on top of it for theming that made modifications 
> complex (it was hard to figure out how exactly a change would behave)
> * With all the stuff I'd changed in my bumbling way just to get things to 
> work back then, I broke a lot of the stuff Bootstrap is supposed to give us 
> in terms of responsiveness and making the Guide usable even on smaller screen 
> sizes.
> After upgrading the Asciidoctor components in SOLR-12786 and stopping the PDF 
> (SOLR-13782), I wanted to try to set us up for a more flexible system. We 
> need it for things like Joel's work on the visual guide for streaming 
> expressions (SOLR-13105), and in order to implement other ideas we might have 
> on how to present information in the future.
> I view this issue as a phase 1 of an overall redesign that I've already 
> started in a local branch. I'll explain in a comment the changes I've already 
> made, and will use this issue to create and push a branch where we can 
> discuss in more detail.
> Phase 1 here will be under-the-hood CSS/JS changes + overall page layout 
> changes.
> Phase 2 (issue TBD) will be a wholesale re-organization of all the pages of 
> the Guide.
> Phase 3 (issue TBD) will explore moving us from Jekyll to another static site 
> generator that is better suited for our content format, file types, and build 
> conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief

2020-01-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013189#comment-17013189
 ] 

Jason Gerlowski edited comment on SOLR-13934 at 1/10/20 8:37 PM:
-

Well, whether or not we want to keep the {{bin/post}} tool longer term, we have 
it now so we should maintain the docs for it.  I'm going to merge this PR today 
(with a few tweaks).


was (Author: gerlowskija):
Well, whether or not we want to keep the {{bin/post}} tool eventually, we have 
it now so we should maintain the docs for it.  I'm going to merge this PR today 
(with a few tweaks).

> Documentation on SimplePostTool for Windows users is pretty brief
> -
>
> Key: SOLR-13934
> URL: https://issues.apache.org/jira/browse/SOLR-13934
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SimplePostTool
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Assignee: Jason Gerlowski
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimplePostTool on windows doesn't have enough documentation, you end up 
> googling to get it to work.  Need to provide better example.
> https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14173) Ref Guide Redesign

2020-01-10 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009848#comment-17009848
 ] 

Cassandra Targett edited comment on SOLR-14173 at 1/10/20 8:38 PM:
---

The branch with my work so far is in a branch ({{jira/solr-14173}}). Github 
link: https://github.com/apache/lucene-solr/tree/jira/solr-14173.

It's still a WIP so I won't create a PR for it yet unless someone wants one. I 
did however, put up files at: 
http://people.apache.org/~ctargett/RefGuideRedesign/index.html. Feel free to 
take a look and let me know your thoughts on overall look & feel and if you 
find buggy behavior.

*There are still bugs* - I'll list them below.

h3. What's Changed

*Updated dependencies*

Updated:
* Bootstrap 3.3.7 to 4.1.3
* JQuery 2.1.4 to 3.3.1
* AnchorJS 2.0.0 to 4.2.0

Added:
* Malihu Custom Scrollbar 3.1.5 - to make the new sidebar scroll
* PopperJS 1.14.3 - required by Bootstrap

Removed:
* {{toc.js}} - no longer used
* {{ref-guide-toc.js}} - no longer used
* TOC-related includes
* Print-related layouts, includes, and CSS
* Leftover PDF-only fonts

*New Layout*
* Sidebar nav is now fixed to the left side of the screen and content is to the 
right designed to use as much space to the right as the browser has available 
(up to 1238px or so, I think)
* Top nav does not span the page, but stays on top of the content to give room 
for the sidebar nav
* Changed the in-page TOCs to be built at the time the HTML is generated. This 
makes all in-page TOCs now always float to the right side of the page (IOW, we 
lost ability to choose where to put it but IMO this is a simplification)
* Moved the "search" bar to the sidebar nav instead of the top nav

*CSS Cleanup*
* Removed lavish-bootstrap.css and replaced with Bootstrap's native CSS (which 
will be easier to upgrade in the future)
* Re-organized all the CSS files and separated them into broad groups:
** decoration.css: buttons, forms, horizontal lines, lead paragraphs, tabs/pills
** navs.css: all navigation elements such as the top nav, sidebar nav, footer, 
in-page TOC
** ref-guide.css: all elements which impact the display of content such as 
overall body, tables, lists, links, code samples, etc.
** search.css: all elements related to the page-title lookup
* Moved CSS elements from other files into the above files and organized them 
by what they control
* Added significant comments to CSS files about what rules are controlling and 
how those elements are used (more to do here)

h3. Known Issues

* The fancy tab thing for multiple code examples in one section isn't styled 
right when you click other tabs
* The top nav won't be responsive in smaller screens
* Behavior of sidebar on smaller screens could be improved
* Still many overlapping CSS rules for elements and many unused CSS rules to be 
cleaned up
* Sidebar requires too much scrolling - Phase 2 will trim this down
* Now unused CSS/JS files haven't been deleted yet
* Search box shows results in the sidebar nav - I wasn't able to see this until 
yesterday and not sure how I feel about it. At any rate, I haven't worked with 
it much yet and it needs more work
* Home page (index.html) needs some additional love


was (Author: ctargett):
The branch with my work so far is in a branch ({{jira/solr-14173}}). Github 
link: https://github.com/apache/lucene-solr/tree/jira/solr-14173.

It's still a WIP so I won't create a PR for it yet unless someone wants one. I 
did however, put up files at: 
http://home.apache.org/~ctargett/RefGuideRedesign/index.html. Feel free to take 
a look and let me know your thoughts on overall look & feel and if you find 
buggy behavior.

*There are still bugs* - I'll list them below.

h3. What's Changed

*Updated dependencies*

Updated:
* Bootstrap 3.3.7 to 4.1.3
* JQuery 2.1.4 to 3.3.1
* AnchorJS 2.0.0 to 4.2.0

Added:
* Malihu Custom Scrollbar 3.1.5 - to make the new sidebar scroll
* PopperJS 1.14.3 - required by Bootstrap

Removed:
* {{toc.js}} - no longer used
* {{ref-guide-toc.js}} - no longer used
* TOC-related includes
* Print-related layouts, includes, and CSS
* Leftover PDF-only fonts

*New Layout*
* Sidebar nav is now fixed to the left side of the screen and content is to the 
right designed to use as much space to the right as the browser has available 
(up to 1238px or so, I think)
* Top nav does not span the page, but stays on top of the content to give room 
for the sidebar nav
* Changed the in-page TOCs to be built at the time the HTML is generated. This 
makes all in-page TOCs now always float to the right side of the page (IOW, we 
lost ability to choose where to put it but IMO this is a simplification)
* Moved the "search" bar to the sidebar nav instead of the top nav

*CSS Cleanup*
* Removed lavish-bootstrap.css and replaced with Bootstrap's native CSS (which 
will be easier to upgrade in the future)
* Re-organized all the CSS

[jira] [Commented] (SOLR-6613) TextField.analyzeMultiTerm should not throw exception when analyzer returns no term



[ 
https://issues.apache.org/jira/browse/SOLR-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013199#comment-17013199
 ] 

ASF subversion and git services commented on SOLR-6613:
---

Commit 72dea4919ebc79721167d451e7c7afa022aeee05 in lucene-solr's branch 
refs/heads/branch_8x from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=72dea49 ]

SOLR-6613: TextField.analyzeMultiTerm does not throw an exception when Analyzer 
returns no terms. (Bruno Roustant)


> TextField.analyzeMultiTerm should not throw exception when analyzer returns 
> no term
> ---
>
> Key: SOLR-6613
> URL: https://issues.apache.org/jira/browse/SOLR-6613
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.3.1, 4.10.2, 6.0
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Attachments: TestTextField.java
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In TextField.analyzeMultiTerm()
> at line
> try {
>   if (!source.incrementToken())
> throw new SolrException();
> The method should not throw an exception if there is no token because having 
> no token is legitimate because all tokens may be filtered out (e.g. with a 
> blocking Filter such as StopFilter).
> In this case it should simply return null (as it already returns null in some 
> cases, see first line of method). However, SolrQueryParserBase needs also to 
> be fixed to correctly handle null returned by TextField.analyzeMultiTerm().
> See attached TestTextField for the corresponding new test class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6613) TextField.analyzeMultiTerm should not throw exception when analyzer returns no term



 [ 
https://issues.apache.org/jira/browse/SOLR-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Roustant resolved SOLR-6613.
--
Fix Version/s: 8.5
   Resolution: Fixed

> TextField.analyzeMultiTerm should not throw exception when analyzer returns 
> no term
> ---
>
> Key: SOLR-6613
> URL: https://issues.apache.org/jira/browse/SOLR-6613
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.3.1, 4.10.2, 6.0
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.5
>
> Attachments: TestTextField.java
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In TextField.analyzeMultiTerm()
> at line
> try {
>   if (!source.incrementToken())
> throw new SolrException();
> The method should not throw an exception if there is no token because having 
> no token is legitimate because all tokens may be filtered out (e.g. with a 
> blocking Filter such as StopFilter).
> In this case it should simply return null (as it already returns null in some 
> cases, see first line of method). However, SolrQueryParserBase needs also to 
> be fixed to correctly handle null returned by TextField.analyzeMultiTerm().
> See attached TestTextField for the corresponding new test class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13934) Documentation on SimplePostTool for Windows users is pretty brief



[ 
https://issues.apache.org/jira/browse/SOLR-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013214#comment-17013214
 ] 

Jason Gerlowski commented on SOLR-13934:


I made a few tweaks visible on the PR: particularly I combined the 
SimplePostTool and "Windows Support" sections, which had some duplication.  I 
also changed to asciidoc formatting to have 1 sentence per line, which allows 
subsequent changes to show up a bit nicer in diffs.

bq. Related maybe, should a lot of the code in the shell script be moved into 
the post.jar?

I agree.  If we keep the bin/post tool around longer term (or someone gets a 
chance now), that logic should all move into the jar.  There's no reason to 
duplicate it.  The {{bin/solr}} and {{bin/solr.cmd}} scripts could benefit from 
this in much the same way, coincidentally.  It's been on my backburner for a 
long time.  (It's an easy change but a hard one to verify, since there's no 
tests to check bin/solr behavior with).

I'll merge the doc changes soon.  If anyone has suggestions (before or after) 
let me know.

> Documentation on SimplePostTool for Windows users is pretty brief
> -
>
> Key: SOLR-13934
> URL: https://issues.apache.org/jira/browse/SOLR-13934
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SimplePostTool
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Assignee: Jason Gerlowski
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SimplePostTool on windows doesn't have enough documentation, you end up 
> googling to get it to work.  Need to provide better example.
> https://lucene.apache.org/solr/guide/8_3/post-tool.html#simpleposttool



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1160: LUCENE-9125: Improve Automaton.step() with binary search

bruno-roustant commented on a change in pull request #1160: LUCENE-9125: 
Improve Automaton.step() with binary search
URL: https://github.com/apache/lucene-solr/pull/1160#discussion_r365452385
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java
 ##
 @@ -658,22 +658,84 @@ public String toDot() {
   public int step(int state, int label) {
 assert state >= 0;
 assert label >= 0;
-int trans = states[2*state];
-int limit = trans + 3*states[2*state+1];
-// TODO: we could do bin search; transitions are sorted
-while (trans < limit) {
-  int dest = transitions[trans];
-  int min = transitions[trans+1];
-  int max = transitions[trans+2];
-  if (min <= label && label <= max) {
-return dest;
+int stateIndex = 2 * state;
+int firstTransitionIndex = states[stateIndex];
+int numTransitions = states[stateIndex + 1];
+
+// Since transitions are sorted,
+// binary search the transition for which label is within [minLabel, 
maxLabel].
+int low = 0;
+int high = numTransitions - 1;
+while (low <= high) {
+  int mid = (low + high) >>> 1;
+  int transitionIndex = firstTransitionIndex + 3 * mid;
+  int minLabel = transitions[transitionIndex + 1];
+  if (minLabel > label) {
+high = mid - 1;
+  } else {
+int maxLabel = transitions[transitionIndex + 2];
+if (maxLabel < label){
+  low = mid + 1;
+} else {
+  return transitions[transitionIndex];
+}
   }
-  trans += 3;
 }
-
 return -1;
   }
 
+  /**
+   * Looks for the next transition that matches the provided label, assuming 
determinism.
+   * 
+   * This method is similar to {@link #step(int, int)} but is used more 
efficiently
+   * when iterating over multiple transitions from the same source state. It 
keeps
+   * the latest reached transition index in {@code transition.transitionUpto} 
so
+   * the next call to this method can continue from there instead of restarting
+   * from the first transition.
+   *
+   * @param transition The transition to start the lookup from (inclusive, 
using its
+   *   {@link Transition#source} and {@link 
Transition#transitionUpto}).
+   *   It is updated with the matched transition;
+   *   or with {@link Transition#dest} = -1 if no match.
+   * @param label  The codepoint to look up.
+   * @return The destination state; or -1 if no matching outgoing transition.
+   */
+  public int next(Transition transition, int label) {
+// Copy of step() method with
+// - binary search 'low' bound initialized to transition.transitionUpto.
+// - param transition .dest/.min/.max/.transitionUpto set to the matching 
transition.
+assert transition.source >= 0;
+assert label >= 0;
+int stateIndex = 2 * transition.source;
+int firstTransitionIndex = states[stateIndex];
+int numTransitions = states[stateIndex + 1];
+
+// Since transitions are sorted,
 
 Review comment:
   I'll try.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on issue #1157: Add RAT check using Gradle

madrob commented on issue #1157: Add RAT check using Gradle
URL: https://github.com/apache/lucene-solr/pull/1157#issuecomment-573246947
 
 
   Almost there! Precommit currently fails with some license failures, I'll 
need to look at that deeper what exclusions we're actually using in the ant 
build, but I think we're super close now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] MarcusSorealheis commented on issue #1141: SOLR-14147 change the Security manager to default to true.