[GitHub] [lucene-solr] dweiss merged pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-16 Thread GitBox


dweiss merged pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232609#comment-17232609
 ] 

ASF subversion and git services commented on LUCENE-8982:
-

Commit ebc87a8a27f3b3bd89ea7c38c8b701d94e50788d in lucene-solr's branch 
refs/heads/master from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ebc87a8 ]

LUCENE-8982: Separate out native code to another module to allow cpp build with 
gradle (#2068)

* LUCENE-8982: Separate out native code to another module to allow cpp build 
with gradle

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232610#comment-17232610
 ] 

ASF subversion and git services commented on LUCENE-8982:
-

Commit ebc87a8a27f3b3bd89ea7c38c8b701d94e50788d in lucene-solr's branch 
refs/heads/master from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ebc87a8 ]

LUCENE-8982: Separate out native code to another module to allow cpp build with 
gradle (#2068)

* LUCENE-8982: Separate out native code to another module to allow cpp build 
with gradle

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-16 Thread GitBox


dweiss commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-727827305


   I've added changes entry and committed it in, thanks @zacharymorn !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-8982:
---

Assignee: Dawid Weiss

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14998) any Collections Handler actions should be logged at debug level

2020-11-16 Thread Nazerke Seidan (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232611#comment-17232611
 ] 

Nazerke Seidan commented on SOLR-14998:
---

PR: https://github.com/apache/lucene-solr/pull/2079

> any Collections Handler actions should be logged at debug level
> ---
>
> Key: SOLR-14998
> URL: https://issues.apache.org/jira/browse/SOLR-14998
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
> status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
> level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232612#comment-17232612
 ] 

Dawid Weiss commented on LUCENE-8982:
-

I've committed in the extracted native library part. I still think it'd be good 
to see if we can get the same performance with just plain java (and new direct 
flags).

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9611) Remove deprecated PACKED_SINGLE_BLOCK from PackedInts

2020-11-16 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-9611:


 Summary: Remove deprecated PACKED_SINGLE_BLOCK from PackedInts
 Key: LUCENE-9611
 URL: https://issues.apache.org/jira/browse/LUCENE-9611
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


In LUCENE-7521, the PACKED_SINGLE_BLOCK format was deprecated. I propose to 
remove it entirely for Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9611) Remove deprecated PACKED_SINGLE_BLOCK from PackedInts

2020-11-16 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232657#comment-17232657
 ] 

Ignacio Vera commented on LUCENE-9611:
--

Ok, no the change was done in 9.0 so it cannot e remove until Lucene 10.

> Remove deprecated PACKED_SINGLE_BLOCK from PackedInts
> -
>
> Key: LUCENE-9611
> URL: https://issues.apache.org/jira/browse/LUCENE-9611
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>
> In LUCENE-7521, the PACKED_SINGLE_BLOCK format was deprecated. I propose to 
> remove it entirely for Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on a change in pull request #2065: SOLR-14977 : ContainerPlugins should be configurable

2020-11-16 Thread GitBox


noblepaul commented on a change in pull request #2065:
URL: https://github.com/apache/lucene-solr/pull/2065#discussion_r524089049



##
File path: solr/core/src/java/org/apache/solr/api/ContainerPluginsRegistry.java
##
@@ -114,6 +118,16 @@ public synchronized ApiInfo getPlugin(String name) {
 return currentPlugins.get(name);
   }
 
+  static class PluginMetaHolder {
+private final Map original;
+private final PluginMeta meta;

Review comment:
   It can be a specific sub-class of `PluginMeta` say `EndPointPluginMeta` .
   
   However, that change does not belong here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15002) Upgrade HttpClient to 4.5.13

2020-11-16 Thread Andras Salamon (Jira)
Andras Salamon created SOLR-15002:
-

 Summary: Upgrade HttpClient to 4.5.13
 Key: SOLR-15002
 URL: https://issues.apache.org/jira/browse/SOLR-15002
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andras Salamon


Upgrade HttpClient 4.5.13 and HttpCore 4.4.13



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] asalamon74 opened a new pull request #2082: SOLR-15002: Upgrade HttpClient to 4.5.13

2020-11-16 Thread GitBox


asalamon74 opened a new pull request #2082:
URL: https://github.com/apache/lucene-solr/pull/2082


   
   
   
   # Description
   
   Upgrade HttpClient 4.5.13 and HttpCore 4.4.13
   
   # Solution
   
   Upgrade HttpClient 4.5.13 and HttpCore 4.4.13
   
   # Tests
   
   Unit tests.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2065: SOLR-14977 : ContainerPlugins should be configurable

2020-11-16 Thread GitBox


sigram commented on a change in pull request #2065:
URL: https://github.com/apache/lucene-solr/pull/2065#discussion_r524211640



##
File path: solr/core/src/java/org/apache/solr/api/ContainerPluginsRegistry.java
##
@@ -114,6 +118,16 @@ public synchronized ApiInfo getPlugin(String name) {
 return currentPlugins.get(name);
   }
 
+  static class PluginMetaHolder {
+private final Map original;
+private final PluginMeta meta;

Review comment:
   Let's make it a sub-class.
   
   (I would argue that it belongs in this PR because it's a part of the 
configuration mechanism for the plugins that this PR defines. Since we are 
adding a flexible config bean it doesn't make sense to still keep the old 
primitive field. And you can always ADD fields in subclasses but removing them 
is much harder... ;) )





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9611) Remove deprecated PACKED_SINGLE_BLOCK from PackedInts

2020-11-16 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232740#comment-17232740
 ] 

Adrien Grand commented on LUCENE-9611:
--

It looks like we only use it for numbers of bits per value 1, 2 and 4 when 
encoding postings. Maybe one way to move forward with this would be to drop 
PACKED_SINGLE_BLOCK from PackedInts and introduce special decoding logic in 
Lucene50PostingsFormat's ForUtil for these numbers of bits per value.

> Remove deprecated PACKED_SINGLE_BLOCK from PackedInts
> -
>
> Key: LUCENE-9611
> URL: https://issues.apache.org/jira/browse/LUCENE-9611
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>
> In LUCENE-7521, the PACKED_SINGLE_BLOCK format was deprecated. I propose to 
> remove it entirely for Lucene 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul merged pull request #2065: SOLR-14977 : ContainerPlugins should be configurable

2020-11-16 Thread GitBox


noblepaul merged pull request #2065:
URL: https://github.com/apache/lucene-solr/pull/2065


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14977) Container plugins need a way to be configured

2020-11-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232752#comment-17232752
 ] 

ASF subversion and git services commented on SOLR-14977:


Commit 73d5e7ae77d8953cb9be35a7cbcebe3a516dd04a in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=73d5e7a ]

SOLR-14977 :  ContainerPlugins should be configurable (#2065)



> Container plugins need a way to be configured
> -
>
> Key: SOLR-14977
> URL: https://issues.apache.org/jira/browse/SOLR-14977
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14977.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Container plugins are defined in {{/clusterprops.json:/plugin}} using a 
> simple {{PluginMeta}} bean. This is sufficient for implementations that don't 
> need any configuration except for the {{pathPrefix}} but insufficient for 
> anything else that needs more configuration parameters.
> An example would be a {{CollectionsRepairEventListener}} plugin proposed in 
> PR-1962, which needs parameters such as the list of collections, {{waitFor}}, 
> maximum operations allowed, etc. to properly function.
> This issue proposes to extend the {{PluginMeta}} bean to allow a 
> {{Map}} configuration parameters.
> There is an interface that we could potentially use ({{MapInitializedPlugin}} 
> but it works only with {{String}} values. This is not optimal because it 
> requires additional type-safety validation from the consumers. The existing 
> {{PluginInfo}} / {{PluginInfoInitialized}} interface is too complex for this 
> purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-16 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232763#comment-17232763
 ] 

David Eric Pugh commented on SOLR-15000:


I checked out the github repo, and I see you've cut two releases (1.0 and 
1.0.1) and that over the past month you have had activity.   I always look at 
https://github.com/qlangtech/tis-solr/pulse/monthly page on Github when 
evaluating if a open source project is one I want to invest in adopting!   

I think your idea about many users just looking for a product is very true!   

I've gone ahead and added your project to my watchlist for releases ;-).   Good 
luck!

I'm think this is a ticket that should be closed as a won't fix, in favour of 
new more targeted tickets?

> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2. Summary
> I have developed an enterprise application based on Solr,named TIS . Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
>  - offline index building platform
>  The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
>  - incremental real-time channel
>  It is transmitted to Kafka , and real-time stream calculation is carried out 
> by Flink and submitted to search engine to ensure that the data in search 
> engine and database are consistent in near real time
>  - search engine
>  currently,based on Solr8
> TIS integrate these components seamlessly and bring users one-stop, out of 
> the box experience.
> h2. My question
> I want to feed back my code to the community, but TIS focuses on Enterprise 
> Application Search, just as elasitc search focuses on visual analysis of time 
> series data. Because Solr is a general search product, *I don't think TIS can 
> be merged directly into Solr. Is it possible for TIS to be a new incubation 
> project under Apache?*
> h2. TIS main Features
>  - The schema and solrconfig storage are separated from ZK and stored in 
> MySQL. The version management function is provided. Users can roll back to 
> the historical version of the configuration.
>   !add-collection-step-2-expert.png|width=500!
>   !add-collection-step-2.png|width=500!
>Schema editing mode can be switched between visual editing mode or 
> advanced expert mode
>  - Define wide table rules based on the selected data table
>  - The offline index building component is provided. Outside the collection, 
> the data is built into Lucene segment file. Then, the segment file is 
> returned to the local disk where solrcore is located. The new index of reload 
> solrcore takes effect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-16 Thread GitBox


cpoerschke commented on pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#issuecomment-728089734


   > ... I assume we merge to master squashing and then cherry-pick the commit 
to some other branches? ...
   
   Yes, squash-and-merge to master branch and then cherry-pick to branch_8x 
from which branch_8_8 would be cut in due course as part of the 8.8.0 release 
process.
   
   > ... Do you think we need to target a major release? Or we could add it in 
the upcoming minors? ...
   
   Good question.
   * From end users' perspective there are no breaking changes which would 
point towards targeting 9.x major rather than 8.8+ minor releases.
   * From code API perspective, there are some APIs that changed but from what 
I can see -- 
https://github.com/cpoerschke/lucene-solr/commits/feature/SOLR-14560-cpoerschke-2
 has associated commits -- we could provide backwards compatible deprecated 
niceties to avoid breaking builds for anyone who perhaps had created their own 
transformer or plugin class that referenced the changed APIs. Having said that, 
perhaps for `solr/contrib` code the expectations of not breaking APIs in a 
given 8.x series are different from `solr/core` code.
   * But yes, with or without backcompat niceties, I think we can target the 
8.8.0 minor rather than then 9.0 major release.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15003) SolrCloud Snapshot metadata inconsistent after core replication

2020-11-16 Thread Istvan Farkas (Jira)
Istvan Farkas created SOLR-15003:


 Summary: SolrCloud Snapshot metadata inconsistent after core 
replication
 Key: SOLR-15003
 URL: https://issues.apache.org/jira/browse/SOLR-15003
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 7.4.1
Reporter: Istvan Farkas
 Attachments: snap2-before.txt, snap2-failed.png, state.json


After a replica does a full recovery, the old index directory is deleted, 
however the snapshot metadata is not updated in Zookeeper. This means that the 
affected core will have snapshots which point to a non-existing index directory.

Steps to reproduce (I used Solr 7.4.0 to test this but it likely affects newer 
versions too):

(1) Create any collection in SolrCloud with more than 1 replica per shard. 
The state.json of the testcollection2 I used: 
 [^state.json] 

(2) Start adding documents to the collection. After some documents are added, 
creating a snapshot. In this example I called it snap2.

{code}
INFO (qtp577405636-12140)-c:testcollection2o.a.s.s.HttpSolrCall: [admin] 
webapp=null path=/admin/collections 
params={name=testcollection2&action=CREATESNAPSHOT&collection=testcollection2&commitName=snap2&wt=javabin&version=2}
 status=0 QTime=280 
{code}

The snapshot is created successfully for both cores, the metadata in ZooKeeper 
looks like this:
 [^snap2-before.txt] 

For core_node4 the index directory is 
/solr/testcollection2/core_node4/data/index. 


(3) Shut down one of the Solr servers.  Here I use the server hosting 
core_node4.

(4) Continue adding documents to the collection (add at least 100 documents to 
ensure that the replica which is shut down will go to a full replication 
recovery on the next start)

Start the server:

{code}
INFO 
(recoveryExecutor-4-thread-1-processing-n:snapshot-test-2.example.com:8985_solr 
x:testcollection2_shard1_replica_n2 s:shard1 c:testcollection2 r:core_nod
e4)-c:testcollection2-s:shard1-r:core_node4-x:testcollection2_shard1_replica_n2-o.a.s.u.PeerSyncWithLeader:
 PeerSync: core=testcollection2_shard1_replica_n2 
url=https://snapshot-test-2.example.com:8985/solr  Received 99 versions from 
https://snapshot-test-3.example.com:8985/solr/testcollection2_shard1_replica_n1/
INFO 
(recoveryExecutor-4-thread-1-processing-n:snapshot-test-2.example.com:8985_solr 
x:testcollection2_shard1_replica_n2 s:shard1 c:testcollection2 r:core_nod
e4)-c:testcollection2-s:shard1-r:core_node4-x:testcollection2_shard1_replica_n2-o.a.s.u.PeerSync:
 PeerSync: core=testcollection2_shard1_replica_n2 
url=https://snapshot-test-2.example.com:8985/solr  Our versions are too old. 
ourHighThreshold=1683104801277083650 otherLowThreshold=1683139494002294784 
ourHighest=1683104801293860865 otherHighest=1683139494085132289
INFO 
(recoveryExecutor-4-thread-1-processing-n:snapshot-test-2.example.com:8985_solr 
x:testcollection2_shard1_replica_n2 s:shard1 c:testcollection2 r:core_nod
e4)-c:testcollection2-s:shard1-r:core_node4-x:testcollection2_shard1_replica_n2-o.a.s.u.PeerSyncWithLeader:
 PeerSync: core=testcollection2_shard1_replica_n2 
url=https://snapshot-test-2.example.com:8985/solr DONE. sync failed
INFO 
(recoveryExecutor-4-thread-1-processing-n:snapshot-test-2.example.com:8985_solr 
x:testcollection2_shard1_replica_n2 s:shard1 c:testcollection2 r:core_nod
e4)-c:testcollection2-s:shard1-r:core_node4-x:testcollection2_shard1_replica_n2-o.a.s.c.RecoveryStrategy:
 PeerSync Recovery was not successful - trying replication.
INFO 
(recoveryExecutor-4-thread-1-processing-n:snapshot-test-2.example.com:8985_solr 
x:testcollection2_shard1_replica_n2 s:shard1 c:testcollection2 r:core_nod
e4)-c:testcollection2-s:shard1-r:core_node4-x:testcollection2_shard1_replica_n2-o.a.s.c.RecoveryStrategy:
 Starting Replication Recovery.
{code}

(5) After the replication is finished, the index.properties points to the new 
directory index.20201112075340480

{code}
hdfs dfs -cat  /solr/testcollection2/core_node4/data/index.properties
#index.properties
#Thu Nov 12 07:58:52 GMT+00:00 2020
index=index.20201112075340480
{code}

And the original index directory for core_node4 has been deleted:

{code}
hdfs dfs -du -h  /solr/testcollection2/core_node4/data
0  0   /solr/testcollection2/core_node4/data/index
4.5 G  8.9 G   /solr/testcollection2/core_node4/data/index.20201112075340480
84 168 /solr/testcollection2/core_node4/data/index.properties
215430 /solr/testcollection2/core_node4/data/replication.properties
401802 /solr/testcollection2/core_node4/data/snapshot_metadata
9.2 M  18.5 M  /solr/testcollection2/core_node4/data/tlog
{code}

The snapshot metadata in Zookeeper is exactly the same as in step (2), so the 
snap2 still points to the index directory 
/solr/testcollection2/core_node4/data/index which is empty by this time.

(6) Try de

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-16 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232812#comment-17232812
 ] 

Adrien Grand commented on LUCENE-9378:
--

This introduced some slowdowns on the nightly benchmarks, e.g. 
http://people.apache.org/~mikemccand/lucenebench/BrowseMonthTaxoFacets.html. It 
would be nice if the strategy for BEST_SPEED performed better on linear scans.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Fix For: 8.8
>
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14970) elevation does not work without elevate.xml config

2020-11-16 Thread Bernd Wahlen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Wahlen updated SOLR-14970:

Summary: elevation does not work without elevate.xml config  (was: 
elevation does not workout elevate.xml config)

> elevation does not work without elevate.xml config
> --
>
> Key: SOLR-14970
> URL: https://issues.apache.org/jira/browse/SOLR-14970
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3
>Reporter: Bernd Wahlen
>Priority: Minor
>
> When i remove elevate.xml line from solrconfig.xml plus the file, elevation 
> is not working and no error is logged.
> We put the ids directly in the query and we are not using the default fields 
> or ids, so the xml is completely useless, but required to let the elevation 
> component work, example query:
> {code:java}http://staging.qeep.net:8983/solr/profile_v2/elevate?q=%2Bapp_sns%3A%20qeep&sort=random_4239%20desc,%20id%20desc&elevateIds=361018,361343&forceElevation=true{code}
> {code:java}
>   
> 
> string
>   elevate.xml
> elevated
>   
>   
>   
> 
>   explicit
> 
> 
>   elevator
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14970) elevation does not work without elevate.xml config

2020-11-16 Thread Bernd Wahlen (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232848#comment-17232848
 ] 

Bernd Wahlen commented on SOLR-14970:
-

solution sounds good, i try to understand the code...

> elevation does not work without elevate.xml config
> --
>
> Key: SOLR-14970
> URL: https://issues.apache.org/jira/browse/SOLR-14970
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3
>Reporter: Bernd Wahlen
>Priority: Minor
>
> When i remove elevate.xml line from solrconfig.xml plus the file, elevation 
> is not working and no error is logged.
> We put the ids directly in the query and we are not using the default fields 
> or ids, so the xml is completely useless, but required to let the elevation 
> component work, example query:
> {code:java}http://staging.qeep.net:8983/solr/profile_v2/elevate?q=%2Bapp_sns%3A%20qeep&sort=random_4239%20desc,%20id%20desc&elevateIds=361018,361343&forceElevation=true{code}
> {code:java}
>   
> 
> string
>   elevate.xml
> elevated
>   
>   
>   
> 
>   explicit
> 
> 
>   elevator
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232870#comment-17232870
 ] 

Dawid Weiss commented on LUCENE-8982:
-

Yeah... like I suspected - CI does complain about missing toolchains.
https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/28653/consoleText

{code}
* What went wrong:
Execution failed for task ':lucene:misc:native:compileDebugLinuxCpp'.
> No tool chain is available to build C++ for host operating system 'Linux' 
> architecture 'x86-64':
- Tool chain 'visualCpp' (Visual Studio):
- Visual Studio is not available on this operating system.
- Tool chain 'gcc' (GNU GCC):
- Could not find C++ compiler 'g++' in system path.
- Tool chain 'clang' (Clang):
- Could not find C++ compiler 'clang++' in system path.
{code}

I think we'll switch native builds to be optionally enabled (rather than 
disabled)?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15004) Unit tests for the replica placement API

2020-11-16 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15004:

Description: This is a follow-up to SOLR-14613.

> Unit tests for the replica placement API
> 
>
> Key: SOLR-15004
> URL: https://issues.apache.org/jira/browse/SOLR-15004
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This is a follow-up to SOLR-14613.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15004) Unit tests for the replica placement API

2020-11-16 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15004:
---

 Summary: Unit tests for the replica placement API
 Key: SOLR-15004
 URL: https://issues.apache.org/jira/browse/SOLR-15004
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15004) Unit tests for the replica placement API

2020-11-16 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15004:

Description: This is a follow-up to SOLR-14613. Both the APIs and the 
sample implementations need unit tests.  (was: This is a follow-up to 
SOLR-14613.)

> Unit tests for the replica placement API
> 
>
> Key: SOLR-15004
> URL: https://issues.apache.org/jira/browse/SOLR-15004
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This is a follow-up to SOLR-14613. Both the APIs and the sample 
> implementations need unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232874#comment-17232874
 ] 

ASF subversion and git services commented on LUCENE-8982:
-

Commit fd3ffd0d38aaeaa8629943f69dca2ff04afcfbfa in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd3ffd0 ]

LUCENE-8982: make native builds disabled by default (CI complains).


> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232880#comment-17232880
 ] 

Uwe Schindler commented on LUCENE-8982:
---

Hi,
would it be possible to build the stuff, if a toolchain is there? It looks like 
Gradle looks for all different types of tool chains and then gives up.
My idea: If it finds windows, it builds windows. if it finds gcc it builds 
linux, macos same.

My biggest problem is still: How to handle releases of binary artifacts? Do we 
really want to make this dependent of the release manager's local system?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232887#comment-17232887
 ] 

Dawid Weiss commented on LUCENE-8982:
-

gradle will pick up the toolchain suitable for the platform if it finds any - 
the CI just doesn't have it. 

bq. My biggest problem is still: How to handle releases of binary artifacts? Do 
we really want to make this dependent of the release manager's local system?

This is what I actually raised in the issue's comment too. I don't think we 
should include binaries in releases. We should strive to make it java-only (and 
if somebody really wants to, they can compile native modules locally, for a 
given platform).

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on pull request #2020: SOLR-14949: Ability to customize Solr Docker build

2020-11-16 Thread GitBox


HoustonPutman commented on pull request #2020:
URL: https://github.com/apache/lucene-solr/pull/2020#issuecomment-728176570


   That's fair, we can certainly move it. I included it there because that 
seemed like the place to put help files for gradle usage. All of the files used 
in the `help.gradle` are in that directory, but I guess there's no actual 
requirement for that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on pull request #1769: SOLR-14789: Absorb the docker-solr repo.

2020-11-16 Thread GitBox


HoustonPutman commented on pull request #1769:
URL: https://github.com/apache/lucene-solr/pull/1769#issuecomment-728185520


   The extra step exists because there was no consensus around how to do 
official release images.
   
   If we want to decide that the official image should be built the same way as 
it currently is in the project (via the local build), then we can get rid of 
the sub-module and the extra step. However if we want to have the official 
image use the official release binaries, as it does in `docker-solr`, then we 
will need to keep the submodule.
   
   I would have preferred to have all of this done in one module, but the 
gradle docker plugin only supports building one image per-module. So if we want 
to build multiple images (which is necessary for supporting the two image 
types, local and release), we need two modules.
   
   I am all for not adding support for official binary release strategy, and 
consolidating into one docker file. I just don't want to make that decision 
unanimously. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-16 Thread GitBox


uschindler commented on pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#issuecomment-728207517


   Hi,
   this broke javadocs:
   > 
/home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:43:
 error: heading used out of sequence: , compared to implicit preceding 
heading: 
* Hyperparameters
  ^
   1 error
   
   This line is wrong: 
https://github.com/apache/lucene-solr/blob/b36b4af22bb76dc42b466b818b417bcbc0deb006/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java#L43
   (should be ``)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler edited a comment on pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-16 Thread GitBox


uschindler edited a comment on pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#issuecomment-728207517


   Hi,
   this broke javadocs:
   ```
   > 
/home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:43:
 error: heading used out of sequence: , compared to implicit preceding 
heading: 
* Hyperparameters
  ^
   1 error
   ```
   
   This line is wrong: 
https://github.com/apache/lucene-solr/blob/b36b4af22bb76dc42b466b818b417bcbc0deb006/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java#L43
   (should be ``)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-9004) Approximate nearest vector search

2020-11-16 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-9004:
---

This commit broke Javadocs:

Hi,
this broke javadocs:

{noformat}
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:43:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
 * Hyperparameters
   ^
1 error
{noformat}

This line is wrong: 
https://github.com/apache/lucene-solr/blob/b36b4af22bb76dc42b466b818b417bcbc0deb006/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java#L43
 (should be )

Maybe this is not detected in Java 11, but later javac version complain hardly.

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose we construct a new graph from scratch when merging. The 
> process is going to be  limited, at least initially, to graphs that can fit 
> in RAM since we require random access to the entire graph while constructing 
> it: In order to add links bidirectionally we must continually update existing 
> documents.
> I think we want to express this API to users as a single joint 
> {{KnnGraphField}} abstraction that joins together the vectors and the graph 
> as a single joint field type. Mostly it just looks like a vector-valued 
> field, but has this graph attached to it.
> I'll push a branch with my POC and would love to hear comments. It has many 
> nocommits, basic design is not really set, there is no Query implementation 

[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search

2020-11-16 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232941#comment-17232941
 ] 

Uwe Schindler edited comment on LUCENE-9004 at 11/16/20, 5:28 PM:
--

This commit broke Javadocs:

Hi,
this broke javadocs:

{noformat}
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:43:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
 * Hyperparameters
   ^
1 error
{noformat}

This line is wrong: 
https://github.com/apache/lucene-solr/blob/b36b4af22bb76dc42b466b818b417bcbc0deb006/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java#L43
 (should be )

Maybe this is not detected in Java 11, but later javac version complain hardly 
(e.g. Java 14)


was (Author: thetaphi):
This commit broke Javadocs:

Hi,
this broke javadocs:

{noformat}
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:43:
>  error: heading used out of sequence: , compared to implicit preceding 
> heading: 
 * Hyperparameters
   ^
1 error
{noformat}

This line is wrong: 
https://github.com/apache/lucene-solr/blob/b36b4af22bb76dc42b466b818b417bcbc0deb006/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java#L43
 (should be )

Maybe this is not detected in Java 11, but later javac version complain hardly.

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: hnsw_layered_graph.png
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable small-world) 
> graphs for the purpose of approximate nearest vector search (often referred 
> to as KNN or k-nearest-neighbor search).
> At a high level the way this algorithm works is this. First assume you have a 
> graph that has a partial encoding of the nearest neighbor relation, with some 
> short and some long-distance links. If this graph is built in the right way 
> (has the hierarchical navigable small world property), then you can 
> efficiently traverse it to find nearest neighbors (approximately) in log N 
> time where N is the number of nodes in the graph. I believe this idea was 
> pioneered in  [1]. The great insight in that paper is that if you use the 
> graph search algorithm to find the K nearest neighbors of a new document 
> while indexing, and then link those neighbors (undirectedly, ie both ways) to 
> the new document, then the graph that emerges will have the desired 
> properties.
> The implementation I propose for Lucene is as follows. We need two new data 
> structures to encode the vectors and the graph. We can encode vectors using a 
> light wrapper around {{BinaryDocValues}} (we also want to encode the vector 
> dimension and have efficient conversion from bytes to floats). For the graph 
> we can use {{SortedNumericDocValues}} where the values we encode are the 
> docids of the related documents. Encoding the interdocument relations using 
> docids directly will make it relatively fast to traverse the graph since we 
> won't need to lookup through an id-field indirection. This choice limits us 
> to building a graph-per-segment since it would be impractical to maintain a 
> global graph for the whole index in the face of segment merges. However 
> graph-per-segment is a very natural at search time - we can traverse each 
> segments' graph independently and merge results as we do today for term-based 
> search.
> At index time, however, merging graphs is somewhat challenging. While 
> indexing we build a graph incrementally, performing searches to construct 
> links among neighbors. When merging segments we must construct a new graph 
> containing elements of all the merged segments. Ideally we would somehow 
> preserve the work done when building the initial graphs, but at least as a 
> start I'd propose 

[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232957#comment-17232957
 ] 

Uwe Schindler commented on LUCENE-8982:
---

bq. gradle will pick up the toolchain suitable for the platform if it finds any 
- the CI just doesn't have it.

That's fine. My idea was: Can't we build the native dependencies, if the 
toolchain is there, but don't fail if not an just print a warning?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232963#comment-17232963
 ] 

Uwe Schindler commented on LUCENE-8982:
---

In short: change the default to "true" if toolchain is there.

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232968#comment-17232968
 ] 

Dawid Weiss commented on LUCENE-8982:
-

You can set -Pbuild.native=true manually on those VMs you know the tools are 
available... I don't think duplicating whatever logic gradle uses to detect 
those toolschains automatically is worth the effort, to be honest. The logic is 
probably in gradle's sources somewhere. I don't know if it can be done easier 
than by copying from their code.

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley opened a new pull request #2083: SOLR-15001 Docker: require init_var_solr.sh

2020-11-16 Thread GitBox


dsmiley opened a new pull request #2083:
URL: https://github.com/apache/lucene-solr/pull/2083


   https://issues.apache.org/jira/browse/SOLR-15001
   
   There are two distinct commits here.  The second is just to the 
build.gradle.  I struggled with the Docker gradle build because the 
inputs/outputs were not configured correctly which confused Gradle's cool 
incremental build.  I also disagree with defining a verbose class file inside a 
build file when it won't be re-used by other modules -- just do ad-hoc instead. 
 Build automation is best suited to scripting languages IMO.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #1769: SOLR-14789: Absorb the docker-solr repo.

2020-11-16 Thread GitBox


dsmiley commented on pull request #1769:
URL: https://github.com/apache/lucene-solr/pull/1769#issuecomment-728271138


   I'm glad your are amenable to changes, and that the complexity & Docker 
image weight I see will melt away if we only produce an image from the Solr 
assembly.  That is identical to the "official release" except packaging -- 
plain dir vs tgz of the same.  I can appreciate there were unknowns causing you 
to add this extra baggage because it might be useful but I prefer to follow a 
KISS/YAGNI philosophy so that we don't pay complexity debt on something not yet 
needed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on pull request #2083: SOLR-15001 Docker: require init_var_solr.sh

2020-11-16 Thread GitBox


HoustonPutman commented on pull request #2083:
URL: https://github.com/apache/lucene-solr/pull/2083#issuecomment-728316337


   I agree with not liking the cumbersome Test class, the reason I added it was 
to allow for easy inclusion/exclusion rules for test cases. I think this PR 
loses that functionality. You could add it back in with 
   
   ```groovy
   testsInclude = propertyOrEnvOrDefault("solr.docker.tests.include", 
"SOLR_DOCKER_TESTS_INCLUDE", "")
   testsExclude = propertyOrEnvOrDefault("solr.docker.tests.exclude", 
"SOLR_DOCKER_TESTS_EXCLUDE", "")
   ```
   
   You would just need to edit `help/docker.txt` to make sure it's up to date 
with how to use the test inputs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-16 Thread GitBox


msokolov commented on pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#issuecomment-728317354


   Thanks Uwe, I'll push a fix soon. True enough, I am building with Java 11, 
which is not as fussy about such things. Indeed, I'm used to using arbitrary 
heading levels to achieve different presentation, but I guess that's not OK!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15005) RequestHandlerBase's logger name should point to the implementation class

2020-11-16 Thread David Smiley (Jira)
David Smiley created SOLR-15005:
---

 Summary: RequestHandlerBase's logger name should point to the 
implementation class
 Key: SOLR-15005
 URL: https://issues.apache.org/jira/browse/SOLR-15005
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: logging
Reporter: David Smiley


RequestHandlerBase is an abstract class that defines a private static Logger 
with a logger name of this very class.  I think it should point to the 
implementing class (getClass()).  This would require it be non-static.  It's 
used in just one spot, from a method that isn't static, so this will work.

Do we go farther and declare as protected and remove static loggers in all 
subclasses, so long as they aren't being referenced from static methods there?

See recent comments at the end of SOLR-8330



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8330) Restrict logger visibility throughout the codebase to private so that only the file that declares it can use it

2020-11-16 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233112#comment-17233112
 ] 

David Smiley commented on SOLR-8330:


FYI [~hossman]; you may have an opinion as well since I see you filed SOLR-4833 
which refers to http://slf4j.org/faq.html#declared_static I looked at that; the 
analysis didn't consider the class hierarchy / subclassing factor.
I filed SOLR-15005 to change RequestHandlerBase's logger.  While just that one 
little change is probably not controversial, please weight in on wether or not 
the subclasses should remove loggers they have in lieu of the RHB one that will 
be in-scope.

> Restrict logger visibility throughout the codebase to private so that only 
> the file that declares it can use it
> ---
>
> Key: SOLR-8330
> URL: https://issues.apache.org/jira/browse/SOLR-8330
> Project: Solr
>  Issue Type: Sub-task
>Affects Versions: 6.0
>Reporter: Jason Gerlowski
>Assignee: Anshum Gupta
>Priority: Major
>  Labels: logging
> Fix For: 5.4, 6.0
>
> Attachments: SOLR-8330-combined.patch, SOLR-8330-detector.patch, 
> SOLR-8330-detector.patch, SOLR-8330.patch, SOLR-8330.patch, SOLR-8330.patch, 
> SOLR-8330.patch, SOLR-8330.patch, SOLR-8330.patch, SOLR-8330.patch
>
>
> As Mike Drob pointed out in Solr-8324, many loggers in Solr are 
> unintentionally shared between classes.  Many instances of this are caused by 
> overzealous copy-paste.  This can make debugging tougher, as messages appear 
> to come from an incorrect location.
> As discussed in the comments on SOLR-8324, there also might be legitimate 
> reasons for sharing loggers between classes.  Where any ambiguity exists, 
> these instances shouldn't be touched.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley merged pull request #2079: SOLR-14998 Update log level to DEBUG for ClusterStatus in Collections…

2020-11-16 Thread GitBox


dsmiley merged pull request #2079:
URL: https://github.com/apache/lucene-solr/pull/2079


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14998) any Collections Handler actions should be logged at debug level

2020-11-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233118#comment-17233118
 ] 

ASF subversion and git services commented on SOLR-14998:


Commit 2d583eaba7ab8eb778bebbc5557bae29ea481830 in lucene-solr's branch 
refs/heads/master from Nazerke Seidan
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2d583ea ]

SOLR-14998: logging: info->debug in CollectionsHandler (#2079)

Because it's almost always redundant with HttpSolrCall's admin request log.
Co-authored-by: Nazerke Seidan 

> any Collections Handler actions should be logged at debug level
> ---
>
> Key: SOLR-14998
> URL: https://issues.apache.org/jira/browse/SOLR-14998
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
> status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
> level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14998) any Collections Handler actions should be logged at debug level

2020-11-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233121#comment-17233121
 ] 

ASF subversion and git services commented on SOLR-14998:


Commit 4d904e523c9bc36f36403b9f3831b0563f3a1f79 in lucene-solr's branch 
refs/heads/branch_8x from Nazerke Seidan
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4d904e5 ]

SOLR-14998: logging: info->debug in CollectionsHandler (#2079)

Because it's almost always redundant with HttpSolrCall's admin request log.
Co-authored-by: Nazerke Seidan 

(cherry picked from commit 2d583eaba7ab8eb778bebbc5557bae29ea481830)


> any Collections Handler actions should be logged at debug level
> ---
>
> Key: SOLR-14998
> URL: https://issues.apache.org/jira/browse/SOLR-14998
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
> status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
> level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14998) any Collections Handler actions should be logged at debug level

2020-11-16 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-14998.
-
Fix Version/s: 8.8
   Resolution: Fixed

Merged.  Thanks Nazerke!

> any Collections Handler actions should be logged at debug level
> ---
>
> Key: SOLR-14998
> URL: https://issues.apache.org/jira/browse/SOLR-14998
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
> Fix For: 8.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CLUSTERSTATUS is logged in CollectionsHandler at INFO level but the cluster 
> status  is already logged in HttpSolrCall at INFO. In CollectionsHandler INFO 
> level should be set to DEBUG  to avoid a lot of noise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #2083: SOLR-15001 Docker: require init_var_solr.sh

2020-11-16 Thread GitBox


dsmiley commented on pull request #2083:
URL: https://github.com/apache/lucene-solr/pull/2083#issuecomment-728356772


   Ah; thanks for pointing out the point of that configuration; now I see.  
I'll remove these changes from this PR and have a separate PR dedicated to 
overhauling/simplifying the Docker build.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15005) RequestHandlerBase's logger name should point to the implementation class

2020-11-16 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233133#comment-17233133
 ] 

Chris M. Hostetter commented on SOLR-15005:
---

I'm not sure i really follow the rationale/suggestion being made here ... even 
after reading hte recent comment in SOLR-8330.

In my personal opinion: If I'm reading a log message, i would much rather know 
the *code* that's logging the message, then what particular subclass that 
called  the method that ran that code ... it does not make sense to me that the 
same block of code in RequestHandlerBase might use one Logger when subclassed 
by SearchHandler, and a differnet Logger when subclassed by 
UpdateRequestHandler.

In general, the suggestion that if {{Foo extends Bar}} means that any code path 
through _an instance of Foo_ should use Foo's logger -- even inside a method 
implemented in Bar that Foo inherits -- seems just as weird to me as suggesting 
that if my Foo instance calls out to some static method in YakUtils, that the 
YakUtils method should (somehow) also use my Foo logger.  For that matter: what 
logger should be used if an _instance_ of Foo calls a static method in Bar?  
what if a _static_ method in Foo calls a static method in Bar? ... all of these 
permutations make me very very scared of how confusing it would be if 
_soemtimes_ code in Bar used it's own logger, but other times it used some 
other caller specific logger.

Going back to the specific context of this jira: If you care _which_ handler is 
logging that message, then changing the Logger used based on the class doesn't 
really help you anyway -- there can/will be many isntances of SearchHandler -- 
this is what MDC values are for, and we could (should?) certainly put the 
"name" of the heandler (ie: {{/update}} vs {{/select}} vs {{/query}} in the MDC 
context for logging if folks find that useful.  Allthough i would suggest that 
at a certain point, instead of putting tons of info in the MDC, it makes sense 
to keep the MDC small, and mainly focus on having a UUID logged that can be 
used to corrolate different log entries (i think not too long back jason added 
a UUID that was included for distributed request tracing, but IDK if it's part 
of the MDC as well)



> RequestHandlerBase's logger name should point to the implementation class
> -
>
> Key: SOLR-15005
> URL: https://issues.apache.org/jira/browse/SOLR-15005
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: logging
>Reporter: David Smiley
>Priority: Minor
>
> RequestHandlerBase is an abstract class that defines a private static Logger 
> with a logger name of this very class.  I think it should point to the 
> implementing class (getClass()).  This would require it be non-static.  It's 
> used in just one spot, from a method that isn't static, so this will work.
> Do we go farther and declare as protected and remove static loggers in all 
> subclasses, so long as they aren't being referenced from static methods there?
> See recent comments at the end of SOLR-8330



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15005) RequestHandlerBase's logger name should point to the implementation class

2020-11-16 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-15005.
-
Resolution: Won't Fix

Fair enough Hoss... I enjoy reading your wisdom.
Indeed you make a good point about MDC; that should ameliorate wanting to know 
which handler is logging.

> RequestHandlerBase's logger name should point to the implementation class
> -
>
> Key: SOLR-15005
> URL: https://issues.apache.org/jira/browse/SOLR-15005
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: logging
>Reporter: David Smiley
>Priority: Minor
>
> RequestHandlerBase is an abstract class that defines a private static Logger 
> with a logger name of this very class.  I think it should point to the 
> implementing class (getClass()).  This would require it be non-static.  It's 
> used in just one spot, from a method that isn't static, so this will work.
> Do we go farther and declare as protected and remove static loggers in all 
> subclasses, so long as they aren't being referenced from static methods there?
> See recent comments at the end of SOLR-8330



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15006) replace the coreNodeName variables to replicaName

2020-11-16 Thread Noble Paul (Jira)
Noble Paul created SOLR-15006:
-

 Summary: replace the coreNodeName variables to replicaName
 Key: SOLR-15006
 URL: https://issues.apache.org/jira/browse/SOLR-15006
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Noble Paul
Assignee: Noble Paul


{{coreNodeName}} makes no sense. it's just the replica name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15006) replace the coreNodeName variables to replicaName

2020-11-16 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-15006:
--
Description: 
{{coreNodeName}} makes no sense. it's just the replica name

This is a backward compatible change. it won't change any attributes that are  
apart of ZK messages/commands etc.

  was:{{coreNodeName}} makes no sense. it's just the replica name


> replace the coreNodeName variables to replicaName
> -
>
> Key: SOLR-15006
> URL: https://issues.apache.org/jira/browse/SOLR-15006
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> {{coreNodeName}} makes no sense. it's just the replica name
> This is a backward compatible change. it won't change any attributes that are 
>  apart of ZK messages/commands etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2047: LUCENE-9592: Use doubles in VectorUtil to maintain precision.

2020-11-16 Thread GitBox


jtibshirani commented on a change in pull request #2047:
URL: https://github.com/apache/lucene-solr/pull/2047#discussion_r524826537



##
File path: lucene/core/src/java/org/apache/lucene/util/VectorUtil.java
##
@@ -25,47 +25,22 @@
   private VectorUtil() {
   }
 
-  public static float dotProduct(float[] a, float[] b) {
-float res = 0f;
-/*
- * If length of vector is larger than 8, we use unrolled dot product to 
accelerate the
- * calculation.
- */
-int i;
-for (i = 0; i < a.length % 8; i++) {
-  res += b[i] * a[i];
-}
-if (a.length < 8) {
-  return res;
-}
-float s0 = 0f;
-float s1 = 0f;
-float s2 = 0f;
-float s3 = 0f;
-float s4 = 0f;
-float s5 = 0f;
-float s6 = 0f;
-float s7 = 0f;
-for (; i + 7 < a.length; i += 8) {
-  s0 += b[i] * a[i];
-  s1 += b[i + 1] * a[i + 1];
-  s2 += b[i + 2] * a[i + 2];
-  s3 += b[i + 3] * a[i + 3];
-  s4 += b[i + 4] * a[i + 4];
-  s5 += b[i + 5] * a[i + 5];
-  s6 += b[i + 6] * a[i + 6];
-  s7 += b[i + 7] * a[i + 7];
+  public static double dotProduct(float[] a, float[] b) {

Review comment:
   Simply changing the test to use a larger epsilon sounds good to me. 
After thinking about this more, I'm not sure we want to optimize for the 
precision of these individual calculations. Many high-dimensional vectors are 
already an approximation to an original object, like a piece of text. And I've 
heard of practitioners choosing less precise representations (like bfloat16) 
for each vector element to save space, and still achieving acceptable results.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233183#comment-17233183
 ] 

Zach Chen commented on LUCENE-8982:
---

Sorry for the late response, just got off work and see this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general assume cpp toolchain to be there 
>in the VMs (and add them if they are missing), but still have 
>-Pbuild.native=false as default to not break builds for others and have a few 
>VMs with cpp toolchain intentionally left out to test for compatibility?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233183#comment-17233183
 ] 

Zach Chen edited comment on LUCENE-8982 at 11/17/20, 1:50 AM:
--

Sorry for the late response, just got off work and see this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general assume cpp toolchain to be there 
>in the VMs (and add them if they are missing) to execute the compilation and 
>tests, but still have -Pbuild.native=false as default to not break builds for 
>others and have a few VMs with cpp toolchain intentionally left out to test 
>for compatibility?


was (Author: zacharymorn):
Sorry for the late response, just got off work and see this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general assume cpp toolchain to be there 
>in the VMs (and add them if they are missing), but still have 
>-Pbuild.native=false as default to not break builds for others and have a few 
>VMs with cpp toolchain intentionally left out to test for compatibility?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani opened a new pull request #2084: LUCENE-9592: Loosen equality checks in TestVectorUtil.

2020-11-16 Thread GitBox


jtibshirani opened a new pull request #2084:
URL: https://github.com/apache/lucene-solr/pull/2084


   TestVectorUtil occasionally fails because of floating point errors. This
   change slightly increases the epsilon in equality checks -- testing shows 
that
   this will greatly decrease the chance of failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2047: LUCENE-9592: Use doubles in VectorUtil to maintain precision.

2020-11-16 Thread GitBox


jtibshirani commented on a change in pull request #2047:
URL: https://github.com/apache/lucene-solr/pull/2047#discussion_r524835397



##
File path: lucene/core/src/java/org/apache/lucene/util/VectorUtil.java
##
@@ -25,47 +25,22 @@
   private VectorUtil() {
   }
 
-  public static float dotProduct(float[] a, float[] b) {
-float res = 0f;
-/*
- * If length of vector is larger than 8, we use unrolled dot product to 
accelerate the
- * calculation.
- */
-int i;
-for (i = 0; i < a.length % 8; i++) {
-  res += b[i] * a[i];
-}
-if (a.length < 8) {
-  return res;
-}
-float s0 = 0f;
-float s1 = 0f;
-float s2 = 0f;
-float s3 = 0f;
-float s4 = 0f;
-float s5 = 0f;
-float s6 = 0f;
-float s7 = 0f;
-for (; i + 7 < a.length; i += 8) {
-  s0 += b[i] * a[i];
-  s1 += b[i + 1] * a[i + 1];
-  s2 += b[i + 2] * a[i + 2];
-  s3 += b[i + 3] * a[i + 3];
-  s4 += b[i + 4] * a[i + 4];
-  s5 += b[i + 5] * a[i + 5];
-  s6 += b[i + 6] * a[i + 6];
-  s7 += b[i + 7] * a[i + 7];
+  public static double dotProduct(float[] a, float[] b) {

Review comment:
   I opened https://github.com/apache/lucene-solr/pull/2047, it if it looks 
okay I can close out this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233183#comment-17233183
 ] 

Zach Chen edited comment on LUCENE-8982 at 11/17/20, 2:25 AM:
--

Sorry for the late response, just got off work and saw this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general requires cpp toolchain to be there 
>in the VMs (and add them if they are missing) to execute the compilation and 
>tests, but still have -Pbuild.native=false as default to not break builds for 
>others and have a few VMs with cpp toolchain intentionally left out to test 
>for compatibility?


was (Author: zacharymorn):
Sorry for the late response, just got off work and saw this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general require cpp toolchain to be there 
>in the VMs (and add them if they are missing) to execute the compilation and 
>tests, but still have -Pbuild.native=false as default to not break builds for 
>others and have a few VMs with cpp toolchain intentionally left out to test 
>for compatibility?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233183#comment-17233183
 ] 

Zach Chen edited comment on LUCENE-8982 at 11/17/20, 2:25 AM:
--

Sorry for the late response, just got off work and saw this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general require cpp toolchain to be there 
>in the VMs (and add them if they are missing) to execute the compilation and 
>tests, but still have -Pbuild.native=false as default to not break builds for 
>others and have a few VMs with cpp toolchain intentionally left out to test 
>for compatibility?


was (Author: zacharymorn):
Sorry for the late response, just got off work and see this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general assume cpp toolchain to be there 
>in the VMs (and add them if they are missing) to execute the compilation and 
>tests, but still have -Pbuild.native=false as default to not break builds for 
>others and have a few VMs with cpp toolchain intentionally left out to test 
>for compatibility?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8982) Make NativeUnixDirectory pure java now that direct IO is possible

2020-11-16 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233183#comment-17233183
 ] 

Zach Chen edited comment on LUCENE-8982 at 11/17/20, 2:32 AM:
--

Sorry for the late response, just got off work and saw this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the CI builds in general require cpp toolchain to be 
>there in the VMs (and add them if they are missing) to execute the compilation 
>and tests, but in gradle still have -Pbuild.native=false as default to not 
>break builds for others in development, and have a few CI VMs with cpp 
>toolchain intentionally left out to test for compatibility?


was (Author: zacharymorn):
Sorry for the late response, just got off work and saw this. 

>From the discussion it seems the assumption / reality here is that cpp 
>toolchain may or may not be available in the VMs. However, since Lucene does 
>have native code and scheduled build can discover any change that breaks the 
>native-java integration early on (there was actually one commit before this 
>that broke it), should the build in general requires cpp toolchain to be there 
>in the VMs (and add them if they are missing) to execute the compilation and 
>tests, but still have -Pbuild.native=false as default to not break builds for 
>others and have a few VMs with cpp toolchain intentionally left out to test 
>for compatibility?

> Make NativeUnixDirectory pure java now that direct IO is possible
> -
>
> Key: LUCENE-8982
> URL: https://issues.apache.org/jira/browse/LUCENE-8982
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/misc
>Reporter: Michael McCandless
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> {{NativeUnixDirectory}} is a {{Directory}} implementation that uses direct IO 
> to write newly merged segments.  Direct IO bypasses the kernel's buffer cache 
> and write cache, making merge writes "invisible" to the kernel, though the 
> reads for merging the N segments are still going through the kernel.
> But today, {{NativeUnixDirectory}} uses a small JNI wrapper to access the 
> {{O_DIRECT}} flag to {{open}} ... since JDK9 we can now pass that flag in 
> pure java code, so we should now fix {{NativeUnixDirectory}} to not use JNI 
> anymore.
> We should also run some more realistic benchmarks seeing if this option 
> really helps nodes that are doing concurrent indexing (merging) and searching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15000) Solr based enterprise level, one-stop search center products with high performance, high reliability and high scalability

2020-11-16 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui resolved SOLR-15000.

Resolution: Done

> Solr based enterprise level, one-stop search center products with high 
> performance, high reliability and high scalability
> -
>
> Key: SOLR-15000
> URL: https://issues.apache.org/jira/browse/SOLR-15000
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: bai sui
>Priority: Minor
> Attachments: add-collection-step-2-expert.png, 
> add-collection-step-2.png
>
>
> h2. Summary
> I have developed an enterprise application based on Solr,named TIS . Use TIS 
> can quickly build enterprise search service for you. TIS includes three 
> components:
>  - offline index building platform
>  The data is exported from ER database( mysql, sqlserver and so on) through 
> full table scanning, and then the wide table is constructed by local MR tool, 
> or the wide table is constructed directly by spark
>  - incremental real-time channel
>  It is transmitted to Kafka , and real-time stream calculation is carried out 
> by Flink and submitted to search engine to ensure that the data in search 
> engine and database are consistent in near real time
>  - search engine
>  currently,based on Solr8
> TIS integrate these components seamlessly and bring users one-stop, out of 
> the box experience.
> h2. My question
> I want to feed back my code to the community, but TIS focuses on Enterprise 
> Application Search, just as elasitc search focuses on visual analysis of time 
> series data. Because Solr is a general search product, *I don't think TIS can 
> be merged directly into Solr. Is it possible for TIS to be a new incubation 
> project under Apache?*
> h2. TIS main Features
>  - The schema and solrconfig storage are separated from ZK and stored in 
> MySQL. The version management function is provided. Users can roll back to 
> the historical version of the configuration.
>   !add-collection-step-2-expert.png|width=500!
>   !add-collection-step-2.png|width=500!
>Schema editing mode can be switched between visual editing mode or 
> advanced expert mode
>  - Define wide table rules based on the selected data table
>  - The offline index building component is provided. Outside the collection, 
> the data is built into Lucene segment file. Then, the segment file is 
> returned to the local disk where solrcore is located. The new index of reload 
> solrcore takes effect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-16 Thread GitBox


dweiss commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r524942578



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -74,12 +65,12 @@
  *
  * @lucene.experimental
  */
-public class NativeUnixDirectory extends FSDirectory {

Review comment:
   I think we should make a copy of the NativeUnixDirectory, modify this to 
direct IO, then perhaps benchmark how they perform? If we replace in-place we 
won't be able to do it (unless you compile from different git commits). Then 
any removal of native code, should it follow-up, would be a cleaner patch as 
well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org