date:20210623

[jira] [Created] (LUCENE-10014) docvalue writeBlock gcd encode improve

2021-06-23 Thread weizijun (Jira)

weizijun created LUCENE-10014:
-

 Summary: docvalue writeBlock gcd encode improve
 Key: LUCENE-10014
 URL: https://issues.apache.org/jira/browse/LUCENE-10014
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: weizijun


Lucene90DocValuesConsumer.writeBlock calculate bitsPerValue  as:
{code:java}
final int bitsPerValue = DirectWriter.unsignedBitsRequired(max - min);
{code}
 it can use gcd in this place as:
{code:java}
(max - min) / gcd
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10014) docvalue writeBlock gcd encode improve

2021-06-23 Thread weizijun (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weizijun updated LUCENE-10014:
--
Status: Patch Available  (was: Open)

> docvalue writeBlock gcd encode improve
> --
>
> Key: LUCENE-10014
> URL: https://issues.apache.org/jira/browse/LUCENE-10014
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: weizijun
>Priority: Major
>
> Lucene90DocValuesConsumer.writeBlock calculate bitsPerValue  as:
> {code:java}
> final int bitsPerValue = DirectWriter.unsignedBitsRequired(max - min);
> {code}
>  it can use gcd in this place as:
> {code:java}
> (max - min) / gcd
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10014) docvalue writeBlock gcd encode improve

2021-06-23 Thread weizijun (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weizijun updated LUCENE-10014:
--
Attachment: LUCENE-10014.patch

> docvalue writeBlock gcd encode improve
> --
>
> Key: LUCENE-10014
> URL: https://issues.apache.org/jira/browse/LUCENE-10014
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: weizijun
>Priority: Major
> Attachments: LUCENE-10014.patch
>
>
> Lucene90DocValuesConsumer.writeBlock calculate bitsPerValue  as:
> {code:java}
> final int bitsPerValue = DirectWriter.unsignedBitsRequired(max - min);
> {code}
>  it can use gcd in this place as:
> {code:java}
> (max - min) / gcd
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10014) docvalue writeBlock gcd encode improve

2021-06-23 Thread weizijun (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weizijun updated LUCENE-10014:
--
Status: Patch Available  (was: Open)

> docvalue writeBlock gcd encode improve
> --
>
> Key: LUCENE-10014
> URL: https://issues.apache.org/jira/browse/LUCENE-10014
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: weizijun
>Priority: Major
> Attachments: LUCENE-10014.patch
>
>
> Lucene90DocValuesConsumer.writeBlock calculate bitsPerValue  as:
> {code:java}
> final int bitsPerValue = DirectWriter.unsignedBitsRequired(max - min);
> {code}
>  it can use gcd in this place as:
> {code:java}
> (max - min) / gcd
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10014) docvalue writeBlock gcd encode improve

2021-06-23 Thread weizijun (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weizijun updated LUCENE-10014:
--
Status: Open  (was: Patch Available)

> docvalue writeBlock gcd encode improve
> --
>
> Key: LUCENE-10014
> URL: https://issues.apache.org/jira/browse/LUCENE-10014
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: weizijun
>Priority: Major
> Attachments: LUCENE-10014.patch
>
>
> Lucene90DocValuesConsumer.writeBlock calculate bitsPerValue  as:
> {code:java}
> final int bitsPerValue = DirectWriter.unsignedBitsRequired(max - min);
> {code}
>  it can use gcd in this place as:
> {code:java}
> (max - min) / gcd
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand merged pull request #193: For stability of DisjunctionIntervalsSource.toString(), sort subSources

2021-06-23 Thread GitBox



mikemccand merged pull request #193:
URL: https://github.com/apache/lucene/pull/193


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on pull request #177: Initial rewrite of MMapDirectory for JDK-17 preview (incubating) Panama APIs (>= JDK-17-ea-b25)

2021-06-23 Thread GitBox



mikemccand commented on pull request #177:
URL: https://github.com/apache/lucene/pull/177#issuecomment-866784477


   > > > The problem with luceneutil is also that it respawns a JVM multiple 
times.
   > > 
   > > 
   > > Hmm, we added multiple JVMs long ago precisely because HotSpot was so 
unpredictable. I.e. we had clear examples where HotSpot would paint itself into 
a corner, compiling e.g. `readVInt` poorly and never re-compiling it, or 
something, such that no matter how long the benchmark ran, it would never reach 
as good performance as if you simply restarted the whole JVM and rolled the 
dice again. But maybe this situation has been improved and these were somehow 
early HotSpot bugs/issues and we could really remove multiple JVMs without 
harming how accurately we can extract the mean/variance performance of all our 
benchmark tasks?
   > 
   > This is also not reality: Would you restart your Elasticsearch server from 
time to time because you think there might be a broken `readVInt()` 
optimization?
   
   Yeah, that is true!  But perhaps it shouldn't be the case :)  Maybe 
Elasticsearch/OpenSearch/Solr should spawn JVM a few times until they get a 
"good" `readVInt` compilation!  The noisy mis-compilation was such a sizable 
impact (back then, hopefully not anymore?).
   
   If we only ran benchmarks in nightly runs so that we could see that 
noise/variance with time, maybe we could do just one JVM.
   
   But when a developer is trying to test an exciting optimization, in the 
privacy of their `git clone`, it really sucks to have hotspot noise completely 
drown out any small gains your optimization might show!
   
   Benchmarking is hard :)
   
   > Here are the berlinbuzzwords slides about this: 
https://2021.berlinbuzzwords.de/sites/berlinbuzzwords.de/files/2021-06/The%20future%20of%20Lucene%27s%20MMapDirectory.pdf
   
   Oooh, thanks for sharing!  The talk looks AWESOME!  I will watch recording 
when it's out :)  You should share these slides on Twitter too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on pull request #177: Initial rewrite of MMapDirectory for JDK-17 preview (incubating) Panama APIs (>= JDK-17-ea-b25)

2021-06-23 Thread GitBox

mikemccand commented on pull request #177:
URL: https://github.com/apache/lucene/pull/177#issuecomment-866787723

> the JMH will do this too. I forget the defaults, but uses multiple jvm
iterations and iterations within each jvm and warmup iterations. But it has
smarts around the JIT compiler and can dump profiled assembly for its
microbenchmarks. I never have noise issues with it.

Excellent!

> The current big "integration test" (lucene util) is useful for some
things: e.g. something has to tell us there is pollution from too many java
abstractions going megamorphic and so on :)

It really is more of an integration test, yeah. It runs many different
kinds of queries/tasks, concurrently across multiple threads, trying to
exercise Lucene roughly in a way that OpenSearch/Elasticsearch/Solr might.
Though, it does not do concurrent indexing with searching in a single JVM, at
least not with the default benchmarks. Really, distributed search engines
should not do that -- they should rather use [Lucene's near-real-time segment
replication](https://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html#:~:text=Lucene's%20near%2Dreal%2Dtime%20segment%20index%20replication,-%5BTL%3BDR%3A&text=Lucene%20has%20a%20unique%20write,files%20will%20never%20again%20change.),
which is more efficient if you have deep replicas, and also enables strong
physical isolation of indexing and searching JVMs which have very different
resources requirements! OK ``!

> But I think it would be improved by providing some more diagnostics
(LogCompilation or whatever, maybe JIT stats in the JFR output). Let it be a
"canary" to find little ways to improve.

+1. I wonder if we could tap into those in real-time and get a sense of
when the JVM really is roughly "warmed up", instead of the static "discard
first N samples for each task" that we do now. Or maybe to detect
mis-compilation of `readVInt`!

> But we have nothing setup to do simple noise-free microbenchmarks over
some specific code, e.g. like "unit tests" running different query types. And
for those you don't want crazy JFR and logging and stuff as it is so targeted,
you can just dump the hot assembly code instead. For now if you want to do
this, you are writing one-off stuff yourself.

Yeah maybe consing up a quick JMH for such cases is perfectly fine solution
for we developers?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on pull request #193: For stability of DisjunctionIntervalsSource.toString(), sort subSources

2021-06-23 Thread GitBox



mikemccand commented on pull request #193:
URL: https://github.com/apache/lucene/pull/193#issuecomment-866789270


   Thanks @magibney -- I pushed this fix and backported to 8.x too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] magibney commented on pull request #193: For stability of DisjunctionIntervalsSource.toString(), sort subSources

2021-06-23 Thread GitBox



magibney commented on pull request #193:
URL: https://github.com/apache/lucene/pull/193#issuecomment-866806262


   Thanks @mikemccand !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #186: LUCENE-9613: Encode ordinals like numerics.

2021-06-23 Thread GitBox



jpountz merged pull request #186:
URL: https://github.com/apache/lucene/pull/186


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

69 matches

Mail list logo