wjp719 commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1959222579
> Have you got specific errors? could you give some detailed message? Thanks!
I have no errors,I didn't realize the new format was used, Thanks.
--
This is an automated messag
easyice commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1958968291
> @easyice Hi, I have doubt that the encoding data result using group-varint
encoding is different from the old way, so is this way compatible with the old
index format data?
This
wjp719 commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1958873159
@easyice Hi, I have doubt that the encoding data result using group-varint
encoding is different from the old way, so is this way compatible with the old
index format data?
--
This is
jpountz commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1822506995
I opened a PR to feed some of this data into the micro benchmark to make it
more realistic: https://github.com/apache/lucene/pull/12833.
--
This is an automated message from the Apache
easyice commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1822504456
It's very important as a reference! Thanks a lot!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above t
jpountz commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1822453957
For reference, I computed the most frequent `flag` values on wikibigall,
which are the values that might be worth optimizing for:
- 0x55 (4 2-bytes ints): 29.6%
- 0xaa (5 3-bytes
jpountz commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1822300926
Also the
[size](http://people.apache.org/~mikemccand/lucenebench/indexing.html#FixedIndexSize)
increase is hardly noticeable.
--
This is an automated message from the Apache Git Servi
jpountz commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1822293603
There seems to be a speedup on [prefix
queries](http://people.apache.org/~mikemccand/lucenebench/Prefix3.html) in
nightly benchmarks, I'll add an annotation.
--
This is an automated m
jpountz merged PR #12782:
URL: https://github.com/apache/lucene/pull/12782
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
easyice commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1398894042
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java:
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
easyice commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1398890215
##
lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestGroupVInt.java:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
jpountz commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1398826440
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java:
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
jpountz commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1398811234
##
lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestGroupVInt.java:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
easyice commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1818157148
I ran some rounds of wikimediumall(sometimes there is noise), It looks a
bit speed up :
`.doc` files were 0.4% larger overall (5.45GB to 5.47GB)
Round 1
```
easyice commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1398242374
##
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/GroupVIntBenchmark.java:
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
easyice commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1817536690
Wow, what an incredible speedup! I would not have expected bulk decoding
with read directly is so much faster than read from array, Thank you for your
time, and I'm sorry i didn't try t
jpountz commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1391047570
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVIntWriter.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
jpountz commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1817146436
Thanks @easyice. I took some time to look into the benchmark and improve a
few things, hopefully you don't mind. Here is the output of the benchmark on my
machine now:
```
Benc
easyice commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1810036730
Thank you @jpountz , I pushed the benchmark code, and added a new comparison
between `ByteArrayDataInput` vs `ByteBufferIndexInput` . For `readVInt`, the
`ByteBufferIndexInput` is a bit
jpountz commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1808314262
At least in theory, group varint could be made faster than vints even with
single-byte integers, because a single check on `flag == 0` would tell us that
all 4 integers have a single byt
jpountz commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1808043672
Could you check in your benchmark under `lucene/benchmark-jmh` so that we
could play with it?
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
easyice commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1806834098
@jpountz You are right, recomputing the length is faster than table lookup,
here is the benchmark when reading the ints, each value will takes 4 bytes:
```
GroupVInt.readGroup
easyice commented on PR #12782:
URL: https://github.com/apache/lucene/pull/12782#issuecomment-1805059427
@jpountz @rmuir Thanks for your suggestions, it's very helpful for me! I
will run the benchmark for recomputing length vs table lookup.
--
This is an automated message from the Apach
easyice commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1388845597
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintReader.java:
##
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
easyice commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1388843109
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintWriter.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
rmuir commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1388595300
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintReader.java:
##
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one o
jpountz commented on code in PR #12782:
URL: https://github.com/apache/lucene/pull/12782#discussion_r1388273135
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintWriter.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
easyice opened a new pull request, #12782:
URL: https://github.com/apache/lucene/pull/12782
As discussed in issue https://github.com/apache/lucene/issues/12717
the read performance of group-varint is 14-30%% faster than vint, the
`Mode` 16-248 is the number of ints will be read.
28 matches
Mail list logo