tyronecai commented on PR #15779:
URL: https://github.com/apache/lucene/pull/15779#issuecomment-3991539832
> > Do you have the benchy source you ran -- I'll test on `beast3`.
>
> Woops -- I see you already posted the code fragment for the benchy in your
op -- I'll try to test on `beast3` in between nightly benchy runs.
>
> And thank you for posting benchy source up front -- it's great to share
exactly what/how your ran along with any results.
I suddenly realized that rehashing is essentially a process of
reconstructing `ids` from existing terms.
Therefore, I no longer need the previous ids; I can directly read terms
sequentially from byteStarts + pool and place them in the appropriate positions
within the new IDs.
Am I right?
This way, I only need 2X the memory, instead of 3X or the 4X after adding
`int hashcodes[]`
The code is roughly as follows; I still need to confirm and test it further.
```
private void rehash(final int newSize, boolean hashOnData) {
final int newMask = newSize - 1;
final int newHighMask = ~newMask;
bytesUsed.addAndGet(Integer.BYTES * (long) (newSize - ids.length));
ids = new int[newSize];
Arrays.fill(ids, -1);
// rebuild ids from terms in pool
for (int id = 0; id < count; id++) {
final int hashcode;
int code;
if (hashOnData) {
hashcode = code = pool.hash(bytesStart[id]);
} else {
code = bytesStart[id];
hashcode = 0;
}
int hashPos = code & newMask;
assert hashPos >= 0;
// Conflict; use linear probe to find an open slot
// (see LUCENE-5604):
while (ids[hashPos] != -1) {
code++;
hashPos = code & newMask;
}
ids[hashPos] = id | (hashcode & newHighMask);
}
hashMask = newMask;
highMask = newHighMask;
hashSize = newSize;
hashHalfSize = newSize / 2;
}
``
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]