date:20221108

[GitHub] [lucene] thecoop commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-11-08 Thread GitBox

thecoop commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1307057810 Unfortunately that doesn't seem to have much of an effect - same number after a GC, with the option turned on or off -- This is an automated message from the Apache Git Service. To res

[GitHub] [lucene] thecoop commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-11-08 Thread GitBox

thecoop commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1307070854 Unfortunately that doesn't seem to have much of an impact, from what I can see here. @rmuir Would you be against having a string cache specifically in the relevant methods in Fiel

[GitHub] [lucene] scampi commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

2022-11-08 Thread GitBox

scampi commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1307096355 I was involved in a [previous issue](https://issues.apache.org/jira/browse/LUCENE-10449) that is related to this one. The problem was a drop of performance when scanning `SortedSetD

[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-11-08 Thread GitBox

rmuir commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1307115506 yes because it would translate as a leak for many other use-cases/applications. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [lucene] rmuir commented on pull request #11906: Add monster test for many knn docs

2022-11-08 Thread GitBox

rmuir commented on PR #11906: URL: https://github.com/apache/lucene/pull/11906#issuecomment-1307150684 i bumped the ram and restarted the test. but it is really broken that i can flush out all the docs with a 512MB heap, but need many many gigabytes to merge them together. and its only 16 m

[GitHub] [lucene] thecoop commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-11-08 Thread GitBox

thecoop commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1307164489 To be clear, are you referring to the extra memory used by the deduplication hashmap for the duration of the deserialisation, that will then be eligible for GC after the method returns?

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

benwtrent commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307208338 > We have to start building up tests for these cases because this seems like deja vu as far as int overflows in this area. I am right there with ya @rmuir. 100% feels like "whack

[GitHub] [lucene] rmuir commented on pull request #11852: Luke Webapp

2022-11-08 Thread GitBox

rmuir commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1307217205 > I'm late to the party. Do we really want to have/maintain a web application under Lucene? An HTTP server would not be sufficient to develop a state-full web app, you need to write an app

[GitHub] [lucene] rmuir commented on pull request #11852: Luke Webapp

2022-11-08 Thread GitBox

rmuir commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1307222420 > Re: JS frameworks - I recognize my position is from Ludd, and it might be untenable. If it gets out of hand we can always add something like jQuery, but we can never remove, so let's sta

[GitHub] [lucene] dweiss commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

dweiss commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307230679 There's a whole bunch of automated checks you could go through, selectively, and try to enable them for the future. This includes IntLongMath, which is currently off. https://gith

[GitHub] [lucene] benwtrent opened a new pull request, #11907: Fix latent casting bug in BKDWriter

2022-11-08 Thread GitBox

benwtrent opened a new pull request, #11907: URL: https://github.com/apache/lucene/pull/11907 This commit fixes a latent casting bug where int multiplication could roll-over to the negatives. `new byte[Math.toIntExact(numSplits * config.bytesPerDim)];` `toIntExact` does nothin

[GitHub] [lucene] benwtrent commented on pull request #11907: Fix latent casting bug in BKDWriter

2022-11-08 Thread GitBox

benwtrent commented on PR #11907: URL: https://github.com/apache/lucene/pull/11907#issuecomment-1307246646 @iverase you might be interested in this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [lucene] iverase commented on pull request #11907: Fix latent casting bug in BKDWriter

2022-11-08 Thread GitBox

iverase commented on PR #11907: URL: https://github.com/apache/lucene/pull/11907#issuecomment-1307282609 Actually, I think there are more occurrences of this multiplication without check, could we add it? for example: https://github.com/apache/lucene/blob/3210a42f0958e395930d2259e155a7149fb

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307289709 > Yeah, we can probably trigger this overflow by using 16268815 byte vectors of few dimensions. Something as small as 2 dimensions could work. > One issue with HNSW is that completel

[GitHub] [lucene] iverase merged pull request #11907: Fix latent casting bug in BKDWriter

2022-11-08 Thread GitBox

iverase merged PR #11907: URL: https://github.com/apache/lucene/pull/11907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] rmuir commented on pull request #11906: Add monster test for many knn docs

2022-11-08 Thread GitBox

rmuir commented on PR #11906: URL: https://github.com/apache/lucene/pull/11906#issuecomment-1307432615 I looked into why the test is taking eternity to run, the super slow merge at the end is spending all its time clearing bitsets! Looks like the wrong datastructure... ``` java.la

[GitHub] [lucene] jpountz commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

2022-11-08 Thread GitBox

jpountz commented on issue #11676: URL: https://github.com/apache/lucene/issues/11676#issuecomment-1307486547 I wonder if the complexity introduced by the nanotime trick is worth the benefits, but I'm happy to discuss it over a PR. In my opinion only exceeding the configured allowed timeout

[GitHub] [lucene] rmuir commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

2022-11-08 Thread GitBox

rmuir commented on issue #11676: URL: https://github.com/apache/lucene/issues/11676#issuecomment-1307533642 It is worth it. nobody wants to debug test failures that happen because NTP skewed the clock. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] gsmiller commented on a diff in pull request #11881: Further optimize DrillSideways scoring

2022-11-08 Thread GitBox

gsmiller commented on code in PR #11881: URL: https://github.com/apache/lucene/pull/11881#discussion_r1016914939 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java: ## @@ -166,89 +160,158 @@ public int score(LeafCollector collector, Bits acceptDocs, int m

[GitHub] [lucene] jpountz commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

2022-11-08 Thread GitBox

jpountz commented on issue #11676: URL: https://github.com/apache/lucene/issues/11676#issuecomment-1307598183 Sorry for the confusion, I was thinking of not relying on any timing info **at all** besides the one that is already encapsulated by the `QueryTimeout` object. Just relying on the f

[GitHub] [lucene] jpountz commented on a diff in pull request #11900: Reduce bloom filter size by using the optimal count for hash functions.

2022-11-08 Thread GitBox

jpountz commented on code in PR #11900: URL: https://github.com/apache/lucene/pull/11900#discussion_r1016950141 ## lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java: ## @@ -46,7 +46,9 @@ public class FuzzySet implements Accountable { public static final in

[GitHub] [lucene] gsmiller merged pull request #11881: Further optimize DrillSideways scoring

2022-11-08 Thread GitBox

gsmiller merged PR #11881: URL: https://github.com/apache/lucene/pull/11881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307727467 > * In Lucene 9.2+, the bug appears when there are `16268814` (Integer.MAX_VALUE/(M * 2 + 1)) or more vectors in a single segment. If this is correct we should just be able to create

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

benwtrent commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307744298 @rmuir Thinking outside the box! I will try that. It would definitely cause the graph offset calculation to be completely blown out of proportion! Which is the cause of this overflow.

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307756832 Yes, if such a test works it may at least prevent similar regressions. Another possible idea is to give every vector value of 0, then zip up the index, it should be ~16MB of zeros w

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307760820 @jdconrad helped with some math that may explain why previous tests didnt fail: ``` jshell> int M = 16; M ==> 16 jshell> long v1 = (1 + (M*2)) * 4 * 16268814; v1 ==> 2147

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307821765 With the 20M docs it still didnt fail. I have the index saved so i can play around, maybe checkindex doesnt trigger what is needed here (e.g. advance vs next). It is a little crazy

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-08 Thread GitBox

benwtrent commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1307833883 > It is a little crazy that this index has 2.5GB .vex file that, if i run zip, deflates 98% down to 75MB. very wasteful. Agreed :). Once this stuff is solved, I hope to further i

[GitHub] [lucene] jmazanec15 commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]

2022-11-08 Thread GitBox

jmazanec15 commented on issue #11354: URL: https://github.com/apache/lucene/issues/11354#issuecomment-1307862709 Hi @mayya-sharipova @jtibshirani @msokolov I figured out the issue in the previous tests with the recall - I was not using the copy of the vectors when recomputing the dis

[GitHub] [lucene-site] sebbASF opened a new pull request, #71: Fix github page title

2022-11-08 Thread GitBox

sebbASF opened a new pull request, #71: URL: https://github.com/apache/lucene-site/pull/71 Github repo currently says "Apache Lucene and Solr web site" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [lucene] jdconrad commented on pull request #11906: Add monster test for many knn docs

2022-11-08 Thread GitBox

jdconrad commented on PR #11906: URL: https://github.com/apache/lucene/pull/11906#issuecomment-1308006136 Just as confirmation I'm seeing `FixedBitSet.clear` taking up a lot of time as well when running this test. ``` "Lucene Merge Thread #0" #18 daemon prio=5 os_prio=0 cpu=347309.

[GitHub] [lucene-site] uschindler merged pull request #71: Fix github page title

2022-11-08 Thread GitBox

uschindler merged PR #71: URL: https://github.com/apache/lucene-site/pull/71 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[GitHub] [lucene-site] uschindler merged pull request #70: Enable issues for website

2022-11-08 Thread GitBox

uschindler merged PR #70: URL: https://github.com/apache/lucene-site/pull/70 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[GitHub] [lucene-site] sebbASF opened a new issue, #72: Add issue tracker for website

2022-11-08 Thread GitBox

sebbASF opened a new issue, #72: URL: https://github.com/apache/lucene-site/issues/72 It would be helpful to have a link to this issue tracker from the website. Perhaps under 'Editing this site'? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] rmuir commented on pull request #11906: Add monster test for many knn docs

2022-11-08 Thread GitBox

rmuir commented on PR #11906: URL: https://github.com/apache/lucene/pull/11906#issuecomment-1308119525 current test still doesn't fail. checkIndex just calls nextDoc() on low-level vectors but we may need to invoke skipping to find the issue. That's my theory at least. one thing miss

[GitHub] [lucene] donnerpeter merged pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

2022-11-08 Thread GitBox

donnerpeter merged PR #11893: URL: https://github.com/apache/lucene/pull/11893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] thecoop commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

[GitHub] [lucene] thecoop commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

[GitHub] [lucene] scampi commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

[GitHub] [lucene] rmuir commented on pull request #11906: Add monster test for many knn docs

[GitHub] [lucene] thecoop commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] rmuir commented on pull request #11852: Luke Webapp

[GitHub] [lucene] rmuir commented on pull request #11852: Luke Webapp

[GitHub] [lucene] dweiss commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] benwtrent opened a new pull request, #11907: Fix latent casting bug in BKDWriter

[GitHub] [lucene] benwtrent commented on pull request #11907: Fix latent casting bug in BKDWriter

[GitHub] [lucene] iverase commented on pull request #11907: Fix latent casting bug in BKDWriter

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] iverase merged pull request #11907: Fix latent casting bug in BKDWriter

[GitHub] [lucene] rmuir commented on pull request #11906: Add monster test for many knn docs

[GitHub] [lucene] jpountz commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

[GitHub] [lucene] rmuir commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

[GitHub] [lucene] gsmiller commented on a diff in pull request #11881: Further optimize DrillSideways scoring

[GitHub] [lucene] jpountz commented on issue #11676: Can TimeLimitingBulkScorer exponentially grow the window size? [LUCENE-10640]

[GitHub] [lucene] jpountz commented on a diff in pull request #11900: Reduce bloom filter size by using the optimal count for hash functions.

[GitHub] [lucene] gsmiller merged pull request #11881: Further optimize DrillSideways scoring

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] benwtrent commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

[GitHub] [lucene] jmazanec15 commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]

[GitHub] [lucene-site] sebbASF opened a new pull request, #71: Fix github page title

[GitHub] [lucene] jdconrad commented on pull request #11906: Add monster test for many knn docs

[GitHub] [lucene-site] uschindler merged pull request #71: Fix github page title

[GitHub] [lucene-site] uschindler merged pull request #70: Enable issues for website

[GitHub] [lucene-site] sebbASF opened a new issue, #72: Add issue tracker for website

[GitHub] [lucene] rmuir commented on pull request #11906: Add monster test for many knn docs

[GitHub] [lucene] donnerpeter merged pull request #11893: hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)

36 matches

Site Navigation

Mail list logo

Footer information