Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand merged PR #12624: URL: https://github.com/apache/lucene/pull/12624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1845435406 Thanks for persisting @dungba88 -- this was a crazy long and tricky exercise. I'm so excited Lucene can finally build arbitrarily large FSTs with bounded heap usage. I'll merg

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1419045318 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,7 +418,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418919257 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,7 +418,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418892433 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,7 +418,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1845249074 Thanks everyone! I addressed comments, putting a simpler implementation. +1 to the FST micro benchmarking -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418746164 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,6 +417,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
uschindler commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418764876 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
uschindler commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418761384 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1845106275 Since we are struggling to best measure FST performance impact of these changes, I opened a spinoff [issue to create a dedicated FST microbenchmark](https://github.com/apache/lucene/p

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418742557 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -218,13 +279,19 @@ public Builder allowFixedLengthArcs(boolean allowFixedLengthArcs) {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418738243 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,22 +125,44 @@ public class FSTCompiler { final float directAddressingMaxOversizin

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418734995 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418728841 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418728841 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-06 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416980241 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-06 Thread via GitHub
dweiss commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416950035 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416403950 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416403950 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416379828 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dweiss commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1416181846 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415876855 ## lucene/core/src/test/org/apache/lucene/util/fst/Test2BFSTOffHeap.java: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415871453 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java: ## @@ -316,6 +313,15 @@ public FST doTest() throws IOException { return fst;

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415869383 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,22 +125,44 @@ public class FSTCompiler { final float directAddressingMaxOversizingF

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415666711 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -218,13 +279,19 @@ public Builder allowFixedLengthArcs(boolean allowFixedLengthArcs) {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415641634 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -500,6 +502,12 @@ public FSTMetadata getMetadata() { return metadata; } + /** + * Save

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415627354 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415627354 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415625009 ## lucene/core/src/test/org/apache/lucene/util/fst/TestFSTDataOutputWriter.java: ## @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1415505550 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -153,6 +180,40 @@ private FSTCompiler( } } + // Get the respective FSTReader o

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-05 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1840646081 > > I tested just how much slower the ByteBuffer based store is than the FST's BytesStore: > > I assume this is before the last iteration that does the freeze, is that right? W

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-04 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1839997544 > I tested just how much slower the ByteBuffer based store is than the FST's BytesStore: I assume this is before the last iteration that does the freeze, is that right? What do

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-29 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1832050665 I re-ran the Test2BFST with the new change, it looks much better ``` 1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585] 1> 0...: took 0

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-28 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829633268 > More than two orders-of-magnitude (base 10) slower! I wonder: are there other places in Lucene that might fall prey to this performance trap (calling `toDataInput` frequently

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-28 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829613228 Thanks @dungba88 -- I will catch up with the latest iterations soon. I tested just how much slower the `ByteBuffer` based store is than the FST's `BytesStore`: 9.x: ```

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-28 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829601799 > Tested Test2BFST with -Dtests.seed=D193E7FD4B9E68C4 Duh, I forgot to fix the seed! And the test is indeed random in the inputs it compiles. Sorry for the false alarm :) --

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-28 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-28 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-28 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-28 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829144978 Tested Test2BFST with `-Dtests.seed=D193E7FD4B9E68C4` **mainline** ``` 110: 432584968 RAM bytes used; 432367203 FST bytes; 211082699 nodes; took 248 seconds ```

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828936176 I checked some of the usage in the analysis module. SynonymGraphFilter cache the `BytesReader` on constructor, and I think TokenFilter by default are cached per field? But lots of other

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828839806 Ah I think since we removed the finish(), getting the reverse bytes reader is expectedly slower. We have to copy the bytes to a readonly buffer every time. If this is a problem maybe le

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828597325 Hmm, also oddly -- why do the number of nodes differ between `main` and 9.x? This PR should not have altered how many nodes are created as a function of FST inputs right? Or maybe h

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828590265 Hmm, also the `FSTCompiler.ramBytesUsed()` seems to no longer return the growing FST size: ``` 1> 310: 560 bytes; 594876500 nodes 1> 320: 560 bytes; 614066389

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-27 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828548480 Hmm I'm running `Test2BFSTs` on this patch and noticed it seems to take very much longer during the `TEST: now verify` step where it confirms the built FST accepts all the inputs it j

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-23 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402058230 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-23 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402058230 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-23 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402058230 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-22 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402058230 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-22 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402022525 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-22 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1402017454 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400663661 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -827,22 +910,24 @@ void setEmptyOutput(T v) { } void finish(long newStartNode) { -

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400663661 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -827,22 +910,24 @@ void setEmptyOutput(T v) { } void finish(long newStartNode) { -

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400633005 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400663661 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -827,22 +910,24 @@ void setEmptyOutput(T v) { } void finish(long newStartNode) { -

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400641544 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -248,15 +305,17 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400633005 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400687529 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400638946 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -153,6 +176,34 @@ private FSTCompiler( } } + // Get the respective FSTReader of

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400638946 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -153,6 +176,34 @@ private FSTCompiler( } } + // Get the respective FSTReader of

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400663661 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -827,22 +910,24 @@ void setEmptyOutput(T v) { } void finish(long newStartNode) { -

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400642484 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -277,9 +336,9 @@ public long getMappedStateCount() { return dedupHash == null ? 0 : no

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400641544 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -248,15 +305,17 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400638946 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -153,6 +176,34 @@ private FSTCompiler( } } + // Get the respective FSTReader of

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400634116 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -435,6 +433,13 @@ public FST(FSTMetadata metadata, DataInput in, Outputs outputs, FSTStore f

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400633005 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1400539919 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-21 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-20 Thread via GitHub
dweiss commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1820305577 Package-private is fine, I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-20 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1820094902 > I've taken a look - I think this class should be kept package-private (or even class-private). It is a DataOutput but it serves very specific purposes (with the unusual methods to mov

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-20 Thread via GitHub
dweiss commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1819661097 I've taken a look - I think this class should be kept package-private (or even class-private). It is a DataOutput but it serves very specific purposes (with the unusual methods to move th

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-19 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392197630 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393462969 ## lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java: ## @@ -64,22 +66,13 @@ public FSTStore init(DataInput in, long numBytes) throws IOException {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1813856323 Seems like this PR is getting long, so I spawned 2 PR out of it: - https://github.com/apache/lucene/pull/12814: Simplify `BytesStore` operations (which was changed to GrowableByteArra

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393547261 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393547261 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393461923 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -247,16 +306,14 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393547261 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393462969 ## lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java: ## @@ -64,22 +66,13 @@ public FSTStore init(DataInput in, long numBytes) throws IOException {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393461923 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -247,16 +306,14 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392710843 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392710843 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392647780 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392645763 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392603399 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392598634 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392595332 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392585398 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizin

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391973476 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392201110 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392201110 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

  1   2   >