[GitHub] [lucene] spyk commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue
spyk commented on pull request #380: URL: https://github.com/apache/lucene/pull/380#issuecomment-1061732060 > I'd be willing to take care of adding it, if you'd prefer. The CHANGES.txt file changes often and gets conflicts which, although trivial, can be kind of annoying. At this point it's possible we may have missed the boat for 9.1, so that might also be an argument for me handling figuring out the right place to put the CHANGES.txt, if you're ok with that. Thanks @magibney , yes please, that'd be great if you could. I'd hate to take up your valuable time until I figure the whole process out :) But please let me know if you need anything from me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #735: Fix: typo + +minScore.
msokolov merged pull request #735: URL: https://github.com/apache/lucene/pull/735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10457) LuceneTestCase.createTempDir could randomly return symbolic links
[ https://issues.apache.org/jira/browse/LUCENE-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502990#comment-17502990 ] Robert Muir commented on LUCENE-10457: -- Sorry, a symbolic link is a whole different thing than a file. It has different semantics / platform specifics, etc. A symlink to a directory is not a directory. I'm a hard -1 to randomly doing this from LuceneTestCase. It isn't just fixing a test or two, its the ongoing costs of having confusing tests. It is literally 3 lines of code for you to do this in your symlink-using test. > LuceneTestCase.createTempDir could randomly return symbolic links > - > > Key: LUCENE-10457 > URL: https://issues.apache.org/jira/browse/LUCENE-10457 > Project: Lucene - Core > Issue Type: Task > Components: general/test >Reporter: Mike Drob >Priority: Major > > When we are creating temporary directories to use for other Lucene functions, > we could occasionally provide symbolic links instead of direct references to > directories. If the system running tests doesn't support symbolic links, then > we should ignore this option. > Providing links would be useful to test scenarios for example where users > have a symbolic link for the "current" index directory and then rotate that > over time but applications still use the same link. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10457) LuceneTestCase.createTempDir could randomly return symbolic links
[ https://issues.apache.org/jira/browse/LUCENE-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503016#comment-17503016 ] Mike Drob commented on LUCENE-10457: > Sorry, a symbolic link is a whole different thing than a file. It has > different semantics / platform specifics, etc. A symlink to a directory is > not a directory. In what ways? If you pass a symlink to a directory to {{Files.isDirectory}} it will return true. You have to specifically ask Java for {{Link_Option.NOFOLLOW}} to get the distinction. > I'm a hard -1 to randomly doing this from LuceneTestCase. Like I said earlier, LuceneTestCase isn't the only place that could handle this. Would you be similarly opposed to a FilterFileSystem that does this? > It isn't just fixing a test or two, its the ongoing costs of having confusing > tests. It is literally 3 lines of code for you to do this in your > symlink-using test. Maybe some additional context for my motivation would be helpful? We had an issue with symlinks in Solr; it wasn't a Lucene related problem; we fixed it; everybody is happy and moved on. I started wondering what other places might be susceptible to mishandling symlinks. Instead of trying to individually search for every single place our code that could be affected, I started looking for common areas that might yield high returns. Switching LuceneTestCase.createTempDir to sometimes return symlinks to directories broke exactly three places in Lucene code. Those are easy to reproduce with three lines of code like you suggest. The concern is that I wouldn't have found those three places without a larger, more general change that sprayed symlinks all over the place. And we won't find similar issues in the future, which is the point of the randomized testing framework in the first place. Have a robust testing framework and find issues that we don't even know exist. For example, right now, running {{IndexSplitter +}} doesn't work when {{destDir}} is a symlink to a directory. How did I know? I wasn't looking for IndexSplitter issues. Should it work? I would imagine yes. Or at the very least, give a better error than it currently does. The core indexing and querying library never has to worry about symlinks, sure. But we have plenty of operational tools that interact with the filesystem as defined by users. And we don't control that. > LuceneTestCase.createTempDir could randomly return symbolic links > - > > Key: LUCENE-10457 > URL: https://issues.apache.org/jira/browse/LUCENE-10457 > Project: Lucene - Core > Issue Type: Task > Components: general/test >Reporter: Mike Drob >Priority: Major > > When we are creating temporary directories to use for other Lucene functions, > we could occasionally provide symbolic links instead of direct references to > directories. If the system running tests doesn't support symbolic links, then > we should ignore this option. > Providing links would be useful to test scenarios for example where users > have a symbolic link for the "current" index directory and then rotate that > over time but applications still use the same link. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] magibney merged pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue
magibney merged pull request #380: URL: https://github.com/apache/lucene/pull/380 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10171) Caching issue on dictionary-based OpenNLPLemmatizerFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503082#comment-17503082 ] ASF subversion and git services commented on LUCENE-10171: -- Commit 8afec33e747ec81c2301a4b099bd26b4195a556e in lucene's branch refs/heads/main from Spyros Kapnissis [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8afec33 ] LUCENE-10171: OpenNLPOpsFactory should directly cache DictionaryLemmatizer objects (#380) Instead of caching dictionary strings and building multiple redundant DictionaryLemmatizer objects. Co-authored-by: Michael Gibney > Caching issue on dictionary-based OpenNLPLemmatizerFilterFactory > > > Key: LUCENE-10171 > URL: https://issues.apache.org/jira/browse/LUCENE-10171 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 9.0, 7.7.3, 8.10 >Reporter: Spyros Kapnissis >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > When providing a lemmas.txt dictionary file, OpenNLPLemmatizerFilterFactory > caches internally only the string format of the dictionary, and not the > DictionaryLemmatizer object. This results in parsing and creating a new > DictionaryLemmatizer object every time the > OpenNLPLemmatizerFilterFactory.create() is called. > In our case, with a large lemmas.txt file (5MB) and the > OpenNLPLemmatizerFilter used in many fields across our setup and in multiple > collections (we use Solr), we had several random OOM issues and generally > high server load due to GC activity. After heap dump analysis we noticed few > thousands of DictionaryLemmatizer instances of around 80MB each. > By switching the caching to the DictionaryLemmatizer instead of the String, > we were able to resolve these issues. I will be attaching a PR for review, > please let me know of any comments. > Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue
magibney commented on pull request #380: URL: https://github.com/apache/lucene/pull/380#issuecomment-1062061030 Merged to `main` and backported to `branch_9x`; this should be available as of release 9.1. Thanks @spyk! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10171) Caching issue on dictionary-based OpenNLPLemmatizerFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503097#comment-17503097 ] ASF subversion and git services commented on LUCENE-10171: -- Commit a03306724664be1a5f4047d845736d85b03ddc41 in lucene's branch refs/heads/branch_9x from Spyros Kapnissis [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a033067 ] LUCENE-10171: OpenNLPOpsFactory should directly cache DictionaryLemmatizer objects (#380) Instead of caching dictionary strings and building multiple redundant DictionaryLemmatizer objects. Co-authored-by: Michael Gibney > Caching issue on dictionary-based OpenNLPLemmatizerFilterFactory > > > Key: LUCENE-10171 > URL: https://issues.apache.org/jira/browse/LUCENE-10171 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 9.0, 7.7.3, 8.10 >Reporter: Spyros Kapnissis >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > When providing a lemmas.txt dictionary file, OpenNLPLemmatizerFilterFactory > caches internally only the string format of the dictionary, and not the > DictionaryLemmatizer object. This results in parsing and creating a new > DictionaryLemmatizer object every time the > OpenNLPLemmatizerFilterFactory.create() is called. > In our case, with a large lemmas.txt file (5MB) and the > OpenNLPLemmatizerFilter used in many fields across our setup and in multiple > collections (we use Solr), we had several random OOM issues and generally > high server load due to GC activity. After heap dump analysis we noticed few > thousands of DictionaryLemmatizer instances of around 80MB each. > By switching the caching to the DictionaryLemmatizer instead of the String, > we were able to resolve these issues. I will be attaching a PR for review, > please let me know of any comments. > Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10171) Caching issue on dictionary-based OpenNLPLemmatizerFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gibney resolved LUCENE-10171. - Fix Version/s: 9.1 Lucene Fields: (was: New,Patch Available) Assignee: Michael Gibney Resolution: Fixed > Caching issue on dictionary-based OpenNLPLemmatizerFilterFactory > > > Key: LUCENE-10171 > URL: https://issues.apache.org/jira/browse/LUCENE-10171 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 9.0, 7.7.3, 8.10 >Reporter: Spyros Kapnissis >Assignee: Michael Gibney >Priority: Major > Fix For: 9.1 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When providing a lemmas.txt dictionary file, OpenNLPLemmatizerFilterFactory > caches internally only the string format of the dictionary, and not the > DictionaryLemmatizer object. This results in parsing and creating a new > DictionaryLemmatizer object every time the > OpenNLPLemmatizerFilterFactory.create() is called. > In our case, with a large lemmas.txt file (5MB) and the > OpenNLPLemmatizerFilter used in many fields across our setup and in multiple > collections (we use Solr), we had several random OOM issues and generally > high server load due to GC activity. After heap dump analysis we noticed few > thousands of DictionaryLemmatizer instances of around 80MB each. > By switching the caching to the DictionaryLemmatizer instead of the String, > we were able to resolve these issues. I will be attaching a PR for review, > please let me know of any comments. > Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10371) Make IndexRearranger able to arrange segment into a determined order
[ https://issues.apache.org/jira/browse/LUCENE-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-10371: -- Fix Version/s: 9.1 > Make IndexRearranger able to arrange segment into a determined order > > > Key: LUCENE-10371 > URL: https://issues.apache.org/jira/browse/LUCENE-10371 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Patrick Zhai >Priority: Minor > Fix For: 9.1 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Previously when I tried to make change to luceneutil to let it use > {{IndexRearranger}} for a faster deterministic index construction, I found > that even each segment contains the same documents set, the order of segments > will impact the estimated hit number (using BMW): > [https://markmail.org/message/zl6zsqvbg7nwfq6w] > At that time the discussion tend to tolerant the small hit count difference > to resolve the issue, after some time when I discuss this issue again with > [~mikemccand] , we thought it might also be a good idea to just add ability > of rearranging the segments order to {{IndexRearranger}}, so that we can > ensure each time the rearranged index is truly the same. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10236) CombinedFieldsQuery to use fieldAndWeights.values() when constructing MultiNormsLeafSimScorer for scoring
[ https://issues.apache.org/jira/browse/LUCENE-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-10236: -- Fix Version/s: 9.1 > CombinedFieldsQuery to use fieldAndWeights.values() when constructing > MultiNormsLeafSimScorer for scoring > - > > Key: LUCENE-10236 > URL: https://issues.apache.org/jira/browse/LUCENE-10236 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/sandbox >Reporter: Zach Chen >Assignee: Zach Chen >Priority: Minor > Fix For: 9.1 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > This is a spin-off issue from discussion in > [https://github.com/apache/lucene/pull/418#issuecomment-967790816], for a > quick fix in CombinedFieldsQuery scoring. > Currently CombinedFieldsQuery would use a constructed > [fields|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421] > object to create a MultiNormsLeafSimScorer for scoring, but the fields > object may contain duplicated field-weight pairs as it is [built from looping > over > fieldTerms|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414], > resulting into duplicated norms being added during scoring calculation in > MultiNormsLeafSimScorer. > E.g. for CombinedFieldsQuery with two fields and two values matching a > particular doc: > {code:java} > CombinedFieldQuery query = > new CombinedFieldQuery.Builder() > .addField("field1", (float) 1.0) > .addField("field2", (float) 1.0) > .addTerm(new BytesRef("foo")) > .addTerm(new BytesRef("zoo")) > .build(); {code} > I would imagine the scoring to be based on the following: > # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + > freq(field1:zoo) + freq(field2:zoo) > # Sum of norms on doc = norm(field1) + norm(field2) > but the current logic would use the following for scoring: > # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + > freq(field1:zoo) + freq(field2:zoo) > # Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + > norm(field2) > > In addition, this differs from how MultiNormsLeafSimScorer is constructed > from CombinedFieldsQuery explain function, which [uses > fieldAndWeights.values()|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389] > and does not contain duplicated field-weight pairs. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10339) Add require derective for jdk.management to core's module descriptor.
[ https://issues.apache.org/jira/browse/LUCENE-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-10339: -- Fix Version/s: 9.1 10.0 (main) > Add require derective for jdk.management to core's module descriptor. > - > > Key: LUCENE-10339 > URL: https://issues.apache.org/jira/browse/LUCENE-10339 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > Fix For: 9.1, 10.0 (main) > > Attachments: Screenshot from 2021-12-25 18-04-55.png, Screenshot from > 2021-12-25 20-11-38.png > > Time Spent: 1h > Remaining Estimate: 0h > > IntelliJ IDEA warns as follows in o.a.l.u.RamUsageEstimator. > !Screenshot from 2021-12-25 18-04-55.png! > !Screenshot from 2021-12-25 20-11-38.png! > This means we would need "require" or "require static" directive(s) in core's > module descriptor to make RamUsageEstimator correctly work. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822174425 ## File path: lucene/core/src/java/org/apache/lucene/index/FieldInfos.java ## @@ -352,6 +352,14 @@ public FieldDimensions(int dimensionCount, int indexDimensionCount, int dimensio this.softDeletesFieldName = softDeletesFieldName; } +public void verifyFieldInfo(FieldInfo fi) { Review comment: This method makes sure that incoming index fields are compatible with the destination index, e.g. vectors have the same dimensions and use the same similarity function. Changed access to package private. I retained method name - `verifyFieldInfo()`, as it internally calls `verifySoftDeletedFieldName` and `verifySameSchema`, which have a similar naming style. I can change it if `assert*()` is more in line with the Lucene convention. In which case, I also think that it should only return a condition and the asserts should be done by the caller.. Let me know.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822174617 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3121,147 +3125,265 @@ private void validateMergeReader(CodecReader leaf) { */ public long addIndexes(CodecReader... readers) throws IOException { ensureOpen(); - -// long so we can detect int overflow: Review comment: I think it was a miss. Added it back, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822175645 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3121,147 +3125,265 @@ private void validateMergeReader(CodecReader leaf) { */ public long addIndexes(CodecReader... readers) throws IOException { ensureOpen(); - -// long so we can detect int overflow: -long numDocs = 0; long seqNo; -try { - if (infoStream.isEnabled("IW")) { -infoStream.message("IW", "flush at addIndexes(CodecReader...)"); - } - flush(false, true); +long numDocs = 0; +final int mergeTimeoutInSeconds = 600; - String mergedName = newSegmentName(); - int numSoftDeleted = 0; - for (CodecReader leaf : readers) { -numDocs += leaf.numDocs(); +try { + // Best effort up front validations + for (CodecReader leaf: readers) { validateMergeReader(leaf); -if (softDeletesEnabled) { - Bits liveDocs = leaf.getLiveDocs(); - numSoftDeleted += - PendingSoftDeletes.countSoftDeletes( - DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator( - config.getSoftDeletesField(), leaf), - liveDocs); +for (FieldInfo fi: leaf.getFieldInfos()) { + globalFieldNumberMap.verifyFieldInfo(fi); } +numDocs += leaf.numDocs(); } - - // Best-effort up front check: testReserveDocs(numDocs); - final IOContext context = - new IOContext( - new MergeInfo(Math.toIntExact(numDocs), -1, false, UNBOUNDED_MAX_MERGE_SEGMENTS)); + synchronized (this) { +ensureOpen(); +if (merges.areEnabled() == false) { + throw new UnsupportedOperationException("Merges are disabled on current writer. " + Review comment: Changed to AlreadyClosedException. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822184400 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3121,147 +3125,265 @@ private void validateMergeReader(CodecReader leaf) { */ public long addIndexes(CodecReader... readers) throws IOException { ensureOpen(); - -// long so we can detect int overflow: -long numDocs = 0; long seqNo; -try { - if (infoStream.isEnabled("IW")) { -infoStream.message("IW", "flush at addIndexes(CodecReader...)"); - } - flush(false, true); +long numDocs = 0; +final int mergeTimeoutInSeconds = 600; - String mergedName = newSegmentName(); - int numSoftDeleted = 0; - for (CodecReader leaf : readers) { -numDocs += leaf.numDocs(); +try { + // Best effort up front validations + for (CodecReader leaf: readers) { validateMergeReader(leaf); -if (softDeletesEnabled) { - Bits liveDocs = leaf.getLiveDocs(); - numSoftDeleted += - PendingSoftDeletes.countSoftDeletes( - DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator( - config.getSoftDeletesField(), leaf), - liveDocs); +for (FieldInfo fi: leaf.getFieldInfos()) { + globalFieldNumberMap.verifyFieldInfo(fi); } +numDocs += leaf.numDocs(); } - - // Best-effort up front check: testReserveDocs(numDocs); - final IOContext context = - new IOContext( - new MergeInfo(Math.toIntExact(numDocs), -1, false, UNBOUNDED_MAX_MERGE_SEGMENTS)); + synchronized (this) { +ensureOpen(); +if (merges.areEnabled() == false) { + throw new UnsupportedOperationException("Merges are disabled on current writer. " + +"Cannot execute addIndexes(CodecReaders...) API"); +} + } + + MergePolicy mergePolicy = config.getMergePolicy(); + MergePolicy.MergeSpecification spec = mergePolicy.findMerges(Arrays.asList(readers)); Review comment: We allow for cascading to proceed in background by calling a `maybeMerge()` at the end of `addIndexes()`. For `addIndexes(CodecReader...)`, we can only add segments to IW after the readers have gone through `SegmentMerger.merge()`. Until then, we don't have any segments to add, (unlike the `addIndexes(Directory...)` variant, where we already have segments provided in the input). So we wait for the first round of merges to complete, by blocking the API on the merges defined in the MergeSpec. Once these merges complete, we add segments to IW's SegmentInfos, and invoke a maybeMerge(), which will cascade any merges on top of these segments in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822184400 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3121,147 +3125,265 @@ private void validateMergeReader(CodecReader leaf) { */ public long addIndexes(CodecReader... readers) throws IOException { ensureOpen(); - -// long so we can detect int overflow: -long numDocs = 0; long seqNo; -try { - if (infoStream.isEnabled("IW")) { -infoStream.message("IW", "flush at addIndexes(CodecReader...)"); - } - flush(false, true); +long numDocs = 0; +final int mergeTimeoutInSeconds = 600; - String mergedName = newSegmentName(); - int numSoftDeleted = 0; - for (CodecReader leaf : readers) { -numDocs += leaf.numDocs(); +try { + // Best effort up front validations + for (CodecReader leaf: readers) { validateMergeReader(leaf); -if (softDeletesEnabled) { - Bits liveDocs = leaf.getLiveDocs(); - numSoftDeleted += - PendingSoftDeletes.countSoftDeletes( - DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator( - config.getSoftDeletesField(), leaf), - liveDocs); +for (FieldInfo fi: leaf.getFieldInfos()) { + globalFieldNumberMap.verifyFieldInfo(fi); } +numDocs += leaf.numDocs(); } - - // Best-effort up front check: testReserveDocs(numDocs); - final IOContext context = - new IOContext( - new MergeInfo(Math.toIntExact(numDocs), -1, false, UNBOUNDED_MAX_MERGE_SEGMENTS)); + synchronized (this) { +ensureOpen(); +if (merges.areEnabled() == false) { + throw new UnsupportedOperationException("Merges are disabled on current writer. " + +"Cannot execute addIndexes(CodecReaders...) API"); +} + } + + MergePolicy mergePolicy = config.getMergePolicy(); + MergePolicy.MergeSpecification spec = mergePolicy.findMerges(Arrays.asList(readers)); Review comment: We allow for cascading to proceed in background by calling a `maybeMerge()` at the end of `addIndexes()`. For `addIndexes(CodecReader...)`, we can only add segments to IW after the readers have gone through `SegmentMerger.merge()`. Until then, we don't have any segments to add, (unlike the `addIndexes(Directory...)` variant, where we already have segments provided in the input). So we wait for the first round of merges to complete, by blocking the API on the merges defined in the MergeSpec. Once these merges complete, we add segments to IW's SegmentInfos, and invoke a maybeMerge(), which will cascade any merges on top of these segments in background -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822185980 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3121,147 +3125,265 @@ private void validateMergeReader(CodecReader leaf) { */ public long addIndexes(CodecReader... readers) throws IOException { ensureOpen(); - -// long so we can detect int overflow: -long numDocs = 0; long seqNo; -try { - if (infoStream.isEnabled("IW")) { -infoStream.message("IW", "flush at addIndexes(CodecReader...)"); - } - flush(false, true); +long numDocs = 0; +final int mergeTimeoutInSeconds = 600; - String mergedName = newSegmentName(); - int numSoftDeleted = 0; - for (CodecReader leaf : readers) { -numDocs += leaf.numDocs(); +try { + // Best effort up front validations + for (CodecReader leaf: readers) { validateMergeReader(leaf); -if (softDeletesEnabled) { - Bits liveDocs = leaf.getLiveDocs(); - numSoftDeleted += - PendingSoftDeletes.countSoftDeletes( - DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator( - config.getSoftDeletesField(), leaf), - liveDocs); +for (FieldInfo fi: leaf.getFieldInfos()) { + globalFieldNumberMap.verifyFieldInfo(fi); } +numDocs += leaf.numDocs(); } - - // Best-effort up front check: testReserveDocs(numDocs); - final IOContext context = - new IOContext( - new MergeInfo(Math.toIntExact(numDocs), -1, false, UNBOUNDED_MAX_MERGE_SEGMENTS)); + synchronized (this) { +ensureOpen(); +if (merges.areEnabled() == false) { + throw new UnsupportedOperationException("Merges are disabled on current writer. " + +"Cannot execute addIndexes(CodecReaders...) API"); +} + } + + MergePolicy mergePolicy = config.getMergePolicy(); + MergePolicy.MergeSpecification spec = mergePolicy.findMerges(Arrays.asList(readers)); + boolean mergesComplete = false; + if (spec != null && spec.merges.size() > 0) { +try { + spec.merges.forEach(addIndexesMergeSource::registerMerge); + mergeScheduler.merge(addIndexesMergeSource, MergeTrigger.ADD_INDEXES); + mergesComplete = spec.await(); +} finally { + if (!mergesComplete) { +// nocommit -- ensure all intermediate files are deleted +for (MergePolicy.OneMerge merge: spec.merges) { + deleteNewFiles(merge.getMergeInfo().files()); +} + } +} + } - // TODO: somehow we should fix this merge so it's - // abortable so that IW.close(false) is able to stop it - TrackingDirectoryWrapper trackingDir = new TrackingDirectoryWrapper(directory); - Codec codec = config.getCodec(); - // We set the min version to null for now, it will be set later by SegmentMerger - SegmentInfo info = - new SegmentInfo( - directoryOrig, - Version.LATEST, - null, - mergedName, - -1, - false, - codec, - Collections.emptyMap(), - StringHelper.randomId(), - Collections.emptyMap(), - config.getIndexSort()); - - SegmentMerger merger = - new SegmentMerger( - Arrays.asList(readers), info, infoStream, trackingDir, globalFieldNumberMap, context); + if (mergesComplete) { +List infos = new ArrayList<>(); +long totalMaxDoc = 0; +for (MergePolicy.OneMerge merge: spec.merges) { + totalMaxDoc += merge.totalMaxDoc; + if (merge.getMergeInfo() != null) { +infos.add(merge.getMergeInfo()); + } +} - if (!merger.shouldMerge()) { -return docWriter.getNextSequenceNumber(); +// nocommit -- add tests for this transactional behavior +synchronized (this) { + if (infos.isEmpty() == false) { +boolean success = false; +try { + ensureOpen(); + // Reserve the docs, just before we update SIS: + reserveDocs(totalMaxDoc); + success = true; +} finally { + if (!success) { +for (SegmentCommitInfo sipc : infos) { + // Safe: these files must exist + deleteNewFiles(sipc.files()); +} + } +} +segmentInfos.addAll(infos); +checkpoint(); + } + seqNo = docWriter.getNextSequenceNumber(); +} + } else { +// We should normally not reach here, as an earlier call should throw an exception. +throw new MergePolicy.MergeException("Could not complete merg
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822189054 ## File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java ## @@ -567,6 +605,21 @@ public abstract MergeSpecification findMerges( MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergeContext mergeContext) throws IOException; + /** + * Define {@link OneMerge} operations for a list of codec readers. This call is used to + * define merges for input readers in {@link IndexWriter#addIndexes(CodecReader...)}. + * Default implementation adds all readers to a single merge. This can be overridden in custom + * merge policies. + * + * @param readers set of readers to merge into the main index + */ + public MergeSpecification findMerges(List readers) throws IOException { +OneMerge merge = new OneMerge(readers, leaf -> new MergeReader(leaf, leaf.getLiveDocs())); Review comment: Yes, this change will modify addIndexes to leverage `MergePolicy` and `MergeScheduler`, without changing its end behavior. I am thinking of doing a follow up PR with a merge policy that does a 1:1 addIndexes call, where the merger creates one segment each for every provided reader. It would create a faster, more concurrent `addIndexes(CodecReader...)`, at the cost of deferring some merges to be done later in background. Which, I believe is similar to the behavior in `addIndexes(Directory...)` - all incoming segments are simply added to IW, and any merging happens in later in background. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822190688 ## File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java ## @@ -813,12 +866,24 @@ protected final boolean verbose(MergeContext mergeContext) { } static final class MergeReader { +final CodecReader codecReader; final SegmentReader reader; Review comment: So far, I see `SegmentReader` being used only in `ReadersAndUpdates` [here](https://github.com/apache/lucene/blob/main/lucene%2Fcore%2Fsrc%2Fjava%2Forg%2Fapache%2Flucene%2Findex%2FReadersAndUpdates.java#L786-L793). I am not really sure if `createNewReaderWithLatestLiveDocs()` can also work with a CodecReader.. Will look into this more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on a change in pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on a change in pull request #633: URL: https://github.com/apache/lucene/pull/633#discussion_r822195662 ## File path: lucene/core/src/java/org/apache/lucene/index/NoMergePolicy.java ## @@ -39,6 +40,11 @@ public MergeSpecification findMerges( return null; } + @Override + public MergeSpecification findMerges(List readers) throws IOException { +return null; Review comment: In my current impl, addIndexes simply completes without adding anything if the API returns null. I now realize, this is misleading for the user. Callers of addIndexes may not know about the configured merge policy. I should probably throw an exception if I get a null here? On similar lines, what should we do if the API call had some CodecReaders, but findMerges returns an empty spec.? Should we error out? (Will add a log for this either way.) Re: letting the API return null, I see that other variants of `findMerges` in this policy, all return `null`, and all callers of findMerges seem to be doing the null checks. Personally, I prefer disallowing the null return values, but I wanted to keep this in sync with the other variants. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on pull request #633: [WIP] LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.
vigyasharma commented on pull request #633: URL: https://github.com/apache/lucene/pull/633#issuecomment-1062441177 Thanks for reviewing this @mikemccand ! I've addressed the ones that had minor changes, and am working on the larger ones. > What remains to bring this out of WIP? There are 2 existing tests that are breaking: ```java - org.apache.lucene.index.TestAddIndexes.testAddIndicesWithSoftDeletes (:lucene:core) - org.apache.lucene.index.TestIndexSorting.testAddIndexesWithDeletions (:lucene:core) ``` I've been looking at `testAddIndicesWithSoftDeletes`. IW does a `rollbackInternal()` on shutDown, where it is failing on this assert - `assert pendingNumDocs.get() == segmentInfos.totalMaxDoc()`. I'm guessing my code is reserving more docs than it should.. is that the right direction? Here's a longer error stack trace - ```java java.lang.AssertionError: pendingNumDocs 7 != 5 totalMaxDoc at __randomizedtesting.SeedInfo.seed([63C0554849BFDEC5:4F94AA0CFCBAFADD]:0) at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2428) at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1334) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1362) at org.apache.lucene.util.IOUtils.close(IOUtils.java:85) at org.apache.lucene.util.IOUtils.close(IOUtils.java:72) at org.apache.lucene.index.TestAddIndexes.testAddIndicesWithSoftDeletes(TestAddIndexes.java:1521) ``` Outside of fixing existing tests, I want to add some new tests, especially around cases where one of the merges fails, causing the whole API to fail (i.e. test that transactionality was retained). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10448) MergeRateLimiter doesn't always limit instant rate.
[ https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503238#comment-17503238 ] kkewwei commented on LUCENE-10448: -- In our product, the instant rate of writing is 200mb/s in a few cases(the statistics collection is the above method, there should be no problem), which is far more than the limited rate, In that case, we should pause 10ms+, but the pause is ignored. [~jpountz][~vigyas], very looking forward to your confirmation, if you think it is not a bug, I will close the issue. > MergeRateLimiter doesn't always limit instant rate. > --- > > Key: LUCENE-10448 > URL: https://issues.apache.org/jira/browse/LUCENE-10448 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 8.11.1 >Reporter: kkewwei >Priority: Major > > We can see the code in *MergeRateLimiter*: > {code:java} > private long maybePause(long bytes, long curNS) throws > MergePolicy.MergeAbortedException { > > double rate = mbPerSec; > double secondsToPause = (bytes / 1024. / 1024.) / rate; > long targetNS = lastNS + (long) (10 * secondsToPause); > long curPauseNS = targetNS - curNS; > // We don't bother with thread pausing if the pause is smaller than 2 > msec. > if (curPauseNS <= MIN_PAUSE_NS) { > // Set to curNS, not targetNS, to enforce the instant rate, not > // the "averaged over all history" rate: > lastNS = curNS; > return -1; > } >.. > } > {code} > If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, > then the *maybePause* is called in 7:05 again, so the value of > *targetNS=lastNS + (long) (10 * secondsToPause)* must be smaller than > *curNS*, no matter how big the bytes is, we will return -1 and ignore to > pause. > I count the total times(callTimes) calling *maybePause* and ignored pause > times(ignorePauseTimes) and detail ignored bytes(detailBytes): > {code:java} > [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] > [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 > docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec > throttle], [callTimes=857], [ignorePauseTimes=25], [detailBytes(mb) = > [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, > 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, > 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, > 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]] > {code} > There are 857 times calling *maybePause*, including 25 times which is ignored > to pause, we can see that the ignored detail bytes (such as 0.28125mb) are > not small. > As long as the interval between two *maybePause* calls is relatively long, > the pause action that should be executed will not be executed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10457) LuceneTestCase.createTempDir could randomly return symbolic links
[ https://issues.apache.org/jira/browse/LUCENE-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503290#comment-17503290 ] Robert Muir commented on LUCENE-10457: -- {quote} In what ways? If you pass a symlink to a directory to Files.isDirectory it will return true. You have to specifically ask Java for Link_Option.NOFOLLOW to get the distinction. {quote} I don't care what java does or doesn't do. It is so different to the OS, and we don't use it, so I don't want it confusing our tests. You can start with reading relevant manpages in posix or linux to get an idea of what kinds of issues you might run into: with any program, no need to think about java. These kinds of issues always stand out: I'm literally just pasting summaries of the shit listed in {{symlink(4)}}, which is just the introduction. * can contain relative pointers * can point to nothing at all (dangling link) * interact with DAC permissions in strange ways * have zero error checking when creating them (makes dangling links easy) * interact with file deletion in strange ways I'm not even gonna think about issues around lower-level OS issues such as synchronization to disk, where things always get hairy. Then on the java side, we can imagine how all this crap translates: * stuff like "getting FileStore to a symlink" will probably never work correctly. * stupid methods like isReadable/isWritable might be buggy. * dangling links may cause chaos across the APIs. Sounds like you used symlinks without considering these issues and you got the punishment you deserved :) Instead of thinking "whoah, we got bit because we made a bad design, let's rethink that", instead now you are trying to insert symlink tests into all of lucene's tests (including stuff like build tools). And randomly! I'm -1 because I know symlinks are different and I know the problems of symlinks and have suffered through them before. There is no need for us to do this because Solr made a bad decision. > LuceneTestCase.createTempDir could randomly return symbolic links > - > > Key: LUCENE-10457 > URL: https://issues.apache.org/jira/browse/LUCENE-10457 > Project: Lucene - Core > Issue Type: Task > Components: general/test >Reporter: Mike Drob >Priority: Major > > When we are creating temporary directories to use for other Lucene functions, > we could occasionally provide symbolic links instead of direct references to > directories. If the system running tests doesn't support symbolic links, then > we should ignore this option. > Providing links would be useful to test scenarios for example where users > have a symbolic link for the "current" index directory and then rotate that > over time but applications still use the same link. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10457) LuceneTestCase.createTempDir could randomly return symbolic links
[ https://issues.apache.org/jira/browse/LUCENE-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503322#comment-17503322 ] Dawid Weiss commented on LUCENE-10457: -- On the one hand I'm with Mike – it should really work, symlink in path or not. On the other hang - I confirm what Robert says: on Windows links in the file system are even more complicated (try googling for reparse point, junctions)... It may be very difficult to make this randomized. Perhaps an explicit method createTempDirOrLink() would be better - this can be used explicitly from tests that cover tools and utilities (and not the core). > LuceneTestCase.createTempDir could randomly return symbolic links > - > > Key: LUCENE-10457 > URL: https://issues.apache.org/jira/browse/LUCENE-10457 > Project: Lucene - Core > Issue Type: Task > Components: general/test >Reporter: Mike Drob >Priority: Major > > When we are creating temporary directories to use for other Lucene functions, > we could occasionally provide symbolic links instead of direct references to > directories. If the system running tests doesn't support symbolic links, then > we should ignore this option. > Providing links would be useful to test scenarios for example where users > have a symbolic link for the "current" index directory and then rotate that > over time but applications still use the same link. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10457) LuceneTestCase.createTempDir could randomly return symbolic links
[ https://issues.apache.org/jira/browse/LUCENE-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503322#comment-17503322 ] Dawid Weiss edited comment on LUCENE-10457 at 3/9/22, 6:32 AM: --- On the one hand I'm with Mike – it should really work, symlink in path or not. On the other hand - I confirm what Robert says: on Windows links in the file system are even more complicated (try googling for reparse point, junctions)... It may be very difficult to make this randomized. Perhaps an explicit method createTempDirOrLink() would be better - this can be used explicitly from tests that cover tools and utilities (and not the core). was (Author: dweiss): On the one hand I'm with Mike – it should really work, symlink in path or not. On the other hang - I confirm what Robert says: on Windows links in the file system are even more complicated (try googling for reparse point, junctions)... It may be very difficult to make this randomized. Perhaps an explicit method createTempDirOrLink() would be better - this can be used explicitly from tests that cover tools and utilities (and not the core). > LuceneTestCase.createTempDir could randomly return symbolic links > - > > Key: LUCENE-10457 > URL: https://issues.apache.org/jira/browse/LUCENE-10457 > Project: Lucene - Core > Issue Type: Task > Components: general/test >Reporter: Mike Drob >Priority: Major > > When we are creating temporary directories to use for other Lucene functions, > we could occasionally provide symbolic links instead of direct references to > directories. If the system running tests doesn't support symbolic links, then > we should ignore this option. > Providing links would be useful to test scenarios for example where users > have a symbolic link for the "current" index directory and then rotate that > over time but applications still use the same link. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r822356271 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SearcherManager; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex extends QueryIndex { + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.queries = new HashMap<>(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); +this.populateQueryCache(serializer, decomposer); + } + + @Override + public void commit(List updates) throws IOException { +throw new IllegalStateException("Monitor is readOnly cannot commit"); + } + + @Override + long search(final Query query, QueryCollector matcher) throws IOException { +QueryBuilder builder = termFilter -> query; +return search(builder, matcher); + } + + @Override + public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws IOException { +IndexSearcher searcher = null; +try { + searcher = manager.acquire(); + return searchInMemory(queryBuilder, matcher, searcher, this.queries); +} finally { + if (searcher != null) { +manager.release(searcher); + } +} + } + + @Override + public void purgeCache() throws IOException { +this.populateQueryCache(serializer, decomposer); +lastPurged = System.nanoTime(); + } + + @Override + void purgeCache(CachePopulator populator) throws IOException { +manager.maybeRefresh(); Review comment: @romseygeek didi had a chanche to look out the new commits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r822356271 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SearcherManager; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex extends QueryIndex { + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.queries = new HashMap<>(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); +this.populateQueryCache(serializer, decomposer); + } + + @Override + public void commit(List updates) throws IOException { +throw new IllegalStateException("Monitor is readOnly cannot commit"); + } + + @Override + long search(final Query query, QueryCollector matcher) throws IOException { +QueryBuilder builder = termFilter -> query; +return search(builder, matcher); + } + + @Override + public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws IOException { +IndexSearcher searcher = null; +try { + searcher = manager.acquire(); + return searchInMemory(queryBuilder, matcher, searcher, this.queries); +} finally { + if (searcher != null) { +manager.release(searcher); + } +} + } + + @Override + public void purgeCache() throws IOException { +this.populateQueryCache(serializer, decomposer); +lastPurged = System.nanoTime(); + } + + @Override + void purgeCache(CachePopulator populator) throws IOException { +manager.maybeRefresh(); Review comment: @romseygeek did you had a chanche to look out the new commits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org