mikemccand commented on code in PR #12797:
URL: https://github.com/apache/lucene/pull/12797#discussion_r1400425827


##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -442,19 +442,19 @@ public void close() throws IOException {
     IOUtils.close(writeLock);
   }
 
-  private boolean doSlowChecks;
+  private int level;
 
   /**
-   * If true, additional slow checks are performed. This will likely 
drastically increase time it
-   * takes to run CheckIndex!
+   * Sets Level, the higher the value, the more additional checks are 
performed. This will likely
+   * drastically increase time it takes to run CheckIndex! See {@link Level}
    */
-  public void setDoSlowChecks(boolean v) {
-    doSlowChecks = v;
+  public void setLevel(int v) {
+    level = v;

Review Comment:
   Hmm could we `throw IllegalArgumentException` if the limit is too low or 
high?  I get annoyed with tools that tell me "just add another `-v` for more 
verbosity" but don't tell me when I can/should stop :)



##########
lucene/MIGRATE.md:
##########
@@ -101,6 +101,13 @@ The deprecated getter for the `Executor` that was 
optionally provided to the `In
 has been removed. Users that want to execute concurrent tasks should rely 
instead on the `TaskExecutor` 
 that the searcher holds, retrieved via `IndexSearcher#getTaskExecutor`.
 
+### CheckIndex params -slow and -fast are deprecated, replaced by -level X 
(GITHUB#11023)
+
+The `CheckIndex` former `-fast` behaviour of performing checksum checks only, 
is now the default.
+Added a new parameter: `-level X`, to set the detail level of the index check. 
The higher the value, the more checks are performed.
+Sample `-level` usage: `1` (Default) - Checksum checks only, `2` - all level 1 
checks as well as logical integrity checks, `3` - all
+level 2 checks as well as slow checks.

Review Comment:
   Are valid levels 1 .. 3 inclusive?  Does it go to 4?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -3744,7 +3718,7 @@ public static Status.TermVectorStatus testTermVectors(
                         + " but FieldInfo has storeTermVector=false");
               }
 
-              if (doSlowChecks) {
+              if (level >= Level.MIN_LEVEL_FOR_SLOW_CHECKS) {
                 Terms terms = tfv.terms(field);

Review Comment:
   Oh I see -- we are already doing the cross posting/TVs check for 
`MIN_LEVEL_FOR_SLOW_CHECKS`.  I think we can remove the above `TODO` now?  You 
did it!
   
   Edit: hmm I'm confused -- were we already cross-checking when `doSlowChecks 
= true` before?  Is this the only thing enabled by slow checks?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -4127,15 +4123,33 @@ public static Options parseOptions(String[] args) {
     int i = 0;
     while (i < args.length) {
       String arg = args[i];
-      if ("-fast".equals(arg)) {
-        opts.doChecksumsOnly = true;
+      if ("-level".equals(arg)) {
+        if (i == args.length - 1) {
+          throw new IllegalArgumentException("ERROR: missing value for -level 
option");
+        }
+        i++;
+        int level = Integer.parseInt(args[i]);
+        if (level < Level.MIN_VALUE || level > Level.MAX_VALUE) {
+          throw new IllegalArgumentException(
+              String.format(
+                  "ERROR: value for -level option is out of bounds. Please use 
a value "
+                      + "from '%d'->'%d'",
+                  Level.MIN_VALUE, Level.MAX_VALUE));
+        }
+        opts.level = level;
+      } else if ("-fast".equals(arg)) {
+        // Deprecated. Remove in Lucene 11.
+        System.err.println("-fast is deprecated, verifying file checksums only 
is now the default");

Review Comment:
   For full disambiguation, could you add `-level 0` into the message making it 
clear that is the equivalent if what `-fast` used to be and is now the default?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -4213,10 +4230,6 @@ public static Options parseOptions(String[] args) {
       throw new IllegalArgumentException("ERROR: cannot specify both -exorcise 
and -segment");
     }
 
-    if (opts.doChecksumsOnly && opts.doSlowChecks) {
-      throw new IllegalArgumentException("ERROR: cannot specify both -fast and 
-slow");

Review Comment:
   Nice -- a level is so much better!



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -4127,15 +4123,33 @@ public static Options parseOptions(String[] args) {
     int i = 0;
     while (i < args.length) {
       String arg = args[i];
-      if ("-fast".equals(arg)) {
-        opts.doChecksumsOnly = true;
+      if ("-level".equals(arg)) {
+        if (i == args.length - 1) {
+          throw new IllegalArgumentException("ERROR: missing value for -level 
option");
+        }
+        i++;
+        int level = Integer.parseInt(args[i]);
+        if (level < Level.MIN_VALUE || level > Level.MAX_VALUE) {
+          throw new IllegalArgumentException(
+              String.format(
+                  "ERROR: value for -level option is out of bounds. Please use 
a value "
+                      + "from '%d'->'%d'",
+                  Level.MIN_VALUE, Level.MAX_VALUE));

Review Comment:
   Awesome, thanks for catching this entry point (command-line).  I think we 
should catch invalid API calls too?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -3661,7 +3640,7 @@ private static void checkDocValues(
    */
   public static Status.TermVectorStatus testTermVectors(CodecReader reader, 
PrintStream infoStream)
       throws IOException {
-    return testTermVectors(reader, infoStream, false, false, false);
+    return testTermVectors(reader, infoStream, false, 
Level.MIN_LEVEL_FOR_SLOW_CHECKS, false);

Review Comment:
   Hmm did we intentionally change the slowness here?  Previously it was fast 
(`doSlowChecks = false`) but now it's slow?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -2073,9 +2056,9 @@ private static Status.TermIndexStatus checkFields(
               Impacts impacts = impactsEnum.getImpacts();
               checkImpacts(impacts, doc);
               maxFreq = Integer.MAX_VALUE;
-              for (int level = 0; level < impacts.numLevels(); ++level) {
-                if (impacts.getDocIdUpTo(level) >= max) {
-                  List<Impact> perLevelImpacts = impacts.getImpacts(level);
+              for (int impactsLevel = 0; impactsLevel < impacts.numLevels(); 
++impactsLevel) {

Review Comment:
   Heh name collision!  Naming is the hardest part of our jobs...



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -4191,7 +4207,8 @@ public static Options parseOptions(String[] args) {
               + "If no package is specified the "
               + FSDirectory.class.getPackage().getName()
               + " package will be used.\n"
-              + "\n"
+              + "CheckIndex only verifies file checksums as default.\n"
+              + "Use -level with value of '2' or higher if you also want to 
check segment file contents.\n\n"

Review Comment:
   Hmm should this be `1 or higher`?  0 == fast, 1 == some contents checked, 2 
== some more contents checked (hmm what's different here?), 3 == insanely slow 
(cross verify postings/TVs)?
   
   Edit OK I see -- `1` is the min level, so `1 == fast (validate checksums)`, 
`2 == checkIntegrity` (hmm not sure what this does beyond checksums), `3 == 
insanely slow (cross verify postings/TVs)` -- are there any other checks 
enabled at level 3 besides the cross verification?



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -2479,15 +2462,11 @@ public static Status.TermIndexStatus 
testPostings(CodecReader reader, PrintStrea
    * @lucene.experimental
    */
   public static Status.TermIndexStatus testPostings(
-      CodecReader reader,
-      PrintStream infoStream,
-      boolean verbose,
-      boolean doSlowChecks,
-      boolean failFast)
+      CodecReader reader, PrintStream infoStream, boolean verbose, int level, 
boolean failFast)
       throws IOException {
 
-    // TODO: we should go and verify term vectors match, if
-    // doSlowChecks is on...
+    // TODO: we should go and verify term vectors match, if the Level is high 
enough to
+    // include slow checks

Review Comment:
   Maybe we need a new `MIN_LEVEL_FOR_EXCEPTIONALLY_SLOW_CHECKS`!  
Cross-verification of postings and term vectors is indeed heinously slow...
   
   But let's not do this in this PR -- this is already an awesome change -- PnP 
(progress not perfection, not plug and play).



##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -4113,6 +4086,29 @@ private static int doMain(String[] args) throws 
IOException, InterruptedExceptio
     }
   }
 
+  /** Class with static variables with information about CheckIndex's -level 
parameter. */
+  public static class Level {
+    private Level() {}
+
+    /** Minimum valid level. */
+    public static final int MIN_VALUE = 1;
+
+    /** Maximum valid level. */
+    public static final int MAX_VALUE = 3;

Review Comment:
   Aha, here's my answer from first comment: `4` ("beyond slow") is not legal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to