jmckenzie-dev commented on code in PR #181:
URL: 
https://github.com/apache/cassandra-analytics/pull/181#discussion_r2920973250


##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/WriterOptions.java:
##########
@@ -132,4 +132,12 @@ public enum WriterOptions implements WriterOption
      * - a failure otherwise
      */
     JOB_TIMEOUT_SECONDS,
+    /**
+     * Option to bypass the secondary index validation check during bulk write 
job setup.
+     * By default, bulk writes to tables with secondary indexes are rejected.
+     * Setting this option to {@code true} allows bulk writes to proceed on 
tables that have secondary indexes,
+     * with the understanding that the secondary indexes will NOT be updated 
by the bulk write and must be
+     * rebuilt separately after the job completes.
+     */
+    SKIP_SECONDARY_INDEX_CHECK,

Review Comment:
   Would putting this in `BulkSparkConf` force it session-wide? i.e. a user 
would lose the ability to reason about and easily configure this setting on a 
per-table / per operation basis vs. the public exposure of it via 
`WriterOptions`?
   
   My intuition right now is that this is something that probably should have 
been enabled by default all this time w/a configurable guardrail to turn it 
off, so while I'm sympathetic to the idea of taking a small step from "don't 
allow it" to "allow it but make it hard to use", the risk of this being easily 
accessible for users is that they'll bulk write to a table that will then have 
a long running index building operation happen in the background. Which, other 
than "load on node" and "application might read a partial index if you don't 
have automation that clearly delineates when a bulk insert and reindex finish 
from application accessing it", doesn't represent a structural or correctness 
risk to the data. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to