jmckenzie-dev commented on code in PR #181:
URL:
https://github.com/apache/cassandra-analytics/pull/181#discussion_r2920973250
##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/WriterOptions.java:
##########
@@ -132,4 +132,12 @@ public enum WriterOptions implements WriterOption
* - a failure otherwise
*/
JOB_TIMEOUT_SECONDS,
+ /**
+ * Option to bypass the secondary index validation check during bulk write
job setup.
+ * By default, bulk writes to tables with secondary indexes are rejected.
+ * Setting this option to {@code true} allows bulk writes to proceed on
tables that have secondary indexes,
+ * with the understanding that the secondary indexes will NOT be updated
by the bulk write and must be
+ * rebuilt separately after the job completes.
+ */
+ SKIP_SECONDARY_INDEX_CHECK,
Review Comment:
Would putting this in `BulkSparkConf` force it session-wide? i.e. a user
would lose the ability to reason about and easily configure this setting on a
per-table / per operation basis vs. the public exposure of it via
`WriterOptions`?
My intuition right now is that this is something that probably should have
been enabled by default all this time w/a configurable guardrail to turn it
off, so while I'm sympathetic to the idea of taking a small step from "don't
allow it" to "allow it but make it hard to use", the risk of this being easily
accessible for users is that they'll bulk write to a table that will then have
a long running index building operation happen in the background. Which, other
than "load on node" and "application might read a partial index if you don't
have automation that clearly delineates when a bulk insert and reindex finish
from application accessing it", doesn't represent a structural or correctness
risk to the data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]