First, I would invest the largest effort towards developing good test cases and a good test harness for your ETL software itself. If validation in production does encounter errors, it should be considered a bug in your code! So be sure to always add these cases to your test harness.
Also, the row level validation can and should be driven by metadata. I'm assuming you have a mapping between RDBMS table names and Solr entity types? And, for any given entity type, a table that maps solr field names and datatypes to their RDBMS equivalents? My assumption would be that the ETL process itself uses such metadata. The same data could be used for production data validation. My inclination would be to integrate granular / row-level validation into the ETL job itself. For summary validation, if re-indexing from scratch every time, just run some facet queries and compare to the equivalent summaries for the SQL input data (assuming you are familiar with SQL "group by" and "having" clauses). If using incremental loads, make sure you can associate the loaded data with the ETL job that loaded it (timestamp, batch ID, etc.). Then simply scope the facet queries by the batch in question and compare to the SQL summary. -----Original Message----- From: marotosg [mailto:marot...@gmail.com] Sent: Monday, March 02, 2015 6:32 AM To: solr-user@lucene.apache.org Subject: Validate data Indexed and versioning Hi, I am trying to define a way of validating if my index has the same content than my database. I am indexing a very complex denormalized version of the database with many items and nested documents. I have an indexation service which pulls records from a staging table(created based on a ETL process), transforms this table into xml which will be posted to Solr. Is there any general approach to check if your indexed document matches the database row?. One option I see is to create an additional service to run against solr and database and validate if has the same data but this is going to be very intensive. I was more on the opinion of solr telling the record indexed and content like number of nested docs of type A,B etc., Any suggestions would help. Thanks Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Validate-data-Indexed-and-versioning-tp4190304.html Sent from the Solr - User mailing list archive at Nabble.com. ************************************************************************* This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *************************************************************************