On 12/3/2015 8:09 AM, Kelly, Frank wrote: > Just wondering if folks have any suggestions on using Schema.xml vs. Managed > Schema going forward. > > Our deployment will be >> 3 Zk, 3 Shards, 3 replicas >> Copies of each collection in 5 AWS regions (EBS-backed EC2 instances) >> Planning at least 1 Billion objects indexed (currently < 100 million) > > I'm sure our schema.xml will have changes and fixes and just wondering which > approach (schema.xml vs. managed) > will be easier to deploy / maintain?
In production, you probably want a schema that cannot change. The managed schema that you find in the data-driven configuration will automatically add new fields to the schema if unknown fields are encountered in your data ... which means that if somehow a typo makes it through your indexing process, you may not know about the problem until later. With a static schema, an indexing request that has an error in a field name will be rejected and you will receive an error, which is how I would want Solr to behave. The data-driven schema is good for prototyping, but because the field definitons that get added are just a guess by Solr, I would manually edit the schema before going into production. Once in production I would want to be in complete manual control of the schema. Thanks, Shawn