Hi, I am a returner to solr with limited experience in solr-5.2 now diving into solr-6.1. My problem is how to specify a tailored schema.xml
After reading several tutorials and book chapters about how to configure schema.xml I have a basic understanding about its concepts and structure. Now I created as exercise a core "cinema" where I intended to load the example/films/films.xml using the command: bin/solr create -c cinema this creates server/solr/cinema and therein conf/managed-schema. The comment inside managed-schema says: 'This is the Solr schema file. This file should be named "schema.xml"' and "This example schema is the recommended starting point for users." Unfortunately I have a hard time to make use of managed-schema as starting point! The problem is that I want to understand how to configure a lightweight schema.xml which is tailored to a doc structure which is pretty much under my control. For instance, the films.xml docs have such a simple structure that it should be sufficient to have a simple schema.xml as that: <schema name="hubert" version="1.6"> <fields> <field name="id" type="string" indexed="true" stored="true" multiValued="false"/> <field name="directed_by" type="string" indexed="true" stored="true" multiValued="true"/> <field name="name" type="string" indexed="true" stored="true" multiValued="false"/> <field name="genre" type="string" indexed="true" stored="true" multiValued="true"/> <field name="initial_release_date" type="date" indexed="true" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> </schema> However, the managed-schema provided in example/techproducts/solr/films/conf has 480 lines instead of my 12 lines. It is full of fieldType and dynamicField specification that never apply for this data. Unfortunately my schema.xml doesn't work with the rest of the conf setting that is generated with bin/solr create -c cinema. The problem seems to be the autogenerated solrconfig.xml. Here again this setting is full of configurations which I probably don't want. In particular all about "Add unknown fields to the schema" is something I definitely don't want when I know the data to be indexed. It looks like there are many other heuristics and clever procedures configured here that might be useful when you don't know your data structure. The problem is that I don't understand what is going on behind the scene. And when you know your data it is better to understand all configurations instead of trusting in "clever" default configurations. In fact my simple schema.xml works fine with a likewise simple solrconfig.xml: <config> <luceneMatchVersion>4.10.4</luceneMatchVersion> <requestHandler name="standard" class="solr.StandardRequestHandler" default="true"/> <requestHandler name="/update" class="solr.UpdateRequestHandler"/> <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers"/> <admin> <defaultQuery>*:*</defaultQuery> </admin> </config> Again my simple solrconfig.xml contains only 9 lines as compared to 1482 lines in the autogenerated solrconfig.xml. Yet, both my simple config files (schema.xml and solrconfig.xml) are not a proper solution as it works only when solrconfig.xml is configured with <luceneMatchVersion>4.10.4</luceneMatchVersion> and it fails when configured (as in the autogenerated solrconfig.xml) with <luceneMatchVersion>6.1.0</luceneMatchVersion> Bottom line is: It would be great to get guidence on how to configure a minimal schema.xml and solrconfig.xml for e.g. films.xml that works under 6.1.0. The config files generated with "bin/solr create ..." are quite the opposite. These configs are probably useful when you want to allow to index data with unpredicatble and heterogenius structures. But in the case of homogenoues data with cotrolled structures it is much better to know how to define a tailored minimal schema.xml and solrconfig.xml. Any hints are apprciated! Regards, Immanuel