Hi,

I am a returner to solr with limited experience in solr-5.2 now diving into
solr-6.1. My problem is
how to specify a tailored schema.xml

After reading several tutorials and book chapters about how to configure
schema.xml I have a basic understanding about its concepts and structure.

Now I created as exercise a core "cinema" where I intended to load the
example/films/films.xml using the command:

bin/solr create -c cinema

this creates server/solr/cinema and therein conf/managed-schema. The
comment inside managed-schema says: 'This is the Solr schema file. This
file should be named "schema.xml"' and "This example schema is the
recommended starting point for users."

Unfortunately I have a hard time to make use of managed-schema as starting
point! The problem is that I want to understand how to configure a
lightweight schema.xml which is tailored to a doc structure which is pretty
much under my control. For instance, the films.xml docs have such a simple
structure that it should be sufficient to have a simple schema.xml as that:

<schema name="hubert" version="1.6">
    <fields>
        <field name="id" type="string" indexed="true" stored="true"
multiValued="false"/>
        <field name="directed_by" type="string" indexed="true"
stored="true" multiValued="true"/>
        <field name="name" type="string" indexed="true" stored="true"
multiValued="false"/>
        <field name="genre" type="string" indexed="true" stored="true"
multiValued="true"/>
        <field name="initial_release_date" type="date" indexed="true"
stored="true"/>
    </fields>
    <uniqueKey>id</uniqueKey>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0"
positionIncrementGap="0"/>
</schema>

However, the managed-schema provided in
example/techproducts/solr/films/conf has 480 lines instead of my 12 lines.
It is full of fieldType and dynamicField specification that never apply for
this data.

Unfortunately my schema.xml doesn't work with the rest of the conf setting
that is generated with
bin/solr create -c cinema. The problem seems to be the autogenerated
solrconfig.xml. Here again this setting is full of configurations which I
probably don't want. In particular all about "Add unknown fields to the
schema" is something I definitely don't want when I know the data to be
indexed. It looks like there are many other heuristics and clever
procedures configured here that might be useful when you don't know your
data structure. The problem is that I don't understand what is going on
behind the scene. And when you know your data it is better to understand
all configurations instead of trusting in "clever" default configurations.

In fact my simple schema.xml works fine with a likewise simple
solrconfig.xml:

<config>
    <luceneMatchVersion>4.10.4</luceneMatchVersion>
    <requestHandler name="standard" class="solr.StandardRequestHandler"
default="true"/>
    <requestHandler name="/update" class="solr.UpdateRequestHandler"/>
    <requestHandler name="/admin/"
class="org.apache.solr.handler.admin.AdminHandlers"/>
    <admin>
        <defaultQuery>*:*</defaultQuery>
    </admin>
</config>

Again my simple solrconfig.xml contains only 9 lines as compared to 1482
lines in the autogenerated solrconfig.xml.

Yet, both my simple config files (schema.xml and solrconfig.xml) are not a
proper solution as it works only when solrconfig.xml is configured with

    <luceneMatchVersion>4.10.4</luceneMatchVersion>

and it fails when configured (as in the autogenerated solrconfig.xml) with

    <luceneMatchVersion>6.1.0</luceneMatchVersion>

Bottom line is: It would be great to get guidence on how to configure a
minimal schema.xml and solrconfig.xml for e.g. films.xml that works under
6.1.0. The config files generated with "bin/solr create ..." are quite the
opposite. These configs are probably useful when you want to allow to index
data with unpredicatble and heterogenius structures. But in the case of
homogenoues data with cotrolled structures it is much better to know how to
define a tailored minimal schema.xml and solrconfig.xml.

Any hints are apprciated!

Regards,
Immanuel

Reply via email to