In thinking about making the entire Solr schema REST-API-addressable 
(SOLR-4898), I'd like to be able to add arbitrary metadata at both the top 
level of the schema and at each leaf node, and allow read/write access to that 
metadata via the REST API.

Some uses I've thought of for such a facility: 

1. The managed schema now drops XML comments from schema.xml upon conversion to 
managed-schema format, but it would be much better if these were somehow 
preserved, as well as round-trippable when retrieving the schema and its 
constituents via the REST API.

2. Some comments in the example schemas don't refer to just one or to all leaf 
nodes, but rather to a group of them. I'd like to be able to group nodes by 
adding same-named "tags" to multiple nodes, and also have a top-level 
(optional) "tag description" - this description could then be presented with 
tagged nodes in various output formats.

3. Some comments in the example schema are documentation about a feature, e.g. 
copyFields.  A top-level "documentation" annotation could take a leaf node 
element name (or maybe an XPath? probably overkill) and apply to all matching 
elements. 

4. When modifying the schema via REST API, a "last-modified" annotation could 
be automatically added.

5. There were a couple of user complaints recently when schema.xml parsing was 
tightened to disallow unknown attributes on field declarations (SOLR-4641): 
people were storing their own information there.  User-level metadata would 
support this in a round-trippable way - I'm thinking we could restrict it to 
flat string-typed key/value pairs, with no nested structure.

W3C XML Schema has a similar facility: 
<http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation>.

Thoughts?

Some concrete examples of what I'm thinking of in schema.xml format 
(syntax/naming as yet unsettled):

<schema name="example" version="1.5">
  <annotation>
    <description element="tag" content="plain-numeric-field-types">
      Plain numeric field types store and index the text value verbatim.
    </description>
    <documentation element="copyField">
      copyField commands copy one field to another at the time a document
      is added to the index.  It's used either to index the same field 
differently,
      or to add multiple fields to the same field for easier/faster searching.
    </documentation>
    <last-modified>2014-03-08T12:14:02Z</last-modified>
    …
  </annotation>
…
  <fieldType name="pint" class="solr.IntField">
    <annotation>
      <tag>plain-numeric-field-types</tag>
    </annotation>
  </fieldType>
  <fieldType name="plong" class="solr.LongField">
    <annotation>
      <tag>plain-numeric-field-types</tag>
    </annotation>
  </fieldType>
  …
  <copyField source="cat" dest="text">
    <annotation>
      <todo>Should this field really be copied to the catchall text 
field?</todo>
    </annotation>
  </copyField>
  …
  <field name="text" type="text_general">
    <annotation>
      <description>catchall field</description>
      <visibility>public</visibility>
    </annotation>
  </field>

Reply via email to