An annotation field would be much better than the current "anything goes" 
schema-less schema.xml.

Has anyone built an XML Schema for schema.xml? I know it is extensible, but it 
would be worth a try.

wunder

On Jul 31, 2013, at 6:21 PM, Steve Rowe wrote:

> In thinking about making the entire Solr schema REST-API-addressable 
> (SOLR-4898), I'd like to be able to add arbitrary metadata at both the top 
> level of the schema and at each leaf node, and allow read/write access to 
> that metadata via the REST API.
> 
> Some uses I've thought of for such a facility: 
> 
> 1. The managed schema now drops XML comments from schema.xml upon conversion 
> to managed-schema format, but it would be much better if these were somehow 
> preserved, as well as round-trippable when retrieving the schema and its 
> constituents via the REST API.
> 
> 2. Some comments in the example schemas don't refer to just one or to all 
> leaf nodes, but rather to a group of them. I'd like to be able to group nodes 
> by adding same-named "tags" to multiple nodes, and also have a top-level 
> (optional) "tag description" - this description could then be presented with 
> tagged nodes in various output formats.
> 
> 3. Some comments in the example schema are documentation about a feature, 
> e.g. copyFields.  A top-level "documentation" annotation could take a leaf 
> node element name (or maybe an XPath? probably overkill) and apply to all 
> matching elements. 
> 
> 4. When modifying the schema via REST API, a "last-modified" annotation could 
> be automatically added.
> 
> 5. There were a couple of user complaints recently when schema.xml parsing 
> was tightened to disallow unknown attributes on field declarations 
> (SOLR-4641): people were storing their own information there.  User-level 
> metadata would support this in a round-trippable way - I'm thinking we could 
> restrict it to flat string-typed key/value pairs, with no nested structure.
> 
> W3C XML Schema has a similar facility: 
> <http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation>.
> 
> Thoughts?
> 
> Some concrete examples of what I'm thinking of in schema.xml format 
> (syntax/naming as yet unsettled):
> 
> <schema name="example" version="1.5">
>  <annotation>
>    <description element="tag" content="plain-numeric-field-types">
>      Plain numeric field types store and index the text value verbatim.
>    </description>
>    <documentation element="copyField">
>      copyField commands copy one field to another at the time a document
>      is added to the index.  It's used either to index the same field 
> differently,
>      or to add multiple fields to the same field for easier/faster searching.
>    </documentation>
>    <last-modified>2014-03-08T12:14:02Z</last-modified>
>    …
>  </annotation>
> …
>  <fieldType name="pint" class="solr.IntField">
>    <annotation>
>      <tag>plain-numeric-field-types</tag>
>    </annotation>
>  </fieldType>
>  <fieldType name="plong" class="solr.LongField">
>    <annotation>
>      <tag>plain-numeric-field-types</tag>
>    </annotation>
>  </fieldType>
>  …
>  <copyField source="cat" dest="text">
>    <annotation>
>      <todo>Should this field really be copied to the catchall text 
> field?</todo>
>    </annotation>
>  </copyField>
>  …
>  <field name="text" type="text_general">
>    <annotation>
>      <description>catchall field</description>
>      <visibility>public</visibility>
>    </annotation>
>  </field>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to