Re: Dynamic schema design: feedback requested

Chris Hostetter Wed, 06 Mar 2013 16:51:22 -0800

: As far as a user editing the file AND rest API access, I think that 
: seems fine. Yes, the user is in trouble if they break the file, but that


Ignoring for a moment what format is used to persist schema information, I 
think it's important to have a conceptual distinction between "data" that 
is managed by applications and manipulated by a REST API, and "config" 
that is managed by the user and loaded by solr on init -- or via an 
explicit "reload config" REST API.

Past experience with how users percieve(d) solr.xml has heavily reinforced 
this opinion: on one hand, it's a place users must specify some config 
information -- so people wnat to be able to keep it in version control 
with other config files.  On the other hand it's a "live" data file that 
is rewritten by solr when cores are added.  (God help you if you want do a 
rolling deploy a new version of solr.xml where you've edited some of the 
config values while simultenously clients are creating new SolrCores)

As we move forward towards having REST APIs that treat schema information 
as "data" that can be manipulated, I anticipate the same types of 
confusion, missunderstanding, and grumblings if we try to use the same 
pattern of treating the existing schema.xml (or some new schema.json) as a 
hybrid configs & data file.  "Edit it by hand if you want, the /schema/* 
REST API will too!"  ... Even assuming we don't make any of the same 
technical mistakes that have caused problems with solr.xml round tripping 
in hte past (ie: losing comments, reading new config options that we 
forget to write back out, etc...) i'm fairly certain there is still going 
to be a lot of things that will loook weird and confusing to people.

(XML may bave been designed to be both "human readable & writable" and 
"machine readable & writable", but practically speaking it's hard have a 
single XML file be "machine and human readable & writable")

I think it would make a lot of sense -- not just in terms of 
implementation but also for end user clarity -- to have some simple, 
straightforward to understand caveats about maintaining schema 
information...

1) If you want to keep schema information in an authoritative config file 
that you can manually edit, then the /schema REST API will be read only. 

2) If you wish to use the /schema REST API for read and write operations, 
then schema information will be persisted under the covers in a data store 
whose format is an implementation detail just like the index file format.

3) If you are using a schema config file and you wish to switch to using 
the /schema REST API for managing schema information, there is a 
tool/command/API you can run to so.

4) if you are using the /schema REST API for managing schema information, 
and you wish to switch to using a schema config file, there is a 
tool/command/API you can run to export the schema info if a config file 
format.


...wether of not the "under the covers in a data store" used by the REST 
API is JSON, or some binary data, or an XML file just schema.xml w/o 
whitespace/comments should be an implementation detail.  Likewise is the 
question of wether some new config file formats are added -- it shouldn't 
matter.

If it's config it's config and the user owns it.
If it's data it's data and the system owns it.

: is the risk they take if they want to manually edit it - it's no 
: different than today when you edit the file and do a Core reload or 
: something. I think we can improve some validation stuff around that, but 
: it doesn't seem like a show stopper to me.

The new risk is multiple "actors" (both the user, and Solr) editing the 
file concurrently, and info that might be lost due to Solr reading the 
file, manpulating internal state, and then writing the file back out.  

Eg: User hand edits may be lost if they happen on disk during Solr's 
internal manpulation of data.  API edits may be reflected in the internal 
state, but lost if the User writes the file directly and then does a core 
reload, etc....

: At a minimum, I think the user should be able to start with a hand 
: modified file. Many people *heavily* modify the example schema to fit 
: their use case. If you have to start doing that by making 50 rest API 
: calls, that's pretty rough. Once you get your schema nice and happy, you 
: might script out those rest calls, but initially, it's much 
: faster/easier to whack the schema into place in a text editor IMO.

I don't think there is any disagreement about that.  The ability to say 
"my schema is a config file and i own it" should always exist (remove 
it over my dead body) 

The question is what trade offs to expect/require for people who would 
rather use an API to manipulate these things -- i don't think it's 
unreasable to say "if you would like to manipulate the schema using an 
API, then you give up the ability to manipulate it as a config file on 
disk"

("if you want the /schema API to drive your car, you have to take your 
foot of hte pedals and let go of the steering wheel")

-Hoss

Re: Dynamic schema design: feedback requested

Reply via email to