Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Upayavira Thu, 03 Dec 2015 13:20:56 -0800

They are different beasts, but I bet on the managed schema winning in
the long run.


With the bulk API, you can post a heap of fields/etc in one go, so
basically, rather than pushing the schema to Zookeeper, you push it to
Solr. 

Look at Solr 5.4 when it comes out shortly. It'll change the way you
think about the schema. The managed schema has been there for ages, but
now the UI has support for it in the schema tab. Being able to really
easily create and remove fields certainly does things to my brain
because it is just so easy.

Upayavira

On Thu, Dec 3, 2015, at 08:35 PM, Erick Erickson wrote:
> It Depends (tm).
> 
> Managed Schema is way cool if you have a front end that lets you
> manipulate the schema via a browser or other program. There's really
> no other way to deal with changing the schema from a browser without
> allowing uploading xml files, which is a security problem. Trust me on
> this one ;).
> 
> For people who know the ins and outs of schema.xml, it's often easier
> just to edit the raw file and upload it to ZK (or use it locally). And
> much faster for mass edits.
> 
> So really they're different beasts. The end result is functionally the
> same, there's a schema that's read by Solr and used. The managed
> schema makes it harder to have typos sneak in and prevent collections
> from loading at the expense of fast mass editing.
> 
> And there is some ability to change the solrconfig.xml file, see:
> https://cwiki.apache.org/confluence/display/solr/Config+API. But again
> whether you "should" use that or just manually edit solrconfig.xml is
> largely a matter of the tools available and personal taste.
> 
> 
> bq: ....will be easier to deploy / maintain
> 
> 
> Not a lot of difference here. At the end of the day, you have to
> 1> have the configs stored somewhere safely in version control (or at
> least I think you must)
> 2> change the files in the config set on Zookeeper
> 3> reload the collection.
> 
> So with manually editing the process to change something you'd
> 1> get the files from VCS
> 2> edit them
> 3> push them to ZK
> 4> reload the collection (collections API) and verify it was correct
> 5> save the configs back to VCS.
> 
> With managed schema you'd
> 1> use the managed schema API to make changes
> 2> reload the collection and verify
> 3> pull the changes from Zookeeper
> 4> put them in VCS
> 
> 
> Best,
> Erick
> 
> 
> 
> On Thu, Dec 3, 2015 at 12:09 PM, Don Bosco Durai <bo...@apache.org>
> wrote:
> > My experience is, once managed-schema is created, then schema.xml even if 
> > present is ignored. When both are present, you will get a warning in the 
> > Solr log.
> >
> > I have stopped using schema.xml. Actually, I use it once, start Solr and 
> > after it generates managed-schema, I export it and pretty much just update 
> > it going forward.
> >
> > I think, the recommended way to manage fields is using API calls, but it 
> > might not be always possible. E.g. You have to save the config in source 
> > code system. If you are doing that, make sure you to update it more 
> > regularly, because if Solr finds a new field name, it will auto create it 
> > in the managed-schema and you saved copy will be out of date.
> >
> > Bosco
> >
> >
> >
> >
> > On 12/3/15, 11:47 AM, "Jeff Wartes" <jwar...@whitepages.com> wrote:
> >
> >>I’ve never used the managed schema, so I’m probably biased, but I’ve never
> >>seen much of a point to the Schema API.
> >>
> >>I need to make changes sometimes to solrconfig.xml, in addition to
> >>schema.xml and other config files, and there’s no API for those, so my
> >>process has been like:
> >>
> >>1. Put the entire config directory used by a collection in source control
> >>somewhere. solrconfig.xml, schema.xml, synonyms.txt, everything.
> >>2. Make changes, test, commit
> >>3. “Release” by uploading the whole config dir at a specific commit to ZK
> >>(overwriting any existing files) and issuing a collections API “reload”.
> >>
> >>
> >>This has the downside that I can upload a broken config and take down my
> >>collection, but with the whole config dir in source control,
> >>I can also easily roll back to any point by uploading an old commit.
> >>You still have to be aware of how the changes you’re making will effect
> >>your current index, but that’s unavoidable.
> >>
> >>
> >>On 12/3/15, 7:09 AM, "Kelly, Frank" <frank.ke...@here.com> wrote:
> >>
> >>>Just wondering if folks have any suggestions on using Schema.xml vs.
> >>>Managed Schema going forward.
> >>>
> >>>Our deployment will be
> >>>> 3 Zk, 3 Shards, 3 replicas
> >>>> Copies of each collection in 5 AWS regions (EBS-backed EC2 instances)
> >>>> Planning at least 1 Billion objects indexed (currently < 100 million)
> >>>
> >>>I'm sure our schema.xml will have changes and fixes and just wondering
> >>>which approach (schema.xml vs. managed)
> >>>will be easier to deploy / maintain?
> >>>
> >>>Cheers!
> >>>
> >>>-Frank
> >>>
> >>>
> >>>Frank Kelly
> >>>Principal Software Engineer
> >>>Predictive Analytics Team (SCBE/HAC/CDA)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >

Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Reply via email to