Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Upayavira Fri, 04 Dec 2015 12:39:12 -0800

This is exactly right. Schemaless can be a great discovery tool, but not
something it is useful to use in production, I'd say.


On Fri, Dec 4, 2015, at 08:21 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> So, I actually went to an Elastic Search one day conference.   One person
> spoke about having to re-index everything because they had their field
> mappings wrong.   I've also worked on Linked Data, RDF, where the fact
> that everything is a triple is supposed to make SQL schemas unneeded.
> 
> The theme with Elastic Search was:
>  - spend some time on your field mappings (which are a schema) up front.
>  - if you don't, you are either going to be wasting space, or
>  experiencing slow search, or both.
> 
> The theme with RDF was:
>  - First model your vocabulary and make sure it answers the questions you
>  want to answer.
> 
> So, we can be "schemaless", but with both Linked Data and ES, it is a way
> to get started quickly - there are still advantages to using a schema.
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Friday, December 04, 2015 3:16 PM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?
> 
> Actually, I rather agree with your colleagues, but then I'm something of
> a curmudgeon.
> 
> More accurately, unless you _strictly_ control the input documents, you
> never know what you have in your index. I'd rather have docs fail
> indexing than be indexed with, say, typos in the field names....
> 
> FWIW,
> Erick
> 
> On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir <richard.l...@canadiana.ca>
> wrote:
> > On Fri, Dec 4, 2015 at 12:59 AM, 
> > <solr-user-digest-h...@lucene.apache.org>
> > wrote:
> >
> >>
> >> >Just wondering if folks have any suggestions on using Schema.xml vs.
> >> >Managed Schema going forward.
> >> >
> >
> >
> > We are using loosely typed languages (Perl and Javascript), and a 
> > loosely typed DB (CouchDB). This is consistent with running Solr in 
> > Schemaless mode, and doing more unit tests. When you post a doc into 
> > Solr containing a field which has not been seen before, Solr chooses 
> > the most appropriate Type. There is no Java exception and the field 
> > data is searchable. You can discover the Type by looking at the Solr 
> > console. We can probably log it too.
> >
> > The new field might be due to us intentionally adding it, though we 
> > should be methodical and systematic about adding new fields.
> >
> > Or it could be due to unexpected input to the ingest scripts, (but I 
> > believe these scripts should clean their inputs).
> >
> > Or it could be due to a bug in the ingest scripts. In the spirit of 
> > TDD, the ingest scripts should have tests so we can claim they are bug free.
> >
> >
> > However, I brought up this topic with my colleagues here, and they are 
> > sure we should stick with Schema.xml. ".. some level of control and 
> > expectation of exactly what kind of data is in our search system 
> > wouldn't be helpful .." So be it.
> > Cheers -- Rick

Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

Reply via email to