This is exactly right. Schemaless can be a great discovery tool, but not something it is useful to use in production, I'd say.
On Fri, Dec 4, 2015, at 08:21 PM, Davis, Daniel (NIH/NLM) [C] wrote: > So, I actually went to an Elastic Search one day conference. One person > spoke about having to re-index everything because they had their field > mappings wrong. I've also worked on Linked Data, RDF, where the fact > that everything is a triple is supposed to make SQL schemas unneeded. > > The theme with Elastic Search was: > - spend some time on your field mappings (which are a schema) up front. > - if you don't, you are either going to be wasting space, or > experiencing slow search, or both. > > The theme with RDF was: > - First model your vocabulary and make sure it answers the questions you > want to answer. > > So, we can be "schemaless", but with both Linked Data and ES, it is a way > to get started quickly - there are still advantages to using a schema. > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, December 04, 2015 3:16 PM > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable? > > Actually, I rather agree with your colleagues, but then I'm something of > a curmudgeon. > > More accurately, unless you _strictly_ control the input documents, you > never know what you have in your index. I'd rather have docs fail > indexing than be indexed with, say, typos in the field names.... > > FWIW, > Erick > > On Fri, Dec 4, 2015 at 6:51 AM, Rick Leir <richard.l...@canadiana.ca> > wrote: > > On Fri, Dec 4, 2015 at 12:59 AM, > > <solr-user-digest-h...@lucene.apache.org> > > wrote: > > > >> > >> >Just wondering if folks have any suggestions on using Schema.xml vs. > >> >Managed Schema going forward. > >> > > > > > > > We are using loosely typed languages (Perl and Javascript), and a > > loosely typed DB (CouchDB). This is consistent with running Solr in > > Schemaless mode, and doing more unit tests. When you post a doc into > > Solr containing a field which has not been seen before, Solr chooses > > the most appropriate Type. There is no Java exception and the field > > data is searchable. You can discover the Type by looking at the Solr > > console. We can probably log it too. > > > > The new field might be due to us intentionally adding it, though we > > should be methodical and systematic about adding new fields. > > > > Or it could be due to unexpected input to the ingest scripts, (but I > > believe these scripts should clean their inputs). > > > > Or it could be due to a bug in the ingest scripts. In the spirit of > > TDD, the ingest scripts should have tests so we can claim they are bug free. > > > > > > However, I brought up this topic with my colleagues here, and they are > > sure we should stick with Schema.xml. ".. some level of control and > > expectation of exactly what kind of data is in our search system > > wouldn't be helpful .." So be it. > > Cheers -- Rick