Yo. That is the truth. You can get stuff indexed with an automatic schema, but if you want to make your customers happy, tune it.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 22, 2016, at 6:22 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > And, more generally, schemaless makes a series of assumptions, any > of which may be wrong. > > You _must_ hand-tweak your schema to squeeze all the performance out of Solr > that you can. If your collection isn't big enough that you need to squeeze, > don't bother.... > > FWIW, > Erick > > On Fri, Jan 22, 2016 at 11:19 AM, Steve Rowe <sar...@gmail.com> wrote: >> Yes, and also underflow in the case of double/float. >> >> -- >> Steve >> www.lucidworks.com >> >>> On Jan 22, 2016, at 12:25 PM, Shyam R <shyam.reme...@gmail.com> wrote: >>> >>> I think, schema-less mode might allocate double instead of float, long >>> instead of int to guard against overflow, which increases index size. Is my >>> assumption valid? >>> >>> Thanks >>> >>> >>> >>> >>> On Thu, Jan 21, 2016 at 10:48 PM, Erick Erickson <erickerick...@gmail.com> >>> wrote: >>> >>>> I guess it's all about whether schemaless really supports >>>> 1> all the docs you index. >>>> 2> all the use-cases for search. >>>> 3> the assumptions it makes scale to you needs. >>>> >>>> If you've established rigorous tests and schemaless does all of the >>>> above, I'm all for shortening the cycle by using schemaless. >>>> >>>> But if it's just being sloppy and "success" is "I managed to index 50 >>>> docs and get some results back by searching", expect to find some >>>> "interesting" issues down the road. >>>> >>>> And finally, if it's "we use schemaless to quickly try things in the >>>> UI and for the _real_ prod environment we need to be more rigorous >>>> about the schema", well shortening development time is A Good Thing. >>>> Part of moving to prod could be taking the schema generated by >>>> schemaless and tweaking it for instance. >>>> >>>> Best, >>>> Erick >>>> >>>> On Thu, Jan 21, 2016 at 8:54 AM, Shawn Heisey <apa...@elyograg.org> wrote: >>>>> On 1/21/2016 2:22 AM, Prateek Jain J wrote: >>>>>> Thanks Erick, >>>>>> >>>>>> Yes, I took same approach as suggested by you. The issue is some >>>> developers started with schemaless configuration and now they have started >>>> liking it and avoiding restrictions (including increased time to deploy >>>> application, in managed enterprise environment). I was more concerned about >>>> pushing best practices around this in team, because allowing anyone to new >>>> attributes will become overhead in terms of management, security and >>>> maintainability. Regarding your concern about not storing documents on >>>> separate disk; we are storing them in solr but not as backup copies. One >>>> doubt still remains in mind w.r.t auto-detection of types in solr: >>>>>> >>>>>> Is there a performance benefit of using defined types (schema based) >>>> vs un-defined types while adding documents? Does "solrj" ships this >>>> meta-information like type of attributes to solr, because code looks >>>> something like? >>>>>> >>>>>> SolrInputDocument doc = new SolrInputDocument(); >>>>>> doc.addField("category", "book"); // String >>>>>> doc.addField("id", 1234); //Long >>>>>> doc.addField("name", "Trying solrj"); //String >>>>>> >>>>>> In my opinion, any auto-detector code will have some overhead vs the >>>> other; any thoughts around this? >>>>> >>>>> Although the true reality may be more complex, you should consider that >>>>> everything Solr receives from SolrJ will be text -- as if you had sent >>>>> the JSON or XML indexing format manually, which has no type information. >>>>> >>>>> When you are building a document with SolrInputDocument, SolrJ has no >>>>> knowledge of the schema in Solr. It doesn't know whether the target >>>>> field is numeric, string, date, or something else. >>>>> >>>>> Using different object types for input to SolrJ just gives you general >>>>> Java benefits -- things like detecting certain programming errors at >>>>> compile time. >>>>> >>>>> Thanks, >>>>> Shawn >>>>> >>>> >>> >>> >>> >>> -- >>> Ph: 9845704792 >>