Thanks Erick for the helpful explanations. thanks sumit ________________________________________ From: Erick Erickson [erickerick...@gmail.com] Sent: Monday, March 23, 2015 4:58 PM To: solr-user@lucene.apache.org Subject: Re: Difference in indexing using config file vs client i.e SolrJ
1> Either none or lots, depending;). You're talking "schemaless" here I think. schemaless mode guesses what the field should be based on the document and creates a field in the doc. pre-defined schemas require you to make that decision up front. So in terms of what the underlying index looks like on a lower-level Lucene basis, whether a field is defined in the schema.xml or dynamically it's identical. So in that perspective, there's no difference. However, whether the field definitions chosen best represent the problem you're trying to solve is another issue all together. Schemaless simply cannot apply the same kind of domain-specific interpretation that a human can, not to mention construct analysis chains for the tokens that are reflective of the characteristics specific to that domain. 2> There have been some anecdotal reports of schemaless copying everything into a _text field that impact performance, but this is configurable. 3> Again, the underlying structure of the index at the Lucene level is the same. What's NOT the same is whether schemaless mode makes the right decisions. Almost invariably a human being can do better since you're armed with knowledge of what's important and what's not. Here's my take: Schemaless mode is a great way to get started with minimal effort on your part. But pretty soon the problem domain requires that you take control of the schema and hand-craft schema.xml. For some problem spaces, schemaless may be "good enough", you have to evaluate your corpus and your problem space.... Best, Erick On Mon, Mar 23, 2015 at 4:41 PM, Purohit, Sumit <sumit.puro...@pnnl.gov> wrote: > Hi All, > > I have recently started working with Solr and i have a trivial question to > ask, as i could not find suitable answer. > > A document's indexes can be defined in a config file (such as schema.xml) and > on the fly using some solr client such as SolrJ. > > 1. What is the difference in indexes created by both the approaches ? > 2. Is there any major performance gain in the case of using predefined index > instead of using SolrJ ? > 3. Does solr persist these indexes differently and does that has any impact > on the Query efficiency ? > > Thanks > Sumit Purohit