I would really avoid schemaless in _any_ situation where I know the schema ahead of time.
bq: But in my case, I am planning to use solrj (so, no spelling mistakes) On, I'm quite sure there'll be some kind of mistake sometime ;) I know of at at least one situation where a programming mistake in SolrJ caused over 20K unique dynamic fields to be created. Admittedly, not a spelling mistake. But ranting aside, let's draw a clear distinction between schemaless and managed schema on the one hand and classic on the other. Both schemaless and managed schema use the same underlying mechanism to change your schema file, specifically the REST API. The difference is that _you_ need to issue the REST API commands in "managed schema" yourself (or script or whatever). "schemaless" mode issues those REST API commands for you whenever the update processor sees a field it doesn't recognize, after guessing what kind of field it is. Classic, of course, requires you to hand-edit a text file and upload it to SolrCloud and reload collections for changes to take effect. I tend to prefer classic when I know up-front exactly what my schema should be. In fact, I tend to strip everything out of the schema.xml file I know I don't need including dynamic field definitions, copyfields and the like. Like Shawn, I want my docs to fail if they don't conform to my schema ASAP. Managed is ideal for situations where you have some UI front-end that allows end users (or administrators) to define a schema and don't want them to muck around with hand-editing files. Schemaless is very cool, but IMO not something I'd go to production with, especially at scale. It's way cool for starting out, but as the scale grows you want to squeeze out all the unessential bits of the index you can, and schemaless doesn't have the "meta-knowledge" you have (or at least should have) about the problem space. bq: Another thing to keep in mind is, I am pushing documents to solr from some random/unknown source and they are not getting stored on separate disc This is pretty scary. How are you controlling what fields get indexed? You mentioned SolrJ, so I'm presuming you have a mechanism to map all the information (meta-data included) you get from those random/unknown sources into your known schema? FWIW, Erick On Wed, Jan 20, 2016 at 10:03 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 1/20/2016 10:17 AM, Prateek Jain J wrote: >> >> What all I could gather from various blogs is, defining schema stops >> developers from accidently adding fields to solr. But in my case, I am >> planning to use solrj (so, no spelling mistakes). My point is: >> >> >> 1. Is there any advantage like performance or anything else while >> reading or writing or querying, if we go schema way? >> >> 2. What impact it can have on maintainability of project? >> >> Another thing to keep in mind is, I am pushing documents to solr from some >> random/unknown source and they are not getting stored on separate disc >> (using solr for indexing and storing). By this what I mean is, re-indexing >> is not an option for me. Starting schemaless might give me a quick start >> for project but, is there a fine print that is getting missed? Any >> inputs/experiences/pointers are welcome. > > > There is no performance difference. With a managed schema, there is still a > schema file in the config, it just has a different filename and can be > changed remotely. Internally, I am pretty sure that the java objects are > identical. > > I personally would not want to have a managed schema or run in schemaless > mode in production. I do not want it to be possible for anybody else to > change the config. > > Thanks, > Shawn >