Re: Managed schema vs schema.xml

Walter Underwood Tue, 07 Mar 2017 12:56:24 -0800

Maybe this is expert stuff, but we keep our schema, solrconfig, and everything 
else checked into source control.


I wrote a Python thingy to hit the cluster through the load balancer, get the 
zkHost string from status, upload the files to zookeeper (kazoo is a nice 
library), link the config, then do an async reload.

I’ve been thinking about time stamping the config directories so I can roll 
back to a previous config if the reload fails.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 7, 2017, at 12:47 PM, OTH <omer.t....@gmail.com> wrote:
> 
> In the reference guide, in the chapter named "The Well Configured Solr
> Instance", it says (I'm copying+pasting from the PDF version) :
> 
> Switching from Managed Schema to Manually Edited schema.xml
>> If you have started Solr with managed schema enabled and you would like to
>> switch to manually editing a schem
>> a.xml
>> a.xml file, you should take the following steps:
>> Rename the
>> Rename the managed-schema file to schema.xml.
>> Modify
>> Modify solrconfig.xml to replace the schemaFactory class.
>> Remove any
>> Remove any ManagedIndexSchemaFactory definition if it exists.
>> Add a
>> Add a ClassicIndexSchemaFactory definition as shown above
>> Reload the core(s).
>> Reload the core(s).
>> Apache Solr Reference Guide 6.4 515
>> If you are using SolrCloud, you may need to modify the files via
>> ZooKeeper. The
>> If you are using SolrCloud, you may need to modify the files via
>> ZooKeeper. The bin/solr script provides an
>> easy way to download the files from ZooKeeper and upload them back after
>> edits. See the section
>> easy way to download the files from ZooKeeper and upload them back after
>> edits. See the section ZooKeeper
>> Operations
>> Operations for more information.
>> IndexConfig in SolrConfig
>> The <indexConfig> section of solrconfig.xml defines low-level behavior of
>> the Lucene index writers.
>> By default, the settings are commented out in the sample
>> By default, the settings are commented out in the sample solrconfig.xml 
>> included
>> with Solr, which means
>> the defaults are used. In most cases, the defaults are fine.
>> the defaults are used. In most cases, the defaults are fine.
>> <indexConfig>
>> ...
>> </indexConfig>
>> Parameters covered in this section:
>> Writing New Segments
>> Merging Index Segments
>> Compound File Segments
>> Index Locks
>> Other Indexing Settings
>> Writing New Segments
>> ramBufferSizeMB
>> Once accumulated document updates exceed this much memory space (defined
>> in megabytes), then the
>> pending updates are flushed. This can also create new segments or trigger
>> a merge. Using this setting is
>> generally preferable to maxBufferedDocs. If both maxBufferedDocs and 
>> ramBufferSizeMB
>> are set in s
>> olrconfig.xml
>> olrconfig.xml, then a flush will occur when either limit is reached. The
>> default is 100Mb.
>> <ramBufferSizeMB>100</ramBufferSizeMB>
>> maxBufferedDocs
>> Sets the number of document updates to buffer in memory before they are
>> flushed as a new segment. This
>> may also trigger a merge. The default Solr configuration sets to flush by
>> RAM usage (ramBufferSizeMB).
>> <maxBufferedDocs>1000</maxBufferedDocs>
>> useCompoundFile
>> Controls whether newly written (and not yet merged) index segments should
>> use the Compound File
>> Segment
>> Segment format. The default is false.
>> <useCompoundFile>false</useCompoundFile>
>> To have full control over your schema.xml file, you may also want to
>> disable schema guessing, which
>> allows unknown fields to be added to the schema during indexing. The
>> properties that enable this feature
>> are discussed in the section
>> allows unknown fields to be added to the schema during indexing. The
>> properties that enable this feature
>> are discussed in the section Schemaless Mode
> 
> 
> On Wed, Mar 8, 2017 at 1:32 AM, Phil Scadden <p.scad...@gns.cri.nz> wrote:
> 
>> I would second that guide could be clearer on that. I read and reread
>> several times trying to get my head around the schema.xml/managed-schema
>> bit. I came away from first cursory reading with the idea that
>> managed-schema was mostly for schema-less mode and only after some stuff
>> ups and puzzling over comments in the basic-config schema file itself did I
>> go back for more careful re-read. I am still not sure that I have got all
>> the nuances. My understanding is:
>> 
>> If you don’t want ability to edit it via admin UI or config api, rename to
>> schema.xml. Unclear whether you have to make changes to other configs to do
>> this. Also unclear to me whether there was any upside at all to using
>> schema.xml? Why degrade functionality? Does the capacity for schema.xml
>> only exist for backward compatibility?
>> 
>> If you want to run schema-less, you have to use managed-schema????? (I
>> didn’t delve too deep into this).
>> 
>> In the end, I used basic-config to create core and then hacked
>> managed-schema from there.
>> 
>> 
>> I would have to say the "basic-config" seems distinctly more than basic.
>> It is still a huge file. I thought perhaps I could delete every unused
>> field type, but worried there were some "system" dependencies. Ie if you
>> want *target type wildcard queries do you need to have text_general_reverse
>> and a copy to it? If you always explicitly set only defined fields in a
>> custom indexer, then can you dump the whole dynamic fields bit?
>> Notice: This email and any attachments are confidential and may not be
>> used, published or redistributed without the prior written consent of the
>> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
>> received in error please destroy and immediately notify GNS Science. Do not
>> copy or disclose the contents.
>>

Re: Managed schema vs schema.xml

Reply via email to