Hello,

I run a kerberized SolrCloud (7.4.0) environment (shipped with Cloudera
6.3.2) and have the problem that all index files for all cores are created
under the same ${solr.solr.home} directory instead of ${solr.core.name} and
thus they are corrupt.

Cloudera uses the HDFSDirectoryFactory by default - but this is largely
unusable for large collections with billions of documents, so we switched
to storing the indexes locally (on /data/solr) a long time ago.

I should probably mention that we have another cluster (CDH6.3.1) which
doesn't have this problem.

Also it might be that there is a Sentry authorization problem (I'm still
investigating) but nevertheless it seems to me like Solr should never ever
use the same data folder for two different cores, so this might also be a
bug in Solr.

Any ideas of what is wrong here?

Thank you,
Razvan

PS:

Here is what it looks like on disk:

$ ls /data/solr/

index  niofs_test_shard1_replica_n1
prod_applogs_20200616_shard2_replica_n2  snapshot_metadata  tlog

[a6709018@cdcdhp10 ~]$ ls /data/solr/niofs_test_shard1_replica_n1/

core.properties

[a6709018@cdcdhp10 ~]$ ls /data/solr/niofs_test_shard1_replica_n1/

core.properties


You can see that there are two core directories (niofs_test
and prod_applogs) and then there are all data-related folders on the same
level. The core folders contain only one file (core.properties).


Here is how Solr is started on one machine. (I removed some security
related properties from the listing):


/usr/lib/jvm/java-openjdk/bin/java -server -XX:NewRatio=3
-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
-XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled -XX:-OmitStackTraceInFastThrow
-Xlog:gc*:file=/var/log/solr/solr_gc.log:time,uptime:filecount=9,filesize=20000
-DzkClientTimeout=15000 -DzkHost=cdcdhp02.bigdatap.de.comdirect.com:2181,
cdcdhp20.bigdatap.de.comdirect.com:2181,
cdcdhp22.bigdatap.de.comdirect.com:2181/solr -Dsolr.log.dir=/var/log/solr
-Djetty.port=8985 -DSTOP.PORT=7985 -DSTOP.KEY=csearch
-Duser.timezone=GMT+0200
-Djetty.home=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/server
-Dsolr.solr.home=/data/solr -Dsolr.data.home=
-Dsolr.install.dir=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr
-Dsolr.default.confdir=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/server/solr/configsets/_default/conf
-Xss256k -DwaitForZk=60 -Dsolr.host=cdcdhp10.bigdatap.de.comdirect.com
-DuseCachedStatsBetweenGetMBeanInfoCalls=true
-DdisableSolrFieldCacheMBeanEntryListJmx=true
-Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/518-solr-SOLR_SERVER/log4j.properties
-Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
-Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/data/solr
-Dsolr.authorization.sentry.site=/var/run/cloudera-scm-agent/process/518-solr-SOLR_SERVER/sentry-conf/sentry-site.xml
-Dsolr.sentry.override.plugins=true -jar start.jar --module=https
--lib=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/server/solr-webapp/webapp/WEB-INF/lib/*
--lib=/var/run/cloudera-scm-agent/process/518-solr-SOLR_SERVER/hadoop-conf

Reply via email to