Hello all, 

We're running Solr 7.3.1 on Docker, trying to save the indexing information on 
Ceph Storage using HDFS + Hadoop-AWS S3A filesystem client. Currently, we start 
2 Solr instances and 3 Zookeepers. 

When Solr is started, we create a test collection with 2 shards and a 
replication factor of 2. Everything works fine and the Ceph Buckets are 
populated correctly. We can see files in Ceph like:

    testcollection/core_node8/data/index/_0.fdt 111 2018-08-01T14:45:18.038Z 
    testcollection/core_node8/data/index/_0.fdx 83 2018-08-01T14:45:16.604Z 
    testcollection/core_node8/data/index/_0.fnm 427 2018-08-01T14:45:22.738Z 

However, when we restart one of the containers, the recovery process apparently 
duplicates the "dataDir" configuration, and we start to see additional files 
like: 

    
testcollection/core_node7/s3a:/bucketname/testcollection/core_node8/data/index/_0.fdt
 111 2018-08-01T14:54:39.361Z 
    
testcollection/core_node7/s3a:/bucketname/testcollection/core_node8/data/index/_0.fdx
 83 2018-08-01T14:54:32.669Z 
    
testcollection/core_node7/s3a:/bucketname/testcollection/core_node8/data/index/_0.fnm
 427 2018-08-01T14:54:58.761Z 

Where "s3a:/bucketname" is the "solr.hdfs.home" value configured in solr.in.sh.

We also noticed that before the restart, the core.properties file does not have 
the "dataDir" property configured. After the restart, the container has this 
property defined as "s3a:/bucketname/testcollection/core_node8/data".

Is this behaviour correct, even if the index files are being duplicated again 
and again in every restart? What could be causing this?

Thanks for your help,

Joaquim Oliveira

-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa 
pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada 
exclusivamente a seu destinatário e pode conter informações confidenciais, 
protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e 
sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, 
por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a 
government company established under Brazilian law (5.615/70) -- is directed 
exclusively to its addressee and may contain confidential data, protected under 
professional secrecy rules. Its unauthorized use is illegal and may subject the 
transgressor to the law's penalties. If you're not the addressee, please send 
it back, elucidating the failure."

Reply via email to