I'm wondering if the shutdown time is too short. When we shutdown the cluster, could it be that it doesn't have enough time to flush? It only happens some of the time; as to which node is seems to be random.
-Joe On Tue, Feb 2, 2016 at 12:49 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Does this happen all the time or only when bringing up Solr on some of > the nodes? > My (naive) question is whether this message: AlreadyBeingCreatedException > could indicate that more than one Solr is trying to access the same > tlog.... > > Best, > Erick > > On Tue, Feb 2, 2016 at 9:01 AM, Scott Stults > <sstu...@opensourceconnections.com> wrote: > > It seems odd that the tlog files are so large. HDFS aside, is there a > > reason why you're not committing? Also, as far as disk space goes, if you > > dip below 50% free you run the risk that the index segments can't be > merged. > > > > > > k/r, > > Scott > > > > On Fri, Jan 29, 2016 at 12:40 AM, Joseph Obernberger < > > joseph.obernber...@gmail.com> wrote: > > > >> Hi All - we're using Apache Solr Cloud 5.2.1, with an HDFS system that > is > >> 86% full. Some of the datanodes in the HDFS cluster are more close to > >> being full than other nodes. We're getting messages about "Error adding > >> log" from the index process, which I **think** is related to datanodes > >> being full. > >> Is that the case? Even though HDFS still has room available? > >> > >> Upon restart of Solr Cloud, we see messages such as: > >> > >> ERROR - 2016-01-28 22:15:27.594; [ UNCLASS] > >> org.apache.solr.common.SolrException; Failure to open existing log file > >> (non fatal) > >> > >> > hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282501:org.apache.solr.common.SolrException: > >> > >> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > >> failed to create file > >> /solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282501 for > >> DFSClient_NONMAPREDUCE_1371064795_29 for client 172.16.100.218 because > >> current leaseholder is trying to recreate file. > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3075) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > >> > >> at > >> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > >> > >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:415) > >> at > >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > >> > >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > >> > >> at > >> > >> > org.apache.solr.update.HdfsTransactionLog.<init>(HdfsTransactionLog.java:131) > >> > >> at > >> org.apache.solr.update.HdfsUpdateLog.init(HdfsUpdateLog.java:193) > >> at > >> org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:136) > >> at > >> org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:94) > >> at > >> > >> > org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:99) > >> > >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > >> Method) > >> at > >> > >> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > >> > >> at > >> > >> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > >> > >> at > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > >> at > org.apache.solr.core.SolrCore.createInstance(SolrCore.java:573) > >> at > >> org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:635) > >> at > >> org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:928) > >> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:786) > >> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:658) > >> at > >> org.apache.solr.core.CoreContainer.create(CoreContainer.java:637) > >> at > >> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:381) > >> at > >> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:375) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) > >> at > >> > >> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148) > >> > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> > >> at java.lang.Thread.run(Thread.java:745) > >> Caused by: > >> > >> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > >> failed to create file > >> /solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282501 for > >> DFSClient_NONMAPREDUCE_1371064795_29 for client 172.16.100.218 because > >> current leaseholder is trying to recreate file. > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3075) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > >> > >> at > >> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > >> > >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:415) > >> at > >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > >> > >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > >> > >> at org.apache.hadoop.ipc.Client.call(Client.java:1472) > >> at org.apache.hadoop.ipc.Client.call(Client.java:1403) > >> at > >> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > >> > >> at com.sun.proxy.$Proxy11.append(Unknown Source) > >> at > >> > >> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:313) > >> > >> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > >> at > >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> > >> at java.lang.reflect.Method.invoke(Method.java:606) > >> at > >> > >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > >> > >> at > >> > >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > >> > >> at com.sun.proxy.$Proxy12.append(Unknown Source) > >> at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1842) > >> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1878) > >> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1871) > >> at > >> > >> > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:329) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325) > >> > >> at > >> > >> > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > >> > >> at > >> > >> > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:325) > >> > >> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1172) > >> at > >> > >> > org.apache.solr.update.HdfsTransactionLog.<init>(HdfsTransactionLog.java:94) > >> > >> ... 21 more > >> > >> INFO - 2016-01-28 22:15:27.639; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; Recovering lease on dfs file > >> > >> > hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> > >> INFO - 2016-01-28 22:15:27.698; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=0 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 58ms > >> INFO - 2016-01-28 22:15:31.821; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=1 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 4181ms > >> INFO - 2016-01-28 22:16:32.971; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=2 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 65331ms > >> INFO - 2016-01-28 22:17:34.638; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=3 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 126998ms > >> INFO - 2016-01-28 22:18:35.764; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=4 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 188124ms > >> INFO - 2016-01-28 22:19:37.114; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=5 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 249474ms > >> INFO - 2016-01-28 22:20:38.629; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=6 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 310989ms > >> INFO - 2016-01-28 22:21:39.751; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=7 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 372111ms > >> INFO - 2016-01-28 22:22:40.854; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=8 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 433214ms > >> INFO - 2016-01-28 22:23:41.981; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=9 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 494341ms > >> INFO - 2016-01-28 22:24:43.088; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=10 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 555448ms > >> INFO - 2016-01-28 22:25:44.808; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=11 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 617168ms > >> INFO - 2016-01-28 22:26:45.934; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=12 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 678294ms > >> INFO - 2016-01-28 22:27:47.036; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=13 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 739396ms > >> INFO - 2016-01-28 22:28:48.504; [ UNCLASS] > >> org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=14 on > >> > >> > file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933 > >> after 800864ms > >> > >> Some shards in the cluster can take hours to come back up. Any ideas? > It > >> appears to wait 900 seconds for each of the tlog files. When there are > 60+ > >> files - this takes a long time! > >> Thank you! > >> > > > > > > > > -- > > Scott Stults | Founder & Solutions Architect | OpenSource Connections, > LLC > > | 434.409.2780 > > http://www.opensourceconnections.com >