Re: Solr Wide Column - not work when with 200k dynamic fields

Alexandre Rafalovitch Tue, 30 Aug 2016 02:43:32 -0700

Perhaps it would make sense to explore a nested structure with your
key/value being a child document. Then, you match on child document
and load the parent document.


I don't think either first or second solution is a very resilient Solr
architecture.

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 30 August 2016 at 14:13, Jeffery Yuan <yuanyun...@gmail.com> wrote:
> We store marketing message in Solr, one type is voucher message, this message
> stores a map: key is accountId, value is this user's voucher - we also
> search on some other fields.
>
> *My current approach is to add a field: accountIds, index it and search
> against it.*
> Another field: details, not-indexed, json data, which stores the acocuntId
> to voucher map.
>
> Then we have a transformer in solr side, that will only keep this user's
> voucher - remove all other user's voucher.
>
> This works, with 2gb memory and 3 nodes in same machine, it can handle at
> least 1 million accountIds. - 35 mb file which contains 1 million accountId
> and voucher codes.
>
> But today I am thinking of another approach,
> *Create a dynamic field: voucher_*, the field name is voucher_acount123,
> it's value is acount123's voucher.
> *
>
> The query would be simple, and its performance would be better. - as no need
> to load the whole details field to memory and filter out other vouchers.
>
> This works for small data, but it failed when I try to index 200,000 voucher
> - (thus 200,000 dynamic fields)
>
> The document is created, but not all dynamic fields are created.
>
> Server side error:
> WARN  - 2016-08-30 06:37:49.529; [message shard1 core_node1
> message_shard1_replica3] org.eclipse.jetty.servlet.ServletHandler; Error for
> /solr/message_shard1_replica3/update
> java.lang.OutOfMemoryError: Java heap space
>         at org.apache.lucene.util.packed.Direct16.<init>(Direct16.java:37)
>         at 
> org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:993)
>         at 
> org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:976)
>         at 
> org.apache.lucene.codecs.compressing.LZ4$HashTable.reset(LZ4.java:193)
>         at org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:217)
>         at
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:164)
>         at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:235)
>         at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:165)
>         at
> org.apache.lucene.index.DefaultIndexingChain.finishStoredFields(DefaultIndexingChain.java:270)
>         at
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:311)
>         at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
>         at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)
>         at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1363)
>         at
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
>         at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
>         at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
>         at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
>         at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
>         at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
>         at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706)
>         at
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
>         at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:101)
>         at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
>         at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:241)
>         at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:206)
>         at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:126)
>         at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
>         at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:111)
>         at 
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
>         at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
>
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /configs/messagen/params.json
>
> solrj client:
> megaphone 2016-08-29 23:22:14 INFO  777756
> [http-nio-0.0.0.0-8080-exec-6-SendThread(localhost:9983)] o.a.z.ClientCnxn -
> Client session timed out, have not heard from server in 6249ms for sessionid
> 0x156da0d9e0b0003, closing socket connection and attempting reconnect
>
> I also tried to change voucher_* to stored only - indexed=false,
> I am able to import message with 1 million accountId and voucher, - means 1
> million stored only dynamic fields.
> But this takes too much memory, it needs 1.5 gb to serve this message - I
> change memory to 512mb, solr can't be started at all.
>
> when changed to 500m
> ERROR - 2016-08-30 07:04:19.866; [   message_shard1_replica3]
> org.apache.solr.core.CoreContainer; Error creating core
> [message_shard1_replica3]: Java heap space
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.HashMap.resize(HashMap.java:703)
>         at java.util.HashMap.putVal(HashMap.java:662)
>         at java.util.HashMap.put(HashMap.java:611)
>         at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:65)
>         at
> org.apache.lucene.codecs.lucene50.Lucene50FieldInfosFormat.read(Lucene50FieldInfosFormat.java:166)
>         at 
> org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWriter.java:912)
>         at
> org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:924)
>         at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:854)
>         at 
> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:78)
>         at 
> org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:65)
>         at
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:273)
>         at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:116)
>         at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1626)
>         at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1769)
>         at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:911)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:788)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:658)
>
> I know solr's data structure is much more complex than canssandra, but is it
> possible to support 1 million indexable dynamic fields in solr - at least
> for stored-only dynamic fields?
>
> This will make solr more useful in some cases.
>
> Thanks for any input.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Wide-Column-not-work-when-with-200k-dynamic-fields-tp4293911.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Wide Column - not work when with 200k dynamic fields

Reply via email to