Re: Data import batch mode for delta

2018-04-05 Thread Shawn Heisey
On 4/5/2018 7:31 PM, gadelkareem wrote: Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of something like where id IN ('${dataimporter.id})' Because there's only one value for that property. If the deltaQuery returns a million rows, then deltaImportQuery is going to be e

Data import batch mode for delta

2018-04-05 Thread gadelkareem
Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of something like where id IN ('${dataimporter.id})' -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-05 Thread Joe Obernberger
Thank you Joel.  I gave each node in the cluster 24g of heap and it ran, but then failed on the 50th iteration (was trying to do 1,000). This time, I have the error on the node and the exception from the client running the stream command.  The node (Doris) has 3 errors that occurred at the sam

Re: Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-05 Thread Joel Bernstein
Hi Joe, Currently you will eventually run into memory problems if the training sets gets too large. Under the covers on each node it is creating a matrix with a row for each document and a column for each feature. This can get large quite quickly. By choosing fewer features you can make this matri

Re: Storing Ranking Scores And Documents In Separate Indices

2018-04-05 Thread Erick Erickson
Also, Solr has updateable docValues fields (single-valued only) that may be another alternative. Best, Erick On Thu, Apr 5, 2018 at 1:59 PM, Markus Jelsma wrote: > Hello Quynh, > > Solr has support for external file fields [1]. They are a simple key=float > based text file where key is ID, and

Re: Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-05 Thread Joe Obernberger
I tried to build a large model based on about 1.2 million documents.  One of the nodes ran out of memory and killed itself. Is this much data not reasonable to use?  The nodes have 16g of heap.  Happy to increase it, but not sure if this is possible? Thank you! -Joe On 4/5/2018 10:24 AM, Jo

RE: Storing Ranking Scores And Documents In Separate Indices

2018-04-05 Thread Markus Jelsma
Hello Quynh, Solr has support for external file fields [1]. They are a simple key=float based text file where key is ID, and the float can be used for boosting/scoring documents. This is a much simpler approach than using a separate collection. These files can be reloaded every commit and are r

Storing Ranking Scores And Documents In Separate Indices

2018-04-05 Thread Huynh, Quynh
Hey Solr Community, We have a collection of product documents that we’d like to add fields to with ranking scores generated by our data scientists. Two options we’re considering is to either: - Have a separate index that contains all the documents from our product index, but with these

Re: Largest number of indexed documents used by Solr

2018-04-05 Thread Joe Obernberger
50 billion per day?  Wow!  How large are these documents? We have a cluster with one large collection that contains 2.4 billion documents spread across 40 machines using HDFS for the index.  We store our data inside of HBase, and in order to re-index data we pull from HBase and index with solr

Getting "zip bomb" exception while sending HTML document to solr

2018-04-05 Thread Hanjan, Harinder
Hello! I'm sending a HTML document to Solr and Tika is throwing the "Zip bomb detected!" exception back. Looks like Tika has an arbitrary limit of 100 level of XML element nesting (https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-core/src/main/java/org/apache/ti

Re: Largest number of indexed documents used by Solr

2018-04-05 Thread Kelly, Frank
For us we have ~ 350M documents stored using r3.xlarge nodes with 8GB Heap and about 31GB of RAM We are using Solr 5.3.1 in a SolrCloud setup (3 collections, each with 3 shards and 3 replicas). For us lots of RAM memory is not as important as CPU (as the EBS disk we run on top of is quite fast a

Re: PreAnalyzed URP and SchemaRequest API

2018-04-05 Thread David Smiley
Is this really a problem when you could easily enough create a TextField and call setTokenStream? Does your remote client have Solr-core and all its dependencies on the classpath? That's one way to do it... and presumably the direction you are going because you're asking how to work with PreAnal

Re: Copy field on dynamic fields?

2018-04-05 Thread Chris Hostetter
: Have you tried reading existing example schemas? They show various : permutations of copy fields. Hmm... as the example schema's have been simplified/consolidated/purged it seems we have lost the specific examples that are relevant to the users question -- the only instance of a glob'ed copyF

Re: Copy field on dynamic fields?

2018-04-05 Thread Alexandre Rafalovitch
Have you tried reading existing example schemas? They show various permutations of copy fields. Regards, Alex On Thu, Apr 5, 2018, 2:54 AM jatin roy, wrote: > Any update? > > From: jatin roy > Sent: Tuesday, April 3, 2018 12:37 PM > To: solr-user@lucene.apac

Basic Security Plugin and Collection Shard Distribution

2018-04-05 Thread Chris Ulicny
Hi all, I've been periodically running into a strange permissions issues and have finally some useful information on it. We've run into the issue on v6.3.0 and v7.X clusters. Assume we have 2 hosts (1 instance on each) with 2 collections. Collection c1 has 2 shards, and collection c2 has 1 shard.

Re: [ANNOUNCE] Solr Reference Guide for Solr 7.3 released

2018-04-05 Thread Terry Steichen
OK, I guess this means this change been included in 7.3.0  I really appreciate what all of the committers do, so please don't get this wrong. Even with this and the preceding comment, I find it difficult to clearly follow these changes.  Perhaps, as Shawn suggests, any such consolidation and/or ea

Re: [ANNOUNCE] Solr Reference Guide for Solr 7.3 released

2018-04-05 Thread Shawn Heisey
On 4/5/2018 9:05 AM, Terry Steichen wrote: I'm a bit confused because of the issue I was concerned about earlier: https://issues.apache.org/jira/browse/SOLR-11622 It was supposed to be fixed and included in (the then-future) 7.3, but I don't see it there in the listed 7.3.0 changes/bug-fixes. Am

Re: [ANNOUNCE] Solr Reference Guide for Solr 7.3 released

2018-04-05 Thread Steve Rowe
You’re missing Erick Erickson’s last comment on the issue[1]: > Fixed as part of SOLR-11701 SOLR-11701[2] is listed in CHANGES[3]. [1] https://issues.apache.org/jira/browse/SOLR-11622?focusedCommentId=16303006&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-163030

Re: [ANNOUNCE] Solr Reference Guide for Solr 7.3 released

2018-04-05 Thread Terry Steichen
I'm a bit confused because of the issue I was concerned about earlier:  https://issues.apache.org/jira/browse/SOLR-11622 It was supposed to be fixed and included in (the then-future) 7.3, but I don't see it there in the listed 7.3.0 changes/bug-fixes. Am I missing something? On 04/05/2018 10:05 A

Re: ZK CLI script giving IOException doing upconfig

2018-04-05 Thread Shawn Heisey
On 4/5/2018 7:01 AM, Doug Turnbull wrote: My upconfig was also trying to upload the data dir (I had used this as a solr home in a standalone non cloud Solr), I'm missing *conf* here Oops. :)  Uploading an entire core would be a problem! Glad you figured it out. I wonder too if there's anyth

Re: Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-05 Thread Joe Obernberger
Thank you Shawn - sorry so long to respond, been playing around with this a good bit.  It is an amazing capability.  It looks like it could be related to certain nodes in the cluster not responding quickly enough.  In one case, I got the concurrent.ExecutionException, but it looks like the root

[ANNOUNCE] Solr Reference Guide for Solr 7.3 released

2018-04-05 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide for Solr 7.3 is now available. This 1,295 page PDF is the definitive guide to using Apache Solr, the search server built on Apache Lucene. The PDF Guide can be downloaded from: https://www.apache.org/dyn/closer.cgi/lucene/solr/re

Re: ZK CLI script giving IOException doing upconfig

2018-04-05 Thread Doug Turnbull
Shawn, that's the ticket... I see where I screwed up now. My upconfig was also trying to upload the data dir (I had used this as a solr home in a standalone non cloud Solr), I'm missing *conf* here -confdir solr_home/foo/ Changing to: -confdir solr_home/foo/conf works... I wonder too if there

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

2018-04-05 Thread Doss
Hi Emir, Just realised DBQ = Delete by Query, we are not using that, we are deleting documents using the document id / unique id. Thanks, Mohandoss. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to create my schema and add document, thank you

2018-04-05 Thread Adhyan Arizki
Raymond, 1. Please ensure your Solr instance does indeed load up the correct managed-schema file. You do not need to create the file, it should have been created automatically in the newer version of Solr out of the box. you just need to edit it 2. Have you reload your instance after you made the

How to create my schema and add document, thank you

2018-04-05 Thread Raymond Xie
I have the data ready for index now, it is a json file: {"122": "20180320-08:08:35.038", "49": "VIPER", "382": "0", "151": "1.0", "9": "653", "10071": "20180320-08:08:35.088", "15": "JPY", "56": "XSVC", "54": "1", "10202": "APMKTMAKING", "10537": "XOSE", "10217": "Y", "48": "179492540", "201": "1

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

2018-04-05 Thread Emir Arnautović
Hi Mohandoss, I would check to see if thread increase is correlated to DBQ since it does not play well with concurrent indexing: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html HTH, Emir -- Monitoring - Log Management - Ale

Re: some parent documents

2018-04-05 Thread Arturas Mazeika
Hi Mikhail et al, Thanks a lot for sharing the code snippet. I would not have been able to dig this Java file myself to investigate the complexity of the search query. Scanning the code I get a feeling that it is well structured and well thought of. There is a concept like advance (Parent Approxim

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

2018-04-05 Thread Doss
Hi Emir, We do fire delete queries but that is very very minimal. Thanks! Mohandoss -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: ZKPropertiesWriter error DIH (SolrCloud 6.6.1)

2018-04-05 Thread msaunier
I have use this process to create the DIH : 1. Create the BLOB collection: * curl http://localhost:8983/solr/admin/collections?action=CREATE&name=.system 2. Send definition and file for DIH * curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @ solr-dataimporthandler-6

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

2018-04-05 Thread Emir Arnautović
Hi, I’ve seen similar jump in thread number when DBQ was used. Do you delete documents while indexing? Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Apr 2018, at 07:56, Doss wrote: > > @wunder