Re: child doc filter

2016-11-03 Thread Mikhail Khludnev
Hello Tim, I think http://blog-archive.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html provides a few relevant examples. To summarize, it's wort to use nested query syntax {!parent... v=$foo} to nest a complex child clause. If you need to exploit filter cache, use filter(foo:b

Re: UpdateProcessor as a batch

2016-11-03 Thread Joel Bernstein
This might be useful. In this scenario you load you content into Solr for staging and perform your ETL from Solr to Solr: http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html Basically Solr becomes a text processing warehouse. Joel Bernstein http://joelsolr.blogspot.com/

Re: UpdateProcessor as a batch

2016-11-03 Thread Alexandre Rafalovitch
How big a batch we are talking about? Because I believe you could accumulate the docs in the first URP in the processAdd and then do the batch lookup and actually processing of them on processCommit. They are daisy chain, so as long as you are holding on to the chain, the rest of the URPs don't h

Re: UpdateProcessor as a batch

2016-11-03 Thread mike st. john
maybe introduce a distributed queue such as apache ignite, hazelcast or even redis. Read from the queue in batches, do your lookup then index the same batch. just a thought. Mike St. John. On Nov 3, 2016 3:58 PM, "Erick Erickson" wrote: > I thought we might be talking past each other... > >

Re: UpdateProcessor as a batch

2016-11-03 Thread Erick Erickson
I thought we might be talking past each other... I think you're into "roll your own" here. Anything that accumulated docs for a while, did a batch lookup on the external system, then passed on the docs runs the risk of losing docs if the server is abnormally shut down. I guess ideally you'd like

child doc filter

2016-11-03 Thread Tim Williams
I'm using the BlockJoinQuery to query child docs and return the parent. I'd like to have the equivalent of a filter that applies to child docs and I don't see a way to do that with the BlockJoin stuffs. It looks like I could modify it to accept some childFilter param and add a QueryWrapperFilter r

RE: UpdateProcessor as a batch

2016-11-03 Thread Markus Jelsma
Hi - i believe i did not explain myself well enough. Getting the data in Solr is not a problem, various sources index docs to Solr, all in fine batches as everyone should do indeed. The thing is that i need to do some preprocessing before it is indexed. Normally, UpdateProcessors are the way to

Re: Problem with Password Decryption in Data Import Handler

2016-11-03 Thread William Bell
OK it was echo -n "${encrypt_key}" > encrypt.key On Thu, Nov 3, 2016 at 12:20 PM, William Bell wrote: > I cannot get it to work either. > > Here are my steps. I took the key from the Patch in > https://issues.apache.org/jira/secure/attachment/12730862/SOLR-4392.patch. > > echo U2FsdGVkX19Gz7q

Re: Problem with Password Decryption in Data Import Handler

2016-11-03 Thread William Bell
I cannot get it to work either. Here are my steps. I took the key from the Patch in https://issues.apache.org/jira/secure/attachment/12730862/SOLR-4392.patch. echo U2FsdGVkX19Gz7q7/4jj3Wsin7801TlFbob1PBT2YEacbPEUARDiuV5zGSAwU4Sz7upXDEPIQPU48oY1fBWM6Q== > pass.enc openssl aes-128-cbc -d -a -salt

Re: High CPU Usage in export handler

2016-11-03 Thread Erick Erickson
Followup question: You say you're indexing 100 docs/second. How often are you _committing_? Either soft commit or hardcommit with openSearcher=true ? Best, Erick On Thu, Nov 3, 2016 at 11:00 AM, Ray Niu wrote: > Thanks Joel > here is the information you requested. > Are you doing heavy writes

Re: Posting files 405 http error

2016-11-03 Thread Pablo Anzorena
When I manually copy one collection to another, I copy the core.properties from the source to the destination with the name core.properties.unloaded so there is no problem. So the steps I'm doing are: 1> index to my source collection. 2>Copy the directory of the source collection, excluding the co

Re: UpdateProcessor as a batch

2016-11-03 Thread Erick Erickson
I _thought_ you'd been around long enough to know about the options I mentioned ;). Right. I'd guess you're in UpdateHandler.addDoc and there's really no batching at that level that I know of. I'm pretty sure that even indexing batches of 1,000 documents from, say, SolrJ go through this method. I

Re: High CPU Usage in export handler

2016-11-03 Thread Ray Niu
the soft commit is 15 seconds and hard commit is 10 minutes. 2016-11-03 11:11 GMT-07:00 Erick Erickson : > Followup question: You say you're indexing 100 docs/second. How often > are you _committing_? Either > soft commit > or > hardcommit with openSearcher=true > > ? > > Best, > Erick > > On Th

RE: UpdateProcessor as a batch

2016-11-03 Thread Markus Jelsma
Erick - in this case data can come from anywhere. There is one piece of code all incoming documents, regardless of their origin, are passed thru, the update handler and update processors of Solr. In my case that is the most convenient point to partially modify the documents, instead of moving t

Re: UpdateProcessor as a batch

2016-11-03 Thread Erick Erickson
Markus: How are you indexing? SolrJ has a client.add(List) form, and post.jar lets you add as many documents as you want in a batch Best, Erick On Thu, Nov 3, 2016 at 10:18 AM, Markus Jelsma wrote: > Hi - i need to process a batch of documents on update but i cannot seem to > find a point

Re: High CPU Usage in export handler

2016-11-03 Thread Ray Niu
Thanks Joel here is the information you requested. Are you doing heavy writes at the time? we are doing write very frequently, but not very heavy, we will update about 100 solr document per second. How many concurrent reads are are happening? the concurrent reads are about 1000-2000 per minute per

Re: Posting files 405 http error

2016-11-03 Thread Erick Erickson
Wait. What were you doing originally? Just copying the entire SOLR_HOME over or something? Because one of the things each core carries along is a "core.properties" file that identifies 1> the name of the core, something like collection_shard1_replica1 2> the name of the collection the core belongs

Re: Posting files 405 http error

2016-11-03 Thread Pablo Anzorena
Thanks Shawn. Actually there is no load balancer or proxy in the middle, but even if there was, how would you explain that I can index if a create a completely new collection? I figured out how to fix it. What I'm doing is creating a new collection, then unloading it (by unloading all the shards/

Re: display searched for text in Solr 6

2016-11-03 Thread Binoy Dalal
Are you sure that the text is stored in the _text_ field? Try q=*:*&fl=_text_ If you see stuff being printed then this field does have data, else this field is empty. To check which filed have data, try using the schema browser. On Thu, Nov 3, 2016 at 10:43 PM win harrington wrote: > I inserted

UpdateProcessor as a batch

2016-11-03 Thread Markus Jelsma
Hi - i need to process a batch of documents on update but i cannot seem to find a point where i can hook in and process a list of SolrInputDocuments, not in UpdateProcessor nor in UpdateHandler. For now i let it go and implemented it on a per-document basis, it is fast, but i'd prefer batches.

Re: display searched for text in Solr 6

2016-11-03 Thread win harrington
I inserted five /opt/solr/*.txt files for testing. Four of the files contain the word 'notice'.Solr finds 4 documents, but I can't see the text. http://localhost:8983/solr/core1/select?fl=_text_&indent=on&q=notice&wt=json "response":{"numFound":4, "start":0, "docs"{{},{},{},{}} On Thursday,

Re: display searched for text in Solr 6

2016-11-03 Thread Binoy Dalal
Append the fields you want to display to the query using the fl parameter. Eg. q=something&fl=_text_ On Thu, Nov 3, 2016 at 10:28 PM win harrington wrote: > I used solr/post to insert some *.txt files intoSolr 6. I can search for > words in Solr and itreturns the id with the file name. > How do

display searched for text in Solr 6

2016-11-03 Thread win harrington
I used solr/post to insert some *.txt files intoSolr 6. I can search for words in Solr and itreturns the id with the file name. How do I display the text? managed-schema has Thank you.

Re: Problem with Password Decryption in Data Import Handler

2016-11-03 Thread Jamie Jackson
You were right, Fuad. There was a flaw in my script (inconsistent naming of the `plain_db_pwd` variable. Thanks for figuring that out. For posterity, here's the fixed script: encrypt_key=your_encryption_key plain_db

Re: Posting files 405 http error

2016-11-03 Thread Shawn Heisey
On 11/3/2016 9:10 AM, Pablo Anzorena wrote: > Thanks for the answer. > > I checked the log and it wasn't logging anything. > > The error i'm facing is way bizarre... I create a new fresh collection and > then index with no problem, but it keeps throwing this error if i copy the > collection from on

RE: CachedSqlEntityProcessor with delta-import

2016-11-03 Thread Mohan, Sowmya
Thanks. We did implement the delete by query on another core and thought of giving the delta import a try here. Looks like differential via full index and deletes using delete by id/query is the way to go. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: T

Re: Posting files 405 http error

2016-11-03 Thread Pablo Anzorena
Thanks for the answer. I checked the log and it wasn't logging anything. The error i'm facing is way bizarre... I create a new fresh collection and then index with no problem, but it keeps throwing this error if i copy the collection from one solrcloud to the other and then index. Any clue on wh

Re: Poor Solr Cloud Query Performance against a Small Dataset

2016-11-03 Thread Dave Seltzer
Good tip Rick, I'll dig in and make sure everything is set up correctly. Thanks! -D Dave Seltzer Chief Systems Architect TVEyes (203) 254-3600 x222 On Wed, Nov 2, 2016 at 9:05 PM, Rick Leir wrote: > Here is a wild guess. Whenever I see a 5 second delay in networking, I > think DNS timeouts.

Re: Apache Solr Question

2016-11-03 Thread Erick Erickson
bq: I have encountered someone who has a collection with five billion documents in it... I know of installations many times that. Admittedly when you start getting into the 100s of billions you must plan carefully Erick On Thu, Nov 3, 2016 at 7:44 AM, Susheel Kumar wrote: > For media like i

Re: Apache Solr Question

2016-11-03 Thread Susheel Kumar
For media like images etc, there is LIRE solr plugin which can be utilised. I have used in the past and may meet your requirement. See http://www.lire-project.net/ Thanks, Susheel On Thu, Nov 3, 2016 at 9:57 AM, Shawn Heisey wrote: > On 11/3/2016 2:49 AM, Chien Nguyen wrote: > > Hi everyone! I'

RE: Apache Solr Question

2016-11-03 Thread Davis, Daniel (NIH/NLM) [C]
Case in point - https://collections.nlm.nih.gov/ has one index (core) for documents and another index (core) for pages within the documents. I think LOC (Library of Congress) does something similar from a presentation they gave at Lucene/DC Exchange. -Original Message- From: Doug Turnbul

Re: Apache Solr Question

2016-11-03 Thread Doug Turnbull
For general search use cases, it's generally not a good idea to index giant documents. A relevance score for an entire book is generally less meaningful than if you can break it up into chapters or sections. Those subdivisions are often much more useful to a user from a usability standpoint for und

Re: Apache Solr Question

2016-11-03 Thread Shawn Heisey
On 11/3/2016 2:49 AM, Chien Nguyen wrote: > Hi everyone! I'm a newbie in using Apache Solr. I've read some > documents about it. But i can't answer some questions. Second reply, so I'm aiming for more detail. > 1. How many documents Solr can search at a moment?? A *single* Solr index has Lucen

Re: High CPU Usage in export handler

2016-11-03 Thread Joel Bernstein
Are you doing heavy writes at the time? How many concurrent reads are are happening? What version of Solr are you using? What is the field definition for the double, is it docValues? Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Nov 3, 2016 at 12:56 AM, Ray Niu wrote: > Hello: >

Re: Apache Solr Question

2016-11-03 Thread Rick Leir
On November 3, 2016 4:49:07 AM EDT, Chien Nguyen wrote: >Hi everyone! >I'm a newbie in using Apache Solr. Welcome! > I've read some documents about it. >But i >can't answer some questions. >1. How many documents Solr can search at a moment?? I would like to say unlimited. But it depends on

Re: edixmax

2016-11-03 Thread Rafael Merino García
Hi, You were absolutely right, there was a *string* field defined in the qf parameter... Using mm.autoRelax parameter did the trick Thank you so much! Regards On Wed, Nov 2, 2016 at 5:15 PM, Vincenzo D'Amore wrote: > Hi Rafael, > > I suggest to check all the fields present in your qf looking for

Apache Solr Question

2016-11-03 Thread Chien Nguyen
Hi everyone! I'm a newbie in using Apache Solr. I've read some documents about it. But i can't answer some questions. 1. How many documents Solr can search at a moment?? 2. Can Solr index the media data?? 3. What's the max size of document that Solr can index??? Can you help me and explain it f