Hello Tim,
I think
http://blog-archive.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
provides a few relevant examples. To summarize, it's wort to use nested
query syntax {!parent... v=$foo} to nest a complex child clause. If you
need to exploit filter cache, use filter(foo:b
This might be useful. In this scenario you load you content into Solr for
staging and perform your ETL from Solr to Solr:
http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html
Basically Solr becomes a text processing warehouse.
Joel Bernstein
http://joelsolr.blogspot.com/
How big a batch we are talking about?
Because I believe you could accumulate the docs in the first URP in
the processAdd and then do the batch lookup and actually processing of
them on processCommit.
They are daisy chain, so as long as you are holding on to the chain,
the rest of the URPs don't h
maybe introduce a distributed queue such as apache ignite, hazelcast or
even redis. Read from the queue in batches, do your lookup then index the
same batch.
just a thought.
Mike St. John.
On Nov 3, 2016 3:58 PM, "Erick Erickson" wrote:
> I thought we might be talking past each other...
>
>
I thought we might be talking past each other...
I think you're into "roll your own" here. Anything that
accumulated docs for a while, did a batch lookup
on the external system, then passed on the docs
runs the risk of losing docs if the server is abnormally
shut down.
I guess ideally you'd like
I'm using the BlockJoinQuery to query child docs and return the
parent. I'd like to have the equivalent of a filter that applies to
child docs and I don't see a way to do that with the BlockJoin stuffs.
It looks like I could modify it to accept some childFilter param and
add a QueryWrapperFilter r
Hi - i believe i did not explain myself well enough.
Getting the data in Solr is not a problem, various sources index docs to Solr,
all in fine batches as everyone should do indeed. The thing is that i need to
do some preprocessing before it is indexed. Normally, UpdateProcessors are the
way to
OK it was
echo -n "${encrypt_key}" > encrypt.key
On Thu, Nov 3, 2016 at 12:20 PM, William Bell wrote:
> I cannot get it to work either.
>
> Here are my steps. I took the key from the Patch in
> https://issues.apache.org/jira/secure/attachment/12730862/SOLR-4392.patch.
>
> echo U2FsdGVkX19Gz7q
I cannot get it to work either.
Here are my steps. I took the key from the Patch in
https://issues.apache.org/jira/secure/attachment/12730862/SOLR-4392.patch.
echo
U2FsdGVkX19Gz7q7/4jj3Wsin7801TlFbob1PBT2YEacbPEUARDiuV5zGSAwU4Sz7upXDEPIQPU48oY1fBWM6Q==
> pass.enc
openssl aes-128-cbc -d -a -salt
Followup question: You say you're indexing 100 docs/second. How often
are you _committing_? Either
soft commit
or
hardcommit with openSearcher=true
?
Best,
Erick
On Thu, Nov 3, 2016 at 11:00 AM, Ray Niu wrote:
> Thanks Joel
> here is the information you requested.
> Are you doing heavy writes
When I manually copy one collection to another, I copy the core.properties
from the source to the destination with the name core.properties.unloaded
so there is no problem.
So the steps I'm doing are:
1> index to my source collection.
2>Copy the directory of the source collection, excluding the
co
I _thought_ you'd been around long enough to know about the options I
mentioned ;).
Right. I'd guess you're in UpdateHandler.addDoc and there's really no
batching at that level that I know of. I'm pretty sure that even
indexing batches of 1,000 documents from, say, SolrJ go through this
method.
I
the soft commit is 15 seconds and hard commit is 10 minutes.
2016-11-03 11:11 GMT-07:00 Erick Erickson :
> Followup question: You say you're indexing 100 docs/second. How often
> are you _committing_? Either
> soft commit
> or
> hardcommit with openSearcher=true
>
> ?
>
> Best,
> Erick
>
> On Th
Erick - in this case data can come from anywhere. There is one piece of code
all incoming documents, regardless of their origin, are passed thru, the update
handler and update processors of Solr.
In my case that is the most convenient point to partially modify the documents,
instead of moving t
Markus:
How are you indexing? SolrJ has a client.add(List)
form, and post.jar lets you add as many documents as you want in a
batch
Best,
Erick
On Thu, Nov 3, 2016 at 10:18 AM, Markus Jelsma
wrote:
> Hi - i need to process a batch of documents on update but i cannot seem to
> find a point
Thanks Joel
here is the information you requested.
Are you doing heavy writes at the time?
we are doing write very frequently, but not very heavy, we will update
about 100 solr document per second.
How many concurrent reads are are happening?
the concurrent reads are about 1000-2000 per minute per
Wait. What were you doing originally? Just copying the entire
SOLR_HOME over or something?
Because one of the things each core carries along is a
"core.properties" file that identifies
1> the name of the core, something like collection_shard1_replica1
2> the name of the collection the core belongs
Thanks Shawn.
Actually there is no load balancer or proxy in the middle, but even if
there was, how would you explain that I can index if a create a completely
new collection?
I figured out how to fix it. What I'm doing is creating a new collection,
then unloading it (by unloading all the shards/
Are you sure that the text is stored in the _text_ field? Try
q=*:*&fl=_text_
If you see stuff being printed then this field does have data, else this
field is empty. To check which filed have data, try using the schema
browser.
On Thu, Nov 3, 2016 at 10:43 PM win harrington
wrote:
> I inserted
Hi - i need to process a batch of documents on update but i cannot seem to find
a point where i can hook in and process a list of SolrInputDocuments, not in
UpdateProcessor nor in UpdateHandler.
For now i let it go and implemented it on a per-document basis, it is fast, but
i'd prefer batches.
I inserted five /opt/solr/*.txt files for testing. Four of the files contain
the word 'notice'.Solr finds 4 documents, but I can't see the text.
http://localhost:8983/solr/core1/select?fl=_text_&indent=on&q=notice&wt=json
"response":{"numFound":4, "start":0, "docs"{{},{},{},{}}
On Thursday,
Append the fields you want to display to the query using the fl parameter.
Eg. q=something&fl=_text_
On Thu, Nov 3, 2016 at 10:28 PM win harrington
wrote:
> I used solr/post to insert some *.txt files intoSolr 6. I can search for
> words in Solr and itreturns the id with the file name.
> How do
I used solr/post to insert some *.txt files intoSolr 6. I can search for words
in Solr and itreturns the id with the file name.
How do I display the text?
managed-schema has
Thank you.
You were right, Fuad. There was a flaw in my script (inconsistent naming of
the `plain_db_pwd` variable.
Thanks for figuring that out.
For posterity, here's the fixed script:
encrypt_key=your_encryption_key
plain_db
On 11/3/2016 9:10 AM, Pablo Anzorena wrote:
> Thanks for the answer.
>
> I checked the log and it wasn't logging anything.
>
> The error i'm facing is way bizarre... I create a new fresh collection and
> then index with no problem, but it keeps throwing this error if i copy the
> collection from on
Thanks. We did implement the delete by query on another core and thought of
giving the delta import a try here. Looks like differential via full index and
deletes using delete by id/query is the way to go.
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: T
Thanks for the answer.
I checked the log and it wasn't logging anything.
The error i'm facing is way bizarre... I create a new fresh collection and
then index with no problem, but it keeps throwing this error if i copy the
collection from one solrcloud to the other and then index.
Any clue on wh
Good tip Rick,
I'll dig in and make sure everything is set up correctly.
Thanks!
-D
Dave Seltzer
Chief Systems Architect
TVEyes
(203) 254-3600 x222
On Wed, Nov 2, 2016 at 9:05 PM, Rick Leir wrote:
> Here is a wild guess. Whenever I see a 5 second delay in networking, I
> think DNS timeouts.
bq: I have encountered someone who has a collection with five billion
documents in it...
I know of installations many times that. Admittedly when you start
getting into the 100s of billions you must plan carefully
Erick
On Thu, Nov 3, 2016 at 7:44 AM, Susheel Kumar wrote:
> For media like i
For media like images etc, there is LIRE solr plugin which can be utilised.
I have used in the past and may meet your requirement. See
http://www.lire-project.net/
Thanks,
Susheel
On Thu, Nov 3, 2016 at 9:57 AM, Shawn Heisey wrote:
> On 11/3/2016 2:49 AM, Chien Nguyen wrote:
> > Hi everyone! I'
Case in point - https://collections.nlm.nih.gov/ has one index (core) for
documents and another index (core) for pages within the documents.
I think LOC (Library of Congress) does something similar from a presentation
they gave at Lucene/DC Exchange.
-Original Message-
From: Doug Turnbul
For general search use cases, it's generally not a good idea to index giant
documents. A relevance score for an entire book is generally less
meaningful than if you can break it up into chapters or sections. Those
subdivisions are often much more useful to a user from a usability
standpoint for und
On 11/3/2016 2:49 AM, Chien Nguyen wrote:
> Hi everyone! I'm a newbie in using Apache Solr. I've read some
> documents about it. But i can't answer some questions.
Second reply, so I'm aiming for more detail.
> 1. How many documents Solr can search at a moment??
A *single* Solr index has Lucen
Are you doing heavy writes at the time?
How many concurrent reads are are happening?
What version of Solr are you using?
What is the field definition for the double, is it docValues?
Joel Bernstein
http://joelsolr.blogspot.com/
On Thu, Nov 3, 2016 at 12:56 AM, Ray Niu wrote:
> Hello:
>
On November 3, 2016 4:49:07 AM EDT, Chien Nguyen wrote:
>Hi everyone!
>I'm a newbie in using Apache Solr.
Welcome!
> I've read some documents about it.
>But i
>can't answer some questions.
>1. How many documents Solr can search at a moment??
I would like to say unlimited. But it depends on
Hi,
You were absolutely right, there was a *string* field defined in the qf
parameter...
Using mm.autoRelax parameter did the trick
Thank you so much!
Regards
On Wed, Nov 2, 2016 at 5:15 PM, Vincenzo D'Amore wrote:
> Hi Rafael,
>
> I suggest to check all the fields present in your qf looking for
Hi everyone!
I'm a newbie in using Apache Solr. I've read some documents about it. But i
can't answer some questions.
1. How many documents Solr can search at a moment??
2. Can Solr index the media data??
3. What's the max size of document that Solr can index???
Can you help me and explain it f
37 matches
Mail list logo