Hello User Group,
we run Solr with HDFS and got a lot of the following warning:
Slow ReadProcessor read fields took 15093ms (threshold=1ms); ack:
seqno: 3 reply: SUCCESS reply: SUCCESS reply: SUCCESS
downstreamAckTimeNanos: 798309 flag: 0 flag: 0 flag: 0, targets:
[DatanodeInfoWithStorage[x
gt;share it here and see what ideas others may have.
>
>I have a DB that hold documents (over 1 million and growing). This is
>known as the "Public" DB that holds documents visible to all of my end
>users.
>
>My application let users "check-out" one or more
gt; share it here and see what ideas others may have.
>
> I have a DB that hold documents (over 1 million and growing). This is
> known as the "Public" DB that holds documents visible to all of my end
> users.
>
> My application let users "check-out" one or mor
Hi everyone,
I have a design problem that i"m not sure how to solve best so I figured I
share it here and see what ideas others may have.
I have a DB that hold documents (over 1 million and growing). This is
known as the "Public" DB that holds documents visible to all of my
Susheel, Just a guess, but carrot2.org might be useful. But it might be
overkill. Cheers -- Rick
On August 30, 2017 7:40:08 AM MDT, Susheel Kumar wrote:
>Hello,
>
>I am looking for different ideas/suggestions to solve the use case am
>working on.
>
>We have couple of fields in
Hello,
I am looking for different ideas/suggestions to solve the use case am
working on.
We have couple of fields in schema along with id, business_email and
personal_email. We need to return all records based on unique business and
personal email's.
The criteria for unique records is e
;>
>> Basically someone is hitting start=15 + and rows=20. The start is crazy
>> large.
>>
>> And then they jump around. start=15 then start=213030 etc.
>>
>> Any ideas for how to stop this besides blocking these IPs?
>>
>> Sometimes it
e. SOLR threads are
> going crazy.
>
> Basically someone is hitting start=15 + and rows=20. The start is crazy
> large.
>
> And then they jump around. start=15 then start=213030 etc.
>
> Any ideas for how to stop this besides blocking these IPs?
>
> Sometim
15 + and rows=20. The start is crazy
> large.
>
> And then they jump around. start=15 then start=213030 etc.
>
> Any ideas for how to stop this besides blocking these IPs?
>
> Sometimes it is Google doing it even though these search results are set
> with N
going crazy.
>
> Basically someone is hitting start=15 + and rows=20. The start is crazy
> large.
>
> And then they jump around. start=15 then start=213030 etc.
>
> Any ideas for how to stop this besides blocking these IPs?
>
> Sometimes it is Google doing it eve
We have some Denial of service attacks on our web site. SOLR threads are
going crazy.
Basically someone is hitting start=15 + and rows=20. The start is crazy
large.
And then they jump around. start=15 then start=213030 etc.
Any ideas for how to stop this besides blocking these IPs
Ian:
Thanks much for the writeup! It's always good to have real-world documentation!
Best,
Erick
On Fri, Nov 7, 2014 at 8:26 AM, Shawn Heisey wrote:
> On 11/7/2014 7:17 AM, Ian Rose wrote:
>> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
>> the docs to where they n
On 11/7/2014 7:17 AM, Ian Rose wrote:
> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
> the docs to where they need to go) is very expensive, more than I
> expected. Using a "smart" router that uses the cluster config to route
> documents directly to their shard resul
Hi again, all -
Since several people were kind enough to jump in to offer advice on this
thread, I wanted to follow up in case anyone finds this useful in the
future.
*tl;dr: *Routing updates to a random Solr node (and then letting it forward
the docs to where they need to go) is very expensive,
bq: but it should be more or less a constant factor no matter how many
Solr nodes you are using, right?
Not really. You've stated that you're not driving Solr very hard in
your tests. Therefore you're waiting on I/O. Therefore your tests just
aren't going to scale linearly with the number of shard
On 11/1/2014 9:52 AM, Ian Rose wrote:
> Just to make sure I am thinking about this right: batching will certainly
> make a big difference in performance, but it should be more or less a
> constant factor no matter how many Solr nodes you are using, right? Right
> now in my load tests, I'm not actu
Erick,
Just to make sure I am thinking about this right: batching will certainly
make a big difference in performance, but it should be more or less a
constant factor no matter how many Solr nodes you are using, right? Right
now in my load tests, I'm not actually that concerned about the absolute
Yes, I was inadvertently sending them to a replica. When I sent them to the
leader, the leader reported (1000 adds) and the replica reported only 1 add
per document. So, it looks like the leader forwards the batched jobs
individually to the replicas.
On Fri, Oct 31, 2014 at 3:26 PM, Erick Erickson
Internally, the docs are batched up into smaller buckets (10 as I
remember) and forwarded to the correct shard leader. I suspect that's
what you're seeing.
Erick
On Fri, Oct 31, 2014 at 12:20 PM, Peter Keegan wrote:
> Regarding batch indexing:
> When I send batches of 1000 docs to a standalone S
Regarding batch indexing:
When I send batches of 1000 docs to a standalone Solr server, the log file
reports "(1000 adds)" in LogUpdateProcessor. But when I send them to the
leader of a replicated index, the leader log file reports much smaller
numbers, usually "(12 adds)". Why do the batches appea
NP, just making sure.
I suspect you'll get lots more bang for the buck, and
results much more closely matching your expectations if
1> you batch up a bunch of docs at once rather than
sending them one at a time. That's probably the easiest
thing to try. Sending docs one at a time is something of
Hi Erick -
Thanks for the detailed response and apologies for my confusing
terminology. I should have said "WPS" (writes per second) instead of QPS
but I didn't want to introduce a weird new acronym since QPS is well
known. Clearly a bad decision on my part. To clarify: I am doing
*only* writes
I'm really confused:
bq: I am not issuing any queries, only writes (document inserts)
bq: It's clear that once the load test client has ~40 simulated users
bq: A cluster of 3 shards over 3 Solr nodes *should* support
a higher QPS than 2 shards over 2 Solr nodes, right
QPS is usually used to mea
Thanks for the suggestions so for, all.
1) We are not using SolrJ on the client (not using Java at all) but I am
working on writing a "smart" router so that we can always send to the
correct node. I am certainly curious to see how that changes things.
Nonetheless even with the overhead of extra r
Your indexing client, if written in SolrJ, should use CloudSolrServer
which is, in Matt's terms "leader aware". It divides up the
documents to be indexed into packets that where each doc in
the packet belongs on the same shard, and then sends the packet
to the shard leader. This avoids a lot of re-
On 10/30/2014 2:56 PM, Ian Rose wrote:
> I think this is true only for actual queries, right? I am not issuing
> any queries, only writes (document inserts). In the case of writes,
> increasing the number of shards should increase my throughput (in
> ops/sec) more or less linearly, right?
No, that
If you are issuing writes to shard non-leaders, then there is a large overhead
for the eventual redirect to the leader. I noticed a 3-5 times performance
increase by making my write client leader aware.
On Oct 30, 2014, at 2:56 PM, Ian Rose wrote:
>>
>> If you want to increase QPS, you shoul
>
> If you want to increase QPS, you should not be increasing numShards.
> You need to increase replicationFactor. When your numShards matches the
> number of servers, every single server will be doing part of the work
> for every query.
I think this is true only for actual queries, right? I a
On 10/30/2014 2:23 PM, Ian Rose wrote:
> My methodology is as follows.
> 1. Start up a K solr servers.
> 2. Remove all existing collections.
> 3. Create N collections, with numShards=K for each.
> 4. Start load testing. Every minute, print the number of successful
> updates and the number of faile
I'm hoping to get some ideas on where I should be
looking to debug this. Apologies in advance for the length of this email;
I'm trying to be comprehensive and provide all relevant information.
Our setup:
1 load generating client
- generates tiny, fake documents with unique IDs
- perf
Correctly spelled words are returning as not spelled correctly, with the
original, correctly spelled word with a single oddball character appended as
multiple suggestions...
--
Ed Smiley, Senior Software Architect, eBooks
ProQuest | 161 E Evelyn Ave|
Mountain View, CA 94041 | USA |
+1 650 475 87
it works best when the data is
> denormalized).,..
>
> Is there any other way / idea by which I can reduce the redundancy of
> creating multiple records for a particular person again and again?
>
>
>
>
>
>
>
> --
> View this message in context:
> http://luce
dancy of
creating multiple records for a particular person again and again?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Need-ideas-to-perform-historical-search-tp4078980.html
Sent from the Solr - User mailing list archive at Nabble.com.
time, 308 ms
total time)
ScorerDocQueue.topNextAndAdjustElsePop:120 (0ms self time, 308 ms
total time)
ScorerDocQueue.checkAdjustElsePop:135 (0ms self time, 111 ms total
time)
ScorerDocQueue.downHeap:212
(111ms self time, 111 ms total time)
---snip---
An
-Original Message-
From: Sohail Aboobaker [mailto:sabooba...@gmail.com]
Sent: Thursday, July 05, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: Any ideas on Solr 4.0 Release.
Hi,
Congratulations on Alpha release. I am wondering is there a ball park on final
release for 4.0? Is it expe
Hi,
Congratulations on Alpha release. I am wondering is there a ball park on
final release for 4.0? Is it expected in August or Sep time frame or is it
further away? We badly need some features included in this release. These
are around grouped facet counts. We have limited use for Solr in our
cur
: Thursday, June 28, 2012 9:20 PM
Subject: RE: Strange "spikes" in query response times...any ideas
where else to look?
Michael,
Thank you for responding...and for the excellent questions.
1) We have never seen this response time spike with a
user-interactive search. However,
esources (cores
or IO).
Otis
Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm
>
> From: "s...@isshomefront.com"
>To: solr-user@lucene.apache.org
>Sent: Thursday, June 28, 2012 9:20 PM
>Subject: RE: Str
3) Are you running multiple queries concurrently, or are you just
using a single thread in JMeter?
-Michael
-Original Message-
From: s...@isshomefront.com [mailto:s...@isshomefront.com]
Sent: Thursday, June 28, 2012 7:56 PM
To: solr-user@lucene.apache.org
Subject: Strange "spik
28, 2012 7:56 PM
To: solr-user@lucene.apache.org
Subject: Strange "spikes" in query response times...any ideas where else to
look?
Greetings all,
We are working on building up a large Solr index for over 300 million
records...and this is our first look at Solr. We are currently runni
Greetings all,
We are working on building up a large Solr index for over 300 million
records...and this is our first look at Solr. We are currently running
a set of unique search queries against a single server (so no
replication, no indexing going on at the same time, and no distributed
ae...@dot.wi.gov]
> Enviado el: lunes, 15 de agosto de 2011 14:54
> Para: solr-user@lucene.apache.org
> Asunto: RE: ideas for indexing large amount of pdf docs
>
> Note on i: Solr replication provides pretty good clustering support
> out-of-the-box, including replication of m
{
print "Query: lnamesyn:$lname AND fnamesyn:$fname$fuzzy";
print $response->content();
}
print "POST for $fname $lname completed, HTTP status=" .
$response->code . "\n";
}
$elapsed = time() - $starttime;
$average
t, 13 Aug 2011 15:34:19 -0400
Subject: Re: ideas for indexing large amount of pdf docs
Ahhh, ok, my reply was irrelevant ...
Here's a good write-up on this problem:
http://www.lucidimagination.com/content/scaling-lucene-and-solr
[http://www.lucidimagination.com/content/scaling-lucen
tering in production time.
>
> Best,
>
> Rode.
>
>
> -Original Message-
>
> From: Erick Erickson
>
> To: solr-user@lucene.apache.org
>
> Date: Sat, 13 Aug 2011 12:13:27 -0400
>
> Subject: Re: ideas for indexing large amount of pdf docs
>
>
You could send PDF for processing using a queue solution like Amazon SQS. Kick
off Amazon instances to process the queue.
Once you process with Tika to text just send the update to Solr.
Bill Bell
Sent from mobile
On Aug 13, 2011, at 10:13 AM, Erick Erickson wrote:
> Yeah, parsing PDF files
dea to minimize this
time all as possible when we entering in production time.
Best,
Rode.
-Original Message-
From: Erick Erickson
To: solr-user@lucene.apache.org
Date: Sat, 13 Aug 2011 12:13:27 -0400
Subject: Re: ideas for indexing large amount of pdf docs
Yeah, parsing PDF
Yeah, parsing PDF files can be pretty resource-intensive, so one solution
is to offload it somewhere else. You can use the Tika libraries in SolrJ
to parse the PDFs on as many clients as you want, just transmitting the
results to Solr for indexing.
HOw are all these docs being submitted? Is this s
Hi all,
I want to ask about the best way to implement a solution for indexing a
large amount of pdf documents between 10-60 MB each one. 100 to 1000 users
connected simultaneously.
I actually have 1 core of solr 3.3.0 and it works fine for a few number of
pdf docs but I'm afraid about the mome
I think a 30% increase is acceptable. Yes, I think we'll try it.
Although our case is more like # groups ~ # documents / N, where N is a
smallish number (~1-5?). We are planning for a variety of different
index sizes, but aiming for a sweet spot around a few M docs.
-Mike
On 08/01/2011 11:
Hi Mike, how many docs and groups do you have in your index?
I think the group.sort option fits your requirements.
If I remember correctly group.ngroup=true adds something like 30% extra time
on top of the search request with grouping,
but that was on my local test dataset (~30M docs, ~8000 groups
Thanks, Tomas. Yes we are planning to keep a "current" flag in the most
current document. But there are cases where, for a given user, the most
current document is not that one, because they only have access to some
older documents.
I took a look at http://wiki.apache.org/solr/FieldCollapsin
Hi Michael, I guess this could be solved using grouping as you said.
Documents inside a group can be sorted on a field (in your case, the version
field, see parameter group.sort), and you can show only the first one. It
will be more complex to show facets (post grouping faceting is work in
progress
A customer has an interesting problem: some documents will have multiple
versions. In search results, only the most recent version of a given
document should be shown. The trick is that each user has access to a
different set of document versions, and each user should see only the
most recent v
t; Sent: Wednesday, June 29, 2011 12:18 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr just 'hangs' under load test - ideas?
>
> Can you get a thread dump to see what is hanging?
>
> -Yonik
> http://www.lucidimagination.com
>
> On Wed, Jun 29, 2011 at 1
on a prototype, and had no problems
> (though that was Solr 1.4 at the time). It ramped up beautifully - bottle
> necks were our apps, not Solr. What I'm benchmarking now is a descendent of
> that prototyping - a bit more complex on searches and more fields in the
> schem
hat was Solr 1.4 at the time). It ramped up beautifully - bottle
necks were our apps, not Solr. What I'm benchmarking now is a descendent of
that prototyping - a bit more complex on searches and more fields in the
schema, but same basic search logic as far as SolrJ usage.
Any ideas? What e
Cuong,
I think you will need some manipulation beyond solr queries. You should
separate the results by your site criteria after retrieving them. After
that, you could cache the results on your application and randomize the
lists every time you render the a page.
I don't know if solr has collapsin
Hi Alexander,
Thanks for your suggestion. I think my problem is a bit different from
yours. We don't have any sponsored words but we have to retrieve sponsored
results directly from the index. This is because a site can have 60,000
products which is hard to insert/update keywords. I can live with
Cuong,
I have implemented sponsored words for a client. I don't know if my working
can help you but I will expose it and let you decide.
I have an index containing products entries that I created a field called
sponsored words. What I do is to boost this field , so when these words are
matched in
Hi all,
I'm trying to implement "sponsored results" in Solr search results similar
to that of Google. We index products from various sites and would like to
allow certain sites to promote their products. My approach is to query a
slave instance to get sponsored results for user queries in addition
I've been struggling with how to get various bits of structured data
into solr documents. In various projects I have tried various ideas,
but none feel great.
Take a simple example where I want a document field to be the list of
linked data with name, ID, and path. I have tried things
On Thu, 9 Aug 2007 15:23:03 -0700
"Lance Norskog" <[EMAIL PROTECTED]> wrote:
> Underlying this all, you have a sneaky network performance problem. Your
> successive posts do not reuse a TCP socket. Obvious: re-opening a new socket
> each post takes time. Not obvious: your server has sockets buildi
Message-
From: Kevin Holmes [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 09, 2007 8:13 AM
To: solr-user@lucene.apache.org
Subject: Any clever ideas to inject into solr? Without http?
I inherited an existing (working) solr indexing script that runs like
this:
Python script queries the
On 8/9/07, Kevin Holmes <[EMAIL PROTECTED]> wrote:
> Python script queries the mysql DB then calls bash script
>
> Bash script performs a curl POST submit to solr
For the most up-to-date solr client for python, check out
https://issues.apache.org/jira/browse/SOLR-216
-Yonik
Is this a native feature, or do we need to get creative with scp from
one server to the other?
If it's a contention between search and indexing, separate them
via a query-slave and an index-master.
--cw
On 8/9/07, Siegfried Goeschl <[EMAIL PROTECTED]> wrote:
> +) my colleague just finished a database import service running within
> the servlet container to avoid writing out the data to the file system
> and transmitting it over HTTP.
Most people doing this read data out of the database and constr
Hi Kevin,
I'm also a newbie but some thoughts along the line ...
+) for evaluating SOLR we used a less exotic setup for data import base
on Pnuts (a JVM based scripting language) ... :-) ... but Groovy would
do as well if you feel at home with Java.
+) my colleague just finished a database i
On 8/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On 8/9/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > Plus, I have to believe there's a faster way to get documents
> > into solr/lucene than using curl
Oh yeah, and by "curl" I assume you meant HTTP in general. You
certainly don't want to
On 8/9/07, David Whalen <[EMAIL PROTECTED]> wrote:
> Plus, I have to believe there's a faster way to get documents
> into solr/lucene than using curl
One issue with HTTP is latency. You can get around that by adding
multiple documents per request, or by using multiple threads
concurrently.
Y
___
> david whalen
> senior applications developer
> eNR Services, Inc.
> [EMAIL PROTECTED]
> 203-849-7240
>
>
> > -Original Message-
> > From: Clay Webster [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, August 09, 2007 11:43 AM
> >
On Aug 9, 2007, at 11:12 AM, Kevin Holmes wrote:
2: Is there a way to inject into solr without using POST / curl /
http?
Check http://wiki.apache.org/solr/EmbeddedSolr
There's examples in java and cocoa to use the DirectSolrConnection
class, querying and updating solr w/o a web serve
es, Inc.
[EMAIL PROTECTED]
203-849-7240
-Original Message-
From: Clay Webster [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 09, 2007 11:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Any clever ideas to inject into solr? Without http?
Condensing the loader into a single executab
August 09, 2007 11:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Any clever ideas to inject into solr? Without http?
>
> Condensing the loader into a single executable sounds right
> if you have performance problems. ;-)
>
> You could also try adding multiple s in a s
Condensing the loader into a single executable sounds right if
you have performance problems. ;-)
You could also try adding multiple s in a single post if you
notice your problems are with tcp setup time, though if you're
doing localhost connections that should be minimal.
If you're already local
I inherited an existing (working) solr indexing script that runs like
this:
Python script queries the mysql DB then calls bash script
Bash script performs a curl POST submit to solr
We're injecting about 1000 records / minute (constantly), frequently
pushing the edge of our CPU / RAM limit
On 5/30/07, Daniel Einspanjer <[EMAIL PROTECTED]> wrote:
What I quickly found I could do without though was the HTTP overhead.
I implemented the EmbeddedSolr class found on the Solr wiki that let
me interact with the Solr engine directly. This is important since I'm
doing thousands of queries in
On 4/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: Not really. The explain scores aren't normalized and I also couldn't
: find a way to get the explain data as anything other than a whitespace
: formatted text blob from Solr. Keep in mind that they need confidence
the defualt way Solr du
Yes, for good (hopefully)
or bad.
-Sean
Shridhar Venkatraman wrote on 5/7/2007, 12:37 AM:
Interesting..
Surrogates can also bring the searcher's subjectivity (opinion and
context) into it by the learning process ?
shridhar
Sean Timm wrote:
It may not be easy or even possible
withou
Interesting..
Surrogates can also bring the searcher's subjectivity (opinion and
context) into it by the learning process ?
shridhar
Sean Timm wrote:
It may not be easy or even possible without major changes, but having
global collection statistics would allow scores to be compared across
It may not be easy or even possible without major changes, but having
global collection statistics would allow scores to be compared across
searchers. To do this, the master indexes would need to be able to
communicate with each other.
An other approach to merging across searchers is describe
On 4/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
A custom Similaity class with simplified tf, idf, and queryNorm functions
might also help you get scores from the Explain method that are more
easily manageable since you'll have predictible query structures hard
coded into your application
I'm not certain that i understand exactly what you are describing, but
there was some discussion a while back that may be similar...
http://issues.apache.org/jira/browse/SOLR-109
...there's not a lot in the issue itself, but the linked discussion may be
fruitful for you.
if what you are describ
I really like the flexibility of naming request handlers to append general
constraints / filters.
Has anyone spun thoughts around something like a "solr.ParmSubstHandler" or any
way to pass maybe a special
ps=0:discussions; ps=1:images; ps=2:false
...
.
category:[0]
splay the
following scores:
title: 1.0
director: .8
year: .6
overall: 2.4
I looked at the javadocs related to the FunctionQuery class because it
looked interesting, but the actual docs were a bit light and I wasn't
able to determine if it might help me out with this need.
Does this sound unreaso
85 matches
Mail list logo