Hi,
When doing a distributed query from solr 4.10.4 ,getting below exception
org.apache.solr.common.SolrException: org.apache.http.ParseException: Invalid
content type:
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
org
On 12/18/2014 12:35 AM, rashi gandhi wrote:
> Also, as per our investigation currently there is work ongoing in SOLR
> community to support this concept of distributed/Global IDF. But, I wanted
> to know if there is any solution possible right now to manage/control the
> score of the documents duri
Hi,
This is regarding the issue that we are facing with SOLR distributed search.
In our application, we are managing multiple shards at SOLR server to
manage the load. But there is a problem with the order of results that we
going to return to client during the search.
For Example: Currently
Thanks shawn i will try to think in that way too :)
With Regards
Aman Tandon
On Fri, Jun 6, 2014 at 8:19 PM, Shawn Heisey wrote:
> On 6/6/2014 8:31 AM, Aman Tandon wrote:
> > In my organisation we also want to implement the solrcloud, but the
> problem
> > is that, we are using the master-slav
On 6/6/2014 8:31 AM, Aman Tandon wrote:
> In my organisation we also want to implement the solrcloud, but the problem
> is that, we are using the master-slave architecture and on master we do all
> indexing, architecture of master is lower than the slaves.
>
> So if we implement the solrcloud in a
Thanks shawn.
In my organisation we also want to implement the solrcloud, but the problem
is that, we are using the master-slave architecture and on master we do all
indexing, architecture of master is lower than the slaves.
So if we implement the solrcloud in a fashion that master will be the
le
Thanks shawn.
In my organisation we also want to implement the solrcloud, but the problem
is that, we are using the master-slave architecture and on master we do all
indexing, architecture of master is lower than the slaves.
So if we implement the solrcloud in a fashion that master will be the
le
On 6/6/2014 6:25 AM, Aman Tandon wrote:
> Does this *shards* parameter will also work in near future with solr 5?
I am not aware of any plan to deprecate or remove the shards parameter.
My personal experience is with versions from 1.4.0 through 4.7.2. It
works in all of those versions. Without
Hi,
Does this *shards* parameter will also work in near future with solr 5?
With Regards
Aman Tandon
On Thu, Jun 5, 2014 at 2:59 PM, Mahmoud Almokadem
wrote:
> Hi, you can search using this sample Url
>
>
> http://localhost:8080/solr/core1/select?q=*:*&shards=localhost:8080/solr/core1,localh
Hi, you can search using this sample Url
http://localhost:8080/solr/core1/select?q=*:*&shards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3
Mahmoud Almokadem
On Thu, Jun 5, 2014 at 8:13 AM, Anurag Verma wrote:
> Hi,
> Can you please help me solr distrib
Hi,
Can you please help me solr distribued search in multicore? i would
be very happy as i am stuck here.
In java code how do i implement distributed search?
--
Thanks & Regards
Anurag Verma
This copying is a bit overstated here because of the way that small
segments are merged into larger segments. Those larger segments are then
copied much less often than the smaller ones.
While you can wind up with lots of copying in certain extreme cases, it is
quite rare. In particular, if you
Here is an example of schema design: a PDF file of 5MB might have
maybe 50k of actual text. The Solr ExtractingRequestHandler will find
that text and only index that. If you set the field to stored=true,
the 5mb will be saved. If saved=false, the PDF is not saved. Instead,
you would store a link to
For data of this size you may want to look at something like Apache
Cassandra, which is made specifically to handle data at this kind of
scale across many machines.
You can still use Hadoop to analyse and transform the data in a
performant manner, however it's probably best to do some research on
ote:
> >
> > > Hi,
> > >
> > > I have a basic question, let's say we're going to have a very very huge
> > set
> > > of data.
> > > In a way that for sure we will need many servers (tens or hundreds of
> > > servers).
> &g
Alireza Salimi
> wrote:
>
> > Hi,
> >
> > I have a basic question, let's say we're going to have a very very huge
> set
> > of data.
> > In a way that for sure we will need many servers (tens or hundreds of
> > servers).
> > We will
to have a very very huge set
> of data.
> In a way that for sure we will need many servers (tens or hundreds of
> servers).
> We will also need failover.
> Now the question is, if we should use Hadoop or using Solr Distributed
> Search
> with shards would be enough
Hi,
I have a basic question, let's say we're going to have a very very huge set
of data.
In a way that for sure we will need many servers (tens or hundreds of
servers).
We will also need failover.
Now the question is, if we should use Hadoop or using Solr Distributed
Search
with shard
Interesting info.
You should look into using Solid State Drives. I moved my search engine to
SSD and saw dramatic improvements.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Huge-Performance-Solr-distributed-search-tp3530627p346.html
Sent from the Solr - User
Problem has been resolved. My disk subsystem been a bottleneck for quick search.
I put my indexes to RAM and I see very nice QTimes :)
Sorry for your time, guys.
On Mon, Nov 28, 2011 at 4:02 PM, Artem Lokotosh wrote:
> Hi all again. Thanks to all for your replies.
>
> On this weekend I'd made som
Hi all again. Thanks to all for your replies.
On this weekend I'd made some interesting tests, and I would like to share it
with you.
First of all I made speed test of my hdd:
root@LSolr:~# hdparm -t /dev/sda9
/dev/sda9:
Timing buffered disk reads: 146 MB in 3.01 seconds = 48.54 MB/se
in general terms, when your Java heap is so large, it is beneficial to
set mx and ms to the same size.
On Wed, Nov 23, 2011 at 5:12 AM, Artem Lokotosh wrote:
> Hi!
>
> * Data:
> - Solr 3.4;
> - 30 shards ~ 13GB, 27-29M docs each shard.
>
> * Machine parameters (Ubuntu 10.04 LTS):
> user@Solr:~$ u
On 11/25/2011 3:13 AM, Mark Miller wrote:
When you search each shard, are you positive that you are using all of the
same parameters? You are sure you are hitting request handlers that are
configured exactly the same and sending exactly the same queries?
I'm my experience, the overhead for dist
45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
shard given 12GB of RAM max.
filterCache class="solr.FastLRUCache" size="1200" initialSize="1200"
autowarmCount="128"/>
true
50
200
In you case I would first check if the network throu
On Thu, Nov 24, 2011 at 12:09 PM, Artem Lokotosh wrote:
> >How big are the documents you return (how many fields, avg KB per doc,
> etc.)?
> I have a following schema in my solr configuration name="field1" type="text" indexed="true" stored="false"/> name="field2" type="text" indexed="true" stored
>How big are the documents you return (how many fields, avg KB per doc, etc.)?
I have a following schema in my solr configuration
27M–30M docs and 12-15 GB for each shard, 0.5KB per doc
>Does performance get much better if you only request top 100, or top>10
>documents instead of top 1000?
>> Can you merge, e.g. 3 shards together or is it much effort for your
>> team?>Yes, we can merge. We'll try to do this and review how it will works
Merge does not help :(I've tried to merge two shards in one, three
shards in one, but results are similar to results first configuration
with 30 shar
If you request 1000 docs from each shard, then aggregator is really
fetching 30,000 total documents, which then it must merge (re-sort
results, and take top 1000 to return to client). Its possible that
SOLR merging implementation needs optimized, but it does not seem like
it could be that slow. H
> If the response time from each shard shows decent figures, then aggregator>
> seems to be a bottleneck. Do you btw have a lot of concurrent users?For now
> is not a problem, but we expect from 1K to 10K of concurrent users and maybe
> more
On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan wrote:
>
If the response time from each shard shows decent figures, then aggregator
seems to be a bottleneck. Do you btw have a lot of concurrent users?
On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh wrote:
> > Is this log from the frontend SOLR (aggregator) or from a shard?
> from aggregator
>
> > Can
> Is this log from the frontend SOLR (aggregator) or from a shard?
from aggregator
> Can you merge, e.g. 3 shards together or is it much effort for your team?
Yes, we can merge. We'll try to do this and review how it will works
Thanks, Dmitry
Any another ideas?
On Wed, Nov 23, 2011 at 4:01 PM, D
Hello,
Is this log from the frontend SOLR (aggregator) or from a shard?
Can you merge, e.g. 3 shards together or is it much effort for your team?
In our setup we currently have 16 shards with ~30GB each, but we rarely
search in all of them at once.
Best,
Dmitry
On Wed, Nov 23, 2011 at 3:12 PM,
Hi!
* Data:
- Solr 3.4;
- 30 shards ~ 13GB, 27-29M docs each shard.
* Machine parameters (Ubuntu 10.04 LTS):
user@Solr:~$ uname -a
Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
x86_64 GNU/Linux
user@Solr:~$ cat /proc/cpuinfo
processor : 0 - 3
vendor_id : Genui
hi
建议你自己搭个环境测试一下吧,1M这点儿数据一点儿问题没有
2011/9/30 秦鹏凯 :
> Hi all,
>
> Now I'm doing research on solr distributed search, and it
> is said documents more than one million is reasonable to use
> distributed search.
> So I want to know, does anyone have the test
> result(S
Hi all,
Now I'm doing research on solr distributed search, and it
is said documents more than one million is reasonable to use
distributed search.
So I want to know, does anyone have the test
result(Such as time cost) of using single index and distributed search
of more than one million
n a
distributed productive system.
Kind Regards
Gregor
On 09/29/2011 12:14 PM, Pengkai Qin wrote:
Hi all,
Now I'm doing research on solr distributed search, and it is said
documents more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result
September 29, 2011 5:15 AM
To: solr-user@lucene.apache.org; d...@lucene.apache.org
Subject: About solr distributed search
Hi all,
Now I'm doing research on solr distributed search, and it is said documents
more than one million is reasonable to use distributed search.
So I want to know,
Hi all,
Now I'm doing research on solr distributed search, and it is said documents
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using
single index and distributed search of more than one million da
explicit
enum
1
10
192.168.1.6/solr/,192.168.1.7/solr/
2011/8/19 Li Li
> could you please show me your configuration in solrconfig.xml?
>
> On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
> wrote:
> > Hi,
> > I do not use spell but I use
could you please show me your configuration in solrconfig.xml?
On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
wrote:
> Hi,
> I do not use spell but I use distributed search, using qt=spell is correct,
> should not use qt=\spell.
> For "shards", I specify it in solrconfig directly, not in url, bu
Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For "shards", I specify it in solrconfig directly, not in url, but should
work the same.
Maybe an issue in your spell request handler.
2011/8/19 Li Li
> hi all,
> I follow the wiki http
hi all,
I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
but there is something wrong.
the url given my the wiki is
http://solr:8983/solr/select?q=*:*&spellcheck=true&spellcheck.build=true&spellcheck.q=toyata&qt=spell&shards.qt=spell&shards=solr-shard1:8983/solr,solr-shar
If you look at the Solr wiki, one of the limitations of distributed
searching it mentions is with regards to the start parameter.
http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
"Makes it more inefficient to use a high "start" parameter. For example, if
you request
On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote:
> And a third potential reason - it's arguably a feature instead of a bug
> for some applications. Depending on how I organize my shards, "give me
> the most relevant document from each shard for this search" seems like
> it could be useful.
You
Andrzej Bialecki wrote:
> On 2010-10-25 11:22, Toke Eskildsen wrote:
>> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
>>> But itshows a problem of distrubted search without common idf.
>>> A doc will get different score in different shard.
>> Bingo.
>>
>> I really don't understand why this funda
On 2010-10-25 13:37, Toke Eskildsen wrote:
> On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
>> * there is an exact solution to this problem, namely to make two
>> distributed calls instead of one (first call to collect per-shard IDFs
>> for given query terms, second call to submit a que
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
> * there is an exact solution to this problem, namely to make two
> distributed calls instead of one (first call to collect per-shard IDFs
> for given query terms, second call to submit a query rewritten with the
> global IDF-s). This solu
On 2010-10-25 11:22, Toke Eskildsen wrote:
> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
>> But itshows a problem of distrubted search without common idf.
>> A doc will get different score in different shard.
>
> Bingo.
>
> I really don't understand why this fundamental problem with sharding
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
> But itshows a problem of distrubted search without common idf.
> A doc will get different score in different shard.
Bingo.
I really don't understand why this fundamental problem with sharding
isn't mentioned more often. Every time the advice "use
use doc_X from shard_A or
>>> shard_B, since they will all have got the same scores.
>>
>> That only works if the docs are exactly the same - they may not be.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p995407.html
Sent from the Solr - User mailing list archive at Nabble.com.
the solr version I used is 1.4
2010/7/26 Li Li :
> where is the link of this patch?
>
> 2010/7/24 Yonik Seeley :
>> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
>>> why do we do not send the output of TermsComponent of every node in the
>>> cluster to a Hadoop instance?
>>> Since TermsComponent
where is the link of this patch?
2010/7/24 Yonik Seeley :
> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
>> why do we do not send the output of TermsComponent of every node in the
>> cluster to a Hadoop instance?
>> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
>> onl
distributed IDF
(like at the mentioned JIRA-issue) to normalize your results's scoring.
But the mentioned problem at this mailing-list-posting has nothing to do
with that...
Regards
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-s
On Fri, Jul 23, 2010 at 2:40 PM, MitchK wrote:
> That only works if the docs are exactly the same - they may not be.
> Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
> don't they?
Documents aren't supposed to be duplicated across shards... so the
presence of multiple
That only works if the docs are exactly the same - they may not be.
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html
Sent fro
gt;> elements. I guess we could rebuild the current priority queue if we
>> detect a duplicate, but that will have an obvious performance impact.
>> Any other suggestions?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
> why do we do not send the output of TermsComponent of every node in the
> cluster to a Hadoop instance?
> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
> only needs to reduce the stuff. Maybe we even do not need Hadoop for
> http://www.lucidimagination.com
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html
Sent from the Solr - User mailing list archive at Nabble.com.
: As the comments suggest, it's not a bug, but just the best we can do
: for now since our priority queues don't support removal of arbitrary
FYI: I updated the DistributedSearch wiki to be more clear about this --
it previously didn't make it explicitly clear that docIds were suppose to
be uni
As the comments suggest, it's not a bug, but just the best we can do
for now since our priority queues don't support removal of arbitrary
elements. I guess we could rebuild the current priority queue if we
detect a duplicate, but that will have an obvious performance impact.
Any other suggestions?
case, Solr sees
> the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
> doc maybe would occur at page 10 in pagination, although it *should* occur
> at page 1 or 2.
>
> Kind regards,
> - Mitch
> --
> View this message in context:
> http://lucene
sees
the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
doc maybe would occur at page 10 in pagination, although it *should* occur
at page 1 or 2.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
How about sorting over the score? Would that be possible?
On Jul 21, 2010, at 12:13 AM, Li Li wrote:
> in QueryComponent.mergeIds. It will remove document which has
> duplicated uniqueKey with others. In current implementation, it use
> the first encountered.
> String prevShard = uniqueD
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983880.html
Sent from the Solr - User mailing list archive at Nabble.com.
not sure, but I think you can't prevent this without custom coding or
> making a document's occurence unique.
>
> Kind regards,
> - Mitch
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771
que.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
Sent from the Solr - User mailing list archive at Nabble.com.
, which may not be intended by the user.
>
> Kind regards,
> - Mitch
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Li Li,
this is the intended behaviour, not a bug.
Otherwise you could get back the same record in a response for several
times, which may not be intended by the user.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
in QueryComponent.mergeIds. It will remove document which has
duplicated uniqueKey with others. In current implementation, it use
the first encountered.
String prevShard = uniqueDoc.put(id, srsp.getShard());
if (prevShard != null) {
// duplicate detected
Or if you must for some reason, you can raise the limit with the following
system property:
org.mortbay.http.HttpRequest.maxFormContentSize=50
You could also do it in the servlet context, and I think there is even
a way in jetty.xml.
--
- Mark
http://www.lucidimagination.com
On Thu, Jul 2, 2009 at 12:14 AM, GiriGG wrote:
>
> Hi All,
>
> I am trying to do a distributed search and getting the below error. Please
> let me know if you know how to solve this issue.
>
> 18:20:28,202 ERROR [STDERR] Caused by:
> org.apache.solr.common.SolrException:
> *Form_too_large*
> __ja
SolrQuery query = new SolrQuery();
query.setQuery(queryStr);
response = solr.query(query);
SolrDocumentList docs = response.getResults();
long docNum = docs.getNumFound();
--
View this message in context:
http://www.nabble.com/Solr-Distributed-Search-throws-org.apache.sol
Thanks for bringing closure to this Raakhi.
- Mark
Rakhi Khatwani wrote:
Hi Mark,
i actually got this error coz i was using an old version of
java. now the problem is solved
Thanks anyways
Raakhi
On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani wrote:
Hi Mark,
Hi Mark,
i actually got this error coz i was using an old version of
java. now the problem is solved
Thanks anyways
Raakhi
On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani wrote:
> Hi Mark,
> yea i would like to open a JIRA issue for it. how do i go about
> that?
>
>
Hi Mark,
yea i would like to open a JIRA issue for it. how do i go about
that?
Regards,
Raakhi
On Mon, Jun 8, 2009 at 7:58 PM, Mark Miller wrote:
> That is a very odd cast exception to get. Do you want to open a JIRA issue
> for this?
>
> It looks like an odd exception because the
That is a very odd cast exception to get. Do you want to open a JIRA
issue for this?
It looks like an odd exception because the call is:
NodeList nodes = (NodeList)solrConfig.evaluate(configPath,
XPathConstants.NODESET); // cast exception is we get an ArrayList rather
than NodeList
Which
Hi,
I was executing a simple example which demonstrates DistributedSearch.
example provided in the following link:
http://wiki.apache.org/solr/DistributedSearch
however, when i startup the server in both port nos: 8983 and 7574, i get
the following exception:
SEVERE: Could not start S
yes - it's all new indexes. I can search them individually, but adding
"shards" throws "Connection Reset" error. Is there any way I can debug
this or any other pointers?
-vivek
On Fri, Apr 10, 2009 at 4:49 AM, Shalin Shekhar Mangar
wrote:
> On Fri, Apr 10, 2009 at 7:50 AM, vivek sar wrote:
>
>>
On Fri, Apr 10, 2009 at 7:50 AM, vivek sar wrote:
> Just an update. I changed the schema to store the unique id field, but
> I still get the connection reset exception. I did notice that if there
> is no data in the core then it returns the 0 result (no exception),
> but if there is data and you
Just an update. I changed the schema to store the unique id field, but
I still get the connection reset exception. I did notice that if there
is no data in the core then it returns the 0 result (no exception),
but if there is data and you search using "shards" parameter I get the
connection reset e
I think the reason behind the "connection reset" is. Looking at the
code it points to QueryComponent.mergeIds()
resultIds.put(shardDoc.id.toString(), shardDoc);
looks like the doc unique id is returning null. I'm not sure how is it
possible as its a required field. Right my unique id is not store
Hi,
I've another thread on multi-core distributed search, but just
wanted to put a simple question here on distributed search to get some
response. I've a search query,
http://etsx19.co.com:8080/solr/20090409_9/select?q=usa -
returns with 10 result
now if I add "shards" parameter to it,
82 matches
Mail list logo