If I were you and not knowing all your details...
I would optimize indices that are static (not being modified) and
would optimize down to 1 segment.
I would do it when search traffic is low.
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - h
what will happen if in my query I specify a greater number for rows than the
queryResultWindowSize in my solrconfig.xml
for example, if queryResultWindowSize=100, but I need process a batch query
from solr with rows=1000 each time and vary the start move on... what will
happen? if I do not turn o
Hi Mikhail,
thank you for your answer. Maybe my sample data was a not so god. The document
always have additional data which I need to use as facet like this:
3
value
A
B
...
200
400
...
Torben
Am 05.10.2012 um 17:20 schrieb Mikhail Khludnev:
> denor
Mr. Miller said that it depends
If you create your collection with the collections api, then replicationFactor
will only see the currently live nodes, not nodes started later.
However, collections added to solr.xml on all nodes, will participate in auto
role assignment for new nodes started. I
Does DIH support only deleting/re-indexing docs of a certain type?
I.E. can I have a DIH for type:vegetable and another for type:mineral
and each only deletes/recreates the right types?
Thanks.
On Fri, Oct 5, 2012 at 1:04 PM, Walter Underwood wrote:
> Using the same unique key doesn't handle do
Well, using embedded Solr isn't necessarily indicated. I have a couple
of questions.
1> you say 30 tps. Are you sending a single doc at a time or batching them
up? I.e. server.add(doclist) or server.add(doc)?
2> Http isn't actually an inefficient protocol, I think the whole idea of
using embedded
My first reaction is you have too much stuff on a single machine. Your
cumulative index size is 2.4 TB. Granted, it's a beefy machine, but still...
And index size isn't all the helpful, as it includes the raw stored data which
doesn't really come into play for sizing things, subtract out the
*.fdt
But look what you're asking Solr to do. 250K queries. Let's say you get 100 QPS,
which for a single box isn't bad. That's still 2,500 seconds, roughly
40 minutes.
But you still haven't told us what QPS you're seeing. Or what you need to see.
Or what kind of results you need from your queries. Perh
If you need to use solr in an embedded application, this is the recommended
approach. It allows you to work with the same interface whether or not you
have access to HTTP.
And it is not thread safe.
On Sat, Oct 6, 2012 at 1:58 AM, balaji.gandhi
wrote:
> Sushil, we are trying to call the VIP in f
Sushil, we are trying to call the VIP in front of the SOLR nodes to distribute
the update load.
Also is EmbeddedSolrServer thread safe?
Balaji Gandhi, Senior Software Developer, Horizontal Platform Services
Product Engineering │ Apollo Group, Inc.
1225 W. Washington St. | AZ23 | Tempe, AZ
Hi Eric,
I am in a major dilemma with my index now. I have got 8 cores each around
300 GB in size and half of them are deleted documents in it and above that
each has got around 100 segments as well. Do i issue a expungeDelete and
allow the merge policy to take care of the segments or optimize the
Already started .. if you want to follow and give feedback :)
https://issues.apache.org/jira/browse/SOLR-3915
On Friday, October 5, 2012 at 7:53 PM, Kristopher Kane wrote:
> I also vote for a legend on the monitor.
Yes, I'd recommend EmbeddedSolrServer, because it doesn't require any web
server for read/write/update/delete operations.
On Sat, Oct 6, 2012 at 1:48 AM, balaji.gandhi
wrote:
> Sushil,
>
> 30 TPS = 30 transactions (updates) per second.
>
> Is the recommendation to use EmbeddedSolrServer instead o
Hi,
I think you should store this outside of Solr, in a DB or file or
Redis (key is doc ID, value is a query=>position map) or ...
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Fri, Oct 5, 2012 at 5:13 A
Hi Ahmet,
thank you, it sounds great:)
I will test it in the next days and give feedback.
Best regards
Vadim
2012/10/5 Ahmet Arslan :
> Hi Vadim,
>
> I attached a zip (solr plugin) file to SOLR-1604. This not a patch. This is
> supposed to work with solr 4.0. Some tests fails but it should wor
Sushil,
30 TPS = 30 transactions (updates) per second.
Is the recommendation to use EmbeddedSolrServer instead of HttpSolrServer?
Thanks,
Balaji
Balaji Gandhi, Senior Software Developer, Horizontal Platform Services
Product Engineering │ Apollo Group, Inc.
1225 W. Washington St. | AZ23 |
Balaji,
What is 30 TPS ?
Toke,
You should use EmbeddedSolrServer Instead.
On Fri, Oct 5, 2012 at 11:42 PM, balaji.gandhi
wrote:
> Hi Toke,
>
> Were you able to find anything on this issue? We are running at 30 TPS and
> using the default HttpSolrServer for the posts.
>
> [cid:image001.png@01
Thank you Erick for your quick response.
Yes, you are right about my problem.
and indexes are on same machine and yes I am using single machine
I am using EmbeddedSolrServer class of SolrJ which removes HTTP layer.
But still it takes time.
On Fri, Oct 5, 2012 at 10:19 PM, Erick Erickson wrote:
Could you please tell me more. What field do you need to update, how it
influences the search results, how often, and why you can not afford commit?
On Fri, Oct 5, 2012 at 11:14 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:
> Hi,
>
> This is not doable in Solr 3.*. There are Lucene-l
Ok ok thanks a lot Otis. This was bothering me since a long while. Thanks
a ton.
On Sat, Oct 6, 2012 at 1:05 AM, Otis Gospodnetic wrote:
> Looks like HttpClient jar is not in your CLASSPATH or in -cp.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance
Looks like HttpClient jar is not in your CLASSPATH or in -cp.
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Fri, Oct 5, 2012 at 3:33 PM, Prithu Banerjee wrote:
> I have been using solrJ since last two mo
Hi,
This is not doable in Solr 3.*. There are Lucene-level patches in
JIRA, but I'm not sure if they are in Solr 4.*
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Fri, Oct 5, 2012 at 3:02 PM, Thakur, Pr
Using the same unique key doesn't handle documents which disappear from one
indexing to the next.
Instead, add a field for the type of item, like type:animal, type:vegetable, or
type:mineral. Then the query used to clean up before indexing can delete all
items of that type.
wunder
On Oct 5,
Hi Everyone,
I am using Solr 3.6. I want to update a single filed value in the index without
re-indexing. Is this possible?
I have google and came across partial update in solr 4.0 BETA.
Can I do do this with Solr 3.6?
Thanks,
-- Pramila Thakur
DIH always gives me indigestion.
Couple of things:
See the 'clean' parameter here for full import:
http://wiki.apache.org/solr/DataImportHandler
it defaults to true. I think if you set it to "false"
_and_ assuming that your is
defined, it should work OK.
The other approach would be to contro
: So extracting the attachment you will be able to track down what appens
:
: this is the query that shows the error, and below you can see the latest stack
: trace and the qt definition
Awesome -- exactly what we needed.
I've reproduced your problem, and verified that it has something to do
w
Thanks a lot for all the replies, Chris it worked out with this mm value:
If this version of solr is affected with the bug you pointed out, shouldn't
fail with this value as well?
Greetings!
On Oct 4, 2012, at 8:48 PM, Jorge Luis Betancourt Gonzalez wrote:
> Hi Chris:
>
> I'm using solr 3.6
Erick,
I did mention using the DIH to index the first two datasets, that is
where my the root of my problem lies.
I do see the benefit of one index. However the question still
remains, can I use the DIH to index xml from data set 1 and 2, every
15 minutes or so (full index) without wiping out al
because eventually you'd run out of file handles. Imagine a
long-running server with 100,000 segments. Totally
unmanageable.
I think shawn was emphasizing that RAM requirements don't
depend on the number of segments. There are other
resources that file consume however.
Best
Erick
On Fri, Oct 5,
Hi Toke,
Were you able to find anything on this issue? We are running at 30 TPS and
using the default HttpSolrServer for the posts.
[cid:image001.png@01CDA2EA.370A6ED0]
Thanks,
Balaji
Balaji Gandhi, Senior Software Developer, Horizontal Platform Services
Product Engineering │ Apollo Group, I
>> Right now there is no specific Document .. but we could perhaps kind of a
>> legend on this screen? .. in the meanwhile, does this help?
>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/webapp/web/css/styles/cloud.css?view=markup#l259
>>
>> The used css-classname is what we get from clust
hi Shawn,
thanks for the detailed explanation.
I have got one doubt, you said it doesn matter how many segments index have
but then why does solr has this merge policy which merges segments
frequently? why can it leave the segments as it is rather than merging
smaller one's into bigger one?
thank
Here's a reference, much of it is at the Lucene layer, but
it might be helpful.
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
If I'm reading this right, you want to get through 250K queries.
What kind of throughput are you seeing? What is your target
speed?
I suspect you're going to b
The very first question is "what form are your XML docs in?"
Solr does NOT index arbitrary XML, so I'm guessing
you're using DIH and some of the xml stuff there. Do note
that the XSLT is a subset of the full capabilities
Second, I'd recommend you just put it all in a single index, it'll be
sim
I think that's correct, but only when creating a new collection. I don't
know if the replication factor is considered after that (running more nodes
that have a core with the collection name, or manually adding nodes to the
collection), or if some nodes go down.
Also, please someone correct me if
It's working fine on the server.
Problem was at my local PC which might be occurred because of some
misconfiguration.
Thank you very much.
On Fri, Oct 5, 2012 at 11:23 AM, Sushil jain wrote:
> I am using Solr 1.4.1 and same solr is indexing the documents, I have
> tried re-indexing but nothi
Hi,
I am using EmbeddedSolrServer to access indexed data.
I have to query server around 250K with different query each time.
I have already created queries. But every time querying solr takes time.
As I am querying using threads and loop, but still it's not so fast.
Is there any way to speed u
Thanks Erick.
We've added the '_version_' and we'll see if that makes a difference
tomorrow. Also, have downloaded the RC1 and will try that next week.
Regards,
David Q
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 05 October 2012 15:40
To: solr-user@luc
denormalize your docs to option x value tuples, identify them by duping id.
3
A
200
3
B
400
3
B
400
3
C
240
then collapse them by set "setid" field. (it can not be uniqkey).
On Fri, Oct 5, 2012 at 6:26 PM, Torben Honigbaum <
torben.honigb...@neuland-bfi.de> wrote:
How are you indexing? There was a problem with indexing from SolrJ
if you indexed documents in batches, server.add(doclist) that's fixed in
4.0 RC#. The work-around is to add docs singly, server.add(doc)
Second thing. Bad Things Happen if you don't have a _version_ field
in your schema.xml. Solr 4
A legend would be awesome, I'm vastly in favor of not having to go to
external docs.
Tooltip would work too.
whichever is easier...
Best
Erick
On Fri, Oct 5, 2012 at 10:24 AM, Stefan Matheis
wrote:
> Hey Kris
>
> Right now there is no specific Document .. but we could perhaps kind of a
> lege
Hi Mikhail,
I read the article and can't see how to solve my problem with FieldCollapsing.
Any other suggestions?
Torben
Am 04.10.2012 um 17:31 schrieb Mikhail Khludnev:
> it's a typical nested document problem. there are several approaches. Out
> of the box solution as far you need facets is
Hi,
We've been using V4.x of SOLR since last November without too much
trouble. Our MySQL database is refreshed daily and a full import is run
automatically after the refresh and generally produces around 86,000
products, obviously on unique doc_id's.
So, we upgraded to 4.0 Beta a few days ago
Hey Kris
Right now there is no specific Document .. but we could perhaps kind of a
legend on this screen? .. in the meanwhile, does this help?
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/webapp/web/css/styles/cloud.css?view=markup#l259
The used css-classname is what we get from cluster
Hi Vadim,
I attached a zip (solr plugin) file to SOLR-1604. This not a patch. This is
supposed to work with solr 4.0. Some tests fails but it should work with "pol*
tel*"~5 types of queries.
Ahmet
--- On Thu, 9/27/12, Vadim Kisselmann wrote:
> From: Vadim Kisselmann
> Subject: Re: Proximit
If you think this could be a problem for your performances you can try two
different solutions:
1 - Make the call to update the db in a different thread
2 - Make an asynchronous http call to a web application that update the db
(in this case the web app can be resident in a different machine, so t
Hi,
Thank you for the reply Davide.
Writing to db you mean to insert into db the search queries? I was
thinking that this might effect search performance?
Yes you are right, Getting stats for particular key word is tough. It would
suffice if I can get q param and fq param values( when we sear
Can anyone point to a document that describes the meanings behind the
different solrcloud graph shard colors?
I've have several that are orange now with two as the active shard and
our total index count is less than it was a day before. The logs
aren't indicating anything in particular.
Thanks,
I _think_ I have this right...
ReplicationFactor is the maximum number of extra replicas per shard.
If you don't
specify this, then as you bring up more and more nodes, the new nodes get
assigned on a round-robin basis to shards. This allows you to have heterogeneous
collections and not have _all_
I'm a little confused about what you actually expect to see. I mean, it
sounds like all you are doing is numbering N query results as positions
1..N. But that's too obvious to be useful. Maybe you could provide an
example.
Or are you talking about query refinement, where you do one query and r
okay. huge rows value is no.1 way to kill Lucene. It's not possible,
absolutely. You need to rethink logic of your component. Check Solr's
FieldCollapsing code, IIRC it makes second search to achieve similar goal.
Also check PostFilter and DelegatingCollector classes, their approach can
also be han
I have only pencil scratches yet, can't share it. I can say that i've found
it quite close to approach described there
http://www.ulakha.com/publications.html it's called there "Concept Search",
but as far as I understand I have rather different implementation approach.
On Fri, Oct 5, 2012 at 2:31
On Fri, Oct 5, 2012 at 4:33 AM, Mikhail Khludnev
wrote:
> what's the value of rows param
> http://wiki.apache.org/solr/CommonQueryParameters#rows ?
Very interesting question - so, for historic reasons lost to me, we
pass in a huge (1000?) number for rows and this hits our custom
component, wh
Hi,
Im generating SOLR from DB with below dataConfig section in data-config.xml
file, and it's working fine.
url="jdbc:sqlserver://127.0.0.1;databaseName=emp" user="user"
password="user"*/>
I want to ENCRYP
Hi all,
I wanna have a field for each document which will simply store the doc's
position ( rank, not its score ) for each query. so for each different query
it will show the doc's new rank within the whole search result...
I have been digging the source code ( 4.0 Beta ) but for now couldnt find
what's the value of rows param
http://wiki.apache.org/solr/CommonQueryParameters#rows ?
On Fri, Oct 5, 2012 at 6:56 AM, Aaron Daubman wrote:
> Greetings,
>
> I've been seeing this call chain come up fairly frequently when
> debugging longer-QTime queries under Solr 3.6.1 but have not been able
>
absolutely, that's what I didn't get in your initial question. Okay it
seems you are talking about typical eCommerce search problem. I will speak
about it at http://www.apachecon.eu/schedule/presentation/18/ see you.
On Fri, Oct 5, 2012 at 9:47 AM, rhl4tr wrote:
> But user query can contain any
57 matches
Mail list logo