Re: Commit (hard) at shutdown?

2016-05-23 Thread Per Steffensen
Sorry, I did not see the responses here because I found out myself. I definitely seems like a hard commit it performed when shutting down gracefully. The info I got from production was wrong. It is not necessarily obvious that you will loose data on "kill -9". The tlog ought to save you, but it

Commit (hard) at shutdown?

2016-05-18 Thread Per Steffensen
opposite) that Solrs, upon graceful shutdown, OUGHT TO do a (hard) commit, leaving tlogs empty (= nothing to replay when starting again)? Regards, Per Steffensen

Re: Securing solr index

2015-04-15 Thread Per Steffensen
That said, it might be nice with a wiki-page (or something) explaining how it can be done, including maybe concrete cases about exactly how it has been done on different installations around the world using Solr On 14/04/15 14:03, Per Steffensen wrote: Hi I might misunderstand you, but if

Re: Securing solr index

2015-04-14 Thread Per Steffensen
two admins know a part each, so that they have to both agree in order to operate as root. Be creative yourself. Regards, Per Steffensen On 13/04/15 12:13, Suresh Vanasekaran wrote: Hi, We are having the solr index maintained in a central server and multiple users might be able to access the

Re: SOLR 5.0.0 and Tomcat version ?

2015-03-27 Thread Per Steffensen
On 23/03/15 20:05, Erick Erickson wrote: you don't run a SQL engine from a servlet container, why should you run Solr that way? https://twitter.com/steff1193/status/580491034175660032 https://issues.apache.org/jira/browse/SOLR-7236?focusedCommentId=14383624&page=com.atlassian.jira.plugin.system.

Re: Solr replicas going in recovering state during heavy indexing

2015-03-27 Thread Per Steffensen
I think it is very likely that it is due to Solr-nodes losing ZK-connections (after timeout). We have experienced that a lot. One thing you want to do, is to make sure your ZK-servers does not run on the same machines as your Solr-nodes - that helped us a lot. On 24/03/15 13:57, Gopal Jee wrot

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Per Steffensen
On 25/03/15 15:03, Ian Rose wrote: Per - Wow, 1 trillion documents stored is pretty impressive. One clarification: when you say that you have 2 replica per collection on each machine, what exactly does that mean? Do you mean that each collection is sharded into 50 shards, divided evenly over al

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Per Steffensen
in the high-end of #replica and #docs, I guess Regards, Per Steffensen On 24/03/15 14:02, Ian Rose wrote: Hi all - I'm sure this topic has been covered before but I was unable to find any clear references online or in the mailing list. Are there any rules of thumb for how many cores (aka

Re: How to configure Solr to use ZooKeeper ACLs in order to protect it's content

2015-03-20 Thread Per Steffensen
because they are not used anyway) * java $SOLR_ZK_CREDS_AND_ACLS -Dsolr.solr.home=$SOLR_HOME/server/solr -Dsolr.data.dir=$SOLR_HOME/server/solr/gettingstarted_shard1_replica1 -Dsolr.log=$SOLR_HOME/server/solr/logs -DzkHost=localhost:2181/solr -Djetty.port=8983 -jar start.jar Viola R

Re: Bloom filter

2014-08-04 Thread Per Steffensen
t does not already exist when we do this duplicate check (using the unique-id feature), but it just takes relatively long time to verify it, because you have to visit the index. We can get a quick "document with this id does not exist" using bloom-filter on id. Regards, Per Steffensen On

Re: Bloom filter

2014-07-30 Thread Per Steffensen
he bloom filter depends on how frequently you can live with false positives (where you have to actually look it up in the real index). Regards, Per Steffensen On 30/07/14 10:05, Shalin Shekhar Mangar wrote: Hi Per, There's LUCENE-5675 which has added a new postings format for IDs. Trying it

Re: Bloom filter

2014-07-30 Thread Per Steffensen
nsactionlog and the actual index (UpdateLog). We would like to use Bloom Filter to quickly tell that a document with a particular id is NOT present. Regards, Jim Regards, Per Steffensen

Re: Bloom filter

2014-07-28 Thread Per Steffensen
still very much appreciated. Regards, Per Steffensen On 28/07/14 15:42, Lukas Drbal wrote: Hi Per, link to jira - https://issues.apache.org/jira/browse/SOLR-1375 Unresolved ;-) L. On Mon, Jul 28, 2014 at 1:17 PM, Per Steffensen wrote: Hi Where can I find documentation on how to use Bloom

Bloom filter

2014-07-28 Thread Per Steffensen
Hi Where can I find documentation on how to use Bloom filters in Solr (4.4). http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated - there is no BloomIndexComponent included in 4.4 code. Regards, Per Steffensen

Re: How to migrate content of a collection to a new collection

2014-07-24 Thread Per Steffensen
ust out of curiosity * Will I have the same OOM problem using the CURSOR-feature in later Solrs? * Will the "poor mans" cursor approach still be efficient if my uniqueKey was DocValued, knowing that all values for uniqueKey (the DocValue file) cannot fit in memory (OS file cache)? Reg

Re: How to migrate content of a collection to a new collection

2014-07-24 Thread Per Steffensen
On 23/07/14 17:13, Erick Erickson wrote: Per: Given that you said that the field redefinition also includes routing info Exactly. It would probably be much faster to make sure that the new collection have the same number of shards on each Solr-machine and that the routing-ranges are identi

How to migrate content of a collection to a new collection

2014-07-22 Thread Per Steffensen
easily achieved much faster than the 1-1 on collection-level. Any input is very much appreciated! Thanks Regards, Per Steffensen

Re: How does query on AND work

2014-05-27 Thread Per Steffensen
Well, the only "search" i did, was ask this question on this mailing-list :-) On 26/05/14 17:05, Alexandre Rafalovitch wrote: Did not follow the whole story but " post-query-value-filter" does exist in Solr. Have you tried searching for pretty much that expression. and maybe something about cos

Re: How does query on AND work

2014-05-27 Thread Per Steffensen
lrlucene.blogspot.dk/2014/05/performance-of-and-queries-with-uneven.html. Hope you do not mind that I reference you and the link you pointed out. Thanks a lot! Regards, Per Steffensen On 23/05/14 18:13, Yonik Seeley wrote: On Fri, May 23, 2014 at 11:37 AM, Toke Eskildsen wrote:

Re: How does query on AND work

2014-05-26 Thread Per Steffensen
ing to do a facet search etc. Well, here is the full story: http://solrlucene.blogspot.dk/2014/05/performance-of-and-queries-with-uneven.html Regards, Per Steffensen On 23/05/14 17:37, Toke Eskildsen wrote: Per Steffensen [st...@designware.dk] wrote: * It IS more efficient to just use the index for the "n

Re: How does query on AND work

2014-05-23 Thread Per Steffensen
00-1000 docs and "timestamp_dlng_doc_ind_sto" hit about 3-4 billion. Regards, Per Steffensen On 19/05/14 13:33, Per Steffensen wrote: Hi Lets say I have a Solr collection (running across several servers) containing 5 billion documents. A.o. each document have

How does query on AND work

2014-05-19 Thread Per Steffensen
oc-value) to filter out the ones among the 500-1000 that does not match the timestamp-part of the query. But what does Solr/Lucene actually do? Is it Solr- or Lucene-code that make the decision on what to do? Can you somehow "hint" the search-engine that you want one or the other method used? Solr 4.4 (and corresponding Lucene), BTW, if that makes a difference Regards, Per Steffensen

Export big extract from Solr to [My]SQL

2014-05-02 Thread Per Steffensen
Solr query, but the number of documents fulfilling it will (potentially) be huge. Regards, Per Steffensen

Re: SOLR cloud disaster recovery

2014-02-28 Thread Per Steffensen
, but I do not know, because we do not use replication. I might be able to find something for you. Which version are you using - I have some scripts that work on 4.0 and some other scripts that work for 4.4 (and maybe later). Regards, Per Steffensen On 28/02/14 16:17, Jan Van Besien wrote: Hi

Re: Fault Tolerant Technique of Solr Cloud

2014-02-24 Thread Per Steffensen
ome other language (if the reason you do not want to use CloudSolrServer, is that your client is not java). Else you need to do other clever stuff, like e.g. what Shalin suggests. Regards, Per Steffensen

Re: Fault Tolerant Technique of Solr Cloud

2014-02-19 Thread Per Steffensen
On 19/02/14 07:57, Vineet Mishra wrote: Thanks for all your response but my doubt is which *Server:Port* should the query be made as we don't know the crashed server or which server might crash in the future(as any server can go down). That is what CloudSolrServer will deal with for you. It knows

Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Per Steffensen
requests. CloudSolrServer uses LBHttpSolrServer behind the scenes. If you use CloudSolrServer as a client everything should be smooth and transparent with respect to querying when servers are down. CloudSolrServer will find out where to (and not to) route your requests. Regards, Per Steffensen

Re: ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen
org/maven2/ in our Artifactory. Well never mind - it works for me now. Thanks for the help! Regards, Per Steffensen On 1/30/14 1:11 PM, Steve Rowe wrote: Hi Per, You may be seeing the stale-Ivy-lock problem (see IVY-1388). LUCENE-4636 upgraded the bootstrapped Ivy to 2.3.0 to reduce the lik

ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen
thing that happened today. Any idea about what might be wrong? A solution? Help to debug? Regards Per Steffensen --- console when running "ant eclipse" - ... resolve: [echo] Building solr-example-DIH... ivy-availability-check: [echo] Building solr-

Re: Solr in non-persistent mode

2014-01-25 Thread Per Steffensen
completed in almost the same time as before, so it is not a big issue for us. Regards, Per Steffensen On 1/23/14 6:09 PM, Mark Miller wrote: Yeah, I think we removed support in the new solr.xml format. It should still work with the old format. If you have a good use case for it, I don’t know

Solr in non-persistent mode

2014-01-23 Thread Per Steffensen
mean that I have to configure it somewhere else? Thanks! Regards, Per Steffensen

Upgrading from SolrCloud 4.x to 4.y - as if you had used 4.y all along

2014-01-22 Thread Per Steffensen
If you are upgrading from SolrCloud 4.x to a later version 4.y, and basically want your end-system to seem as if it had been running 4.y (no legacy mode or anything) all along, you might find some inspiration here http://solrlucene.blogspot.dk/2014/01/upgrading-from-solrcloud-4x-to-4y-as-if.htm

Re: can't overwrite and can't delete by id

2013-11-23 Thread Per Steffensen
... Assume you are running "cloud"-mode and that the shards belong to the same collection? Any custom routing? Regards, Per Steffensen On 11/22/13 8:32 PM, Mingfeng Yang wrote: BTW: it's a 4 shards solorcloud cluster using zookeeper 3.3.5 On Fri, Nov 22, 2013 at 11:07 AM, Ming

Re: Storing/indexing speed drops quickly

2013-09-23 Thread Per Steffensen
IndexWriter.updateDocument but not with IndexWriter.addDocument? Regards, Per Steffensen On 9/12/13 10:14 AM, Per Steffensen wrote: Seems like the attachments didnt make it through to this mailing list https://dl.dropboxusercontent.com/u/25718039/doccount.png https://dl.dropboxusercontent.com/u/25718039/iowait.png

Re: Storing/indexing speed drops quickly

2013-09-13 Thread Per Steffensen
On 9/12/13 4:26 PM, Shawn Heisey wrote: On 9/12/2013 2:14 AM, Per Steffensen wrote: Starting from an empty collection. Things are fine wrt storing/indexing speed for the first two-three hours (100M docs per hour), then speed goes down dramatically, to an, for us, unacceptable level (max 10M per

Re: No or limited use of FieldCache

2013-09-12 Thread Per Steffensen
On 9/12/13 3:28 PM, Toke Eskildsen wrote: On Thu, 2013-09-12 at 14:48 +0200, Per Steffensen wrote: Actually some months back I made PoC of a FieldCache that could expand beyond the heap. Basically imagine a FieldCache with room for "unlimited" data-arrays, that just behind the scen

Re: No or limited use of FieldCache

2013-09-12 Thread Per Steffensen
ing out of swap space"-problems. Regards, Per Steffensen On 9/12/13 12:48 PM, Erick Erickson wrote: Per: One thing I'll be curious about. From my reading of DocValues, it uses little or no heap. But it _will_ use memory from the OS if I followed Simon's slides correctly. So I

Re: Storing/indexing speed drops quickly

2013-09-12 Thread Per Steffensen
Seems like the attachments didnt make it through to this mailing list https://dl.dropboxusercontent.com/u/25718039/doccount.png https://dl.dropboxusercontent.com/u/25718039/iowait.png On 9/12/13 8:25 AM, Per Steffensen wrote: Hi SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr

Re: Storing/indexing speed drops quickly

2013-09-12 Thread Per Steffensen
Maybe the fact that we are never ever going to delete or update documents, can be used for something. If we delete we will delete entire collections. Regards, Per Steffensen On 9/12/13 8:25 AM, Per Steffensen wrote: Hi SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node on

Storing/indexing speed drops quickly

2013-09-11 Thread Per Steffensen
level, while still making sure that searches will perform fairly well when data-amounts become big? (guess without merging you will end up with lots and lots of "small" files, and I guess this is not good for search response-time) Regards, Per Steffensen

Re: No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
Thanks, guys. Now I know a little more about DocValues and realize that they will do the job wrt FieldCache. Regards, Per Steffensen On 9/12/13 3:11 AM, Otis Gospodnetic wrote: Per, check zee Wiki, there is a page describing docvalues. We used them successfully in a solr for analytics

Re: No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
point to documentation where I will be able to read that I am wrong. Thanks! Regards, Per Steffensen On 9/11/13 1:38 PM, Erick Erickson wrote: I don't know any more than Michael, but I'd _love_ some reports from the field. There are some restriction on DocValues though, I believe one

No or limited use of FieldCache

2013-09-11 Thread Per Steffensen
of disabling the FieldCache (taking the performance penalty of course) or make it behave in a nicer way where it only uses up to e.g. 80% of the memory available to the JVM? Or other suggestions? Regards, Per Steffensen

Complex group request

2013-08-30 Thread Per Steffensen
der to make it possible? You do not have to hand me the solution, but a few comments on how easy/hard it would be, and ideas on how to attack the challenge would be nice. Thanks! Regards, Per Steffensen

Group/distinct

2013-08-28 Thread Per Steffensen
certain period of time, and also, for each distinct "a", have the limited set of distinct "b"-values returned? I guess this will beg grouping/faceting on multiple fields, but can you do that? Other suggestions on how to achieve this? Regards, Per Steffensen

Re: In-memory collections?

2013-08-07 Thread Per Steffensen
On 8/7/13 9:04 AM, Shawn Heisey wrote: On 8/7/2013 12:13 AM, Per Steffensen wrote: Is there a way I can configure Solrs so that it handles its shared completely in memory? If yes, how? No writing to disk - neither transactionlog nor lucene indices. Of course I accept that data is lost if the

In-memory collections?

2013-08-06 Thread Per Steffensen
Hi Is there a way I can configure Solrs so that it handles its shared completely in memory? If yes, how? No writing to disk - neither transactionlog nor lucene indices. Of course I accept that data is lost if the Solr crash or is shut down. Regards, Per Steffensen

Re: Solr Collection's Size

2013-04-10 Thread Per Steffensen
On 4/10/13 12:17 PM, Per Steffensen wrote: "number of documents found" can be found in a field called "numFound" in the response. If you do use SolrJ you will likely have a QueryResponse qr and can just do a qr.setNumFound(). qr.getResults().getNumFound() :-) If you

Re: Solr Collection's Size

2013-04-10 Thread Per Steffensen
... data.response.numFound ... } ) Go figure who to extract it in javascript without jQuery Regards, Per Steffensen On 4/5/13 3:20 PM, Alexandre Rafalovitch wrote: I'd add rows=0, just to avoid the actual records serialization if size is all that matters. Rega

Re: AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Per Steffensen
is a "real" OOM indicating no more space on java heap, but is more an exception saying that OS has no more memory (in some interpretation of that). Regards, Per Steffensen On 4/2/13 11:32 AM, Arkadi Colson wrote: It is running as root: root@solr01-dcg:~# ps aux | grep tom root

Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-22 Thread Per Steffensen
On 3/21/13 10:50 PM, Shawn Heisey wrote: On 3/21/2013 4:05 AM, Per Steffensen wrote: Can anyone else elaborate? How to "activate" it? How to make sure, for sorting, that sort-field-value for all docs are not read into memory for sorting - leading to OOM when you have a lot of docs

Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Per Steffensen
On 3/21/13 10:52 AM, Toke Eskildsen wrote: On Thu, 2013-03-21 at 09:57 +0100, Per Steffensen wrote: Thanks Toke! Can you please elaborate a little bit? How to use it? What it is supposed to do for you? Sorry, no, I only know about it on the abstract level. The release notes for Solr 4.2 says

Re: Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Per Steffensen
On 3/21/13 9:48 AM, Toke Eskildsen wrote: On Thu, 2013-03-21 at 09:13 +0100, Per Steffensen wrote: We have a lot of docs in Solr. Each particular Solr-node handles a lot of docs distributed among several replica. When you issue a sort query, it seems to me that, the value of the sort-field of

Sort-field for ALL docs in FieldCache for sort queries -> OOM on lots of docs

2013-03-21 Thread Per Steffensen
he penalty of more disk-IO as soon as the entire thing does not fit in memory, but I would rather accept that than accept OOM's. Regards, Per Steffensen

Re: Known memory leaks in 4.0?

2013-03-15 Thread Per Steffensen
down if you stop all searching. We have just decided to dive into it for a few days in order to understand what actually happens. Regards Bernd Regards, Per Steffensen

Known memory leaks in 4.0?

2013-03-15 Thread Per Steffensen
Hi We have a problem that seems to be due to memory leaks during search on Solr 4.0. Havnt dived into it yet, so I am certainly not sure, but just wanted to ask upfront, if 4.0 contains any known memory leaks? And if they have been fixed? Regards, Per Steffensen

Re: How to migrate SolrCloud shards to different servers?

2013-01-26 Thread Per Steffensen
! Regards, Per Steffensen On 1/26/13 6:56 AM, Mingfeng Yang wrote: Hi Mark, When I did testing with SolrCloud, I found the following. 1. I started 4 shards on the same host on port 8983, 8973, 8963, and 8953. 2. Index some data. 3. Shutdown all 4 shards. 4. Started 4 shards again, all pointing to

Re: Submit schema definition using curl via SOLR

2013-01-25 Thread Per Steffensen
On 1/24/13 11:22 PM, Fadi Mohsen wrote: Thanks Per, would the first approach involve restarting Solr? Of course ZK need to run in order to load the config into ZK. Solr nodes do not need to run. If they do I couldnt imagine that they need to be restarted in order to take advantage of new conf

Re: Starting instances with multiple collections

2013-01-24 Thread Per Steffensen
startup is only for "playing". You ought to load configs into ZK as a separate operation from starting Solrs (and creating collections for that matter). Also see recent mail-list dialog "Submit schema definition using curl via SOLR" Regards, Per Steffensen On 1/23/13 11:12

Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Per Steffensen
On 1/24/13 4:51 PM, Per Steffensen wrote: 2) or You can have an Solr node (server) load a "Solr config" into ZK during startup by adding collection.configName and bootstrap_confdir VM params - something like this java -DzkHost= -Dcollection.configName= -Dbootstrap_confdir= -jar

Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Per Steffensen
va -DzkHost= -Dcollection.configName=edr_sms_conf -Dbootstrap_confdir= -jar start.jar I prefer 1) for several reasons. Regards, Per Steffensen On 1/24/13 4:02 PM, Fadi Mohsen wrote: Hi, We would like to use Solr to index statistics from any Java module in our production environment. Applica

Re: zookeeper config

2013-01-23 Thread Per Steffensen
This is supported. You just need to ajust your ZK connection-string: ":/solr,:/solr,...,:/solr" Regards, Per Steffensen On 1/24/13 7:57 AM, J Mohamed Zahoor wrote: Hi I am using Solr 4.0. I see the Solr data in zookeeper is placed on the root znode itself. This becomes a p

Re: Way to lock solr for incoming writes

2013-01-16 Thread Per Steffensen
Regards, Per Steffensen On 1/16/13 4:02 PM, mizayah wrote: Is there a way to lock solr for writes? I don't wona use solr integrated backup because i'm using ceph claster. What I need is to have consistent data for few seconds to make backup. -- View this message in context: htt

Re: Forwarding authentication credentials in internal node-to-node requests

2013-01-12 Thread Per Steffensen
I will figure out. Essence of question was if it was there out-of-the-box. Thanks! Regards, Per Steffensen On 1/11/13 5:38 PM, Markus Jelsma wrote: Hmm, you need to set up the HttpClient in HttpShardHandlerFactory but you cannot access the HttpServletRequest from there, it is only available

Re: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen
b-requests that he is not authorized to do. Forward of credentials is a must. So what you are saying is that I should expect to have to do some modifications to Solr in order to achieve what I want? Regards, Per Steffensen On 1/11/13 2:11 PM, Markus Jelsma wrote: Hi, If your credentials are fix

Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen
ding the credentials. Does this just work out of the box, or ... ? Regards, Per Steffensen

Re: CoreAdmin STATUS performance

2013-01-10 Thread Per Steffensen
we will always have a least 3 months of historic data, and last in a month close to 4 months of history. It does not matter that we have a little to much history, when we just do not go below the lower limit on lenght of historic data. We also use the new Collection API for deletion. Regards, Per

Re: CoreAdmin STATUS performance

2013-01-10 Thread Per Steffensen
ulated on server-side from the timestamp-interval in the search-query. We handle this in a Solr SearchComponent which we place "early" in the chain of SearchComponents. Maybe you can get some inspiration by this approach, if it is also relevant for you. Regards, Per Steffensen

Re: CoreAdmin STATUS performance

2013-01-10 Thread Per Steffensen
-memory ClusterState with changes. Regards, Per Steffensen On 1/9/13 4:38 PM, Shahar Davidson wrote: Hi All, I have a client app that uses SolrJ and which requires to collect the names (and just the names) of all loaded cores. I have about 380 Solr Cores on a single Solr server (net indices

Re: Solr 4 exceptions on trying to create a collection

2013-01-08 Thread Per Steffensen
JIRA about the fix for 4.1: https://issues.apache.org/jira/browse/SOLR-4140 On 1/8/13 4:01 PM, Jay Parashar wrote: Thanks Mark...I will use it with 4.1. For now, I used httpclient to call the Collections api directly (do a Get on http://127.0.0.1:8983/solr/admin/collections?action=CREATE etc). T

Re: How to size a SOLR Cloud

2013-01-07 Thread Per Steffensen
successfully recover when recover situations occur, and we see like 4-times indexing times compared to non-redundancy (even though a max of 2-times should be expected). Regards, Per Steffensen On 1/7/13 6:14 PM, f.fourna...@gibmedia.fr wrote: Hello, I'm new in SOLR and I've a collecti

Re: SolrCloud and Join Queries

2013-01-06 Thread Per Steffensen
collection per customer, with one shard and many replicas, A query will be handled by one shard (or replica) on one node only and scalability here is really about load balancing queries between the replicas only. i.e no distributed search. is this correct? Hassan On 05/01/13 15:47, Per Steffensen

Re: SolrCloud and Join Queries

2013-01-05 Thread Per Steffensen
n 4.0.0, but will be in 4.1) Regards, Per Steffensen On 1/5/13 11:55 AM, Hassan wrote: Thanks Per and Otis, It is much clearer now but I have a question about adding new solr nodes and collections. I have a dedicated zookeeper instance. Lets say I have uploaded my configuration to zookeep

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Per Steffensen
I belive Alexandre Rafalovitch got his answer already :-) To the level a clean answer exists at the moment. Regards, Per Steffensen On 1/4/13 2:54 PM, Jack Krupansky wrote: Replication makes perfect sense even if our explanations so far do not. A shard is an abstraction of a subset of the dat

Re: SolrCloud and Join Queries

2013-01-04 Thread Per Steffensen
tomer = (as long as we do not consider replication) one lucene index per customer = one data-disk-folder per customer. You should be able to do join queries inside the specific customers shard. Regards, Per Steffensen

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
On 1/3/13 5:58 PM, Walter Underwood wrote: A "factor" is multiplied, so multiplying the leader by a replicationFactor of 1 means you have exactly one copy of that shard. I think that recycling the term "replication" within Solr was confusing, but it is a bit late to change that. wunder Yes, t

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
On 1/3/13 5:26 PM, Yonik Seeley wrote: I agree - it's pointless to have two replicas of the same shard on a single node. But I'm talking about having replicationFactor as a target, so when you start up *new* nodes they will become a replica for any shard where the number of replicas is currentl

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
On 1/3/13 4:55 PM, Mark Miller wrote: Trying to forge our own path here seems more confusing than helpful IMO. We have enough issues with terminology right now - where we can go with the industry standard, I think we should. - Mark Fair enough. I dont think our biggest problem is whether we d

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
On 1/3/13 4:33 PM, Mark Miller wrote: This has pretty much become the standard across other distributed systems and in the literat…err…books. Hmmm Im not sure you are right about that. Maybe more than one distributed system calls them "Replica", but there is also a lot that doesnt. But if you

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
Ok, sorry. Easy to misunderstand, though. On 1/3/13 3:58 PM, Mark Miller wrote: MAX_INT is just a place holder for a high value given the context of this guy wanting to add replicas for as many machines as he adds down the line. You are taking it too literally. - Mark

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
n with replicationFactor=1 - WTF!?!?). If we want to insist that you specify the total number of cores at least use "replicaPerShard" instead of "replicationFactor", or even better rename "Replica" to "Shard-instance" and use "instancesPerShard" in

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Per Steffensen
iately obvious what this is as long as it is called "Replica". A "Replica" is basically a Solr Cloud managed Core and behind every Replica/Core lives a physical Lucene index. So Replica=Core) contains/maintains Lucene index behind the scenes. The term "Replica" also

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
On 1/3/13 2:50 AM, Mark Miller wrote: Unfortunately, for 4.0, the collections API was pretty bare bones. You don't actually get back responses currently - you just pass off the create command to zk for the Overseer to pick up and execute. So you actually have to check the logs of the Overseer

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
On 1/3/13 3:05 AM, davers wrote: This is what I get from the leader overseer log: 2013-01-02 18:04:24,663 - INFO [ProcessThread:-1:PrepRequestProcessor@419] - Got user-level KeeperException when processing sessionid:0x23bfe1d4c280001 type:create cxid:0x58 zxid:0xfffe txntype:unknown

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Per Steffensen
There are defaults for both replicationFactor and maxShardsPerNode, so non of them HAS to be provided - default is 1 in both cases. int repFactor = msgStrToInt(message, REPLICATION_FACTOR, 1); int maxShardsPerNode = msgStrToInt(message, MAX_SHARDS_PER_NODE, 1); Remember than replica

Re: Max number of core in Solr multi-core

2013-01-02 Thread Per Steffensen
Furthermore, if you plan to index "a lot" of data per application, and you are using Solr 4.0.0+ (including Solr Cloud), you probably want to consider creating a collection per application instead of a core per application. On 1/2/13 2:38 PM, Erick Erickson wrote: This is a common approach to

Re: Solr 4.0 NRT Search

2013-01-02 Thread Per Steffensen
On 1/1/13 2:07 PM, hupadhyay wrote: I was reading a solr wiki located at http://wiki.apache.org/solr/NearRealtimeSearch It says all commitWithin are now soft commits. can any one explain what does it means? Soft commit means that the documents indexed before the soft commit will become searcha

Re: Dynamic collections in SolrCloud for log indexing

2012-12-24 Thread Per Steffensen
any way to cross-search X slices across many collections, than it is to cross-search X slices under the same collection. Besides that see my answer for topic "Will SolrCloud always slice by ID hash?" a few days back. Regards, Per Steffensen On 12/24/12 1:07 AM, Erick Erickson wrote:

Re: SolrCloud: only partial results returned

2012-12-21 Thread Per Steffensen
are searchable before "configured auto-commit time-period" has passed since you indexed your last document. Regards, Per Steffensen On 12/20/12 6:37 PM, Lili wrote: Mark, yes, they have unique ids. Most the time, after the 2nd json http post, query will return complete results. I b

Re: Will SolrCloud always slice by ID hash?

2012-12-21 Thread Per Steffensen
each Solr node in the cluster. We still do not know how the system will behave when we have and cross-search many (up to 24 since we are supposed to keep data for 2 years before we can throw it away) collections with 1+ billion documents each. Regards, Per Steffensen On 12/18/12 8:20 PM, Scott

Re: Solr Cloud 4.0 Production Ready?

2012-12-21 Thread Per Steffensen
hopefully we will succeed in collaboration with the rest of the Solr community, and hopefully Solr Cloud replication will be production ready within the next half year. Regards, Per Steffensen On 12/18/12 3:28 PM, Otis Gospodnetic wrote: Hi, If you are not in a rush, I'd wait for

Re: Solrcloud and Node.js

2012-12-17 Thread Per Steffensen
Luis Cappa Banda skrev: Thanks a lot, Per. Now I understand the whole scenario. One last question: I've been searching trying to find some kind of request handler that retrieves cluster status information, but no luck. I know that there exists a JSON called clusterstate.json, but I don't know the

Re: Solrcloud and Node.js

2012-12-15 Thread Per Steffensen
Luis Cappa Banda skrev: Do you know if SolrCloud replica shards have 100% the same data as the leader ones every time? Probably wen synchronizing with leaders there exists a delay, so executing queries to replicas won't be a good idea. As long as the replica is in state "active" it will be 100

Re: Solrcloud and Node.js

2012-12-15 Thread Per Steffensen
compile CloudSolrServer to javascript (I would imagine it will be hard to make it work though) Regards, Per Steffensen Luis Cappa Banda skrev: Hello! I've always used Java as the backend language to program search modules, and I know that CloudSolrServer implementation is the way to int

Re: Solrj connect to already running solr server

2012-12-14 Thread Per Steffensen
Per Steffensen skrev: Billy Newman skrev: I have deployed the solr.war to my application server. On deploy I can see the solr server and my core "general" start up. I have a timer that fires every so ofter to go out and 'crawl' some services and index into Solr. I

Re: Solrj connect to already running solr server

2012-12-14 Thread Per Steffensen
Billy Newman skrev: I have deployed the solr.war to my application server. On deploy I can see the solr server and my core "general" start up. I have a timer that fires every so ofter to go out and 'crawl' some services and index into Solr. I am using Solrj in my application and I am having tr

Re: SolrCloud breaks distributed query strings

2012-12-12 Thread Per Steffensen
It doesnt sound exactly like a problem we experienced some time ago, where long request where mixed put during transport. Jetty was to blame. I might be Jetty that f up you request too? SOLR-4031. Are you still running 8.1.2? Regards, Per Steffensen Markus Jelsma skrev: Hi, We&#x

Re: Partial results returned

2012-12-12 Thread Per Steffensen
In general you probably want to add a parameter "distrib=true" to your search requests. adm1n wrote: I have 1 collection called index. I created it like explained here: http://wiki.apache.org/solr/SolrCloud in Example A: Simple two shard cluster section here are the start up commands: 1)java -

Re: Partial results returned

2012-12-11 Thread Per Steffensen
ormation asked for above will also help others to help you. I will try to remember though. Regards, Per Steffensen adm1n skrev: Hello, I'm running solrcloud with 2 shards. Lets assume I've 100 documents indexed in total, which are divided 55/45 by the shards... when I query, for e

Re: Versioning

2012-12-10 Thread Per Steffensen
Regards, Per Steffensen Sushil jain skrev: Hello Everyone, I am a Solr beginner. I just want to know if versioning of data is possible in Solr, if yes then please share the procedure. Thanks & Regards, Sushil Jain

  1   2   >