Re: 2.1billion+ document

2013-07-06 Thread Erick Erickson
uniqueKey is used to enforce there being only a single copy of a doc. Say a doc changes and you re-index it. If there is a doc in the index already _with the same uniqueKey_ it'll be deleted and the new one will be the only one visible. Which implies that if you do implement the suggestions, be su

Re: 2.1billion+ document

2013-07-05 Thread Gora Mohanty
On 6 July 2013 09:45, Ali, Saqib wrote: > Thanks Jason! That was very helpful. > > I read on the solr wiki that: > "Documents must have a unique key and the unique key must be stored > (stored="true" in schema.xml)" > > What is this unique key? Is this just a id that we define in the schema.xml >

Re: 2.1billion+ document

2013-07-05 Thread Ali, Saqib
Thanks Jason! That was very helpful. I read on the solr wiki that: "Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml)" What is this unique key? Is this just a id that we define in the schema.xml that is unique to all documents? We have something as f

Re: 2.1billion+ document

2013-07-05 Thread Jason Hellman
Saqib: At the simplest level: 1) Source the machine 2) Install Java 3) Install a servlet container of your choice 4) Copy your Solr WAR and conf directories as desired (probably a rough mirror of your current single server) 5) Start it up and start sending data there 6) Query both by simpl

Re: 2.1billion+ document

2013-07-05 Thread Ali, Saqib
Hello Otis, I was thinking more in terms of Solr DistributedSearch rather than SolrCloud. I was hoping to add another Solr instance, when the time comes. This is a low use application, but with lot of data. Uptime and query speed are not of importance. However we would like to be able to index mor

Re: 2.1billion+ document

2013-07-05 Thread Otis Gospodnetic
Hi, It's a broad question, but it starts with getting a few servers, putting Solr 4.3.1 on it (soon 4.4), setting up Zookeeper, creating a Solr Collection (index) with N shards and M replicas, and reindexing your old data to this new cluster, which you can expand with new nodes over time. If you