Re: Solr server requirements for 100+ million documents

2014-01-24 Thread Kranti Parisa
can you post the complete solrconfig.xml file and schema.xml files to review all of your settings that would impact your indexing performance. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar < susheel.ku...@thedigitalgroup.net> wr

RE: Solr server requirements for 100+ million documents

2014-01-24 Thread Susheel Kumar
Thanks, Svante. Your indexing speed using db seems to really fast. Can you please provide some more detail on how you are indexing db records. Is it thru DataImportHandler? And what database? Is that local db? We are indexing around 70 fields (60 multivalued) but data is not populated always in

Re: Replica not consistent after update request?

2014-01-24 Thread Erick Erickson
Right. There updates are guaranteed to be on the replicas and in their transaction logs. That doesn't mean they're searchable, however. For a document to be found in a search there must be a commit, either soft, or hard with openSearcher=true. Here's a post that outlines all this. If you have di

Re: Replica not consistent after update request?

2014-01-24 Thread Nathan Neulinger
It's 4.6.0. Pair of servers with an external 3-node zk ensemble. SOLR-4260 looks like a very promising answer. Will check it out as soon as 4.6.1 is released. May also check out the nightly builds since this is still just development/prototype usage. -- Nathan On 01/24/2014 09:45 PM, Anshum

Re: Replica not consistent after update request?

2014-01-24 Thread Nathan Neulinger
Wow, the detail in that jira issue makes my brain hurt... Great to see it's got a quick answer/fix! Thank you! -- Nathan On 01/24/2014 09:43 PM, Joel Bernstein wrote: If you're on Solr 4.6 then this is likely the issue: https://issues.apache.org/jira/browse/SOLR-4260. The issue is resolved f

Re: Replica not consistent after update request?

2014-01-24 Thread Anshum Gupta
Hi Nathan, It'd be great to have more information about your setup, Solr Version? Depending upon your version, you might want to also look at: https://issues.apache.org/jira/browse/SOLR-4260 (which is now fixed). On Fri, Jan 24, 2014 at 6:52 PM, Nathan Neulinger wrote: > How can we issue an upd

Re: Replica not consistent after update request?

2014-01-24 Thread Joel Bernstein
If you're on Solr 4.6 then this is likely the issue: https://issues.apache.org/jira/browse/SOLR-4260. The issue is resolved for Solr 4.6.1 which should be out next week. Joel Bernstein Search Engineer at Heliosearch On Fri, Jan 24, 2014 at 9:52 PM, Nathan Neulinger wrote: > How can we issue a

Replica not consistent after update request?

2014-01-24 Thread Nathan Neulinger
How can we issue an update request and be certain that all of the replicas in the SolrCloud cluster are up to date? I found this post: http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/79886 which seems to indicate that all replicas for a shard must finish/succeed before it

Re: Solr server requirements for 100+ million documents

2014-01-24 Thread Otis Gospodnetic
Hi Susheel, Like Erick said, it's impossible to give precise recommendations, but making a few assumptions and combining them with experience (+ a licked finger in the air): * 3 servers * 32 GB * 2+ CPU cores * Linux Assuming docs are not bigger than a few KB, that they are not being reindexed ov

Re: Solr server requirements for 100+ million documents

2014-01-24 Thread svante karlsson
I just indexed 100 million db docs (records) with 22 fields (4 multivalued) in 9524 sec using libcurl. 11 million took 763 seconds so the speed drops somewhat with increasing dbsize. We write 1000 docs (just an arbitrary number) in each request from two threads. If you will be using solrcloud you

RE: Solr server requirements for 100+ million documents

2014-01-24 Thread Susheel Kumar
Thanks, Erick for the info. For indexing I agree the more time is consumed in data acquisition which in our case from Database. For indexing currently we are using the manual process i.e. Solr dashboard Data Import but now looking to automate. How do you suggest to automate the index part. Do

Re: Solr server requirements for 100+ million documents

2014-01-24 Thread Erick Erickson
Can't be done with the information you provided, and can only be guessed at even with more comprehensive information. Here's why: http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Also, at a guess, your indexing speed is so slow due to data acq

Complex nested structure in solr

2014-01-24 Thread Utkarsh Sengar
Hi guys, I have to load extra meta data to an existing collection. This is what I am looking for: For a UPC: Store availability by merchantId per location (which has lat/lon) My query pattern will be: Given a keyword, find all available products for a merchantId around the given lat/lon. Exampl

Re: SOLR 4.4 - Slave always replicates full index

2014-01-24 Thread Shawn Heisey
On 1/24/2014 10:36 AM, sureshrk19 wrote: I'm not committing each document but, have following configuration in solrconfig.xml (commit every 5mins). 30 false Also, if you look at my master config, I do not have 'optimize'. startup commit Is there any wa

Solr server requirements for 100+ million documents

2014-01-24 Thread Susheel Kumar
Hi, Currently we are indexing 10 million document from database (10 db data entities) & index size is around 8 GB on windows virtual box. Indexing in one shot taking 12+ hours while indexing parallel in separate cores & merging them together taking 4+ hours. We are looking to scale to 100+ mi

Re: SOLR 4.4 - Slave always replicates full index

2014-01-24 Thread sureshrk19
Erick, Thanks for the reply.. I'm not committing each document but, have following configuration in solrconfig.xml (commit every 5mins). 30 false Also, if you look at my master config, I do not have 'optimize'. startup commit Is there any way other option

Re: Distributed search with Terms Component and Solr Cloud.

2014-01-24 Thread Uwe Reh
Hi Ryan, just take a look on the thread "TermsComponent/SolrCloud". Setting your parameters as default in solrconfig.xml should help. Uwe Am 13.01.2014 20:24, schrieb Ryan Fox: Hello, I am running Solr 4.6.0. I am experiencing some difficulties using the terms component across multiple shar

Re: Searching and scoring with block join

2014-01-24 Thread dev
Zitat von Mikhail Khludnev : nesting query parsers is shown at http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html try to start from the following: title:Test _query_:"{!parent which=is_parent:true}{!dismax qf=content_de}Test" mind about local params referencing eg

What is the "right" way to bring a failed SolrCloud node back online?

2014-01-24 Thread Nathan Neulinger
I have an environment where new collections are being added frequently (isolated per customer), and the backup is virtually guaranteed to be missing some of them. As it stands, bringing up the restored/out-of-date instance results in thos collections being stuck in 'Recovering' state, because t

Re: Loading resources from Zookeeper using SolrCloud API

2014-01-24 Thread Mark Miller
The best way is to use the ResourceLoader without relying on ResourceLoader#getConfigDir (which will fail in SolrCloud mode). For example, see openSchema, openConfig, openResource. If you use these API’s, your code will work both with those files being on the local filesystem for non SolrCloud

Re: Loading resources from Zookeeper

2014-01-24 Thread Alan Woodward
Hi Ugo, You can load things from the conf/ directory via SolrResourceLoader, which will load either from the filesystem or from zookeeper, depending on whether or not you're running in SolrCloud mode. Alan Woodward www.flax.co.uk On 24 Jan 2014, at 16:02, Ugo Matrangolo wrote: > Hi, > > I'm

Loading resources from Zookeeper using SolrCloud API

2014-01-24 Thread Ugo Matrangolo
Hi, we have a quite large SOLR 3.6 installation and we are trying to update to 4.6.x. One of the main point in doing this is to get SolrCloud and centralized configuration using Zookeeper. Unfortunately, some custom code we have (custom indexer extending org.apache.solr.handler.dataimport.Entity

Loading resources from Zookeeper

2014-01-24 Thread Ugo Matrangolo
Hi, I'm in the process to move our organization search infrastructure to SOLR4/SolrCloud. One of the main point is to centralize our cores configuration in Zookeeper in order to roll out changes wout redeploying all the nodes in our cluster. Unfortunately I have some code (custom indexers extendi

Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-24 Thread Ahmet Arslan
How about using  http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html On Friday, January 24, 2014 5:39 PM, stevenNabble wrote: Hello, thanks to all for the help :-) we have managed to narrow it down what is exactly going

Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-24 Thread stevenNabble
Hello, thanks to all for the help :-) we have managed to narrow it down what is exactly going wrong. My initial thinking on the backslashes within field values being the problem were incorrect. The source of the problem is in-fact submitting a document with a blank field value. The JSON returned

Re: solrcloud shards backup/restoration

2014-01-24 Thread Greg Walters
We've managed some success restoring existing/backed up indexes into solr cloud and even building the indexes offline and dumping the lucene files into the directories that solr expects. The general steps we follow are: 1) Round up your files. It doesn't matter if you pull from a master or slave

LinkedIn'de bağlantı kurma daveti

2014-01-24 Thread somer81
LinkedIn vibhoreng04 Lucene], Sizi LinkedIn'deki profesyonel ağıma eklemek istiyorum. - ömer sevinç ömer sevinç Ondokuzmayıs Üniversitesi Uzaktan Eğitim Merkezi şirketinde Öğr. Gör. Bilgisayar Müh. pozisyonunda Samsun, Türkiye ömer sevinç adlı kişiyi tanıdığınızı onaylayın: htt

Re: SOLR 4.4 - Slave always replicates full index

2014-01-24 Thread Erick Erickson
How are you committing? Are you committing every document? (you shouldn't). Or, sin of all sins, are you _optimizing_ frequently? That'll cause your entire index to be replicated every time. Best, Erick On Thu, Jan 23, 2014 at 3:26 PM, sureshrk19 wrote: > Hi, > > I have configured single core m