Re: Solr in non-persistent mode

2014-01-25 Thread Per Steffensen
Well, we where using it in our automatic tests to make them run faster - so that is at least an use-case. But after upgrade to 4.4 using the new solr.xml-style we are not running our test-suite with Solrs in non-persistent mode anymore (we cant). But actually it seems like the test-suite is com

Re: Solr server requirements for 100+ million documents

2014-01-25 Thread svante karlsson
We are using a postgres server on a different host (same hardware as the test solr server). The reason we take the data from the postgres server is that is easy to automate testing since we use the same server to produce queries. In production we preload the solr from a csv file from a hive (hadoop

Re: Solr server requirements for 100+ million documents

2014-01-25 Thread svante karlsson
That got away a little early... The inserter is a small C++ program that uses pglib to speek to postgres and the a http-client library that uses libcurl under the hood. The inserter draws very little CPU and we normally use 2 writer threads that each posts 1000 records at a time. Its very ineffici

Re: Solr server requirements for 100+ million documents

2014-01-25 Thread Erick Erickson
Hmmm, I'm always suspicious when I see a schema.xml with a lot of "string" types. This is tangential to your question, but I thought I'd butt in anyway. String types are totally unanalyzed. So if the input for a field is "I like Strings", the only match will be "I like Strings". "I like strings" w

Re: Solr server requirements for 100+ million documents

2014-01-25 Thread svante karlsson
You are of course right but we do our own normalization (among other things "to_lower") before we insert and before search queries get entered. We do not use wildcards in searches either so in our problem domain it works quite well. /svante 2014/1/25 Erick Erickson > Hmmm, I'm always suspic

Re: Replica not consistent after update request?

2014-01-25 Thread Nathan Neulinger
Ok, so our issue sounds like a combination of not having softCommits properly done, combined with SOLR-4260. Thanks everyone! On 01/24/2014 11:04 PM, Erick Erickson wrote: Right. There updates are guaranteed to be on the replicas and in their transaction logs. That doesn't mean they're searcha

How to handle multiple sub second updates to same SOLR Document

2014-01-25 Thread christopher palm
I have a scenario where the same SOLR document is being updated several times within a few ms of each other due to how the source system is sending in field updates on the document. The problem I am trying to solve is that the order of these updates isn’t guaranteed once the multi threaded SOLRJ c

RE: Solr server requirements for 100+ million documents

2014-01-25 Thread Susheel Kumar
Hi Kranti, Attach are the solrconfig & schema xml for review. I did run indexing with just few fields (5-6 fields) in schema.xml & keeping the same db config but Indexing almost still taking similar time (average 1 million records 1 hr) which confirms that the bottleneck is in the data acquisit

Re: How to handle multiple sub second updates to same SOLR Document

2014-01-25 Thread Shalin Shekhar Mangar
There is no timestamp versioning as such in Solr but there is a new document based versioning which will allow you to specify your own (externally assigned) versions. See the "Document Centric Versioning Constraints" section at https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Doc