Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

2016-01-14 Thread Gili Nachum
Opps. Got omitted. v4.72. plus it kept reproducing after upgrading to v4.9 (was trying to see if it was fixed later on). On Thu, Jan 14, 2016 at 5:26 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Which version of Solr is this on? > > On Thu, Jan 14, 2016 at 4:10

Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

2016-01-14 Thread Gili Nachum
Clarificaiton: If we restart nodes after reloading collection and before pausing, then recovery works fine. On Thu, Jan 14, 2016 at 12:08 PM, Gili Nachum wrote: > Hi, > > Our Solr cluster is running VMs that could freeze for more than the ZK > tick time (it's a non critic

Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

2016-01-14 Thread Gili Nachum
Hi, Our Solr cluster is running VMs that could freeze for more than the ZK tick time (it's a non critical CI/CD pipeline running on an overloaded ESX). When this happens the node's shards will be registered as down. Then when the node is back recovery takes place, and all shards replicas end up ac

Does soft commit re-opens searchers in disk?

2016-01-04 Thread Gili Nachum
Hello, When a new document is added, it becomes visible after a soft commit, during which it is written to a Lucene RAMDirectory (in heap). Then after a hard commit, the RAMDirectory is removed from memory and the docs are written to the index on disk. What happens if I hard commit (write to disk)

Re: Group by function in SolrCloud - when specifying exact shard with composite router (_route_ param)

2015-11-29 Thread Gili Nachum
ot quite sure if I'm reading this right, but a non cloud request with > &distrib=false might do the trick. Although you sake you're not > supposed to know which shard, so I'm not sure this applies... > > On Sun, Nov 29, 2015 at 4:47 AM, Gili Nachum wrote: > > Ad

Re: Group by function in SolrCloud - when specifying exact shard with composite router (_route_ param)

2015-11-29 Thread Gili Nachum
Adding: 1. Currently, when I query I only get results from the particular share I happened to hit (normally I'm not suppose to know which shard I hit). 2. Running Solr 4.7.2 On Sun, Nov 29, 2015 at 2:44 PM, Gili Nachum wrote: > Hi, I'm attempting result grou

Group by function in SolrCloud - when specifying exact shard with composite router (_route_ param)

2015-11-29 Thread Gili Nachum
Hi, I'm attempting result grouping with custom function in SolrCloud, by providing a _route_ , without success :~( I know that group.func isn't supported in distributed searches, but in my case *I only need the query to gather data

Re: Conditional Add/Overwrite a document

2015-10-31 Thread Gili Nachum
e/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints > > Cheers, > -Brendan > > On 29 October 2015 at 06:33, Gili Nachum wrote: > > > Hi, Is there a conditional Add operation in Solr? > > > > My documents have "

Conditional Add/Overwrite a document

2015-10-28 Thread Gili Nachum
Hi, Is there a conditional Add operation in Solr? My documents have "my_int" field and when re-adding a document with the same ID, I would to overwrite the existing doc only if the new doc my_int value is higher than that of the existing doc. As a naive solution, I could first read the existing d

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Gili Nachum
CloudSolrServer Beyond sending documents to the right leader shard, it also do this in *parallel *(for a batch), employing its own thread pool, with a connection per shard. On Tue, Oct 6, 2015 at 8:15 PM, Walter Underwood wrote: > This is at Cheg

efficient sort by title (multi word field)

2015-10-06 Thread Gili Nachum
Hi, wanted to make sure I'm implementing sort in an efficient way... I need to allow users to sort by documents' title field. A title can contain 1-20 words. Title examples: "new project meeting minutes - Oct 2015 - new chance on the horizon" or "how to create a wonderful presentation". I'm alrea

Re: How can I get a monotonically increasing field value for docs?

2015-10-04 Thread Gili Nachum
Glad I made that silly statement. I came to know cursorMark, after noticing how much inefficient is native deep paging in Solr, where each shard returns rowXstart worth of data to the shard servicing the query. I then *wrongly* assumed that cursorMark records the returned doc # of the result set fo

Re: Reverse query?

2015-10-03 Thread Gili Nachum
Check if MLT (more like this) could fit your requirements. https://wiki.apache.org/solr/MoreLikeThis If your requirements are more specific I think your client program should tokenize the target document then construct one or more queries like: "token token2" OR "token2 token3" OR ... I'm not sur

MongoDB to Solr connector - anyone done it?

2015-09-30 Thread Gili Nachum
Hi, Looking to learn from experience of others, what works best? Looking for a production grade solution to efficiently push data of a multi-sharded Mongo to a multi-sharded Solr in a continues manner and in a one off fashion. Not having to write any code would be a nice bonus. What I found so f

Re: How can I get a monotonically increasing field value for docs?

2015-09-29 Thread Gili Nachum
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results > > > > > > : Date: Mon, 21 Sep 2015 21:32:33 +0300 > : From: Gili Nachum > : Reply-To: solr-user@lucene.apache.org > : To: solr-user@lucene.apache.org > : Subject: Re: How can I get a monotonically inc

Re: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread Gili Nachum
If you can't use CursorMark, then I suggest not using the start parameter, instead sort asc by a unique field and and range the query to records with a field value larger then the last doc you read. Then set rows to be whatever you found can fit in memory. On Mon, Sep 28, 2015 at 10:59 PM, Ajinkya

Re: Cost of having multiple search handlers?

2015-09-28 Thread Gili Nachum
A different solution to the same need: I'm measuring response times of different collections measuring online/batch queries apart using New Relic. I've added a servlet filter that analyses the request and makes this info available to new relic over a request argument. The built in new relic solr

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Gili Nachum
Were all of shard replica in active state (green color in admin ui) before starting? Sounds like it otherwise you won't hit the replica that is out of sync. Replicas can get out of sync, and report being in sync after a sequence of stop start w/o a chance to complete sync. See if it might have hap

Re: How to know index file in OS Cache

2015-09-25 Thread Gili Nachum
Gonna try Mikhail suggestion, but just for fun you can also empirically "test" for how much of a file is in the oshr...@matrix.co.il cache with: time cat > /dev/null The faster it completes the more blocks are cached you can take a baseline after manually purging of cache - don't recall the comma

Re: Cloud Deployment Strategy... In the Cloud

2015-09-22 Thread Gili Nachum
Our auto setup sequence is: 1.deploy 3 zk nodes 2. Deploy solr nodes and start them connecting to zk. 3. Upload collection config to zk. 4. Call create collection rest api. 5. Done. SolrCloud ready to work. Don't yet have automation for replacing or adding a node. On Sep 22, 2015 18:27, "Steve Dav

Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

2015-09-21 Thread Gili Nachum
Happens to us too. Solr 4.7.2 On Sep 21, 2015 20:42, "Jeff Wu" wrote: > Hi Shai, still the same question: other peer cores which they are active > did not claim to be leader after a long time. However, some of the peer > cores claimed to be leaders at earlier time when server stopping. That's >

Re: How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Gili Nachum
lue to track progress over. On Sep 21, 2015 19:56, "Shawn Heisey" wrote: > On 9/21/2015 9:01 AM, Gili Nachum wrote: > > TimestampUpdateProcessorFactory takes place only on the leader shard, or > on > > each shard replica? > > if on each replica then I wo

Re: How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Gili Nachum
default value. > > You could construct that on a per invocation basis, using > System.getMillis() or whatever. > > Upayavira > > On Mon, Sep 21, 2015, at 07:34 AM, Gili Nachum wrote: > > I've implemented a custom solr2solr ongoing unidirectional replication > > mec

How can I get a monotonically increasing field value for docs?

2015-09-20 Thread Gili Nachum
I've implemented a custom solr2solr ongoing unidirectional replication mechanism. A Replicator (acting as solrJ client), crawls documents from SolrCloud1 and writes them to SolrCloud2 in batches. The replicator crawl logic is to read documents with a time greater/equale to the time of the last rep

Re: Does more shards in core improve performance?

2015-09-18 Thread Gili Nachum
If cpu is just 50% and adding a shard does increase indexing throughput then check for disk bottleneck. On Sep 17, 2015 18:19, "Zheng Lin Edwin Yeo" wrote: > Thank you everyone for your reply. > > > How many CPUs on that machine? How many other requests using the server? > > A) There's 8 CPU on t

Are Solr releases predictable? Every 2 months?

2015-08-02 Thread Gili Nachum
When is 5.3 coming out? When is SOLR-6273 (Cross Data Center Replication) to be released? Any way to tell?

Solr transactions naming using New Relic - anyone did it?

2015-07-07 Thread Gili Nachum
Hi, I want to have fine grained Solr calls naming using New Relic . So I can focus on monitoring those requests to Solr that mean end-user wait time. Anyone already implemented such a solution? In more detai

Optimal FS block size for "small" documents in Solr?

2015-05-30 Thread Gili Nachum
Hi, What would be an optimal FS block size to use? Using Solr 4.7.2, I have an RAID-5 of SSD drives currently configured with a 128KB block size. Can I expect better indexing/query time performance with a smaller block size (say 8K)? Considering my documents are almost always smaller than 8K. I as

Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-27 Thread Gili Nachum
To prevent it from re occurring you could monitor index size and once above a certain size threshold add another machine and split the shard between existing and new machine. On Apr 20, 2015 9:10 PM, "Rishi Easwaran" wrote: > So is there anything that can be done from a tuning perspective, to > r

Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-19 Thread Gili Nachum
I assume you don't have much free space available in your disk. Notice that during optimization (merge into a single segment) your shard replica space usage may peak to 2x-3x of it's normal size until optimization completes. Is it a problem? Not if optimization occurs over shards serially and your

Re: Solr Lazy startup - load-on-startup missing from web.xml?

2015-04-13 Thread Gili Nachum
Hi, it worked! The issue was originally on WAS 7, but has somehow regressed to WebSphere 8.5. Thanks. On Thu, Feb 19, 2015 at 10:13 PM, Chris Hostetter wrote: > : Hi! Solr is starting up "dormant" for me, until a client wake it up with > a > : REST request, or I open admin UI, only then the rema

Solr Lazy startup - load-on-startup missing from web.xml?

2015-02-19 Thread Gili Nachum
Hi! Solr is starting up "dormant" for me, until a client wake it up with a REST request, or I open admin UI, only then the remaining initializing happens. Is it something known? I can't see any load-on-startup in the web.xml, in Solr.war. Running Solr 4.7.2 over WebSphere 8.5 App loading message

Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Gili Nachum
ts > time to give you some ideas what's going on inside. > > HTH. > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Sun, Feb 8, 2015 at 6:48 AM, Gili Nachu

43sec commit duration - blocked by index merge events?

2015-02-08 Thread Gili Nachum
Hello, During a load test I noticed a commit that took 43 seconds to complete (client hard complete). Is this to be expected? What's causing it? I have a pair of machines hosting a 128M docs collection (8 shards, replication factor=2). Could it be merges? In Lucene merges happen async of commit s

Re: clarification regarding shard splitting and composite IDs

2015-02-04 Thread Gili Nachum
routing/ > http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/ > > and shard splitting here: > http://lucidworks.com/blog/shard-splitting-in-solrcloud/ > > > On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum wrote: > > > Hi, I'm also interested. When using

Re: clarification regarding shard splitting and composite IDs

2015-02-04 Thread Gili Nachum
Hi, I'm also interested. When using composite the ID, the _route_ information is not kept on the document itself, so to me it looks like it's not possible as the split API doesn't have a relevant parameter to spl

Re: Mixing 4.x SolrJ and Solr.war - compatible?

2015-01-01 Thread Gili Nachum
wrote: > On 12/31/2014 6:23 AM, Gili Nachum wrote: > > Can I use SolrJ v4.7 with the latest 4.x Solr.war? > > Should I switch the writer from Javabin, back to XML to ensure > > compatibility? > > > http://wiki.apache.org/solr/Solrj#SolrJ.2FSolr_cross-version_compat

Mixing 4.x SolrJ and Solr.war - compatible?

2014-12-31 Thread Gili Nachum
Can I use SolrJ v4.7 with the latest 4.x Solr.war? Should I switch the writer from Javabin, back to XML to ensure compatibility? http://wiki.apache.org/solr/Solrj#SolrJ.2FSolr_cross-version_compatibility I'm using CloudSolrServer. My client is running on Java6 so I can't go beyond 4.7.

Inconsistent doc value across two nodes - very simple test - what's the expected behavior?

2014-12-11 Thread Gili Nachum
I know Solr CAP properties are CP, but I don't see it happening over a very basic test - doing something wrong? With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop node1, start node2, start node1, and I get two different versions of the doc depending on which replica I query.

Re: Too much Lucene code to refactor but I like SolrCloud

2014-12-04 Thread Gili Nachum
Hi Bill. I'm migrating of a Lucene based app to SolrCloud as well. My main motivation is horizontal scalability. My backend is compelx, so the migration is not one cut off, but a long process; Currently I have both Lucene and SolrCloud, indexing to both, and querying from either of them. The migr

Re: Move a shard from one disk to another

2014-11-29 Thread Gili Nachum
Hi, I believe symlinks should work, you could try and see. Alternatively you could either set *coreRootDirectory* in solr.xml for where core discovery starts. Or keep the same root for core discovery, and tweak only the rele

Re: Replicate a collection to a 2nd SolrCloud

2014-11-25 Thread Gili Nachum
terthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On 25 November 2014 at 15:29, Gili Nachum wrote: > > Hi, > > > > > > > &g

Replicate a collection to a 2nd SolrCloud

2014-11-25 Thread Gili Nachum
Hi, *I need to replicate a collection between SolrClouds, anyone did it?*The replication style I need is one direction replicating anything that happens on my main site SolrCloud to the DR site (master->salve) I considered and decide against synchronizing the collections' shards Lucene index ov

Re: A bad idea to store core data directory over NAS?

2014-11-05 Thread Gili Nachum
ter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ > > > > On Nov 5, 2014, at 12:27 AM, Toke Eskildsen > > wrote: > > > > > On Tue, 2014-11-04 at 22:57 +0100, Gili Nachum wrote: > > >> My data center is out of SAN or l

A bad idea to store core data directory over NAS?

2014-11-04 Thread Gili Nachum
My data center is out of SAN or local disk storage - is it a big no-no to store Solr core data folder over NAS? That means 1. Lucene index 2. Transaction log. The NAS mount would be accessed by a single machine. I do care about performance. If I do go with NAS. Should I expect index corruption an

Chronological partitioning of data - what does Solr offer in this area?

2014-09-09 Thread Gili Nachum
Hello! *Does Solr support any sort of chronological ordering of data?* I would like to divide my data to: Daily, weekly, monthly, yearly parts. For performance sake. Has anyone done something like this over SolrCloud? More thoughts: While Indexing: I'm soft committing every 2 seconds so I would r

Are stored fields compressed by default?

2014-07-23 Thread Gili Nachum
Hi! I'm planning to use atomic-updates which means having all fields stored. Some docs might have text fields of up to 200K, I will feel better knowing that Solr automatically compresses stored fields (I know Lucene 4.x default codec does). *Are stored

ExtractingRequestHandler - extracted files caching?

2014-06-30 Thread Gili Nachum
Hello, I plan to use ExtractingRequestHandler to index binary files text plus app metadata (like literal.downloadCount and others) into a single document. I expect the app metadata to change much more often than the binary file itself. I would hate to have to extract text from the binary file when

Re: Large disjunction query practices

2014-06-10 Thread Gili Nachum
Yes, most cases there would be some other, better, way to accomplish what you're after, share your high level goal. By default, Lucene, and Solr, limit the max number of clauses to 1024, even before that your performance would go down the drain. 1024 On Tue, Jun 10, 2014 at 10:21 AM, Ahmet Ars

Recommended ZooKeeper topology in Production

2014-06-09 Thread Gili Nachum
Is there a recommended ZooKeeper topology for production Solr environments? I was planning: 3 ZK nodes, each on its own dedicated machine. Thinking that dedicated machines, separate from Solr servers, would keep ZK isolated from resource contention spikes that may occur on Solr. Also, if a Solr m

Re: Setup a Solr Cloud on a set of powerful machines

2014-06-09 Thread Gili Nachum
> the incoming document rate could be as high as 20k/second... That sounds like a lot of CPU eager indexing work, given the 128 CPU cores available, from indexing speed perspective: would you recommend having a similar number of solr cores created, or Solr does just a when several with a small numb

Can Atomic Updates help me to re-indexing w/o crawling external content?

2014-06-01 Thread Gili Nachum
Hello. I'm just starting out with my Solr deployment and believe there's a good chance I'll want to change how my fields are indexing in the near future, I wouldn't want to crawl the original content store again just to re-index. I was hoping that Atomic Updates (which keeps all fields as stored)