Re: Analyzing .pos files using luke

2016-07-11 Thread KNitin
x27;t changed. What versions of Solr? Be sure > DocValues isn't different (that's recently become a > default, and you haven't told us _which_ versions of > Solr you're comparing). > > Best, > Erick > > On Sun, Jul 10, 2016 at 9:27 PM, KNitin wrote: > &g

Analyzing .pos files using luke

2016-07-10 Thread KNitin
Hi, I am trying to diff between 2 versions of solr index. Both the indices have similar .doc, .pay file sizes but their .pos files are extremely different. How do i dig deeper to understand what could be causing this difference? Is there a way to just open/analyze .pos file/compare 2 .pos files

Solr /Lucene Payload loading

2016-03-03 Thread KNitin
Hi, I am indexing a bunch of payloads with terms in solr. I notice during query time that the IO reads increase a lot everytime i require the payload to be fetched. Does solr load payload from the disk all the time? Is there anyway to force it to be loaded into mem? Thanks, Nitin

SolrCloud shards marked as down and Does not recovery connection to zk

2016-02-18 Thread KNitin
Hi, I am indexing about 5M docs in a 4 shard and 1 replica setup. During indexing one of the shards is marked as down in zookeeper but when i tail the logs all the updates are received in the shard and a hard commit at the end of the job also succeeds. (The auto commit is set to trigger every 10

Re: SolrCloud shard marked as down and "reloading" collection doesnt restore it

2016-02-11 Thread KNitin
After more debugging, I figured out that it is related to this: https://issues.apache.org/jira/browse/SOLR-3274 Is there a recommended fix (apart from running a zk ensemble?) On Thu, Feb 11, 2016 at 10:29 AM, KNitin wrote: > Hi, > > I noticed while running an indexing job (2M doc

SolrCloud shard marked as down and "reloading" collection doesnt restore it

2016-02-11 Thread KNitin
Hi, I noticed while running an indexing job (2M docs but per doc size could be 2-3 MB) that one of the shards goes down just after the commit. (Not related to OOM or high cpu/load). This marks the shard as "down" in zk and even a reload of the collection does not recover the state. There are n

Re: Monitor backup progress when location parameter is used.

2016-01-31 Thread KNitin
You can also checkout : https://github.com/bloomreach/solrcloud-haft for doing backup and restore of your solrcloud collections. On Fri, Jan 15, 2016 at 12:23 AM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: > Ok thanks, I also think that it's worth a jira, because for resto

Transaction Log rotation /retention setup

2016-01-22 Thread KNitin
Hi, I was wondering if txn logs obey any log rotation setup rules. Sometimes indexing can get pretty large and txn logs grow upto tens of gigabytes(occupying disk which eventually needs to be cleaned up) or as indexing is progressing and a commit had been made, I want to delete old txn log to save

Re: Specifying a different txn log directory

2016-01-09 Thread KNitin
> > > > Best, > > Erick > > > > On Fri, Jan 8, 2016 at 7:47 PM, KNitin wrote: > > > Hi, > > > > > > How do I specify a different directory for transaction logs? I tried > > using > > > the updatelog entry in solrconfig.xml a

Specifying a different txn log directory

2016-01-08 Thread KNitin
Hi, How do I specify a different directory for transaction logs? I tried using the updatelog entry in solrconfig.xml and reloaded the collection but that does not seem to work. Is there another setting I need to change? Thanks Nitin

Re: Field Size per document in Solr

2016-01-05 Thread KNitin
elds. > > Your best bet is to index field lengths into Solr alongside the field > values. You could use an UpdateProcessor to do this if you want to do it > in Solr. > > Upayavira > > On Tue, Jan 5, 2016, at 12:39 AM, KNitin wrote: > > Hi, > > > > I want t

Field Size per document in Solr

2016-01-04 Thread KNitin
Hi, I want to get the size of individual fields per document (or per index) in solrcloud. Is there a way to do this using exiting solr or lucene api? *Use case*: I have a few dynamic fields which may or may not be populated everyday depending on certain conditions. I also do faceting and some cu

Re: Max indexing threads & RamBuffered size

2015-12-07 Thread KNitin
> Best, > Erick > > On Sat, Dec 5, 2015 at 8:07 PM, KNitin > wrote: > > I have an extremely large indexing load (per doc size of 4-5 Mb with over > > 100M docs). I have auto commit settings to flush to disk (with open > > searcher as false) every 20 seconds. Even with t

Re: Max indexing threads & RamBuffered size

2015-12-05 Thread KNitin
very high indexing loads or > really long autocommit times, you'll rarely hit it anyway since this > memory is also flushed when you do any flavor of hard commit. > > Best, > Erick > > On Fri, Dec 4, 2015 at 4:55 PM, KNitin wrote: > > Hi, > > > > The ma

Max indexing threads & RamBuffered size

2015-12-04 Thread KNitin
Hi, The max indexing threads in the solrconfig.xml is set to 8 by default. Does this mean only 8 concurrent indexing threads will be allowed per collection level? or per core level? Buffered size : This seems to be set at 64Mb. If we have beefier machine that can take more load, can we set this t

Re: Generating Index offline and loading into solrcloud

2015-11-19 Thread KNitin
> of documents based on uniqueKey. That process is > not guaranteed by MRIT is all. > > Best, > Erick > > On Thu, Nov 19, 2015 at 12:56 PM, KNitin wrote: > > Thanks, Eric. Looks like MRIT uses Embedded solr running per > > mapper/reducer and uses that to index documents.

Re: Generating Index offline and loading into solrcloud

2015-11-19 Thread KNitin
ption will automatically merge the indexes into > the appropriate > running Solr instances. > > One caveat. This tool doesn't handle _updating_ documents. So if you > run it twice > on the same data set, you'll have two copies of every doc. It's > designed as a bul

Re: Generating Index offline and loading into solrcloud

2015-11-19 Thread KNitin
> *Sameer Maggon* > Measured Search > www.measuredsearch.com <http://measuredsearch.com/> > > On Thu, Nov 19, 2015 at 11:17 AM, KNitin wrote: > > > Hi, > > > > I was wondering if there are existing tools that will generate solr > index > > offline (in solrcloud

Generating Index offline and loading into solrcloud

2015-11-19 Thread KNitin
Hi, I was wondering if there are existing tools that will generate solr index offline (in solrcloud mode) that can be later on loaded into solrcloud, before I decide to implement my own. I found some tools that do only solr based index loading (non-zk mode). Is there one with zk mode enabled?

Re: Data Import Handler / Backup indexes

2015-11-17 Thread KNitin
afaik Data import handler does not offer backups. You can try using the replication handler to backup data as you wish to any custom end point. You can also try out : https://github.com/bloomreach/solrcloud-haft. This helps backup solr indices across clusters. On Tue, Nov 17, 2015 at 7:08 AM, Br

Re: Best way to backup and restore an index for a cloud setup in 4.6.1?

2015-11-17 Thread KNitin
You can use solrcloud haft : https://github.com/bloomreach/solrcloud-haft We use it in our production against 4.6.1. Nitin On Monday, May 11, 2015, Shalin Shekhar Mangar wrote: > Hi John, > > There are a few HTTP APIs for replication, one of which can let you take a > backup of the index. Rest

Re: Replication as backup in SolrCloud

2015-11-15 Thread KNitin
We built and open sourced haft precisely for such use cases. https://github.com/bloomreach/solrcloud-haft You can clone an entire cluster or selective collections between clusters. It has only been tested upto solr 4.10. Let me know if you run into i

Re: copy data between collection

2015-11-14 Thread KNitin
Yes that is correct. https://github.com/bloomreach/solrcloud-haft helps precisely with that. You can clone an entire cluster or selective collections between clusters. It has only been tested upto solr 4.10 Let me know if you run into issues Nitin On Mon, Oct 26, 2015 at 9:46 AM, Jeff Wartes wro

Re: Disabling Query result cache at runtime

2015-11-13 Thread KNitin
e}foo:bar&... work in this case? > > On Fri, Nov 13, 2015 at 9:31 PM, KNitin wrote: > > > Hi, > > > > Is there a way to make solr not cache the results when we send the > query? > > (mainly for query result cache). I need to still enable doc and filter >

Disabling Query result cache at runtime

2015-11-13 Thread KNitin
Hi, Is there a way to make solr not cache the results when we send the query? (mainly for query result cache). I need to still enable doc and filter caching. Let me know if this is possible, Thanks Nitin

Raw lucene query for a given solr query

2015-06-15 Thread KNitin
Hi, We have a few custom solrcloud components that act as value sources inside solrcloud for boosting items in the index. I want to get the final raw lucene query used by solr for querying the index (for debugging purposes). Is it possible to get that information? Kindly advise Thanks, Nitin

Re: On the fly reloading of solr core properties

2015-04-29 Thread KNitin
Hi I would really appreciate it if any of you can share your insights with such a use case. Thanks much Nitin On Tuesday, April 28, 2015, KNitin wrote: > Hi > > In Solrcloud (4.6.1) every time a property/value is changed in > solrcore.properties file, a core/collection reload

On the fly reloading of solr core properties

2015-04-28 Thread KNitin
Hi In Solrcloud (4.6.1) every time a property/value is changed in solrcore.properties file, a core/collection reload is needed to pick up the new values. Core/Collection reloads for large collections (example 100 shards) is very expensive (performance wise) and can pose a threat to the collectio

Understanding SolrCloud Restart Behavior - 4.6 onwards

2015-01-12 Thread KNitin
Hi I am trying to understand the process/node restart flow in a SolrCloud Cluster . What are the exact set of steps occur (like core/collection recovery, zk interaction etc) when a node is getting restarted? I am looking to implement some business logic at a collection/node level when solr is r

Re: Loading an index (generated by map reduce) in SolrCloud

2014-09-23 Thread KNitin
ive option to auto-merge them. Your > > > Solr instances need to be running over HDFS > > > though. > > > > > > If you don't have Solr running over HDFS, you can > > > just copy the results for each shard "to the right place". > >

Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread KNitin
Hello I have generated a lucene index (with 6 shards) using Map Reduce. I want to load this into a SolrCloud Cluster inside a collection. Is there any out of the box way of doing this? Any ideas are much appreciated Thanks Nitin

Re: Disabling transaction logs

2014-08-13 Thread KNitin
e's the > link: > https://issues.apache.org/jira/browse/SOLR-5473 > > A lot of work has already been done on that one and hopefully, it > should be in trunk soon. > > > On Wed, Aug 13, 2014 at 3:13 PM, KNitin wrote: > > Thanks, Mark. Yes I keep track of the overseer a

Re: Disabling transaction logs

2014-08-13 Thread KNitin
hat 30%. > Might open a JIRA with some logs. > > It can help if you restart the overseer node last. > > There are likely some improvements around this post 4.6. > > -- > Mark Miller > about.me/markrmiller > > On August 13, 2014 at 12:05:27 PM, KNitin (nitin.t...@gmai

Re: Disabling transaction logs

2014-08-13 Thread KNitin
Thank u all! Yes I want to disable it for testing purposes The main issue is that rolling restart of solrcloud for 1000 collections is extremely unreliable and slow. More than 30% of the collections fail to recover. What are some good guidelines to follow while restarting a massive cluster like t

Disabling transaction logs

2014-08-07 Thread KNitin
Hello I am using solr 4.6.1 with over 1000 collections and 8 nodes. Restarting of nodes takes a long time (especially if we have indexing running against it) . I want to see if disabling transaction logs can help with a better robust restart. However I can't see any docs around disabling txn logs

Re: Cannot get shard id error - Hitting limits on creating collections

2014-04-08 Thread KNitin
Thanks, Shawn. Adding it to all clients and servers worked On Tue, Apr 8, 2014 at 3:37 PM, KNitin wrote: > Thanks. I missed "the clients" part from doc. Will try and update the > results here > > > > > On Tue, Apr 8, 2014 at 3:27 PM, Shawn Heisey wrote: > &g

Re: Cannot get shard id error - Hitting limits on creating collections

2014-04-08 Thread KNitin
Thanks. I missed "the clients" part from doc. Will try and update the results here On Tue, Apr 8, 2014 at 3:27 PM, Shawn Heisey wrote: > On 4/8/2014 4:13 PM, KNitin wrote: > >> I have already raised the jute.buffersize to 5Mb on the zookeeper server >> side but st

Re: Cannot get shard id error - Hitting limits on creating collections

2014-04-08 Thread KNitin
Thanks, Shawn I have already raised the jute.buffersize to 5Mb on the zookeeper server side but still hitting the same problem. Should i make any changes on the solr server side for this (client side changes?) On Tue, Apr 8, 2014 at 9:09 AM, Shawn Heisey wrote: > On 4/8/2014 9:48 AM, KNi

Cannot get shard id error - Hitting limits on creating collections

2014-04-08 Thread KNitin
Hi I am running solr cloud 4.3.1 (there is a plan to upgrade to later versions but that would take a few months). I noticed a very peculiar solr behavior in solr that beyond *2496* cores I am unable to create any more collections due to this error *Could not get shard id for core.* I also n

Re: Race condition in Leader Election

2014-03-06 Thread KNitin
I am using 4.3.1. On Thu, Mar 6, 2014 at 11:48 AM, Mark Miller wrote: > Are you using an old version? > > - Mark > > http://about.me/markrmiller > > On Mar 6, 2014, at 11:50 AM, KNitin wrote: > > > Hi > > > > When restarting a node in solr

Race condition in Leader Election

2014-03-06 Thread KNitin
Hi When restarting a node in solrcloud, i run into scenarios where both the replicas for a shard get into "recovering" state and never come up causing the error "No servers hosting this shard". To fix this, I either unload one core or restart one of the nodes again so that one of them becomes the

Re: Solr Cloud Cores, Zookeepers and Zk Data

2014-03-05 Thread KNitin
I should also mention that the "watch count" is in the order of 400-500 but the maxClientConnections is 100. Not sure if this has to do with the issue but just putting it out there On Wed, Mar 5, 2014 at 11:37 AM, KNitin wrote: > Hi > > I am trying to understand the

Solr Cloud Cores, Zookeepers and Zk Data

2014-03-05 Thread KNitin
Hi I am trying to understand the flow between zk and SolrCloud nodes during writes and restarts. *Writes*: When an indexing job runs , it looks like the leader for every shard is identified from zk and the write requests goes to the leader and then eventually data flows to replicas. Question

Re: SolrCloud Startup

2014-03-04 Thread KNitin
PM, Shawn Heisey wrote: > On 3/4/2014 3:09 PM, KNitin wrote: > >> I did the following as you suggested. I have a lib dir under /mnt/solr/ >> (this is the solr.solr.home dir) and moved all my jars in it. I do not >> have >> anySharedLib or lib references in m

Re: SolrCloud Startup

2014-03-04 Thread KNitin
. Should I specify anywhere to use /mnt/solr/lib/ as the lib path to use anywhere? - Nitin On Mon, Mar 3, 2014 at 3:06 PM, KNitin wrote: > Thanks, Shawn. Right now my solr.solr.home is not being passed from the > java runtime > > Lets say /mnt/solr/ is my solr root. I can add all

Re: SolrCloud Startup

2014-03-03 Thread KNitin
: > On 3/3/2014 3:30 PM, KNitin wrote: > >> A quick ping on this. To give more stats, I have 100's of collections on >> every node. The time it takes for one collection to boot up /loadonStartup >> is around 10-20 seconds ("and sometimes even 1 minute). I do not have a

Re: SolrCloud Startup

2014-03-03 Thread KNitin
ks - Nitin On Wed, Feb 26, 2014 at 3:06 PM, KNitin wrote: > Thanks, Shawn. I will try to upgrade solr soon > > Reg firstSearcher: I think it does nothing now. I have configured to use > ExternalFileLoader but there the external file has no contents. Most of the > queries h

Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Is there a way to dump the contents of permgen and look at which classes are occupying the most memory in that? - Nitin On Mon, Mar 3, 2014 at 11:19 AM, KNitin wrote: > Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud > (containing custom parsing, analyzers).

Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
s are practically 0 for the large collection since our queries are tail by nature Thanks Nitin On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > On 3/3/2014 1:54 AM, KNitin wrote: > >> 3. 2.8 Gb - Perm Gen (I am guessing this is beca

Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread KNitin
he is very small for query results or docs. We run with 10K > or more entries for those. The filter cache size depends on your usage. We > have only a handful of different filter queries, so a tiny cache is fine. > > What is your hit rate on the caches? > > wunder > > On Ma

Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread KNitin
Hi I have very large index for a few collections and when they are being queried, i see the Old gen space close to 100% Usage all the time. The system becomes extremely slow due to GC activity right after that and it gets into this cycle very often I have given solr close to 30G of heap in a 65 G

Re: Perm Gen issues in SolrCloud

2014-02-28 Thread KNitin
gt; http://www.brokenbuild.com/blog/2006/08/04/java-jvm-gc-permgen > -and-memory-options/ > " > > You can see the conversation from here: > http://search-lucene.com/m/iMaR11lgj3Q1/permgen&subj=PermGen+OOM+Error > > Thanks; > Furkan KAMACI > > > 2014-02

Perm Gen issues in SolrCloud

2014-02-28 Thread KNitin
Hi I am seeing the Perm Gen usage increase as i keep adding more collections. What kind of strings get interned in solr? (Only schema , fields, collection metadata or the data itself?) Will Permgen space (atleast interned strings) increase proportional to the size of the data in the collections

Re: Tracing Solr Query Execution and Performance

2014-02-26 Thread KNitin
bugQuery > parameters on for inter-node shard queries and then add that to the > aggregated response (if debug/debugQuery was specified.) Sounds worth a > Jira. > > -- Jack Krupansky > > -Original Message- From: KNitin > Sent: Wednesday, February 26, 2014 5:25 PM &

Re: SolrCloud Startup

2014-02-26 Thread KNitin
warm the first Searcher/new Searcher? Thanks Nitin On Tue, Feb 25, 2014 at 4:12 PM, Shawn Heisey wrote: > On 2/25/2014 4:30 PM, KNitin wrote: > >> Jeff : Thanks. I have tried reload before but it is not reliable (atleast >> in 4.3.1). A few cores get initialized and few

Tracing Solr Query Execution and Performance

2014-02-26 Thread KNitin
Hi there I have a few very expensive queries (atleast thats what the QTime tells me) that is causing high CPU problems on a few nodes. Is there a way where I can "trace" or do an "explain" on the solr query to see where it spends more time? More like profiling on a per sub query basis? I have t

Re: SolrCloud Startup

2014-02-25 Thread KNitin
Erick: My autocommit is set to trigger every 30 seconds with openSearcher=false. The autocommit for soft commits are disabled On Tue, Feb 25, 2014 at 3:30 PM, KNitin wrote: > Jeff : Thanks. I have tried reload before but it is not reliable (atleast > in 4.3.1). A few cores get initializ

Re: SolrCloud Startup

2014-02-25 Thread KNitin
Jeff : Thanks. I have tried reload before but it is not reliable (atleast in 4.3.1). A few cores get initialized and few dont (show as just recovering or down) and hence had to move away from it. Is it a known issue in 4.3.1? Shawn,Otis,Erick Yes I have reviewed the page before and have given 1

SolrCloud Startup

2014-02-24 Thread KNitin
Hi I have a 4 node solrcloud cluster with more than 50 collections with 4 shards each. Everytime I want to make a schema change, I upload configs to zookeeper and then restart all nodes. However the restart of every node is very slow and takes about 20-30 minutes per node. Is it recommended to m

Re: Solr Segments, Segment Merges,Optimize

2014-02-23 Thread KNitin
I should also mention that apart from committing, the pipeline also does a bunch of deletes for stale documents (based on a custom version field). The # of deletes can be very significant causing the % of deleted documents to be easily 40-50% of the index itself Thanks KNitin On Sun, Feb 23

Re: Solr Segments, Segment Merges,Optimize

2014-02-23 Thread KNitin
gh CPU utilization. I suppose you could > issue a commit and see what difference that made. > > I rather doubt that the # of segments is the underlying issue, but that's > nothing but a SWAG... > > Best, > Erick > > > > > On Sat, Feb 22, 2014 at 6:16 P

Re: Solr Segments, Segment Merges,Optimize

2014-02-22 Thread KNitin
is number will have the effect of more aggressively > merging segments with a greater % of deleted docs. But these are already > pretty heavily weighted for merging already... > > > Best, > Erick > > > On Sat, Feb 22, 2014 at 1:23 PM, KNitin wrote: > > > Hi >

Solr Segments, Segment Merges,Optimize

2014-02-22 Thread KNitin
Hi I have the following questions 1. I have a job that runs for 3-4 hours continuously committing data to a collection with auto commit of 30 seconds. Does it mean that every 30 seconds I would get a new solr segment ? 2. My current segment merge policy is set to 10. Will merger al

Re: Tweaking Solr Query Result Cache

2014-02-22 Thread KNitin
tisfy a few pages of results. > > If you mean by "tail queries" that there is very little repetition of > queries, then > why bother with a cache at all? If the hit ratio is going towards 0 it's > not doing > you enough good to matter. > > > FWIW, > Eri

Tweaking Solr Query Result Cache

2014-02-20 Thread KNitin
Hello I have a 4 node cluster running Solr cloud 4.3.1. I have a few large collections sharded 8 ways across all the 4 nodes (with 2 shards per node). The size of the shard for the large collections is around 600-700Mb containing around 250K+ documents. Currently the size of the query cache is

Re: CloudServer 4.2.1 and SolrCloud 4.3.1

2014-02-20 Thread KNitin
Thanks, Shawn. On Thu, Feb 20, 2014 at 11:29 AM, Shawn Heisey wrote: > On 2/20/2014 12:09 PM, KNitin wrote: > >> I have a question on CloudServer client for solrcloud. How does >> CloudServer route requests to solr? Does it use round robin internally or >> does

CloudServer 4.2.1 and SolrCloud 4.3.1

2014-02-20 Thread KNitin
Hi I have a question on CloudServer client for solrcloud. How does CloudServer route requests to solr? Does it use round robin internally or does it take into account any other parameter for the node (example how many replicas it has, etc) ? Thanks Nitin