Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-11 Thread Rahul Goswami
iii) Wait for 5-10 seconds between each subsequent node start Hope this helps. Best, Rahul On Thu, Feb 11, 2021 at 12:03 PM mmb1234 wrote: > Hello, > > On reboot of one of the solr nodes in the cluster, we often see a > collection's shards with > 1. LEADER replica in DO

Re: StandardTokenizerFactory doesn't split on underscore

2021-01-09 Thread Rahul Goswami
t on underscores if that is your use case. > > On Sat, Jan 9, 2021 at 2:58 PM Rahul Goswami > wrote: > > > Nope. The underscore is preserved right after tokenization even before it > > reaches any filters. You can choose the type "text_general" and try an &

Re: StandardTokenizerFactory doesn't split on underscore

2021-01-09 Thread Rahul Goswami
Nope. The underscore is preserved right after tokenization even before it reaches any filters. You can choose the type "text_general" and try an index time analysis through the "Analysis" page on Solr Admin UI. Thanks, Rahul On Sat, Jan 9, 2021 at 8:22 AM xiefengchan

StandardTokenizerFactory doesn't split on underscore

2021-01-07 Thread Rahul Goswami
this behavior is included in the documentation since it is similar to the behavior with periods. https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer "Periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names. " Thanks, Rahul

Re: Need urgent help -- High cpu on solr

2020-10-16 Thread Rahul Goswami
ation nevertheless. https://backstage.forgerock.com/knowledge/kb/article/a39551500 The hex number the author talks about in the link above is the native thread id. Best, Rahul On Wed, Oct 14, 2020 at 8:00 AM Erick Erickson wrote: > Zisis makes good points. One other thing is I’d look to >

Re: Question about solr commits

2020-10-08 Thread Rahul Goswami
updates. Is this understanding correct ? Thanks, Rahul On Wed, Oct 7, 2020 at 11:39 PM yaswanth kumar wrote: > Thank you very much both Eric and Shawn > > Sent from my iPhone > > > On Oct 7, 2020, at 10:41 PM, Shawn Heisey wrote: > > > > On 10/7/2020 4:40 PM, yaswant

Re: Solr 7.7 - Few Questions

2020-10-06 Thread Rahul Goswami
l 3. How to scale up the servers for the better performance? >> This is too open ended a question and depends on a lot of factors specific to your environment and use-case :) - Rahul On Tue, Oct 6, 2020 at 4:26 PM Manisha Rahatadkar < manisha.rahatad...@anjusoftware.com> wrote: > Hi

Re: Solr 7.7 - Few Questions

2020-10-04 Thread Rahul Goswami
Charlie, Thanks for providing an alternate approach to doing this. It would be interesting to know how one could go about organizing the docs in this case? (Nested documents?) How would join queries perform on a large index(200 million+ docs)? Thanks, Rahul On Fri, Oct 2, 2020 at 5:55 AM

Re: Solr 7.7 - Few Questions

2020-10-01 Thread Rahul Goswami
count-filter You'll need to configure it in the schema for the "index" analyzer for the data type of the field with large text. Indexing documents of the order of half a GB will definitely come to hurt your operations, if not now, later (think OOM, extremely slow atomic updates, long r

Re: ApacheCon at Home 2020 starts tomorrow!

2020-09-29 Thread Rahul Goswami
Thanks for sharing this Anshum. Day 1 had some really interesting sessions. Missed out on a couple that I would have liked to listen to. Are the recordings of these sessions available anywhere? -Rahul On Mon, Sep 28, 2020 at 7:08 PM Anshum Gupta wrote: > Hey everyone! > > ApacheCo

Re: Delete from Solr console fails

2020-09-26 Thread Rahul Goswami
that I would still expect delete by id to execute in reasonable time, so I would start by looking at what is s eating up the CPU in your request. -Rahul On Sat, Sep 26, 2020 at 4:50 AM Goutham Tholpadi wrote: > Thanks Dominique! I just tried deleting a single document using its id. I > &

Re: Delete from Solr console fails

2020-09-24 Thread Rahul Goswami
Goutham, Is the field you are trying to delete by indexed=true in the schema ? If the uniqueKey is indexed=true, does delete by id work for you? ( uniqueKey:value) Also, instead of "Solr Command" if you choose the Document type as "XML" does it make any difference? Rahul On

Re: How to remove duplicate tokens from solr

2020-09-17 Thread Rahul Goswami
rect me if I am wrong!) -Rahul On Thu, Sep 17, 2020 at 2:56 PM Rajdeep Sahoo wrote: > If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt" > we need to remove the duplicates and search with tshirt. > > > On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalo

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Rahul Goswami
I agree with Phill, Noble and Ilan above. The problematic term is "slave" (not master) which I am all for changing if it causes less regression than removing BOTH master and slave. Since some people have pointed out Github changing the "master" terminology, in my personal opinion, it was not a meas

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Rahul Goswami
+1 on avoiding SolrCloud terminology. In the interest of keeping it obvious and simple, may I I please suggest primary/secondary? On Wed, Jun 17, 2020 at 5:14 PM Atita Arora wrote: > I agree avoiding using of solr cloud terminology too. > > I may suggest going for "prime" and "clone" > (Short an

Re: when to use docvalue

2020-05-20 Thread Rahul Goswami
) stored=false and docValues=true 3) stored=true and docValues=true Thanks, Rahul On Tue, May 19, 2020 at 5:55 PM Erick Erickson wrote: > They are _absolutely_ able to be used together. Background: > > “In the bad old days”, there was no docValues. So whenever you needed > to facet/so

Re: Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
Hoss, Thank you for such a succinct explanation! I was not aware of the order of lookups (queryResultCache followed by filterCache). Makes sense now. Sorry for the false alarm! Rahul On Mon, Apr 20, 2020 at 4:04 PM Chris Hostetter wrote: > : 4) A query with different fq. > :

Re: Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
quot;item_manu:samsung manu:apple":"SortedIntDocSet{size=2,ramUsed=40 bytes}", "warmupTime":0, "maxRamMB":-1, 5) A query with the same fq again (fq=manu:samsung OR manu:apple)the numbers don't get update for this fq hereafter for subseque

Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
ted. However, if I search with the same fq again, I expect the lookup and hits count to increase, but it doesn't. This ultimately results in an incorrect hitratio. I tried this scenario on Solr 7.2.1, 7.7.2 and 8.5 and observe the same behavior on all three versions. Is this a bug or am I missing something here? Thanks, Rahul

Re: Zookeeper upgrade required with Solr upgrade?

2020-02-13 Thread Rahul Goswami
eb 13, 2020 at 9:26 AM Erick Erickson wrote: > That should be OK. There were no code changes necessary for that upgrade. > see SOLR-13363 > > > On Feb 12, 2020, at 5:34 PM, Rahul Goswami > wrote: > > > > Hello, > > We are running a SolrCloud (7.2.1) cluster an

Zookeeper upgrade required with Solr upgrade?

2020-02-12 Thread Rahul Goswami
updates requests for a 2 node SolrCloud cluster with the older (3.4.10) zookeeper and it seemed to work fine. But just want to know if there are any caveats I should be aware of. Thanks, Rahul

Performance comparison for wildcard searches

2020-02-03 Thread Rahul Goswami
Hello, I am working with Solr 7.2.1 and had a question regarding the performance of wildcard searches. q=*:* vs q=id:* vs q=id:[* TO *] Can someone please rank them in the order of performance with the underlying reason? Thanks, Rahul

Re: How expensive is core loading?

2020-01-29 Thread Rahul Goswami
l documents and the index size (to gather stats about the Solr server), is the amount of memory consumed proportional to the index size in some way? Thanks, Rahul On Wed, Jan 29, 2020 at 6:43 PM Shawn Heisey wrote: > On 1/29/2020 3:01 PM, Rahul Goswami wrote: > > 1) How expensive is c

Re: How expensive is core loading?

2020-01-29 Thread Rahul Goswami
Thanks for your response Walter. But I could not find a Java api for Luke for writing my tool. Is there one? I also tried using the LukeRequestHandler that comes with Solr, but invoking it causes the Solr core to be loaded. Rahul On Wed, Jan 29, 2020 at 5:20 PM Walter Underwood wrote: >

How expensive is core loading?

2020-01-29 Thread Rahul Goswami
production setup with above configuration? Thanks, Rahul

Solr indexing performance

2019-12-05 Thread Rahul Goswami
for better application design considerations. Thanks, Rahul

Re: [ANNOUNCE] Apache Solr 8.3.1 released

2019-12-04 Thread Rahul Goswami
s. Is it linked appropriately? Or is it some access rights issue for non-PMC members like me ? Thanks, Rahul On Wed, Dec 4, 2019 at 7:12 AM Noble Paul wrote: > Thanks ishan > > On Wed, Dec 4, 2019, 3:32 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> > wrote: > &g

Re: Solr 8.2 indexing issues

2019-11-21 Thread Rahul Goswami
Hi Sujatha, How did you upgrade your cluster ? Did you restart each node in the cluster one by one after upgrade (while other nodes were running on 6.6.2) or did you bring down the entire cluster and bring up one upgraded node at a time? Thanks, Rahul On Thu, Nov 14, 2019 at 7:03 AM Paras

Re: Upgrade solr from 7.2.1 to 8.2

2019-11-19 Thread Rahul Goswami
Hello, Just wanted to follow up in case my question fell through the cracks :) Would appreciate help on this. Thanks, Rahul On Fri, Nov 15, 2019 at 5:32 PM Rahul Goswami wrote: > Hello, > > We are planning to upgrade our SolrCloud cluster from 7.2.1 (hosted on > Windows server)

Upgrade solr from 7.2.1 to 8.2

2019-11-15 Thread Rahul Goswami
n that case? Thanks in advance! Regards, Rahul

Re: Custom update processor not kicking in

2019-09-19 Thread Rahul Goswami
any further custom processors other than the run update processor in standalone mode? Alternatively, is there a way I can get a handle on a complete document once it’s reconstructed from an atomic update? Thanks, Rahul On Thu, Sep 19, 2019 at 7:06 AM Erick Erickson wrote: > _Why_ is reindex

Re: Custom update processor not kicking in

2019-09-18 Thread Rahul Goswami
the processAdd() of the processor. Is this an expected behavior? Regards, Rahul On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson wrote: > It Depends (tm). This is a little confused. Why do you have > distributed processor in stand-alone Solr? Stand-alone doesn't, well, > distrib

Custom update processor not kicking in

2019-09-18 Thread Rahul Goswami
don’t see any log lines from the processAdd() method. Any inputs on why the processor is getting skipped if placed after distributed processor? Thanks, Rahul

java.lang.OutOfMemoryError: Java heap space

2019-07-24 Thread Mandava, Rahul
I am using SOLR version 6.6.0 and the heap size is set to 512 MB, I believe which is default. We do have almost 10 million documents in the index, we do perform frequent updates (we are doing soft commit on every update: heap issue was seen with and without soft commit) to the index and obviousl

Re: SolrCloud indexing triggers merges and timeouts

2019-07-12 Thread Rahul Goswami
y one huge document ? 2) If yes, does this flush create a segment with just one document ? 3) Heap dump analysis shows large (>350 MB) instances of DocumentWritersPerThread. Does one instance of this class correspond to one document? Help is much appreciated. Thanks, Rahul On Fri, Jul 5, 20

Re: SolrCloud indexing triggers merges and timeouts

2019-07-04 Thread Rahul Goswami
Shawn,Erick, Thank you for the explanation. The merge scheduler params make sense now. Thanks, Rahul On Wed, Jul 3, 2019 at 11:30 AM Erick Erickson wrote: > Two more tidbits to add to Shawn’s explanation: > > There are heuristics built in to ConcurrentMergeScheduler. > From

Re: SolrCloud indexing triggers merges and timeouts

2019-07-02 Thread Rahul Goswami
iculty wrapping my head around this, and would appreciate if you could help clear it for me. Thanks, Rahul On Thu, Jun 13, 2019 at 7:33 AM Shawn Heisey wrote: > On 6/6/2019 9:00 AM, Rahul Goswami wrote: > > *OP Reply* : Total 48 GB per node... I couldn't see another software > us

Re: Configuration recommendation for SolrCloud

2019-07-01 Thread Rahul Goswami
beefy physical servers at disposal for this deployment. If we go with 4 SolrClouds then we would have 4x8=32 nodes (Solr instances) running across these 4 physical servers. Any issues that you might see with this configuration or additional considerations that I might be missing? Thanks, Rahul

Configuration recommendation for SolrCloud

2019-06-25 Thread Rahul Goswami
efficient for our use case considering moderate-heavy indexing and search load? Would also like to know the tradeoffs involved if any. Thanks in advance! Regards, Rahul

Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-24 Thread Rahul Goswami
r this part is different on the master. Regards, Rahul On Thu, Jun 20, 2019 at 8:22 PM Rahul Goswami wrote: > Hi Gus, > Thanks for the response and referencing the umbrella JIRA for these kind > of issues. I see that it won't solve the problem since the builder object > wh

Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-20 Thread Rahul Goswami
binary to try the patch nevertheless, but it didn't help as I anticipated. I'll update the JIRA and submit a patch. Thank you, Rahul On Thu, Jun 20, 2019 at 11:35 AM Gus Heck wrote: > Hi Rahul, > > Did you try the patch int that issue? Also food for thought: > https://is

Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-18 Thread Rahul Goswami
teShardHandlerConfig().getDistributedSocketTimeout(); } I found this open JIRA on this issue: https://issues.apache.org/jira/browse/SOLR-12550?jql=text%20~%20%22distribUpdateSoTimeout%22 Should I update the JIRA with this ? Thanks, Rahul On Thu, Jun 13, 2019 at 12:00 AM Rahul Goswami wrote: > Hello, >

SolrCloud: Configured socket timeouts not reflecting

2019-06-12 Thread Rahul Goswami
, is there a JIRA for it ? Thanks, Rahul

Re: SolrCloud indexing triggers merges and timeouts

2019-06-12 Thread Rahul Goswami
/measures. Thanks, Rahul On Thu, Jun 6, 2019 at 11:00 AM Rahul Goswami wrote: > Thank you for your responses. Please find additional details about the > setup below: > > We are using Solr 7.2.1 > > > I have a solrcloud setup on Windows server with below config: > >

Re: SolrCloud indexing triggers merges and timeouts

2019-06-06 Thread Rahul Goswami
ndex.ConcurrentMergeScheduler", "maxMergeCount":2, "maxThreadCount":2}, Thanks, Rahul On Wed, Jun 5, 2019 at 4:24 PM Shawn Heisey wrote: > On 6/5/2019 9:39 AM, Rahul Goswami wrote: > > I have a solrcloud setup on Windows server with below config: > >

SolrCloud indexing triggers merges and timeouts

2019-06-05 Thread Rahul Goswami
that this is the cause, and the timeouts and recoveries are the symptoms. Is my understanding correct? If yes, what steps could I take to help the situation. I do see that the difference between "Num Docs" and "Max Docs" is about 20%. Would appreciate your help. Thanks, Rahul

Re: Graph query extremely slow

2019-06-01 Thread Rahul Goswami
, since the parameters of this fq don't change shouldn't I expect to gain any advantage out of using the filterCache? Thanks, Rahul On Wed, May 22, 2019 at 7:40 AM Toke Eskildsen wrote: > On Wed, 2019-05-15 at 21:37 -0400, Rahul Goswami wrote: > > fq={!graph from=from_field to=

Solr exception while retrieving documents

2019-05-31 Thread Mandava, Rahul
on in Solr log files. I am thinking that seeing error in log files doesn't hurt as long as the updates and get's work fine, but still would like to know how to eradicate these errors from happening. Thanks Rahul Mandava

Re: Graph query extremely slow

2019-05-19 Thread Rahul Goswami
Hello experts, Just following up in case my previous email got lost in the big stack of queries. Would appreciate any help on optimizing a graph query. Or any pointers on the direction to investigate. Thanks, Rahul On Wed, May 15, 2019 at 9:37 PM Rahul Goswami wrote: > Hello, >

Graph query extremely slow

2019-05-15 Thread Rahul Goswami
optimizations that I could try? Thanks, Rahul

Re: Delay searches till log replay finishes

2019-03-21 Thread Rahul Goswami
;ll continue to monitor this for now. Thanks, Rahul On Fri, Mar 8, 2019 at 2:14 PM Erick Erickson wrote: > (1) no, and Shawn’s comments are well taken. > > (2) bq. is the number of segments would drastically increase > > Not true. First of all, TieredMergePolicy will take care of m

Re: Delay searches till log replay finishes

2019-03-08 Thread Rahul Goswami
autoCommit interval (with openSearcher=false) is the number of segments that would drastically increase, eventually causing merges,slower searches etc. Thanks, Rahul On Fri, Mar 8, 2019 at 12:08 PM Erick Erickson wrote: > Yes, you’ll get stale values. There’s no way I know of to change that, >

Re: Delay searches till log replay finishes

2019-03-08 Thread Rahul Goswami
1 On Thu, Mar 7, 2019 at 11:36 PM Zheng Lin Edwin Yeo wrote: > Hi, > > Do you mean that when you startup Solr, it will automatically do the search > request even before the Solr is fully started up? > > Regards, > Edwin > > > On Fri, 8 Mar 2019 at 10:13, Rahul Goswami

Delay searches till log replay finishes

2019-03-07 Thread Rahul Goswami
results, which in turn has a cascading effect on other parts of the application. Is there a setting in Solr which would prevent Solr from serving search requests before log replay has finished? Thanks, Rahul

Re: Full index replication upon service restart

2019-02-21 Thread Rahul Goswami
in Solr to know whether a replica is falling behind from the leader ? Thanks, Rahul On Mon, Feb 11, 2019 at 10:28 PM Erick Erickson wrote: > bq. To answer your question about index size on > disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I > allocated 24G

Re: Full index replication upon service restart

2019-02-11 Thread Rahul Goswami
our currentUpdates Regards, Rahul On Thu, Feb 7, 2019 at 12:59 PM Erick Erickson wrote: > bq. We have a heavy indexing load of about 10,000 documents every 150 > seconds. > Not so heavy query load. > > It's unlikely that changing numRecordsToKeep will help all that much if > y

Full index replication upon service restart

2019-02-05 Thread Rahul Goswami
47C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46] org.apache.solr.update.PeerSync PeerSync: core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46 url= http://indexnode1:2/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates Thanks, Rahul

Re: SPLITSHARD not working as expected

2019-01-30 Thread Rahul Goswami
created post split? Regards, Rahul On Wed, Jan 30, 2019 at 1:18 AM Rahul Goswami wrote: > Thanks for the reply Jan. I have been referring to documentation for > SPLISHARD on 7.2.1 > <https://lucene.apache.org/solr/guide/7_2/collections-api.html#splitshard> > which > see

Re: Error using collapse parser with /export

2019-01-29 Thread Rahul Goswami
sc",fl="fileld1,field2,field3",qt="/export",q="*:*",fq="((field4:1) OR (field4:2))",fq="{!collapse field=id_field sort='field3 desc'}") The same query with "select" handler does return the collapse result fine. Looks like this m

Re: SPLITSHARD not working as expected

2019-01-29 Thread Rahul Goswami
ink you need a > screenshot here, what you describe is the default behaviour. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 28. jan. 2019 kl. 09:05 skrev Rahul Goswami : > > > > Hello, > > I am using Solr 7.2.1. I c

SPLITSHARD not working as expected

2019-01-28 Thread Rahul Goswami
mage.png] Thanks, Rahul

Re: Error using collapse parser with /export

2019-01-27 Thread Rahul Goswami
ve is coming from documents not present in the same shard. I'll verify this tomorrow and update the thread. Thanks, Rahul On Mon, Jan 21, 2019 at 2:26 PM Joel Bernstein wrote: > I haven't had time to look into the details of this issue but it's not > clear that these two fea

Re: Error using collapse parser with /export

2019-01-20 Thread Rahul Goswami
Hello, Following up on my query. I know this might be too specific an issue. But I just want to know that it's a legitimate bug and the supported operation is allowed with the /export handler. If someone has an idea about this and could confirm, that would be great. Thanks, Rahul On Thu, J

Error using collapse parser with /export

2019-01-17 Thread Rahul Goswami
Hello, I am using SolrCloud on Solr 7.2.1. I get the NullPointerException in the Solr logs (in ExportWriter.java) when the /stream handler is invoked with a search() streaming expression with qt="/export" containing fq="{!collapse field=id_field sort="time desc"} (among other fq's. I tried elimina

Re: Able to search with indexed=false and docvalues=true

2018-11-20 Thread Rahul Goswami
particularly functional for any industry size load anyway. Thanks, Rahul On Tue, Nov 20, 2018 at 3:37 AM Toke Eskildsen wrote: > On Mon, 2018-11-19 at 22:19 -0500, Rahul Goswami wrote: > > I am using SolrCloud 7.2.1. My understanding is that setting > > docvalues=true would optimize fac

Re: Error:Missing Required Fields for Atomic Updates

2018-11-19 Thread Rahul Goswami
What is the Router name for your collection? Is it "implicit" (You can know this from the "Overview" of you collection in the admin UI) ? If yes, what is the router.field parameter the collection was created with? Rahul On Mon, Nov 19, 2018 at 11:19 PM Rajeswari Koll

Re: Error:Missing Required Fields for Atomic Updates

2018-11-19 Thread Rahul Goswami
What’s your update query? You need to provide the unique id field of the document you are updating. Rahul On Mon, Nov 19, 2018 at 10:58 PM Rajeswari Kolluri < rajeswari.koll...@oracle.com> wrote: > Hi, > > > > > > Using Solr 7.5.0. While performing atomic upd

Able to search with indexed=false and docvalues=true

2018-11-19 Thread Rahul Goswami
I am using SolrCloud 7.2.1. My understanding is that setting docvalues=true would optimize faceting, grouping and sorting; but for a field to be searchable it needs to be indexed=true. However I was dumbfounded today when I executed a successful search on a field with below configuration: However

Re: Explode kind of function in Solr

2018-09-14 Thread Rahul Singh
https://github.com/bazaarvoice/jolt On Thu, Sep 13, 2018 at 9:18 AM Joel Bernstein wrote: > Solr Streaming Expressions allow you to do this with the cartesianProduct > function: > > > http://lucene.apache.org/solr/guide/7_4/stream-decorator-reference.html#cartesianproduct > > The structure of th

Re: 20180913 - Clarification about Limitation

2018-09-13 Thread Rahul Singh
Depends on whether you are using Solr or solrcloud. Solrcloud distributes data into shards so it increases overall capacity. Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business

Re: parent/child rows in solr

2018-09-13 Thread Rahul Singh
waste of space. Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business technology platforms. On Sep 11, 2018, 11:23 PM -0400, John Smith , wrote: > On Tue, Sep 11, 2018 at 11:05 PM Wal

Re: Boost only first 10 records

2018-09-03 Thread Rahul Singh
” query. Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business technology platforms. On Sep 3, 2018, 6:29 AM -0400, Emir Arnautović , wrote: > Hi, > The requirement is not 100% cl

Re: Metrics for a healthy Solr cluster

2018-08-17 Thread Rahul Singh
I wrote something related to this topic a while ago. https://www.google.com/amp/s/blog.anant.us/resources-for-monitoring-datastax-cassandra-spark-solr-performance/amp/ Rahul On Aug 16, 2018, 3:35 PM -0700, Jan Høydahl , wrote: > Check out the Reference Guide chapter on monitoring with o

Re: Recipe for moving to solr cloud without reindexing

2018-08-07 Thread Rahul Singh
with leader and replicas being spread around the cluster. You would be bypassing general High availability / distributed computing processes by trying to not reindex. Rahul On Aug 7, 2018, 7:06 AM -0400, Bjarke Buur Mortensen , wrote: > Hi List, > > is there a cookbook recipe for

RE: create collection from existing managed-schema

2018-07-26 Thread Rahul Chhiber
the _default configset for any collections created without explicit configset. Regards, Rahul Chhiber -Original Message- From: Chuming Chen [mailto:chumingc...@gmail.com] Sent: Thursday, July 26, 2018 11:35 PM To: solr-user@lucene.apache.org Subject: create collection from existing

Re: Silk from LucidWorks

2018-07-15 Thread Rahul Singh
Their commercial offering still has something like it. You can always try Grafana Rahul On Jul 13, 2018, 9:59 AM -0400, rgummadi , wrote: > Is SiLK from LucidWorks still an acitve project. I looked at their github and > it does not seem to be active. If so are there any alternative sol

Re: Text Similarity

2018-07-15 Thread Rahul Singh
deduplication — the join I’m pretty sure works on exact matches. Consider creating a “identity” collection where you map the different names to a unique identity key. This could then be technically be joined on two datasets and then those could be joined again. Rahul On Jul 11, 2018, 4:42 PM -0400, Aroop

Regarding pdf indexing issue

2018-07-11 Thread Rahul Prasad Dwivedi
/solr/gettingstarted/select?q='* <http://localhost:8983/solr/gettingstarted/select?q='*>'* Please suggest me anything and let me know if I am missing anything Thanks, Rahul

Re: Delta import not working with Oracle in Solr

2018-07-10 Thread Rahul Singh
Agreed. DIH is not an industrial grade ETL tool.. may want to consider other options. May want to look into Kafka Connect as an alternative. It has connectors for JDBC into Kafka, and from Kafka into Solr. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Jul 9, 2018, 6:14 AM -0500

RE: cmd to enable debug logs

2018-07-09 Thread Rahul Chhiber
Use -v option in the bin/solr start command. Regards, Rahul Chhiber -Original Message- From: Prateek Jain J [mailto:prateek.j.j...@ericsson.com] Sent: Monday, July 09, 2018 4:26 PM To: solr-user@lucene.apache.org Subject: cmd to enable debug logs Hi All, What's the command (fro

Re: How to know the name(url) of documents that data import handler skipped

2018-07-08 Thread Rahul Singh
Have you tried changing the log level https://lucene.apache.org/solr/guide/7_2/configuring-logging.html -- Rahul Singh rahul.si...@anant.us Anant Corporation On Jul 8, 2018, 8:54 PM -0500, Yasufumi Mizoguchi , wrote: > Hi, > > I am trying to indexing files into Solr 7.2 using da

Resources for Monitoring Cassandra, Spark, Solr

2018-07-02 Thread Rahul Singh
is a work in progress and I'll update this with screenshots as well as with links from other contributors. -- Rahul Singh rahul.si...@anant.us Anant Corporation

Re: Drive Change for Solr Setup

2018-06-21 Thread Rahul Singh
If it’s windows it may be using a tool called NSSM to manage the solr service. Look at windows services and task scheduler and understand if solr services are being managed by windows via services or the task scheduler — or just .batch files. Rahul On Jun 20, 2018, 11:34 AM -0400, Shawn Heisey

Re: Solr Cloud 7.3.1 backups

2018-05-31 Thread Rahul Singh
are some decent distributed shared file system services that could be leveraged depending on the number of compute nodes. Shared file system is the best way to keep it consistent but it comes with its draw backs. You can always backup locally and asynchronously sync to shared FS too. -- Rahul

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Rahul Singh
Right, That’s why you need a place to persist the task list / graph. If you use a table, you can set “processed” / “unprocessed” value … or a queue, then its delivered only once .. otherwise you have to check indexed date from solr, and waste a solr call. -- Rahul Singh rahul.si...@anant.us

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Rahul Singh
. http://saumitra.me/blog/tweet-search-and-analysis-with-kafka-solr-cassandra/ I dont know where this guys code went.. but the content is there with code samples. -- On May 23, 2018, 8:37 PM -0500, Raymond Xie , wrote: > Thank you Rahul despite that's very high level. > > With

Re: How to do parallel indexing on files (not on HDFS)

2018-05-23 Thread Rahul Singh
Enumerate the file locations (map) , put them in a queue like rabbit or Kafka (Persist the map), have a bunch of threads , workers, containers, whatever pop off the queue , process the item (reduce). -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 20, 2018, 7:24 AM -0400

Re: Multi threading indexing

2018-05-16 Thread Rahul Singh
Can try to leverage Spark to index. Or Kafka Connect with SolR. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 14, 2018, 2:03 AM -0500, Mikhail Khludnev , wrote: > A few years ago I provided server side concurrency "booster" > https://issues.apache.org/jira/browse/

Re: SolrCloud

2018-05-16 Thread Rahul Singh
Having concurrent DIH for example from the same source on different cluster nodes may cause duplicate work. But yes the ZK is what distributes the conf. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 16, 2018, 4:55 AM -0500, Jon Morisi , wrote: > Hi All, > I'm

Re: Apache SOLR Design Query

2018-05-13 Thread Rahul Singh
. 4. Unless you need highlighting, only index the actual contents, and store the rest of the fields. 5. Shared File storage is probably ok, but you may want to do with a caching later via Nginx and serve files through it. That way you don’t hit the disk every time. -- Rahul Singh rahul.si

Re: Team please help

2018-04-29 Thread Rahul Singh
pipeline. Best, -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 29, 2018, 6:27 AM -0700, Doug Turnbull , wrote: > Morphlines is a cloudera specific tool. I suspect moving Solr platforms > will require you to rework your indexing somewhat. You may need to step > back and think

Re: solr cell: write entire file content binary to index along with metadata

2018-04-25 Thread Rahul Singh
process can improve the overall stability of the SolR service. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 25, 2018, 12:49 PM -0400, Shawn Heisey , wrote: > On 4/25/2018 4:02 AM, Lee Carroll wrote: > > *We don't recommend using solr-cell for production indexing.* >

Re: DIH with huge data

2018-04-12 Thread Rahul Singh
CSV -> Spark -> SolR https://github.com/lucidworks/spark-solr/blob/master/docs/examples/csv.adoc If speed is not an issue there are other methods. Spring Batch / Spring Data might have all the tools you need to get speed without Spark. -- Rahul Singh rahul.si...@anant.us Anant Corpo

Re: DIH with huge data

2018-04-12 Thread Rahul Singh
If you want speed, Spark is the fastest easiest way. You can connect to relational tables directly and import or export to CSV / JSON and import from a distributed filesystem like S3 or HDFS. Combining a dfs with spark and a highly available SolR - you are maximizing all threads. -- Rahul

Re: DIH with huge data

2018-04-12 Thread Rahul Singh
How much data and what is the database source? Spark is probably the fastest way. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar , wrote: > Hi, > > We are using DIH with SortedMapBackedCache but as data size increases we > nee

Re: Text in images are not extracted and indexed to content

2018-04-10 Thread Rahul Singh
May need to extract outside SolR and index pure text with an external ingestion process. You have much more control over the Tika attributes and behaviors. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 9, 2018, 10:23 PM -0400, Zheng Lin Edwin Yeo , wrote: > Hi, > > Cu

Re: Using Solr to build a product matcher, with learning to rank

2018-03-29 Thread Rahul Singh
Maybe overthinking this. There is a “more like this” feature at basically does this. Give that a try before digging deeper into the LTR methods. It may be good enough for rock and roll. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 28, 2018, 12:25 PM -0400, Xavier Schepler

RE: Solr or Elasticsearch

2018-03-22 Thread Rahul Singh
because the updates / selects are fast. Ultimately I think SolR is like a 18 wheel tractor trailer and Elastic is like a uhaul trucks and you can chain a bunch of them up to do what SolR does. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 22, 2018, 9:04 AM -0500, Liu, Daphne

RE: Question liste solr

2018-03-20 Thread Rahul Singh
Parallel processing in any way will help, including Spark w/ a DFS like S3 or HDFS. Your three machines could end up being a bottleneck and you may need more nodes. On Mar 20, 2018, 2:36 AM -0500, LOPEZ-CORTES Mariano-ext , wrote: > CSV file is 5GB aprox. for 29 millions. > > As you say Christo

Re: Securying ONLY the web interface console

2018-03-19 Thread Rahul Singh
Use a proxy server that only gives access to the update / select handlers (URLs). Can do it with a numerous programming languages or with a simple proxy in nginx. The whole web server running SolR is not supposed to be out in the open. You are opening yourself up to too many issues. -- Rahul

  1   2   3   >