Re: Apache SOLR Design Query

2018-05-13 Thread Rahul Singh
. 4. Unless you need highlighting, only index the actual contents, and store the rest of the fields. 5. Shared File storage is probably ok, but you may want to do with a caching later via Nginx and serve files through it. That way you don’t hit the disk every time. -- Rahul Singh rahul.si

Re: SolrCloud

2018-05-16 Thread Rahul Singh
Having concurrent DIH for example from the same source on different cluster nodes may cause duplicate work. But yes the ZK is what distributes the conf. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 16, 2018, 4:55 AM -0500, Jon Morisi , wrote: > Hi All, > I'm

Re: Multi threading indexing

2018-05-16 Thread Rahul Singh
Can try to leverage Spark to index. Or Kafka Connect with SolR. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 14, 2018, 2:03 AM -0500, Mikhail Khludnev , wrote: > A few years ago I provided server side concurrency "booster" > https://issues.apache.org/jira/browse/

Re: How to do parallel indexing on files (not on HDFS)

2018-05-23 Thread Rahul Singh
Enumerate the file locations (map) , put them in a queue like rabbit or Kafka (Persist the map), have a bunch of threads , workers, containers, whatever pop off the queue , process the item (reduce). -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 20, 2018, 7:24 AM -0400

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Rahul Singh
. http://saumitra.me/blog/tweet-search-and-analysis-with-kafka-solr-cassandra/ I dont know where this guys code went.. but the content is there with code samples. -- On May 23, 2018, 8:37 PM -0500, Raymond Xie , wrote: > Thank you Rahul despite that's very high level. > > With

Re: How to do parallel indexing on files (not on HDFS)

2018-05-24 Thread Rahul Singh
Right, That’s why you need a place to persist the task list / graph. If you use a table, you can set “processed” / “unprocessed” value … or a queue, then its delivered only once .. otherwise you have to check indexed date from solr, and waste a solr call. -- Rahul Singh rahul.si...@anant.us

Re: Solr Cloud 7.3.1 backups

2018-05-31 Thread Rahul Singh
are some decent distributed shared file system services that could be leveraged depending on the number of compute nodes. Shared file system is the best way to keep it consistent but it comes with its draw backs. You can always backup locally and asynchronously sync to shared FS too. -- Rahul

Re: Drive Change for Solr Setup

2018-06-21 Thread Rahul Singh
If it’s windows it may be using a tool called NSSM to manage the solr service. Look at windows services and task scheduler and understand if solr services are being managed by windows via services or the task scheduler — or just .batch files. Rahul On Jun 20, 2018, 11:34 AM -0400, Shawn Heisey

Resources for Monitoring Cassandra, Spark, Solr

2018-07-02 Thread Rahul Singh
is a work in progress and I'll update this with screenshots as well as with links from other contributors. -- Rahul Singh rahul.si...@anant.us Anant Corporation

Re: How to know the name(url) of documents that data import handler skipped

2018-07-08 Thread Rahul Singh
Have you tried changing the log level https://lucene.apache.org/solr/guide/7_2/configuring-logging.html -- Rahul Singh rahul.si...@anant.us Anant Corporation On Jul 8, 2018, 8:54 PM -0500, Yasufumi Mizoguchi , wrote: > Hi, > > I am trying to indexing files into Solr 7.2 using da

RE: cmd to enable debug logs

2018-07-09 Thread Rahul Chhiber
Use -v option in the bin/solr start command. Regards, Rahul Chhiber -Original Message- From: Prateek Jain J [mailto:prateek.j.j...@ericsson.com] Sent: Monday, July 09, 2018 4:26 PM To: solr-user@lucene.apache.org Subject: cmd to enable debug logs Hi All, What's the command (fro

Re: Delta import not working with Oracle in Solr

2018-07-10 Thread Rahul Singh
Agreed. DIH is not an industrial grade ETL tool.. may want to consider other options. May want to look into Kafka Connect as an alternative. It has connectors for JDBC into Kafka, and from Kafka into Solr. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Jul 9, 2018, 6:14 AM -0500

Re: Text Similarity

2018-07-15 Thread Rahul Singh
deduplication — the join I’m pretty sure works on exact matches. Consider creating a “identity” collection where you map the different names to a unique identity key. This could then be technically be joined on two datasets and then those could be joined again. Rahul On Jul 11, 2018, 4:42 PM -0400, Aroop

Re: Silk from LucidWorks

2018-07-15 Thread Rahul Singh
Their commercial offering still has something like it. You can always try Grafana Rahul On Jul 13, 2018, 9:59 AM -0400, rgummadi , wrote: > Is SiLK from LucidWorks still an acitve project. I looked at their github and > it does not seem to be active. If so are there any alternative sol

RE: create collection from existing managed-schema

2018-07-26 Thread Rahul Chhiber
the _default configset for any collections created without explicit configset. Regards, Rahul Chhiber -Original Message- From: Chuming Chen [mailto:chumingc...@gmail.com] Sent: Thursday, July 26, 2018 11:35 PM To: solr-user@lucene.apache.org Subject: create collection from existing

Re: Recipe for moving to solr cloud without reindexing

2018-08-07 Thread Rahul Singh
with leader and replicas being spread around the cluster. You would be bypassing general High availability / distributed computing processes by trying to not reindex. Rahul On Aug 7, 2018, 7:06 AM -0400, Bjarke Buur Mortensen , wrote: > Hi List, > > is there a cookbook recipe for

How expensive is core loading?

2020-01-29 Thread Rahul Goswami
production setup with above configuration? Thanks, Rahul

Re: How expensive is core loading?

2020-01-29 Thread Rahul Goswami
Thanks for your response Walter. But I could not find a Java api for Luke for writing my tool. Is there one? I also tried using the LukeRequestHandler that comes with Solr, but invoking it causes the Solr core to be loaded. Rahul On Wed, Jan 29, 2020 at 5:20 PM Walter Underwood wrote: >

Re: How expensive is core loading?

2020-01-29 Thread Rahul Goswami
l documents and the index size (to gather stats about the Solr server), is the amount of memory consumed proportional to the index size in some way? Thanks, Rahul On Wed, Jan 29, 2020 at 6:43 PM Shawn Heisey wrote: > On 1/29/2020 3:01 PM, Rahul Goswami wrote: > > 1) How expensive is c

Performance comparison for wildcard searches

2020-02-03 Thread Rahul Goswami
Hello, I am working with Solr 7.2.1 and had a question regarding the performance of wildcard searches. q=*:* vs q=id:* vs q=id:[* TO *] Can someone please rank them in the order of performance with the underlying reason? Thanks, Rahul

Zookeeper upgrade required with Solr upgrade?

2020-02-12 Thread Rahul Goswami
updates requests for a 2 node SolrCloud cluster with the older (3.4.10) zookeeper and it seemed to work fine. But just want to know if there are any caveats I should be aware of. Thanks, Rahul

Re: Zookeeper upgrade required with Solr upgrade?

2020-02-13 Thread Rahul Goswami
eb 13, 2020 at 9:26 AM Erick Erickson wrote: > That should be OK. There were no code changes necessary for that upgrade. > see SOLR-13363 > > > On Feb 12, 2020, at 5:34 PM, Rahul Goswami > wrote: > > > > Hello, > > We are running a SolrCloud (7.2.1) cluster an

Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
ted. However, if I search with the same fq again, I expect the lookup and hits count to increase, but it doesn't. This ultimately results in an incorrect hitratio. I tried this scenario on Solr 7.2.1, 7.7.2 and 8.5 and observe the same behavior on all three versions. Is this a bug or am I missing something here? Thanks, Rahul

Re: Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
quot;item_manu:samsung manu:apple":"SortedIntDocSet{size=2,ramUsed=40 bytes}", "warmupTime":0, "maxRamMB":-1, 5) A query with the same fq again (fq=manu:samsung OR manu:apple)the numbers don't get update for this fq hereafter for subseque

Re: Solr filter cache hits not reflecting

2020-04-20 Thread Rahul Goswami
Hoss, Thank you for such a succinct explanation! I was not aware of the order of lookups (queryResultCache followed by filterCache). Makes sense now. Sorry for the false alarm! Rahul On Mon, Apr 20, 2020 at 4:04 PM Chris Hostetter wrote: > : 4) A query with different fq. > :

Re: when to use docvalue

2020-05-20 Thread Rahul Goswami
) stored=false and docValues=true 3) stored=true and docValues=true Thanks, Rahul On Tue, May 19, 2020 at 5:55 PM Erick Erickson wrote: > They are _absolutely_ able to be used together. Background: > > “In the bad old days”, there was no docValues. So whenever you needed > to facet/so

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Rahul Goswami
+1 on avoiding SolrCloud terminology. In the interest of keeping it obvious and simple, may I I please suggest primary/secondary? On Wed, Jun 17, 2020 at 5:14 PM Atita Arora wrote: > I agree avoiding using of solr cloud terminology too. > > I may suggest going for "prime" and "clone" > (Short an

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Rahul Goswami
I agree with Phill, Noble and Ilan above. The problematic term is "slave" (not master) which I am all for changing if it causes less regression than removing BOTH master and slave. Since some people have pointed out Github changing the "master" terminology, in my personal opinion, it was not a meas

Re: How to remove duplicate tokens from solr

2020-09-17 Thread Rahul Goswami
rect me if I am wrong!) -Rahul On Thu, Sep 17, 2020 at 2:56 PM Rajdeep Sahoo wrote: > If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt" > we need to remove the duplicates and search with tshirt. > > > On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalo

Re: Delete from Solr console fails

2020-09-24 Thread Rahul Goswami
Goutham, Is the field you are trying to delete by indexed=true in the schema ? If the uniqueKey is indexed=true, does delete by id work for you? ( uniqueKey:value) Also, instead of "Solr Command" if you choose the Document type as "XML" does it make any difference? Rahul On

Re: Delete from Solr console fails

2020-09-26 Thread Rahul Goswami
that I would still expect delete by id to execute in reasonable time, so I would start by looking at what is s eating up the CPU in your request. -Rahul On Sat, Sep 26, 2020 at 4:50 AM Goutham Tholpadi wrote: > Thanks Dominique! I just tried deleting a single document using its id. I > &

Re: ApacheCon at Home 2020 starts tomorrow!

2020-09-29 Thread Rahul Goswami
Thanks for sharing this Anshum. Day 1 had some really interesting sessions. Missed out on a couple that I would have liked to listen to. Are the recordings of these sessions available anywhere? -Rahul On Mon, Sep 28, 2020 at 7:08 PM Anshum Gupta wrote: > Hey everyone! > > ApacheCo

Re: Solr 7.7 - Few Questions

2020-10-01 Thread Rahul Goswami
count-filter You'll need to configure it in the schema for the "index" analyzer for the data type of the field with large text. Indexing documents of the order of half a GB will definitely come to hurt your operations, if not now, later (think OOM, extremely slow atomic updates, long r

Re: Solr 7.7 - Few Questions

2020-10-04 Thread Rahul Goswami
Charlie, Thanks for providing an alternate approach to doing this. It would be interesting to know how one could go about organizing the docs in this case? (Nested documents?) How would join queries perform on a large index(200 million+ docs)? Thanks, Rahul On Fri, Oct 2, 2020 at 5:55 AM

Re: Solr 7.7 - Few Questions

2020-10-06 Thread Rahul Goswami
l 3. How to scale up the servers for the better performance? >> This is too open ended a question and depends on a lot of factors specific to your environment and use-case :) - Rahul On Tue, Oct 6, 2020 at 4:26 PM Manisha Rahatadkar < manisha.rahatad...@anjusoftware.com> wrote: > Hi

Re: Question about solr commits

2020-10-08 Thread Rahul Goswami
updates. Is this understanding correct ? Thanks, Rahul On Wed, Oct 7, 2020 at 11:39 PM yaswanth kumar wrote: > Thank you very much both Eric and Shawn > > Sent from my iPhone > > > On Oct 7, 2020, at 10:41 PM, Shawn Heisey wrote: > > > > On 10/7/2020 4:40 PM, yaswant

Re: Need urgent help -- High cpu on solr

2020-10-16 Thread Rahul Goswami
ation nevertheless. https://backstage.forgerock.com/knowledge/kb/article/a39551500 The hex number the author talks about in the link above is the native thread id. Best, Rahul On Wed, Oct 14, 2020 at 8:00 AM Erick Erickson wrote: > Zisis makes good points. One other thing is I’d look to >

Graph query extremely slow

2019-05-15 Thread Rahul Goswami
optimizations that I could try? Thanks, Rahul

Re: Graph query extremely slow

2019-05-19 Thread Rahul Goswami
Hello experts, Just following up in case my previous email got lost in the big stack of queries. Would appreciate any help on optimizing a graph query. Or any pointers on the direction to investigate. Thanks, Rahul On Wed, May 15, 2019 at 9:37 PM Rahul Goswami wrote: > Hello, >

Solr exception while retrieving documents

2019-05-31 Thread Mandava, Rahul
on in Solr log files. I am thinking that seeing error in log files doesn't hurt as long as the updates and get's work fine, but still would like to know how to eradicate these errors from happening. Thanks Rahul Mandava

Re: Graph query extremely slow

2019-06-01 Thread Rahul Goswami
, since the parameters of this fq don't change shouldn't I expect to gain any advantage out of using the filterCache? Thanks, Rahul On Wed, May 22, 2019 at 7:40 AM Toke Eskildsen wrote: > On Wed, 2019-05-15 at 21:37 -0400, Rahul Goswami wrote: > > fq={!graph from=from_field to=

SolrCloud indexing triggers merges and timeouts

2019-06-05 Thread Rahul Goswami
that this is the cause, and the timeouts and recoveries are the symptoms. Is my understanding correct? If yes, what steps could I take to help the situation. I do see that the difference between "Num Docs" and "Max Docs" is about 20%. Would appreciate your help. Thanks, Rahul

Re: SolrCloud indexing triggers merges and timeouts

2019-06-06 Thread Rahul Goswami
ndex.ConcurrentMergeScheduler", "maxMergeCount":2, "maxThreadCount":2}, Thanks, Rahul On Wed, Jun 5, 2019 at 4:24 PM Shawn Heisey wrote: > On 6/5/2019 9:39 AM, Rahul Goswami wrote: > > I have a solrcloud setup on Windows server with below config: > >

Re: SolrCloud indexing triggers merges and timeouts

2019-06-12 Thread Rahul Goswami
/measures. Thanks, Rahul On Thu, Jun 6, 2019 at 11:00 AM Rahul Goswami wrote: > Thank you for your responses. Please find additional details about the > setup below: > > We are using Solr 7.2.1 > > > I have a solrcloud setup on Windows server with below config: > >

SolrCloud: Configured socket timeouts not reflecting

2019-06-12 Thread Rahul Goswami
, is there a JIRA for it ? Thanks, Rahul

Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-18 Thread Rahul Goswami
teShardHandlerConfig().getDistributedSocketTimeout(); } I found this open JIRA on this issue: https://issues.apache.org/jira/browse/SOLR-12550?jql=text%20~%20%22distribUpdateSoTimeout%22 Should I update the JIRA with this ? Thanks, Rahul On Thu, Jun 13, 2019 at 12:00 AM Rahul Goswami wrote: > Hello, >

Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-20 Thread Rahul Goswami
binary to try the patch nevertheless, but it didn't help as I anticipated. I'll update the JIRA and submit a patch. Thank you, Rahul On Thu, Jun 20, 2019 at 11:35 AM Gus Heck wrote: > Hi Rahul, > > Did you try the patch int that issue? Also food for thought: > https://is

Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-24 Thread Rahul Goswami
r this part is different on the master. Regards, Rahul On Thu, Jun 20, 2019 at 8:22 PM Rahul Goswami wrote: > Hi Gus, > Thanks for the response and referencing the umbrella JIRA for these kind > of issues. I see that it won't solve the problem since the builder object > wh

Configuration recommendation for SolrCloud

2019-06-25 Thread Rahul Goswami
efficient for our use case considering moderate-heavy indexing and search load? Would also like to know the tradeoffs involved if any. Thanks in advance! Regards, Rahul

Re: Configuration recommendation for SolrCloud

2019-07-01 Thread Rahul Goswami
beefy physical servers at disposal for this deployment. If we go with 4 SolrClouds then we would have 4x8=32 nodes (Solr instances) running across these 4 physical servers. Any issues that you might see with this configuration or additional considerations that I might be missing? Thanks, Rahul

Re: SolrCloud indexing triggers merges and timeouts

2019-07-02 Thread Rahul Goswami
iculty wrapping my head around this, and would appreciate if you could help clear it for me. Thanks, Rahul On Thu, Jun 13, 2019 at 7:33 AM Shawn Heisey wrote: > On 6/6/2019 9:00 AM, Rahul Goswami wrote: > > *OP Reply* : Total 48 GB per node... I couldn't see another software > us

Re: SolrCloud indexing triggers merges and timeouts

2019-07-04 Thread Rahul Goswami
Shawn,Erick, Thank you for the explanation. The merge scheduler params make sense now. Thanks, Rahul On Wed, Jul 3, 2019 at 11:30 AM Erick Erickson wrote: > Two more tidbits to add to Shawn’s explanation: > > There are heuristics built in to ConcurrentMergeScheduler. > From

Re: SolrCloud indexing triggers merges and timeouts

2019-07-12 Thread Rahul Goswami
y one huge document ? 2) If yes, does this flush create a segment with just one document ? 3) Heap dump analysis shows large (>350 MB) instances of DocumentWritersPerThread. Does one instance of this class correspond to one document? Help is much appreciated. Thanks, Rahul On Fri, Jul 5, 20

java.lang.OutOfMemoryError: Java heap space

2019-07-24 Thread Mandava, Rahul
I am using SOLR version 6.6.0 and the heap size is set to 512 MB, I believe which is default. We do have almost 10 million documents in the index, we do perform frequent updates (we are doing soft commit on every update: heap issue was seen with and without soft commit) to the index and obviousl

Custom update processor not kicking in

2019-09-18 Thread Rahul Goswami
don’t see any log lines from the processAdd() method. Any inputs on why the processor is getting skipped if placed after distributed processor? Thanks, Rahul

Re: Custom update processor not kicking in

2019-09-18 Thread Rahul Goswami
the processAdd() of the processor. Is this an expected behavior? Regards, Rahul On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson wrote: > It Depends (tm). This is a little confused. Why do you have > distributed processor in stand-alone Solr? Stand-alone doesn't, well, > distrib

Re: Custom update processor not kicking in

2019-09-19 Thread Rahul Goswami
any further custom processors other than the run update processor in standalone mode? Alternatively, is there a way I can get a handle on a complete document once it’s reconstructed from an atomic update? Thanks, Rahul On Thu, Sep 19, 2019 at 7:06 AM Erick Erickson wrote: > _Why_ is reindex

Upgrade solr from 7.2.1 to 8.2

2019-11-15 Thread Rahul Goswami
n that case? Thanks in advance! Regards, Rahul

Re: Upgrade solr from 7.2.1 to 8.2

2019-11-19 Thread Rahul Goswami
Hello, Just wanted to follow up in case my question fell through the cracks :) Would appreciate help on this. Thanks, Rahul On Fri, Nov 15, 2019 at 5:32 PM Rahul Goswami wrote: > Hello, > > We are planning to upgrade our SolrCloud cluster from 7.2.1 (hosted on > Windows server)

Re: Solr 8.2 indexing issues

2019-11-21 Thread Rahul Goswami
Hi Sujatha, How did you upgrade your cluster ? Did you restart each node in the cluster one by one after upgrade (while other nodes were running on 6.6.2) or did you bring down the entire cluster and bring up one upgraded node at a time? Thanks, Rahul On Thu, Nov 14, 2019 at 7:03 AM Paras

Re: [ANNOUNCE] Apache Solr 8.3.1 released

2019-12-04 Thread Rahul Goswami
s. Is it linked appropriately? Or is it some access rights issue for non-PMC members like me ? Thanks, Rahul On Wed, Dec 4, 2019 at 7:12 AM Noble Paul wrote: > Thanks ishan > > On Wed, Dec 4, 2019, 3:32 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> > wrote: > &g

Solr indexing performance

2019-12-05 Thread Rahul Goswami
for better application design considerations. Thanks, Rahul

StandardTokenizerFactory doesn't split on underscore

2021-01-07 Thread Rahul Goswami
this behavior is included in the documentation since it is similar to the behavior with periods. https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer "Periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names. " Thanks, Rahul

Re: StandardTokenizerFactory doesn't split on underscore

2021-01-09 Thread Rahul Goswami
Nope. The underscore is preserved right after tokenization even before it reaches any filters. You can choose the type "text_general" and try an index time analysis through the "Analysis" page on Solr Admin UI. Thanks, Rahul On Sat, Jan 9, 2021 at 8:22 AM xiefengchan

Re: StandardTokenizerFactory doesn't split on underscore

2021-01-09 Thread Rahul Goswami
t on underscores if that is your use case. > > On Sat, Jan 9, 2021 at 2:58 PM Rahul Goswami > wrote: > > > Nope. The underscore is preserved right after tokenization even before it > > reaches any filters. You can choose the type "text_general" and try an &

Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-11 Thread Rahul Goswami
iii) Wait for 5-10 seconds between each subsequent node start Hope this helps. Best, Rahul On Thu, Feb 11, 2021 at 12:03 PM mmb1234 wrote: > Hello, > > On reboot of one of the solr nodes in the cluster, we often see a > collection's shards with > 1. LEADER replica in DO

Regarding pdf indexing issue

2018-07-11 Thread Rahul Prasad Dwivedi
/solr/gettingstarted/select?q='* <http://localhost:8983/solr/gettingstarted/select?q='*>'* Please suggest me anything and let me know if I am missing anything Thanks, Rahul

<    1   2   3