.
4. Unless you need highlighting, only index the actual contents, and store the
rest of the fields.
5. Shared File storage is probably ok, but you may want to do with a caching
later via Nginx and serve files through it. That way you don’t hit the disk
every time.
--
Rahul Singh
rahul.si
Having concurrent DIH for example from the same source on different cluster
nodes may cause duplicate work. But yes the ZK is what distributes the conf.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On May 16, 2018, 4:55 AM -0500, Jon Morisi , wrote:
> Hi All,
> I'm
Can try to leverage Spark to index. Or Kafka Connect with SolR.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On May 14, 2018, 2:03 AM -0500, Mikhail Khludnev , wrote:
> A few years ago I provided server side concurrency "booster"
> https://issues.apache.org/jira/browse/
Enumerate the file locations (map) , put them in a queue like rabbit or Kafka
(Persist the map), have a bunch of threads , workers, containers, whatever pop
off the queue , process the item (reduce).
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On May 20, 2018, 7:24 AM -0400
.
http://saumitra.me/blog/tweet-search-and-analysis-with-kafka-solr-cassandra/
I dont know where this guys code went.. but the content is there with code
samples.
--
On May 23, 2018, 8:37 PM -0500, Raymond Xie , wrote:
> Thank you Rahul despite that's very high level.
>
> With
Right,
That’s why you need a place to persist the task list / graph. If you use a
table, you can set “processed” / “unprocessed” value … or a queue, then its
delivered only once .. otherwise you have to check indexed date from solr, and
waste a solr call.
--
Rahul Singh
rahul.si...@anant.us
are some decent distributed shared file system services that could be
leveraged depending on the number of compute nodes.
Shared file system is the best way to keep it consistent but it comes with its
draw backs. You can always backup locally and asynchronously sync to shared FS
too.
--
Rahul
If it’s windows it may be using a tool called NSSM to manage the solr service.
Look at windows services and task scheduler and understand if solr services are
being managed by windows via services or the task scheduler — or just .batch
files.
Rahul
On Jun 20, 2018, 11:34 AM -0400, Shawn Heisey
is a work in progress and I'll update this with screenshots as well as
with links from other contributors.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
Have you tried changing the log level
https://lucene.apache.org/solr/guide/7_2/configuring-logging.html
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On Jul 8, 2018, 8:54 PM -0500, Yasufumi Mizoguchi ,
wrote:
> Hi,
>
> I am trying to indexing files into Solr 7.2 using da
Use -v option in the bin/solr start command.
Regards,
Rahul Chhiber
-Original Message-
From: Prateek Jain J [mailto:prateek.j.j...@ericsson.com]
Sent: Monday, July 09, 2018 4:26 PM
To: solr-user@lucene.apache.org
Subject: cmd to enable debug logs
Hi All,
What's the command (fro
Agreed. DIH is not an industrial grade ETL tool.. may want to consider other
options. May want to look into Kafka Connect as an alternative. It has
connectors for JDBC into Kafka, and from Kafka into Solr.
--
Rahul Singh
rahul.si...@anant.us
Anant Corporation
On Jul 9, 2018, 6:14 AM -0500
deduplication — the join I’m pretty sure works
on exact matches.
Consider creating a “identity” collection where you map the different names to
a unique identity key. This could then be technically be joined on two datasets
and then those could be joined again.
Rahul
On Jul 11, 2018, 4:42 PM -0400, Aroop
Their commercial offering still has something like it. You can always try
Grafana
Rahul
On Jul 13, 2018, 9:59 AM -0400, rgummadi , wrote:
> Is SiLK from LucidWorks still an acitve project. I looked at their github and
> it does not seem to be active. If so are there any alternative sol
the _default configset for any collections created without
explicit configset.
Regards,
Rahul Chhiber
-Original Message-
From: Chuming Chen [mailto:chumingc...@gmail.com]
Sent: Thursday, July 26, 2018 11:35 PM
To: solr-user@lucene.apache.org
Subject: create collection from existing
with leader and replicas being spread around the cluster.
You would be bypassing general High availability / distributed computing
processes by trying to not reindex.
Rahul
On Aug 7, 2018, 7:06 AM -0400, Bjarke Buur Mortensen ,
wrote:
> Hi List,
>
> is there a cookbook recipe for
production
setup with above configuration?
Thanks,
Rahul
Thanks for your response Walter. But I could not find a Java api for Luke
for writing my tool. Is there one? I also tried using the LukeRequestHandler
that comes with Solr, but invoking it causes the Solr core to be loaded.
Rahul
On Wed, Jan 29, 2020 at 5:20 PM Walter Underwood
wrote:
>
l documents and the
index size (to gather stats about the Solr server), is the amount of memory
consumed proportional to the index size in some way?
Thanks,
Rahul
On Wed, Jan 29, 2020 at 6:43 PM Shawn Heisey wrote:
> On 1/29/2020 3:01 PM, Rahul Goswami wrote:
> > 1) How expensive is c
Hello,
I am working with Solr 7.2.1 and had a question regarding the performance
of wildcard searches.
q=*:*
vs
q=id:*
vs
q=id:[* TO *]
Can someone please rank them in the order of performance with the
underlying reason?
Thanks,
Rahul
updates requests for a 2 node SolrCloud cluster with
the older (3.4.10) zookeeper and it seemed to work fine. But just want to
know if there are any caveats I should be aware of.
Thanks,
Rahul
eb 13, 2020 at 9:26 AM Erick Erickson
wrote:
> That should be OK. There were no code changes necessary for that upgrade.
> see SOLR-13363
>
> > On Feb 12, 2020, at 5:34 PM, Rahul Goswami
> wrote:
> >
> > Hello,
> > We are running a SolrCloud (7.2.1) cluster an
ted.
However, if I search with the same fq again, I expect the lookup and hits
count to increase, but it doesn't. This ultimately results in an incorrect
hitratio.
I tried this scenario on Solr 7.2.1, 7.7.2 and 8.5 and observe the same
behavior on all three versions.
Is this a bug or am I missing something here?
Thanks,
Rahul
quot;item_manu:samsung
manu:apple":"SortedIntDocSet{size=2,ramUsed=40 bytes}",
"warmupTime":0,
"maxRamMB":-1,
5) A query with the same fq again (fq=manu:samsung OR manu:apple)the
numbers don't get update for this fq hereafter for subseque
Hoss,
Thank you for such a succinct explanation! I was not aware of the order of
lookups (queryResultCache followed by filterCache). Makes sense now. Sorry
for the false alarm!
Rahul
On Mon, Apr 20, 2020 at 4:04 PM Chris Hostetter
wrote:
> : 4) A query with different fq.
> :
) stored=false and docValues=true
3) stored=true and docValues=true
Thanks,
Rahul
On Tue, May 19, 2020 at 5:55 PM Erick Erickson
wrote:
> They are _absolutely_ able to be used together. Background:
>
> “In the bad old days”, there was no docValues. So whenever you needed
> to facet/so
+1 on avoiding SolrCloud terminology. In the interest of keeping it obvious
and simple, may I I please suggest primary/secondary?
On Wed, Jun 17, 2020 at 5:14 PM Atita Arora wrote:
> I agree avoiding using of solr cloud terminology too.
>
> I may suggest going for "prime" and "clone"
> (Short an
I agree with Phill, Noble and Ilan above. The problematic term is "slave"
(not master) which I am all for changing if it causes less regression than
removing BOTH master and slave. Since some people have pointed out Github
changing the "master" terminology, in my personal opinion, it was not a
meas
rect me if
I am wrong!)
-Rahul
On Thu, Sep 17, 2020 at 2:56 PM Rajdeep Sahoo
wrote:
> If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt"
> we need to remove the duplicates and search with tshirt.
>
>
> On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalo
Goutham,
Is the field you are trying to delete by indexed=true in the schema ?
If the uniqueKey is indexed=true, does delete by id work for you?
( uniqueKey:value)
Also, instead of "Solr Command" if you choose the Document type as "XML"
does it make any difference?
Rahul
On
that I would still
expect delete by id to execute in reasonable time, so I would start by
looking at what is s eating up the CPU in your request.
-Rahul
On Sat, Sep 26, 2020 at 4:50 AM Goutham Tholpadi
wrote:
> Thanks Dominique! I just tried deleting a single document using its id. I
>
&
Thanks for sharing this Anshum. Day 1 had some really interesting sessions.
Missed out on a couple that I would have liked to listen to. Are the
recordings of these sessions available anywhere?
-Rahul
On Mon, Sep 28, 2020 at 7:08 PM Anshum Gupta wrote:
> Hey everyone!
>
> ApacheCo
count-filter
You'll need to configure it in the schema for the "index" analyzer for the
data type of the field with large text.
Indexing documents of the order of half a GB will definitely come to hurt
your operations, if not now, later (think OOM, extremely slow atomic
updates, long r
Charlie,
Thanks for providing an alternate approach to doing this. It would be
interesting to know how one could go about organizing the docs in this
case? (Nested documents?) How would join queries perform on a large
index(200 million+ docs)?
Thanks,
Rahul
On Fri, Oct 2, 2020 at 5:55 AM
l
3. How to scale up the servers for the better performance?
>> This is too open ended a question and depends on a lot of factors
specific to your environment and use-case :)
- Rahul
On Tue, Oct 6, 2020 at 4:26 PM Manisha Rahatadkar <
manisha.rahatad...@anjusoftware.com> wrote:
> Hi
updates.
Is this understanding correct ?
Thanks,
Rahul
On Wed, Oct 7, 2020 at 11:39 PM yaswanth kumar
wrote:
> Thank you very much both Eric and Shawn
>
> Sent from my iPhone
>
> > On Oct 7, 2020, at 10:41 PM, Shawn Heisey wrote:
> >
> > On 10/7/2020 4:40 PM, yaswant
ation nevertheless.
https://backstage.forgerock.com/knowledge/kb/article/a39551500
The hex number the author talks about in the link above is the native
thread id.
Best,
Rahul
On Wed, Oct 14, 2020 at 8:00 AM Erick Erickson
wrote:
> Zisis makes good points. One other thing is I’d look to
>
optimizations that I
could try?
Thanks,
Rahul
Hello experts,
Just following up in case my previous email got lost in the big stack of
queries. Would appreciate any help on optimizing a graph query. Or any
pointers on the direction to investigate.
Thanks,
Rahul
On Wed, May 15, 2019 at 9:37 PM Rahul Goswami wrote:
> Hello,
>
on in Solr log files. I am thinking that seeing error in log files
doesn't hurt as long as the updates and get's work fine, but still would like
to know how to eradicate these errors from happening.
Thanks
Rahul Mandava
, since the parameters of this fq don't
change shouldn't I expect to gain any advantage out of using the
filterCache?
Thanks,
Rahul
On Wed, May 22, 2019 at 7:40 AM Toke Eskildsen wrote:
> On Wed, 2019-05-15 at 21:37 -0400, Rahul Goswami wrote:
> > fq={!graph from=from_field to=
that
this is the cause, and the timeouts and recoveries are the symptoms. Is my
understanding correct? If yes, what steps could I take to help the
situation. I do see that the difference between "Num Docs" and "Max Docs"
is about 20%.
Would appreciate your help.
Thanks,
Rahul
ndex.ConcurrentMergeScheduler",
"maxMergeCount":2,
"maxThreadCount":2},
Thanks,
Rahul
On Wed, Jun 5, 2019 at 4:24 PM Shawn Heisey wrote:
> On 6/5/2019 9:39 AM, Rahul Goswami wrote:
> > I have a solrcloud setup on Windows server with below config:
> >
/measures.
Thanks,
Rahul
On Thu, Jun 6, 2019 at 11:00 AM Rahul Goswami wrote:
> Thank you for your responses. Please find additional details about the
> setup below:
>
> We are using Solr 7.2.1
>
> > I have a solrcloud setup on Windows server with below config:
> >
, is there a JIRA for it ?
Thanks,
Rahul
teShardHandlerConfig().getDistributedSocketTimeout();
}
I found this open JIRA on this issue:
https://issues.apache.org/jira/browse/SOLR-12550?jql=text%20~%20%22distribUpdateSoTimeout%22
Should I update the JIRA with this ?
Thanks,
Rahul
On Thu, Jun 13, 2019 at 12:00 AM Rahul Goswami
wrote:
> Hello,
>
binary to try
the patch nevertheless, but it didn't help as I anticipated. I'll update
the JIRA and submit a patch.
Thank you,
Rahul
On Thu, Jun 20, 2019 at 11:35 AM Gus Heck wrote:
> Hi Rahul,
>
> Did you try the patch int that issue? Also food for thought:
> https://is
r this part is different on the master.
Regards,
Rahul
On Thu, Jun 20, 2019 at 8:22 PM Rahul Goswami wrote:
> Hi Gus,
> Thanks for the response and referencing the umbrella JIRA for these kind
> of issues. I see that it won't solve the problem since the builder object
> wh
efficient for our use case
considering moderate-heavy indexing and search load? Would also like to
know the tradeoffs involved if any. Thanks in advance!
Regards,
Rahul
beefy physical servers at disposal for
this deployment. If we go with 4 SolrClouds then we would have 4x8=32 nodes
(Solr instances) running across these 4 physical servers.
Any issues that you might see with this configuration or additional
considerations that I might be missing?
Thanks,
Rahul
iculty wrapping my head around this, and would appreciate if you could
help clear it for me.
Thanks,
Rahul
On Thu, Jun 13, 2019 at 7:33 AM Shawn Heisey wrote:
> On 6/6/2019 9:00 AM, Rahul Goswami wrote:
> > *OP Reply* : Total 48 GB per node... I couldn't see another software
> us
Shawn,Erick,
Thank you for the explanation. The merge scheduler params make sense now.
Thanks,
Rahul
On Wed, Jul 3, 2019 at 11:30 AM Erick Erickson
wrote:
> Two more tidbits to add to Shawn’s explanation:
>
> There are heuristics built in to ConcurrentMergeScheduler.
> From
y one huge
document ?
2) If yes, does this flush create a segment with just one document ?
3) Heap dump analysis shows large (>350 MB) instances of
DocumentWritersPerThread. Does one instance of this class correspond to one
document?
Help is much appreciated.
Thanks,
Rahul
On Fri, Jul 5, 20
I am using SOLR version 6.6.0 and the heap size is set to 512 MB, I believe
which is default. We do have almost 10 million documents in the index, we do
perform frequent updates (we are doing soft commit on every update: heap issue
was seen with and without soft commit) to the index and obviousl
don’t see any log lines from the processAdd() method.
Any inputs on why the processor is getting skipped if placed after
distributed processor?
Thanks,
Rahul
the
processAdd() of the processor. Is this an expected behavior?
Regards,
Rahul
On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson
wrote:
> It Depends (tm). This is a little confused. Why do you have
> distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> distrib
any further custom processors other than the run update processor in
standalone mode? Alternatively, is there a way I can get a handle on a
complete document once it’s reconstructed from an atomic update?
Thanks,
Rahul
On Thu, Sep 19, 2019 at 7:06 AM Erick Erickson
wrote:
> _Why_ is reindex
n that case?
Thanks in advance!
Regards,
Rahul
Hello,
Just wanted to follow up in case my question fell through the cracks :)
Would appreciate help on this.
Thanks,
Rahul
On Fri, Nov 15, 2019 at 5:32 PM Rahul Goswami wrote:
> Hello,
>
> We are planning to upgrade our SolrCloud cluster from 7.2.1 (hosted on
> Windows server)
Hi Sujatha,
How did you upgrade your cluster ? Did you restart each node in the cluster
one by one after upgrade (while other nodes were running on 6.6.2) or did
you bring down the entire cluster and bring up one upgraded node at a time?
Thanks,
Rahul
On Thu, Nov 14, 2019 at 7:03 AM Paras
s.
Is it linked appropriately? Or is it some access rights issue for non-PMC
members like me ?
Thanks,
Rahul
On Wed, Dec 4, 2019 at 7:12 AM Noble Paul wrote:
> Thanks ishan
>
> On Wed, Dec 4, 2019, 3:32 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com>
> wrote:
>
&g
for better
application design considerations.
Thanks,
Rahul
this behavior is included in
the documentation since it is similar to the behavior with periods.
https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer
"Periods (dots) that are not followed by whitespace are kept as part of the
token, including Internet domain names. "
Thanks,
Rahul
Nope. The underscore is preserved right after tokenization even before it
reaches any filters. You can choose the type "text_general" and try an
index time analysis through the "Analysis" page on Solr Admin UI.
Thanks,
Rahul
On Sat, Jan 9, 2021 at 8:22 AM xiefengchan
t on underscores if that is your use case.
>
> On Sat, Jan 9, 2021 at 2:58 PM Rahul Goswami
> wrote:
>
> > Nope. The underscore is preserved right after tokenization even before it
> > reaches any filters. You can choose the type "text_general" and try an
&
iii) Wait for 5-10 seconds between each subsequent node start
Hope this helps.
Best,
Rahul
On Thu, Feb 11, 2021 at 12:03 PM mmb1234 wrote:
> Hello,
>
> On reboot of one of the solr nodes in the cluster, we often see a
> collection's shards with
> 1. LEADER replica in DO
/solr/gettingstarted/select?q='*
<http://localhost:8983/solr/gettingstarted/select?q='*>'*
Please suggest me anything and let me know if I am missing anything
Thanks,
Rahul
201 - 267 of 267 matches
Mail list logo