Please see inline for my thoughts
Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"
+91 73500 12833
deic...@gmail.com
Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool
"Pl
On 11/2/2018 5:00 PM, Wei wrote:
After a recent schema change, it takes almost 40 minutes to optimize the
index. The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.
Hello,
After a recent schema change, it takes almost 40 minutes to optimize the
index. The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.
I have tried to increase ma
On 11/2/2018 1:38 PM, Chuming Chen wrote:
I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g
-Xmx40g”), each shard has 32 million documents and 32Gbytes in size.
A 40GB heap is probably completely unnecessary for an index of that
size. Does each machine have one replica on
Hi All,
I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g
-Xmx40g”), each shard has 32 million documents and 32Gbytes in size.
For a given query (I use complexphrase query), typically, the first time it
took a couple of seconds to return the first 20 docs. However, for the
Great, thanks for the response. This is how we have it configured now, but
we just had the idea the other day that maybe it would be better
otherwise...
And thhanks for the blog post! We ended up with basically the same config,
so it's good to see that validated.
Kyle
On Fri, 2 Nov 2018 at 13:
I prefer a single HDFS home since it definitely simplifies things. No need
to create folders for each node or anything like that if you add nodes to
the cluster. The replicas underneath will get their own folders. I don't
know if there are issues with autoAddReplicas or other types of failovers
if
Hi All,
Here's a question that I can't find an answer to in the documentation:
When configuring solr cloud with HDFS, is it best to:
a) provide a unique hdfs folder for each solr cloud instance
or
b) provide the same hdfs folder to all solr cloud instances.
So for example, if I have two solr
+1 Thank you, Daniel. If you have any interest in helping out on
TIKA-2749, please join the fun. :D
On Fri, Nov 2, 2018 at 12:12 PM Davis, Daniel (NIH/NLM) [C]
wrote:
>
> I think that you also have to process a PDF pretty deeply to decide if you
> want it to be OCR. I have worked on projects w
I think that you also have to process a PDF pretty deeply to decide if you want
it to be OCR. I have worked on projects where all of the PDFs are really like
faxes - images are encoded in JBIG2 black and white or similar, and there is
really one image per page, and no text. I have also worke
On 11/2/2018 3:12 AM, Vadim Ivanov wrote:
It seems to me that issue related with:
- restart solr node
- rebalance leader
- reload collection
- reload core (Core admin is not forbidden but seems obsolete in SolrCloud)
In SolrCloud, CoreAdmin is an expert option. Many of the things that
the Col
OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr! We
have an open ticket to make it "just work", but we aren't there yet
(TIKA-2749).
You have to tell Tika how you want to process images from PDFs via the
tika-config.xml file.
You've seen this link in the links you mentioned:
ht
Hi All,
I want to index images and pdf documents which have images into Solr. I
test it with my Solr 6.3.0.
I've installed tesseract at my computer (Mac). I verify that Tesseract
works fine to extract text from an image.
I index image into Solr but it has no content. However, as far as I know, I
Hi Susheel,
Yes, it appears that under certain conditions, if a follower is down when
the leader gets an update, the follower will not receive that update when it
comes back (or maybe it receives the update and it's then overwritten by its
own transaction logs, I'm not sure). Furthermore,
It seems to me that issue related with:
- restart solr node
- rebalance leader
- reload collection
- reload core (Core admin is not forbidden but seems obsolete in SolrCloud)
If nothing is changing in cluster state everything goes smoothly.
May be it can be reproduced wit the same test as in " Solr
15 matches
Mail list logo