Hello I tried setting both autocommit and autosoftcommit to -1, but i
still see the documents just seconds after indexing it.
These are the actual configurations in <solrcorename>/conf/solrconfig.xml
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:9999999}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<!-- softAutoCommit is like autoCommit except it causes a
'soft' commit which only ensures that changes are visible
but does not ensure that data is synced to disk. This is
faster and more near-realtime friendly than a hard commit.
-->
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:9999999}</maxTime>
</autoSoftCommit>
but even that way after every single POST to /update request handler, If
I search * I see 1K documents more (i index in chunk of 1k documents).
Do you have any idea of why this happens?
On 12/12/18 17:16, Erick Erickson wrote:
The answer to your question is to set the interval to -1.
however, for <autoCommit> that's a really bad idea. Why do you think
this will help with OOM errors? _Querying_ usually is the place OOMs
are generated, especially if you do things like facet on very
high-cardinality fields and/or do _not_ have docValues enabled for
fields you facet, group, or sort on.
I have a single machine where I just index data, no concurrent querying
is happening, that's why I don't care about visibility but just about
speed/no crash.
I'm planning to make a single hard commit at the end (roughly once every
500.000 docs)
copy the final index to a clone machine where all the querying happens,
to avoid OOM presumably generated by concurrent indexing/querying.
I thought this can help lowering the solr memory requirements.
We don't facet, group, sort. The default solr sorting by relevance is ok
for us.
We just have big edismax queries with sub-edismax queries with different
mm values. Every sub-edismax query do have a lot (order of K) of
alternative words/phrases.
If you do disable hard commits, your TLOG sizes will grow without
bound until your entire indexing run is complete. Worse, if the TLOG
replays due to abnormal restart, it would try to re-index everything.
Hard commits with openSearcher=false are recommended.
yes I know, but I want to have the control on the time where the hard
commit is triggered.
It would also be nice to know when solr finishes the hard commit, so
that I can stop sending POST request in that timeframe, but I haven't
seen any API for that.
Thank you for your help
Danilo
Best,
Erick
On Wed, Dec 12, 2018 at 4:44 AM Danilo Tomasoni <tomas...@cosbi.eu>
wrote:
I want to disable even that.
I saw here
https://lucene.apache.org/solr/guide/6_6/updatehandlers-in-solrconfig.html
that probably to achieve what I want I just need to comment out the
autoCommit tag.. correct?
What do you think about disabling autocommit/autosoftcommit?
it can lower the system requirements while indexing?
What about transaction logs? they can be disabled?
When solr crashes I always reimport from scratch because I don't expect
that the documents accepted by solr between the last hard commit and the
crash will be saved somewhere.
But this article
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
says that solr is capable of restoring documents even if they weren't
committed, is it still correct?
Thank you
Danilo
On 12/12/18 13:33, Mikhail Khludnev wrote:
What about autoSoftCommit ?
On Wed, Dec 12, 2018 at 3:24 PM Danilo Tomasoni <tomas...@cosbi.eu>
wrote:
Hello, I'm experiencing oom while indexing a big amount of documents.
The main idea to avoid OOM is to avoid commit (just one big commit at
the end).
Is this a correct idea?
How can I disable autocommit?
I've set
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:-1}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
in solrconfig.xml
but it's not sufficient, while indexing I still see documents.
Thank you
Danilo
--
Danilo Tomasoni
COSBI
As for the European General Data Protection Regulation 2016/679 on the
protection of natural persons with regard to the processing of
personal
data, we inform you that all the data we possess are object of
treatement
in the respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and
how;
you may ask for their correction, cancellation or you may oppose to
their
use by written request sent by recorded delivery to The Microsoft
Research
– University of Trento Centre for Computational and Systems Biology
Scarl,
Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
--
Danilo Tomasoni
COSBI
As for the European General Data Protection Regulation 2016/679 on the
protection of natural persons with regard to the processing of personal
data, we inform you that all the data we possess are object of treatement
in the respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how;
you may ask for their correction, cancellation or you may oppose to their
use by written request sent by recorded delivery to The Microsoft Research
– University of Trento Centre for Computational and Systems Biology Scarl,
Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
--
Danilo Tomasoni
COSBI
As for the European General Data Protection Regulation 2016/679 on the
protection of natural persons with regard to the processing of personal
data, we inform you that all the data we possess are object of treatement
in the respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how;
you may ask for their correction, cancellation or you may oppose to their
use by written request sent by recorded delivery to The Microsoft Research
– University of Trento Centre for Computational and Systems Biology Scarl,
Piazza Manifattura 1, 38068 Rovereto (TN), Italy.