Hi Derek,
There are both pros and cons for both approaches:
1. if you are doing full reindexing PRO is that you have clean index all
the time and even if something goes wrong, you don't have to switch
alias to updated index so your users will not notice issues. CON is that
you are doing full reindex all the time even amount of changes is
minimal. Also, this approach is not real time friendly if you plan to
have more frequent update cycles.
2. If you delete in existing index, you do min changes. But note that
deleted doc are just flagged in index as deleted and removed when
segments are merged. This can result in skewed statistics and if you
have replicas and sort by score, can result in different ordering
depending on replicas' merge cycles. Using optimize after update is done
would solve this issue.
In order to make the right decision, you have to look at size of your
collection, number of deleted items etc. You can even combine
approaches, e.g. delete daily and do full reindex once a week.
HTH,
Emir
On 23.03.2017 07:10, Derek Poh wrote:
Hi
I have collections of products. I am doing indexing 3-4 times daily.
Every day there are products that expired and I need to remove them
from these collectionsdaily.
Ican think of 2 ways to do this.
1. using collection aliasto switch between a main and temp collection.
- clear and index the temp collection
- create alias to temp collection.
- clear and index the main collection.
- create alias to main collection.
this way require additional collections.
2. get list of expired products and generate deleteby id queries to
the collections.
Would like to get some advice on which way should I adopt?
Derek
----------------------
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential
and/or privileged information. If you are not the intended recipient
or have received this e-mail in error, please inform the sender
immediately and delete this e-mail (including any attachments) from
your computer, and you must not use, disclose to anyone else or copy
this e-mail (including any attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/