It turns out that all our fields are stored and restoring from the
source data is a bit of problem. I've tried DIH/SorEntityProcessor and
it seems to be working out good, so I'll probably end up using it.
Thank you!
--
Warm regards,
Artem Karpenko
On 04.02.2013 19:58, Lance Norskog wrote:
A side problem here is text analyzers: the analyzers have changed how
they split apart text for searching, and are matched pairs. That is, the
analyzer queries are created matching what the analyzer did when
indexing. If you do this binary upgrade sequence, the indexed data will
not match what the analyzers do. It is not a major problem, but queries
will not bring back what you expect.
Also, in 4.x, the unique field has to be called 'id' and every document
needs a '_version_' field.
On 02/04/2013 09:32 AM, Upayavira wrote:
Just to add a little to the good stuff Shawn has shared here - Solr 4.1
does not support 1.4.1 indexes. If you cannot re-index (by far
recommended), then first upgrade to 3.6, then optimize your index, which
will convert it to 3.6 format. Then you will be able to use that index
in 4.1. The simple logic here is that Solr/Lucene can read the indexes
of the previous major version. Given you are two major versions behind,
you'd have to do it in two steps.
Upayavira
On Mon, Feb 4, 2013, at 03:18 PM, Shawn Heisey wrote:
On 2/4/2013 7:20 AM, Artem OXSEED wrote:
I need to upgrade our Solr installation from 1.4.1 to the latest 4.1.0
version. The question is how to deal with indexes. AFAIU there are two
things to be aware of: file format and index format (excuse me for
possible term mismatch, I'm new to Solr) - and while file format can
(and will automatically?) be updated if old index files are used by new
Solr installation, one cannot say the same about index format. Is it true?
And if the above is true then the question is - should this "index
format" be updated at all - i.e. if we can happily live with it then
it's fine, but I guess that this decision will not bring
performance/feature improvements that were introduced since 1.4.1
version, will it?
Assuming we do need to update this "index format", how to do it? I found
solution on SO
(http://stackoverflow.com/questions/4528063/moving-data-from-solr-1-5-to-solr-4-0)
that includes usage of some "export to XML" feature, maybe with Luke,
some custom-made XSLT transformation and import back. Seems like a lot
to do - although it's quite understandable. However, this answer was
given in 2010 with Solr 4.0 being in pre-alpha - so maybe there are now
tools for this now?
Artem,
When upgrading Solr, the absolute best option is always to delete (or
move) your index directory, let the new version recreate it, and rebuild
from scratch by reindexing from your original data source. This should
always remain an option - the indexes may get corrupted by an unexpected
situation. If you have the ability to rebuild your 1.4.1 index from
your original data source, then it should be straightforward to do the
same thing on the new version.
Solr 4.1 can read version 3.x indexes, but I would not be surprised to
find that it can't read the Lucene 2.9.x format that Solr 1.4.1 uses. I
don't know how much difference there is between the 2.9.x format and the
3.x format. I'm not aware of a distinction between "file" and "index"
formats.
If a Solr version supports an older format, then it will read the
segments created in that format, but new segments will be in the new
format. Solr/Lucene index segments on disk are never changed once they
are finalized. They can be merged into new segments and then deleted,
but nothing will ever change them.
Have you stored every single field individually in Solr? If you have,
then you will be able to retrieve the data to reindex into the new
version. If you have fields that are indexed but not stored, then even
with the XML method you will be unable to obtain all the data. It is
fairly normal in a Solr schema to have fields that you can search on but
that are not stored, because stored fields make the index larger.
If you have stored every single field in your index, you can also use
the SolrEntityProcessor in the dataimport handler to import from an old
Solr server to a new one.
The critical piece of the puzzle for upgrading between incompatible
versions is that you must be storing every field in the old version
before you start. If you aren't storing a particular field, then the
data from that field is not retrievable and you must go back to the
original data source.
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
Thanks,
Shawn