If you build your index in Hadoop, read this (it is about the Cloudera
Search but in my understanding also should work with Solr Hadoop contrib
since 4.7)
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_batch_index_to_solr_servers_using_g
Why not to change the order to this:
3. Upgrade Solr Schema (Master) Replication is disabled.
4. Start Index Rebuild. (if step 3)
1. Pull up Maintenance Pages
2. Upgrade DB
5. Upgrade UI code
6. Index build complete ? Start Replication
7. Verify UI and Drop Maintenance Pages.
So your slaves will
Otis,
Not sure about the Solr, but with Lucene It was certainly doable. I
saw fields way bigger than 400Mb indexed, sometimes having a large set
of unique terms as well (think something like log file with lots of
alphanumeric tokens, couple of gigs in size). While indexing and
querying of such thi
Alexander,
I saw the same behavior in 1.4.x with non-multivalued fields when
"updating" the document in the index (i.e obtaining the doc from the
index, modifying some fields and then adding the document with the same
id back). I do not know what causes this, but it looks like the
copyField logic
and re-indexed all content.
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
> > wrote:
> >> Ravi,
> >>
> >> if you have what looks like a full replication each time even if the
g: 23902s, Speed: 18.67 KB/s
>
>
> Thanks,
> Ravi Kiran Bhaskar
>
> On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
> wrote:
> > Ravi,
> >
> > as far as I remember, this is how the replication logic works (see
> > SnapPuller class, fetchLatestInd
Ravi,
as far as I remember, this is how the replication logic works (see
SnapPuller class, fetchLatestIndex method):
> 1. Does the Slave get the whole index every time during replication or
> just the delta since the last replication happened ?
It look at the index version AND the index generat
for your response.
>
> You said that the old index files were still in use. That means Linux does
> not *really* delete them until Solr frees its locks from it, which happens
> while reloading?
>
>
>
> Thank you for sharing your experiences!
>
> Kind regards,
> Em
&g
I see the file
-rw-rw-r-- 1 feeddo feeddo0 Dec 15 01:19
lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
was created on Dec. 15. At the end of the replication, as far as I
remember, the SnapPuller tries to open the writer to ensure the old
files are deleted, and in
your case it cannot obtai
Em,
yes, you can replace the index (get the new one into a separate folder
like index.new and then rename it to the index folder) outside the
Solr, then just do the http call to reload the core.
Note that the old index files may still be in use (continue to serve
the queries while reloading), eve
Igor,
you can set two different Solr cores in solr.xml and search them separately.
See multicore example in Solr distribution.
-Alexander
On Fri, Jan 21, 2011 at 3:51 PM, Igor Chudov wrote:
> I would like to have two sets of data and search them separately (they are
> used for two different web
Joan,
make sure that you are running the job on Hadoop 0.21 cluster. (It
looks like you have compiled the apache-solr-hadoop jar with Hadoop
0.21 but using it on 0.20 cluster).
-Alexander
Joan,
current version of the patch assumes the location and names for the
schema and solrconfig files ($SOLR_HOME/conf), it is hardcoded (see
the SolrRecordWriter's constructor). Multi-core configuration with
separate configuration locations via solr.xml is not supported as for
now. As a workarou
1
('text' field has the original Cyrillic tokens, 'text_translit' is for
transliterated ones)
-Alexander
2010/10/28 Pavel Minchenkov :
> Alexander,
>
> Thanks,
> What variat has better performance?
>
>
> 2010/10/28 Alexander Kanarsky
>
>> Pavel
Pavel,
I think there is no single way to implement this. Some ideas that
might be helpful:
1. Consider adding additional terms while indexing. This assumes
conversion of Russian text to both "translit" and "wrong keyboard"
forms and index converted terms along with original terms (i.e. your
Analy
> He said some other things about a huge petabyte hosted search collection
> they have used by banks..
In context of your discussion this reference sounds really, really funny... :)
-Alexander
On Wed, Sep 22, 2010 at 1:17 PM, Grant Ingersoll wrote:
>
> On Sep 22, 2010, at 2:04 PM, Smiley, Dav
Set up your JVM to produce the heap dumps in case of OOM and try to
analyze them with a profiler like YourKit. This could give you some
ideas on what takes memory and what potentially could be reduced.
Sometimes the cache settings could be adjusted without significant
performance toll etc. See what
17 matches
Mail list logo