Faceting unknown fields
Hello, I'm wondering if it's possible to index and facet "unknown" fields. Let's me explain: I've got a set of 1M products (from computer to freezer), and each category of product has some attributes, so number of attributes is pretty large (1000+). I've started to describe each attribute in my schema, but i think it will be hard to maintain. So, can I index and facet these fields, without describe then in my schema? I will first try with dynamic fields, but I'm not sure it's going to work. Anyone's got some idea? Mickael. -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-unknown-fields-tp951008p951008.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting unknown fields
hi, > So, can I index and facet these fields, without describe then in my schema? > > I will first try with dynamic fields, but I'm not sure it's going to work. we do all our facet fields in this way, with just general string field for single/multivalued fields: and faceting works... but you will still need to know the specific name of the field(s) to use in the facet.field URL parameter (i.e. as long as your UI knows!). hope that helps bec :)
Re: Faceting unknown fields
Thanks, I'll test your solution shortly Mickael. -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-unknown-fields-tp951008p951027.html Sent from the Solr - User mailing list archive at Nabble.com.
Spellcheck help
Hello,I've been trying to get rid of a bug when using the spellcheck but so far with no success :(When searching for a word that starts with a number, for example "3dsmax", i get the results that i want, BUT the spellcheck says it is not correctly spelled AND the collation gives me "33dsmax". Further investigation shows that the spellcheck is actually only checking "dsmax" which it considers does not exist and gives me "3dsmax" for better results, but since i have spellcheck.collate = true, the collation that i show is "33dsmax" with the first 3 being the one discarded by the spellchecker... Otherwise, the spellcheck works correctly for normal words... any ideas? :(My spellcheck field is fairly classic, whitespace tokenizer, with lowercase filter...Any help would be greatly appreciated :)Thanks,Marc _ Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement ! http://www.messengersurvotremobile.com/?d=iPhone
Score boosting
Hi everyone, I have a requirement to achieve, but i can't figure out how to do it. Hope someone could help me. Here is the requirement: A book has several keyphrases (available to use in searching). The author could buy the search result position with these keyphrases or simply add keyphrases related to this book. Here, I need to implement the search affected by the position field. I'm not so sure how to implement this requirement. Hope anyone could help me! -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Distributed Indexing
Is there any tools for "Distributed Indexing"? It refers to KattaIntegration and ZooKeeperIntegration in http://wiki.apache.org/solr/DistributedSearch. But it seems that they concern more on error processing and replication. I need a dispatcher that dispatch different docs by uniqueKey(such as url) to different machines. And when a doc is updated, the doc is sent to the machine that contains the url. Also I need the docs are randomly sent to all the machines so that when I do a distributed search the idfs of different machines are similar because the current distributed search's idf are local.
Re: How do I get the matched terms of my query?
if you want only documents that have both values then make your q q=content:videos+AND+content:songs If you want the more open query, but to be able to tell which docs have videos, which have songs and which have both...then I'm not sure. Using debugQuery=on might help with your understanding, but isn't a good runtime solution if you needed that. -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-get-the-matched-terms-of-my-query-tp951422p951492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Score boosting
Sounds like you want Payloads. I don't think you can guarantee a position, but you can boost relative to others. You can give one author/book a boost of 0 for the phrase Cooking, and another author/book a boost of .5 and yet another a boost of 1.0. For searches that include the phrase Cooking, the scores should reflect the boosts and the authors that bought the higher boost value will sort higher. These discuss Payloads (it isn't a trivial task by the way): http://www.ultramagnus.org/?p=1 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ or use this to see other Solr-User group discussions on the topic: http://lucene.472066.n3.nabble.com/template/NodeServlet.jtp?tpl=search-page&node=472068&query=Using+Lucene's+payload+in+Solr -- View this message in context: http://lucene.472066.n3.nabble.com/Score-boosting-tp951214p951510.html Sent from the Solr - User mailing list archive at Nabble.com.
Filter multivalue fields from search result
Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: And example docs are: ++--+++ | id | name | town | date | ++--+++ | 1 | Microsoft Excel | London | 2010-08-20 | || | Glasgow| 2010-08-24 | || | Leeds | 2010-08-28 | | 2 | Microsoft Word | Aberdeen | 2010-08-21 | || | Reading| 2010-08-25 | || | London | 2010-08-29 | | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | || | Leeds | 2010-08-26 | ++--+++ so the query for q=name:Microsoft town:Leeds returns docs 1 & 3. How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? Or is it that I should create separate doc for each name-event? Thanks, Alex
solr connection question
Hi solr users I need to know how solr manages the connections when we make a request(select update commit) Is there any connection pooling or an article to learn about it connection management?? How can I log in a file the connections solr server I have setup my solr 1.4 with tomcat Thanks in advance
Re: solr connection question
Hi, Solr runs as a Web application. The requests you most probably mean are just HTTP-requests to the underlying container. Internally each request is processed against the Lucene index, usually being a file- based one. Therefore there are no connections like in a database application, where you have a pool of connections to your remote databse server. Best, Sven --On Donnerstag, 8. Juli 2010 15:46 +0300 "ZAROGKIKAS,GIORGOS" wrote: Hi solr users I need to know how solr manages the connections when we make a request(select update commit) Is there any connection pooling or an article to learn about it connection management?? How can I log in a file the connections solr server I have setup my solr 1.4 with tomcat Thanks in advance
Re: solr connection question
Jorl, ok tendré que modificar mi petición de vacaciones :( Rubén Abad On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS < g.zarogki...@multirama.gr> wrote: > Hi solr users > > I need to know how solr manages the connections when we make a > request(select update commit) > Is there any connection pooling or an article to learn about it connection > management?? > How can I log in a file the connections solr server > > I have setup my solr 1.4 with tomcat > > Thanks in advance > > > >
RE: solr connection question
Yes I mean HTTP-requests How can I log them? -Original Message- From: Sven Maurmann [mailto:sven.maurm...@kippdata.de] Sent: Thursday, July 08, 2010 3:56 PM To: solr-user@lucene.apache.org Subject: Re: solr connection question Hi, Solr runs as a Web application. The requests you most probably mean are just HTTP-requests to the underlying container. Internally each request is processed against the Lucene index, usually being a file- based one. Therefore there are no connections like in a database application, where you have a pool of connections to your remote databse server. Best, Sven --On Donnerstag, 8. Juli 2010 15:46 +0300 "ZAROGKIKAS,GIORGOS" wrote: > Hi solr users > > I need to know how solr manages the connections when we make a > request(select update commit) Is there any connection pooling or an > article to learn about it connection management?? How can I log in a file > the connections solr server > > I have setup my solr 1.4 with tomcat > > Thanks in advance
Re: solr connection question
ok please don't forget it :) 2010/7/8 Ruben Abad > Jorl, ok tendré que modificar mi petición de vacaciones :( > Rubén Abad > > > On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS < > g.zarogki...@multirama.gr> wrote: > > > Hi solr users > > > > I need to know how solr manages the connections when we make a > > request(select update commit) > > Is there any connection pooling or an article to learn about it > connection > > management?? > > How can I log in a file the connections solr server > > > > I have setup my solr 1.4 with tomcat > > > > Thanks in advance > > > > > > > > >
RE: Distributed Indexing
Li, as far as I know, you still have to do this part yourself. A possible way to shard is to number the shards from 0 to numShards-1, calculate hash(uniqueKey)%numShards per each document, and send the document to the resulting shard number. This number is consistent and sends documents uniformly to different shards. -- Yuval -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: Thursday, July 08, 2010 2:44 PM To: solr-user@lucene.apache.org Subject: Distributed Indexing Is there any tools for "Distributed Indexing"? It refers to KattaIntegration and ZooKeeperIntegration in http://wiki.apache.org/solr/DistributedSearch. But it seems that they concern more on error processing and replication. I need a dispatcher that dispatch different docs by uniqueKey(such as url) to different machines. And when a doc is updated, the doc is sent to the machine that contains the url. Also I need the docs are randomly sent to all the machines so that when I do a distributed search the idfs of different machines are similar because the current distributed search's idf are local.
Determining matched tokens in original query
Hi, I'm trying to find out which tokens in a user's query matched against each result. I've been trying to use the highlight component for this, however it doesn't quite fit the bill. I'm using edismax, with mm set to 50%, and I want to extract for each matching doc which tokens /didn't/ match (I then strip the matching tokens from the search string and run the remaining query against a different solr index). My problem comes that the highlighter, naturally, applies highlighting to fields after filters have been applied. This means it's tricky to use the highlighted terms to match the original query because things like synonyms, stemmed words & possessives may be matched. E.g. with the search string: mr banana's shop I could get a highlighted fragment like: Mister Banana's frozen banana stand Is there some other approach I could use? Thanks, Mark
Realtime + Batch indexing
Hi, Currently we are trying to acheive both realtime and batch indexing using SOLR. For batch indexing we have setup a master SOLR server which uses DIH and indexes the data. For slave we post the XML (real time) in to the SOLR slave and add that to the existing SOLR document. Now my issue is that when I replicate the data present in master in to the slave the data that was added to the slave (by posting XML) will get overwritten. I cant post the XML to the master as replication of the whole master to slave again and again will cause performance issues. My question is Is there a way to replicate just the modified data in Master (delta) to slave environment? What is the best approach to implement both batch / real time indexing in Master / Slave environment? One more issue is that when I post some documents to slave directly usung /update handler some of the attributes are getting lost in the existing index. Any reason why this might be happening? Any help / suggestions would be of great help Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p952293.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH batch job
Hi, We are trying to import data from the ORACLE database into Solr 1.4 for free text search and would like to provide a faceted search experience. There are files on the network which we are indexing as well. We are using the DIH for indexing the data from the database and have written a batch job for iterating over the network files and indexing them using Tika 0.7. We have a couple of questions: 1)How do we schedule a batch job using DIH (We need fine granular access to log any error messages and decide whether to continue or abort the job)? Is there a patch for Solr 1.5 we can take a look at? Currently we use Solr 1.4 2)Can we upgrade the Tika libraries in Solr 1.4 to leverage the latest tika enhancements and use the Solr Cell module? It would be great if you could provide guidance. Thanks, Sanjeev Kakar
Re: Using hl.regex.pattern to print complete lines
To clarify, I never want a snippet, I always want a whole line returned. Is this possible? Thanks! -Pete On Jul 7, 2010, at 5:33 PM, Peter Spam wrote: > Hi, > > I have a text file broken apart by carriage returns, and I'd like to only > return entire lines. So, I'm trying to use this: > > &hl.fragmenter=regex > &hl.regex.pattern=^.*$ > > ... but I still get fragments, even if I crank up the hl.regex.slop to 3 or > so. I also tried a pattern of "\n.*\n" which seems to work better, but still > isn't right. Any ideas? > > > -Pete
Delta Import by ID
I'm still having issues - my config looks like: However I really dont want to use CreationDate, but rather just pass in the id (as done in the deltaImportQuery) - Can I do that directly - if so how do I specify the value for dataimporter.delta.id? (P.S. sorry for a new thread, I kept getting my mail bounced back when I did a reply, so I'm trying a new thread.)
Re: Using hl.regex.pattern to print complete lines
(10/07/09 2:44), Peter Spam wrote: To clarify, I never want a snippet, I always want a whole line returned. Is this possible? Thanks! -Pete Hello Pete, Use NullFragmenter. It can be used via GapFragmenter with hl.fragsize=0. Koji -- http://www.rondhuit.com/en/
Indexing slowdowns
Since I began using the 2010-05-18 nightly I'm experiencing indexing slow downs which I didn't with solr-1.4. I'm seeing indexing slow down roughly every 7m records. I'm indexing about 28m in total. These records are batched into csv files of 1m rows, which are loaded with stream.file. Solr happily chugs away at the first 7m at around 50s/million. It will then consistently take around 20 minutes to index the 7m-8m batch, after which it returns to around 50s/million until reaching the 14m-15m batch and taking again around 20 minutes and so on. There are essentially no differences in configuration between my 1.4 set up and the nightly. I've played around with mergeFactor and other params to no avail. I've also hooked up yourkit to jetty, but haven't seen anything obvious in the results. That said, my java foo is not so strong so I may be missing something. Can anyone suggest where I might start looking for answers? I have a yourkit snapshot if anyone would care to see it. Thanks, Mark
Re: Using hl.regex.pattern to print complete lines
Thanks for the note, Koji. However, hl.fragsize=0 seems to return the entire document, rather than just one single line. Here's what I tried (what I previously had was commented out): regexv = "^.*$" thequery = '/solr/select?facet=true&facet.limit=10&fl=id,score,filename&tv=true&timeAllowed=3000&facet.field=filename&qt=tvrh&wt=ruby' + (p['fq'].empty? ? '' : ('&fq='+p['fq'].to_s) ) + '&q=' + CGI::escape(p['q'].to_s) + '&rows=' + p['rows'].to_s + "&hl=true&hl.snippets=1&hl.fragsize=0" #&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" + CGI::escape(regexv) Thanks for your help. -Peter On Jul 8, 2010, at 3:47 PM, Koji Sekiguchi wrote: > (10/07/09 2:44), Peter Spam wrote: >> To clarify, I never want a snippet, I always want a whole line returned. Is >> this possible? Thanks! >> >> >> -Pete >> >> > Hello Pete, > > Use NullFragmenter. It can be used via GapFragmenter with > hl.fragsize=0. > > Koji > > -- > http://www.rondhuit.com/en/ >
Re: Indexing slowdowns
On Thu, Jul 8, 2010 at 7:44 PM, Mark Holland wrote: > > Can anyone suggest where I might start looking for answers? I have a > yourkit > snapshot if anyone would care to see it. > > Doesn't sound good. I'd like to see whatever data you can provide (i worry it might be something in analysis) -- Robert Muir rcm...@gmail.com
Re: DIH batch job
There is no batch job scheduling in Solr. You will have to script this with your OS tools (probably the 'cron' program). Tika is integrated into the DataImportHandler in Solr 1.5. This gives you flexibility in indexing and is worth extra effort. On Thu, Jul 8, 2010 at 10:48 AM, Sanjeev Kakar wrote: > Hi, > > > > We are trying to import data from the ORACLE database into Solr 1.4 > for free text search and would like to provide a faceted search > experience. There are files on the network which we are indexing as > well. > > > > We are using the DIH for indexing the data from the database and have > written a batch job for iterating over the network files and indexing > them using Tika 0.7. > > > > We have a couple of questions: > > 1) How do we schedule a batch job using DIH (We need fine granular > access to log any error messages and decide whether to continue or abort > the job)? Is there a patch for Solr 1.5 we can take a look at? Currently > we use Solr 1.4 > > 2) Can we upgrade the Tika libraries in Solr 1.4 to leverage the > latest tika enhancements and use the Solr Cell module? > > > > It would be great if you could provide guidance. > > > > Thanks, > > Sanjeev Kakar > > > > -- Lance Norskog goks...@gmail.com
Re: Realtime + Batch indexing
You cannot add to the same index on two different solrs. You can set up separate shards for the batch and incremental indexes and use distributed search to query both of them. On Thu, Jul 8, 2010 at 10:04 AM, bbarani wrote: > > Hi, > > Currently we are trying to acheive both realtime and batch indexing using > SOLR. > > For batch indexing we have setup a master SOLR server which uses DIH and > indexes the data. > > For slave we post the XML (real time) in to the SOLR slave and add that to > the existing SOLR document. > > Now my issue is that when I replicate the data present in master in to the > slave the data that was added to the slave (by posting XML) will get > overwritten. > > I cant post the XML to the master as replication of the whole master to > slave again and again will cause performance issues. My question is > > Is there a way to replicate just the modified data in Master (delta) to > slave environment? > > What is the best approach to implement both batch / real time indexing in > Master / Slave environment? > > > One more issue is that when I post some documents to slave directly usung > /update handler some of the attributes are getting lost in the existing > index. Any reason why this might be happening? > > > Any help / suggestions would be of great help > > Thanks, > BB > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p952293.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com
Re: Filter multivalue fields from search result
Yes, denormalizing the index into separate (name,town) pairs is the common design for this problem. 2010/7/8 "Alex J. G. Burzyński" : > Hi, > > Is it possible to remove from search results the multivalued fields that > don't pass the search criteria? > > My schema is defined as: > > > required="true" /> > > > > multiValued="true"/> > > multiValued="true"/> > > And example docs are: > > ++--+++ > | id | name | town | date | > ++--+++ > | 1 | Microsoft Excel | London | 2010-08-20 | > | | | Glasgow | 2010-08-24 | > | | | Leeds | 2010-08-28 | > | 2 | Microsoft Word | Aberdeen | 2010-08-21 | > | | | Reading | 2010-08-25 | > | | | London | 2010-08-29 | > | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | > | | | Leeds | 2010-08-26 | > ++--+++ > > so the query for q=name:Microsoft town:Leeds returns docs 1 & 3. > > How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? > > Or is it that I should create separate doc for each name-event? > > Thanks, > Alex > -- Lance Norskog goks...@gmail.com
Re: Indexing slowdowns
Hmm, did the default number of background merge threads change sometime recently? I seem to recall so, but I can't find a reference to it. -Yonik http://www.lucidimagination.com
Re: Using hl.regex.pattern to print complete lines
(10/07/09 9:30), Peter Spam wrote: Thanks for the note, Koji. However, hl.fragsize=0 seems to return the entire document, rather than just one single line. Here's what I tried (what I previously had was commented out): regexv = "^.*$" thequery = '/solr/select?facet=true&facet.limit=10&fl=id,score,filename&tv=true&timeAllowed=3000&facet.field=filename&qt=tvrh&wt=ruby' + (p['fq'].empty? ? '' : ('&fq='+p['fq'].to_s) ) + '&q=' + CGI::escape(p['q'].to_s) + '&rows=' + p['rows'].to_s + "&hl=true&hl.snippets=1&hl.fragsize=0" #&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" + CGI::escape(regexv) Thanks for your help. -Peter Peter, Are you sure using GapFragmenter when you set fragsize to 0? I've never tried regex fragmenter... If you can use the latest branch_3x or trunk, hl.fragListBuilder=single is available that is for getting entire field contents with search terms highlighted. To use it, set hl.useFastVectorHighlighter to true. Koji -- http://www.rondhuit.com/en/
Re: Indexing slowdowns
On 7/8/10 8:55 PM, Yonik Seeley wrote: > Hmm, did the default number of background merge threads change > sometime recently? I seem to recall so, but I can't find a reference > to it. > > -Yonik > http://www.lucidimagination.com It did change - from 3 to 1-3: maxThreadCount = Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors()/2)); - Mark
Re: Using symlinks to alias cores
: However, the wiki recommends against using the ALIAS command in CoreAdmin in : a couple of places, and SOLR-1637 says it's been removed now anyway. correct, there were a lot of problems with how to cleanly/sanely deal with core operations on aliases -- he command may return at some future date if there is a better seperation between the concept of an "authoritative' name for a core, and aliases -- but in the meantime, i wouldn't recomend using it even in older versions of Solr where it (sort of) worked. : If I can't use ALIAS safely, is it okay to just symlink the most recent : core's instance (or data) directory to 'current', and bring it up in Solr as : a separate core? Will this be safe, as long as all index writing happens via : the 'current' core? i would not recommend that -- as long as you only index to one of those cores, and use "commits" to force the other instances to reload from disk there wouldn't be any errors -- but you'll wind up duplicating all of the internal memory strucutures (index objects, and caches) a cleaner way to deal with this would be do use something like RewriteRule -- either in your appserver (if it supports a feature like that) or in a proxy sitting in front of Solr. Frankly though: indexing code can usually be made fairly smart -- pretty much every programming langauge i nthe world makes it fairly easy to generate a string using the pattern "http://server:8983/solr/${YY-MM-DD}/update";, and then you just POST to that. -Hoss
Re: Using hl.regex.pattern to print complete lines
: If you can use the latest branch_3x or trunk, hl.fragListBuilder=single : is available that is for getting entire field contents with search terms : highlighted. To use it, set hl.useFastVectorHighlighter to true. He doesn't want the entire field -- his stored field values contain multi-line strings (using newline characters) and he wants to make fragments per "line" (ie: bounded by newline characters, or the start/end of the entire field value) Peter: i haven't looked at the code, but i expect that the problem is that the java regex engine isn't being used in a way that makes ^ and $ match any line boundary -- they are probably only matching the start/end of the field (and . is probably only matching non-newline characters) java regexes support embedded flags (ie: "(?xyz)your regex") so you might try that (i don't remember what the correct modifier flag is for the multiline mode off the top of my head) -Hoss
Re: Realtime + Batch indexing
Hi, Thanks a lot for your reply. As you suggested the best option is to have another core started up at same / different port and use shards for distributed search. I had also thought of another approach where I would be writing the real time data to both master and slave hence it will be available at slave when user is searching, also it would be present after replication of master to slave. Do you think my suggestion would work? I would surely go for shards but for time being I am planning to implement the 2nd approach as we need to make changes to the UI code if we are going for shards. Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p953410.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Realtime + Batch indexing
No, this second part will not work. Lucene creates new index files independent of when and what you index. So copying files from one indexer to another will never work: the indexes will be out of sync. You don't have to change your UI to use distributed search. You can add a new that fowards requests to the other shards (with different URLs!). shard1,shard2 Now, solr/broker?q=word goes to shard1/solr?q=word and shard2/solr?q=word with no UI changes. I usually make a new core to broker these sharded queries. It makes it easier to track what I'm doing. On Thu, Jul 8, 2010 at 7:22 PM, bbarani wrote: > > Hi, > > Thanks a lot for your reply. > > As you suggested the best option is to have another core started up at same > / different port and use shards for distributed search. > > I had also thought of another approach where I would be writing the real > time data to both master and slave hence it will be available at slave when > user is searching, also it would be present after replication of master to > slave. > > Do you think my suggestion would work? I would surely go for shards but for > time being I am planning to implement the 2nd approach as we need to make > changes to the UI code if we are going for shards. > > Thanks, > BB > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p953410.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com
making rotating timestamped logs from solr output
Hello, I would like to log the solr console. although solr logs requests in timestamped format, this only logs the requests, i.e. does not log number of hits for a given query, etc. is there any easy way to do this other then reverting to methods for capturing solr output. I usually run solr on my server using screen command first, running solr, then detaching from console. but it would be nice to have output logging instead of request logging. best regards, c.b.
Re: Realtime + Batch indexing
Thanks a ton for your reply.. Your suggestion always helped me out :) Your inputs on configuring shards via SOLR config would help us a lot!!! One final question about replication.. When I initiate replication I thought SOLR would delete the existing index in slave and just transfers the master index in to Slave. If thats the case there wont be any sync up issues right? I am asking this because everytime I initiate replication the index size of both slave and master becomes the same (even if for some reason if index size of slave is bigger than master it gets reduced to the same size as master after replication) so thought that SOLR just deletes the slave index and then moves all the files from master.. Again, thanks for your help Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p953590.html Sent from the Solr - User mailing list archive at Nabble.com.