subject:"Re\: solr keep old docs"

Re: solr keep old docs

2011-12-29 Thread Alexander Aristov

well. The first results are ready. I have implemented custom update processor following your suggestion using low level index reader and termdocs. Launched scripts which add about 10 000 docs. Indexing took about 1 minute including commit that is quite good for me. I don't have larger datasets so

Re: solr keep old docs

2011-12-29 Thread Erick Erickson

I'd guess it would be much faster, assuming that the search savings wouldn't be swamped by the additional transmission time over the wire and parsing the request (although SolrJ uses a binary format, so parsing request probably isn't all that expensive). You could even do a hybrid approach. Pack u

Re: solr keep old docs

2011-12-29 Thread Alexander Aristov

I have never developed for solr yet and don't know much internals but Today I have tried one approach with searcher. In my update processor I get searcher and search for ID. It works but I need to load test it. Will index traversal be faster (less resource consuming) than search? Best Regards Ale

Re: solr keep old docs

2011-12-29 Thread Erick Erickson

Hmmm, we're not communicating ... The update processor wouldn't search in the classic sense. It would just use lower-level index traversal to determine if the doc (identified by your unique key) was already in the index and skip indexing that document if it was. No real *searching* involved (see T

Re: solr keep old docs

2011-12-28 Thread Mikhail Khludnev

Alexander, I have two ideas how to implement fast dedupe externally, assuming your PKs don't fit to java.util.*Map: - your crawler can use inprocess RDBMS (Derby, H2) to track dupes; - if your crawler is stateless - it doesn't track PKs which has been already crawled, you can retrieve it

Re: solr keep old docs

2011-12-28 Thread Alexander Aristov

Yes I have been warned that query index each time before adding doc to index might be resource consuming. Will check it. As for the overwrite parameter I think the name is not the best then. People outside the "business" like me misuse it and assume what I wrote. Overwrite shall mean what it means

Re: solr keep old docs

2011-12-28 Thread Alexander Aristov

Unfortunately I have a lot of duplicates and taking that searching might suffer I will try with implementing update procesor. But your idea is interesting and I will consider it, thanks. Best Regards Alexander Aristov On 28 December 2011 19:12, Tanguy Moal wrote: > Hello Alexander, > > I don

Re: solr keep old docs

2011-12-28 Thread Chris Hostetter

: That said, writing your own update request handler : that detected this case isn't very difficult, : extend UpdateRequestProcessorFactory/UpdateRequestProcessor : and use it as a plugin. i can't find the thread at the moment, but the general issue that has caused people headaches with this typ

Re: solr keep old docs

2011-12-28 Thread Tanguy Moal

Hello Alexander, I don't know much about your requirements in terms of size and performances, but I've had a similar use case and found a pretty simple workaround. If your duplicate rate is not too high, you can have the SignatureProcessor to generate fingerprint of documents (you already did

Re: solr keep old docs

2011-12-28 Thread Alexander Aristov

Thanks Eric, it sets me direction. I will be writing new plugin and will get back to the dev forum with results and then we will decide next steps. Best Regards Alexander Aristov On 28 December 2011 18:08, Erick Erickson wrote: > Well, the short answer is that nobody else has > 1> had a simil

Re: solr keep old docs

2011-12-28 Thread Erick Erickson

Well, the short answer is that nobody else has 1> had a similar requirement AND 2> not found a suitable work around AND 3> implemented the change and contributed it back. So, if you'd like to volunteer . Seriously. If you think this would be valuable and are willing to work on it, hop on over

Re: solr keep old docs

2011-12-28 Thread Alexander Aristov

the problem with dedupe (SignatureUpdateProcessor ) is that it REPLACES old docs. I have tried it already. Best Regards Alexander Aristov On 28 December 2011 13:04, Lance Norskog wrote: > The SignatureUpdateProcessor is for exactly this problem: > > > http://www.lucidimagination.com/search/lin

Re: solr keep old docs

2011-12-28 Thread Lance Norskog

The SignatureUpdateProcessor is for exactly this problem: http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/Deduplication On Tue, Dec 27, 2011 at 10:42 PM, Alexander Aristov wrote: > I get docs from external sources and the only place I keep them is solr > index. I have

Re: solr keep old docs

2011-12-27 Thread Alexander Aristov

I get docs from external sources and the only place I keep them is solr index. I have no a database or other means to track indexed docs (my personal oppinion is that it might be a huge headache). Some docs might change slightly in there original sources but I don't need that changes. In fact I ne

Re: solr keep old docs

2011-12-27 Thread Erick Erickson

Mikhail is right as far as I know, the assumption built into Solr is that duplicate IDs (when is defined) should trigger the old document to be replaced. what is your system-of-record? By that I mean what does your SolrJ program do to send data to Solr? Is there any way you could just *not* send

Re: solr keep old docs

2011-12-27 Thread Alexander Aristov

Hi I am not using database. All needed data is in solr index that's why I want to skip excessive checks. I will check DIH but not sure if it helps. I am fluent with Java and it's not a problem for me to write a class or so but I want to check first maybe there are any ways (workarounds) to make

Re: solr keep old docs

2011-12-26 Thread Mikhail Khludnev

On Tue, Dec 27, 2011 at 12:26 AM, Alexander Aristov < alexander.aris...@gmail.com> wrote: > Hi people, > > I urgently need your help! > > I have solr 3.3 configured and running. I do uncremental indexing 4 times a > day using bulk updates. Some documents are identical to some extent and I > wish t

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

Re: solr keep old docs

17 matches

Site Navigation

Mail list logo

Footer information