date:20130727

Re: Synonym Phrase

2013-07-27 Thread Mikhail Khludnev

Hello, As far as I know http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ has some usage in the industry. On Fri, Jul 26, 2013 at 8:28 PM, Jack Krupansky wrote: > Hmmm... Actually, I think there was also a solution where you could > specify an alternate tokenizer for the synony

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Mikhail Khludnev

Otis, You gave links to 'deep paging' when I asked about response streaming. Let me understand. From my POV, deep paging is a special case for regular search scenarios. We definitely need it in Solr. However, if we are talking about data analytic like problems, when we need to select an "endless" s

Re: processing documents in solr

2013-07-27 Thread Roman Chyla

Dear list, I'vw written a special processor exactly for this kind of operations https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs/src/java/org/apache/solr/handler/batch This is how we use it http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/SearchEngineBatch It is capable of

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Roman Chyla

Mikhail, If your solution gives lazy loading of solr docs /and thus streaming of huge result lists/ it should be big YES! Roman On 27 Jul 2013 07:55, "Mikhail Khludnev" wrote: > Otis, > You gave links to 'deep paging' when I asked about response streaming. > Let me understand. From my POV, deep p

RE: How to Make That Domains Should Be First?

2013-07-27 Thread Markus Jelsma

Hi - To make this work you'll need a homepage flag and some specific hostname analysis and function query boosting. I assume you're still using Nutch so getting detecting homepages is easy using NUTCH-1325. To actually get the homepage flag in Solr you need to modify the indexer to ingest the Ho

Re: problems about solr replication in 4.3

2013-07-27 Thread Erick Erickson

Well, a full import is going to re-import everything in the database, and the presumption is that each every document would be replaced (because presumably you're is the same). So every document will be deleted and re-added. So essentially you'll get a completely new index every time. In 3.6 are

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-27 Thread Erick Erickson

What is your autocommit limit? Is it possible that your transaction logs are simply getting too large? tlogs are truncated whenever you do a hard commit (autocommit) with openSearcher either true for false it doesn't matter. FWIW, Erick On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourt wro

Re: Solr-4663 - Alternatives to use same data dir in different cores for optimal cache performance

2013-07-27 Thread Erick Erickson

You can certainly have multiple Solrs pointing to the same underlying physical index if (and only if) you absolutely guarantee that only one Solr will write to the index at a time. But I'm not sure this is premature optimization or not. Problem is that your multiple Solrs are eating up the same ph

Re: Querying a specific core in solr cloud

2013-07-27 Thread Erick Erickson

Not quite sure what's happening here. It would be interesting to see whether the requests are actually going to the right IP, by tailing out the logs. It _may_ be that the &distrib=false isn't honored if there is no core on the target machine (I haven't looked at the code). To test that, go ahead

Re: Sending shard requests to all replicas

2013-07-27 Thread Erick Erickson

This has been suggested, but so far it's not been implemented as far as I know. I'm curious though, how many shards are you dealing with? I wonder if it would be a better idea to try to figure out _why_ you so often have a slow shard and whether the problem could be cured with, say, better warming

Re: processing documents in solr

2013-07-27 Thread Joe Zhang

Thanks for sharing, Roman. I'll look into your code. One more thought on your suggestion, Shawn. In fact, for the id, we need more than "unique" and "rangeable"; we also need some sense of atomic values. Your approach might run into risk with a text-based id field, say: the id/key has values 'a',

Re: processing documents in solr

2013-07-27 Thread Shawn Heisey

On 7/27/2013 11:17 AM, Joe Zhang wrote: > Thanks for sharing, Roman. I'll look into your code. > > One more thought on your suggestion, Shawn. In fact, for the id, we need > more than "unique" and "rangeable"; we also need some sense of atomic > values. Your approach might run into risk with a tex

Re: processing documents in solr

2013-07-27 Thread Joe Zhang

I have a constantly growing index, so not updating the index can't be practical... Going back to the beginning of this thread: when we use the vanilla "*:*"+pagination approach, would the ordering of documents remain stable? the index is dynamic: update/insertion only, no deletion. On Sat,

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Mikhail Khludnev

Roman, Let me briefly explain the design special RequestParser stores servlet output stream into the context https://github.com/m-khl/solr-patches/compare/streaming#L7R22 then special component injects special PostFilter/DelegatingCollector which writes right into output https://github.com/m-kh

Re: processing documents in solr

2013-07-27 Thread Shawn Heisey

On 7/27/2013 11:38 AM, Joe Zhang wrote: > I have a constantly growing index, so not updating the index can't be > practical... > > Going back to the beginning of this thread: when we use the vanilla > "*:*"+pagination approach, would the ordering of documents remain stable? > the index is dyn

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-27 Thread Tim Vaillancourt

Thanks for the reply Erick, Hard Commit - 15000ms, openSearcher=false Soft Commit - 1000ms, openSearcher=true 15sec hard commit was sort of a guess, I could try a smaller number. When you say "getting too large" what limit do you think it would be hitting: a ulimit (nofiles), disk space, numbe

Early Access Release #4 for Solr 4.x Deep Dive book is now available for download on Lulu.com

2013-07-27 Thread Jack Krupansky

Okay, it’s hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #4 is now available for purchase and download as an e-book for $9.99 on Lulu.com at: http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html (That link says “1”, but

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Roman Chyla

Hi Mikhail, I can see it is lazy-loading, but I can't judge how much complex it becomes (presumably, the filter dispatching mechanism is doing also other things - it is there not only for streaming). Let me just explain better what I found when I dug inside solr: documents (results of the query)

Re: processing documents in solr

2013-07-27 Thread Roman Chyla

On Sat, Jul 27, 2013 at 4:17 PM, Shawn Heisey wrote: > On 7/27/2013 11:38 AM, Joe Zhang wrote: > > I have a constantly growing index, so not updating the index can't be > > practical... > > > > Going back to the beginning of this thread: when we use the vanilla > > "*:*"+pagination approach, woul

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-27 Thread Jack Krupansky

No hard numbers, but the general guidance is that you should set your hard commit interval to match your expectations for how quickly nodes should come up if they need to be restarted. Specifically, a hard commit assures that all changes have been committed to disk and are ready for immediate ac

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Yonik Seeley

On Sat, Jul 27, 2013 at 4:30 PM, Roman Chyla wrote: > Let me just explain better what I found when I dug inside solr: documents > (results of the query) are loaded before they are passed into a writer - so > the writers are expecting to encounter the solr documents, but these > documents were load

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Mikhail Khludnev

Hello, Please find below > Let me just explain better what I found when I dug inside solr: documents > (results of the query) are loaded before they are passed into a writer - so > the writers are expecting to encounter the solr documents, but these > documents were loaded by one of the componen

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-27 Thread Erick Erickson

Tim: 15 seconds isn't unreasonable, I was mostly wondering if it was hours. Take a look at the size of the tlogs as you're indexing, you should see them truncate every 15 seconds or so. There'll be a varying number of tlogs kept around, although under heavy indexing I'd only expect 1 or 2 inactiv

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Yonik Seeley

On Sat, Jul 27, 2013 at 5:05 PM, Mikhail Khludnev wrote: > anyway, even if writer pulls docs one by one, it doesn't allow to stream a > billion of them. Solr writes out DocList, which is really problematic even > in deep-paging scenarios. Which part is problematic... the creation of the DocList (

Re: Sending shard requests to all replicas

2013-07-27 Thread Isaac Hebsh

Hi Erick, thanks. I have about 40 shards. repFactor=2. The cause of slower shards is very interesting, and this is the main approach we took. Note that in every query, it is another shard which is the slowest. In 20% of the queries, the slowest shard takes about 4 times more than the average shard

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-27 Thread Tim Vaillancourt

Thanks Jack/Erick, I don't know if this is true or not, but I've read there is a tlog per soft commit, which is then truncated by the hard commit. If this were true, a 15sec hard-commit with a 1sec soft-commit could generate around 15~ tlogs, but I've never checked. I like Erick's scenario mor

Re: Sending shard requests to all replicas

2013-07-27 Thread Shawn Heisey

On 7/27/2013 3:33 PM, Isaac Hebsh wrote: > I have about 40 shards. repFactor=2. > The cause of slower shards is very interesting, and this is the main > approach we took. > Note that in every query, it is another shard which is the slowest. In 20% > of the queries, the slowest shard takes about 4 t

Re: Solr 4.3.1 only accepts UTF-8 encoded queries?

2013-07-27 Thread Shawn Heisey

On 7/26/2013 2:03 PM, Gustav wrote: > The problem here is that in my client's application, the query beign encoded > in iso-8859-1 its a *must*. So, this is kind of a trouble here. > I just dont get how this encoding could work on queries in version 3.5, but > it doesnt in 4.3. I brought up the is

Re: Sending shard requests to all replicas

2013-07-27 Thread Isaac Hebsh

Shawn, thank you for the tips. I know the significant cons of virtualization, but I don't want to move this thread into a virtualization pros/cons in the Solr(Cloud) case. I've just asked what is the minimal code change should be made, in order to examine whether this is a possible solution or not

Searching in stopwords

2013-07-27 Thread Rohit Kumar

I have a company search which uses stopwords during quezary time. In my stopwords list i have entries like : HR Club India Pvt. Ltd. So if i search for companies like HR Club i get no results. Similarly search for India HR giving no results. How can i get results in query for following comp

Re: Searching in stopwords

2013-07-27 Thread Jack Krupansky

Edismax should be able to handle a query consisting of only query-time stop words. What does your text field type analyzer look like? -- Jack Krupansky -Original Message- From: Rohit Kumar Sent: Saturday, July 27, 2013 9:59 PM To: solr-user@lucene.apache.org Subject: Searching in sto

Re: processing documents in solr

2013-07-27 Thread Maurizio Cucchiara

In both cases, for better performance, first I'd load just all the IDs, after, during processing I'd load each document. For what concern the incremental requirement, it should not be difficult to write an hash function which maps a non-numerical I'd to a value. On Jul 27, 2013 7:03 AM, "Joe Zhang

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Mikhail Khludnev

On Sun, Jul 28, 2013 at 1:25 AM, Yonik Seeley wrote: > > Which part is problematic... the creation of the DocList (the search), > Literally DocList is a copy of TopDocs. Creating TopDocs is not a search, but ranking. And ranking costs is log(rows+start) beside of numFound, which the search takes.

Re: Synonym Phrase

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Re: processing documents in solr

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

RE: How to Make That Domains Should Be First?

Re: problems about solr replication in 4.3

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

Re: Solr-4663 - Alternatives to use same data dir in different cores for optimal cache performance

Re: Querying a specific core in solr cloud

Re: Sending shard requests to all replicas

Re: processing documents in solr

Re: processing documents in solr

Re: processing documents in solr

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Re: processing documents in solr

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

Early Access Release #4 for Solr 4.x Deep Dive book is now available for download on Lulu.com

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Re: processing documents in solr

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Re: Sending shard requests to all replicas

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

Re: Sending shard requests to all replicas

Re: Solr 4.3.1 only accepts UTF-8 encoded queries?

Re: Sending shard requests to all replicas

Searching in stopwords

Re: Searching in stopwords

Re: processing documents in solr

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

33 matches

Site Navigation

Mail list logo

Footer information