Hello,
As far as I know
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ has some
usage in the industry.
On Fri, Jul 26, 2013 at 8:28 PM, Jack Krupansky wrote:
> Hmmm... Actually, I think there was also a solution where you could
> specify an alternate tokenizer for the synony
Otis,
You gave links to 'deep paging' when I asked about response streaming.
Let me understand. From my POV, deep paging is a special case for regular
search scenarios. We definitely need it in Solr. However, if we are talking
about data analytic like problems, when we need to select an "endless"
s
Dear list,
I'vw written a special processor exactly for this kind of operations
https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs/src/java/org/apache/solr/handler/batch
This is how we use it
http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/SearchEngineBatch
It is capable of
Mikhail,
If your solution gives lazy loading of solr docs /and thus streaming of
huge result lists/ it should be big YES!
Roman
On 27 Jul 2013 07:55, "Mikhail Khludnev" wrote:
> Otis,
> You gave links to 'deep paging' when I asked about response streaming.
> Let me understand. From my POV, deep p
Hi - To make this work you'll need a homepage flag and some specific hostname
analysis and function query boosting. I assume you're still using Nutch so
getting detecting homepages is easy using NUTCH-1325. To actually get the
homepage flag in Solr you need to modify the indexer to ingest the Ho
Well, a full import is going to re-import everything in the database, and the
presumption is that each every document would be replaced (because
presumably you're is the same). So every document
will be deleted and re-added. So essentially you'll get a completely
new index every time.
In 3.6 are
What is your autocommit limit? Is it possible that your transaction
logs are simply getting too large? tlogs are truncated whenever
you do a hard commit (autocommit) with openSearcher either
true for false it doesn't matter.
FWIW,
Erick
On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourt
wro
You can certainly have multiple Solrs pointing to the same
underlying physical index if (and only if) you absolutely
guarantee that only one Solr will write to the index at a
time.
But I'm not sure this is premature optimization or not. Problem
is that your multiple Solrs are eating up the same ph
Not quite sure what's happening here. It would be interesting to
see whether the requests are actually going to the right IP, by
tailing out the logs.
It _may_ be that the &distrib=false isn't honored if there is no core
on the target machine (I haven't looked at the code). To test that,
go ahead
This has been suggested, but so far it's not been implemented
as far as I know.
I'm curious though, how many shards are you dealing with? I
wonder if it would be a better idea to try to figure out _why_
you so often have a slow shard and whether the problem could
be cured with, say, better warming
Thanks for sharing, Roman. I'll look into your code.
One more thought on your suggestion, Shawn. In fact, for the id, we need
more than "unique" and "rangeable"; we also need some sense of atomic
values. Your approach might run into risk with a text-based id field, say:
the id/key has values 'a',
On 7/27/2013 11:17 AM, Joe Zhang wrote:
> Thanks for sharing, Roman. I'll look into your code.
>
> One more thought on your suggestion, Shawn. In fact, for the id, we need
> more than "unique" and "rangeable"; we also need some sense of atomic
> values. Your approach might run into risk with a tex
I have a constantly growing index, so not updating the index can't be
practical...
Going back to the beginning of this thread: when we use the vanilla
"*:*"+pagination approach, would the ordering of documents remain stable?
the index is dynamic: update/insertion only, no deletion.
On Sat,
Roman,
Let me briefly explain the design
special RequestParser stores servlet output stream into the context
https://github.com/m-khl/solr-patches/compare/streaming#L7R22
then special component injects special PostFilter/DelegatingCollector which
writes right into output
https://github.com/m-kh
On 7/27/2013 11:38 AM, Joe Zhang wrote:
> I have a constantly growing index, so not updating the index can't be
> practical...
>
> Going back to the beginning of this thread: when we use the vanilla
> "*:*"+pagination approach, would the ordering of documents remain stable?
> the index is dyn
Thanks for the reply Erick,
Hard Commit - 15000ms, openSearcher=false
Soft Commit - 1000ms, openSearcher=true
15sec hard commit was sort of a guess, I could try a smaller number.
When you say "getting too large" what limit do you think it would be
hitting: a ulimit (nofiles), disk space, numbe
Okay, it’s hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #4
is now available for purchase and download as an e-book for $9.99 on Lulu.com
at:
http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html
(That link says “1”, but
Hi Mikhail,
I can see it is lazy-loading, but I can't judge how much complex it becomes
(presumably, the filter dispatching mechanism is doing also other things -
it is there not only for streaming).
Let me just explain better what I found when I dug inside solr: documents
(results of the query)
On Sat, Jul 27, 2013 at 4:17 PM, Shawn Heisey wrote:
> On 7/27/2013 11:38 AM, Joe Zhang wrote:
> > I have a constantly growing index, so not updating the index can't be
> > practical...
> >
> > Going back to the beginning of this thread: when we use the vanilla
> > "*:*"+pagination approach, woul
No hard numbers, but the general guidance is that you should set your hard
commit interval to match your expectations for how quickly nodes should come
up if they need to be restarted. Specifically, a hard commit assures that
all changes have been committed to disk and are ready for immediate ac
On Sat, Jul 27, 2013 at 4:30 PM, Roman Chyla wrote:
> Let me just explain better what I found when I dug inside solr: documents
> (results of the query) are loaded before they are passed into a writer - so
> the writers are expecting to encounter the solr documents, but these
> documents were load
Hello,
Please find below
> Let me just explain better what I found when I dug inside solr: documents
> (results of the query) are loaded before they are passed into a writer - so
> the writers are expecting to encounter the solr documents, but these
> documents were loaded by one of the componen
Tim:
15 seconds isn't unreasonable, I was mostly wondering if it was hours.
Take a look at the size of the tlogs as you're indexing, you should see them
truncate every 15 seconds or so. There'll be a varying number of tlogs kept
around, although under heavy indexing I'd only expect 1 or 2 inactiv
On Sat, Jul 27, 2013 at 5:05 PM, Mikhail Khludnev
wrote:
> anyway, even if writer pulls docs one by one, it doesn't allow to stream a
> billion of them. Solr writes out DocList, which is really problematic even
> in deep-paging scenarios.
Which part is problematic... the creation of the DocList (
Hi Erick, thanks.
I have about 40 shards. repFactor=2.
The cause of slower shards is very interesting, and this is the main
approach we took.
Note that in every query, it is another shard which is the slowest. In 20%
of the queries, the slowest shard takes about 4 times more than the average
shard
Thanks Jack/Erick,
I don't know if this is true or not, but I've read there is a tlog per
soft commit, which is then truncated by the hard commit. If this were
true, a 15sec hard-commit with a 1sec soft-commit could generate around
15~ tlogs, but I've never checked. I like Erick's scenario mor
On 7/27/2013 3:33 PM, Isaac Hebsh wrote:
> I have about 40 shards. repFactor=2.
> The cause of slower shards is very interesting, and this is the main
> approach we took.
> Note that in every query, it is another shard which is the slowest. In 20%
> of the queries, the slowest shard takes about 4 t
On 7/26/2013 2:03 PM, Gustav wrote:
> The problem here is that in my client's application, the query beign encoded
> in iso-8859-1 its a *must*. So, this is kind of a trouble here.
> I just dont get how this encoding could work on queries in version 3.5, but
> it doesnt in 4.3.
I brought up the is
Shawn, thank you for the tips.
I know the significant cons of virtualization, but I don't want to move
this thread into a virtualization pros/cons in the Solr(Cloud) case.
I've just asked what is the minimal code change should be made, in order to
examine whether this is a possible solution or not
I have a company search which uses stopwords during quezary time. In my
stopwords list i have entries like :
HR
Club
India
Pvt.
Ltd.
So if i search for companies like HR Club i get no results. Similarly
search for India HR giving no results. How can i get results in query for
following comp
Edismax should be able to handle a query consisting of only query-time stop
words.
What does your text field type analyzer look like?
-- Jack Krupansky
-Original Message-
From: Rohit Kumar
Sent: Saturday, July 27, 2013 9:59 PM
To: solr-user@lucene.apache.org
Subject: Searching in sto
In both cases, for better performance, first I'd load just all the IDs,
after, during processing I'd load each document.
For what concern the incremental requirement, it should not be difficult to
write an hash function which maps a non-numerical I'd to a value.
On Jul 27, 2013 7:03 AM, "Joe Zhang
On Sun, Jul 28, 2013 at 1:25 AM, Yonik Seeley wrote:
>
> Which part is problematic... the creation of the DocList (the search),
>
Literally DocList is a copy of TopDocs. Creating TopDocs is not a search,
but ranking.
And ranking costs is log(rows+start) beside of numFound, which the search
takes.
33 matches
Mail list logo