Re: Query & result caching with custom functions

2013-11-24 Thread Mathias Lux
Hi Joel,

I just tested with custom equals and hashcode ... what I basically did
is that I created a string object based on all the function values and
used this for the equals (with an instanceof) and for the hash method.

The result was quite the same as before, all treszults are cashed
unless I set the queryResultCache size to 0 in the solrconfig.xml

cheers,
Mathias

On Thu, Oct 24, 2013 at 4:51 PM, Joel Bernstein  wrote:
> Mathias,
>
> I'd have to do a close review of the function sort code to be sure, but I
> suspect if you implement the equals() method on the ValueSource it should
> solve your caching issue. Also implement hashCode().
>
> Joel
>
>
> On Thu, Oct 24, 2013 at 10:35 AM, Shawn Heisey  wrote:
>
>> On 10/24/2013 5:35 AM, Mathias Lux wrote:
>> > I've written a custom function, which is able to provide a distance
>> > based on some DocValues to re-sort result lists. This basically works
>> > great, but we've got the problem that if I don't change the query, but
>> > the function parameters, Solr delivers a cached result without
>> > re-ordering. I turned off caching and see there, problem solved. But
>> > of course this is not a avenue I want to pursue further as it doesn't
>> > make sense for a prodcutive system.
>> >
>> > Do you have any ideas (beyond fake query modification and turning off
>> > caching) to counteract?
>> >
>> > btw. I'm using Solr 4.4 (so if you are aware of the issue and it has
>> > been resolved in 4.5 I'll port it :) The code I'm using is at
>> > https://bitbucket.org/dermotte/liresolr
>>
>> I suspect that the queryResultCache is not paying attention to the fact
>> that parameters for your plugin have changed.  This probably means that
>> your plugin must somehow inform the "cache check" code that something
>> HAS changed.
>>
>> How you actually do this is a mystery to me because it involves parts of
>> the code that are beyond my understanding, but it MIGHT involve making
>> sure that parameters related to your code are saved as part of the entry
>> that goes into the cache.
>>
>> Thanks,
>> Shawn
>>
>>



-- 
PD Dr. Mathias Lux
Associate Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


Re: Solrcloud: external fields and frequent commits

2013-11-24 Thread Erick Erickson
Long blog post on commits and the state of updates here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

hdfs is perfectly fine with Solr, there's even an HdfsDirectoryFactory for
your index. It has its own
performance characteristics/tuning parameters, so there'll be something of
a learning curve.

Best
Erick


On Sat, Nov 23, 2013 at 4:14 AM, Flavio Pompermaier wrote:

> Thanks again for such a detailed description.
> In our use case we're going to save shards data on hdfs so they all have
> access to a shared location, it would be great to put such a file in one
> place in that case :)
> Do you think that using hdfs as storage is bad for performance?
> Last question: if I softCommit and I have to shutdown my tomcat, will data
> be commited to disk or do I have to annually force a commit before shutting
> down?
>
> Best,
> Flavio
>
> On Sat, Nov 23, 2013 at 2:01 AM, Erick Erickson  >wrote:
>
> > about <1>. Well, at a high level you're right, of course.
> > Having the EFF stuff in a single place seems more elegant. But
> > then ugly details crop up. I.e. "one place" implies that you'd have
> > to fetch them over the network, potentially a very expensive
> > operation every time there was a commit. Is this really a good
> > tradeoff? With high network latency, this could be a performance
> > killer. But I suspect that the real reason is that nobody has found
> > a compelling use-case for this kind of thing. Until and unless
> > someone does, and is willing to make a patch, it'll be theory :).
> >
> > bq:  modifications also sent to replicas
> > with this kind of commits
> >
> > brief review:
> >
> > Update process:
> > 1> Update goes to a node.
> > 2> node forwards to all leaders
> > 3> leader forward to replicas
> > 4> replicas respond to their leader.
> > 5> leader responds to originating node.
> > 6> originating node responds to caller.
> >
> > At this point all the replicas for your entire cluster have the
> > update. This is entirely independent of commits. Whenever a
> > commit is issued the documents currently pending on a node
> > are committed and made visible to a searcher.
> >
> > If one is relying on solrconfig settings, then the commit happens
> > a little bit out of synch. Let's say that the commit (hard with
> > opensearcher=true or soft) is set to 60 seconds. Each node may
> > have a different commit time, depending upon when it was started.
> > So there may be a slight difference in when documents are visible.
> > You'll probably never notice.
> >
> > If you issue commits from a client, then the commit is propagated
> > to all nodes in the cluster.
> >
> > HTH,
> > Erick
> >
> >
> > On Fri, Nov 22, 2013 at 7:23 PM, Flavio Pompermaier <
> pomperma...@okkam.it
> > >wrote:
> >
> > > On Fri, Nov 22, 2013 at 2:21 PM, Erick Erickson <
> erickerick...@gmail.com
> > > >wrote:
> > >
> > > > 1> I'm not quite sure I understand. External File Fields are keyed
> > > > by the unique id of the doc. So every shard _must_ have the
> > > > eff available for at least the documents in that shard. At first
> glance
> > > > this doesn't look simple. Perhaps a bit more explanation of what
> > > > you're using EFF for?
> > > >
> > > Thanks Erick for the reply, I use EFF for boosting results by
> popularity.
> > > So I was right, I should put popularity in every shard data dir..right?
> > But
> > > why not keeping that file in just one place (obviously the file should
> be
> > > reachable by all solrcloud nodes...) and allow external fields to be
> > > outside data dir?
> > >
> > > >
> > > > 2> Let's be sure we're talking about the same thing here. In Solr,
> > > > a "commit" is the command that makes documents visible, often
> > > > controlled by the autoCommit and autoSoftCommit settings in
> > > > solrconfig.xml. You will not be able to issue 100 commits/second.
> > > >
> > > > If you're using "commit" to mean adding a document to the index,
> > > > then 100/s should be no problem. I regularly see many times that
> > > > ingestion rate. The documents won't be visible to search until
> > > > you do a commit however.
> > > >
> > > Yeah, now it is more clear. Still a question: for my client is not a
> > > problem to soft commit but, are the modifications also sent to replicas
> > > with this kind of commits?
> > >
> > > >
> > > > Best
> > > > Erick
> > > >
> > > >
> > > > On Fri, Nov 22, 2013 at 4:44 AM, Flavio Pompermaier <
> > > pomperma...@okkam.it
> > > > >wrote:
> > > >
> > > > > Hi to all,
> > > > > we're migrating from solr 3.x to solr 4.x to use Solrcloud and I
> have
> > > two
> > > > > big doubts:
> > > > >
> > > > > 1) External fields. When I compute such a file do I have to copy it
> > in
> > > > the
> > > > >  data directory of shards..? The external fields boosts the results
> > of
> > > > the
> > > > > query to a specific collection, for me it doesn't make sense to put
> > it
> > > in
> > > > > all shard's data dir, it should be something related t

Re: building custom cache - using lucene docids

2013-11-24 Thread Erick Erickson
bq: Do i understand you correctly that when two segmets get merged, the
docids
(of the original segments) remain the same?

The original segments are unchanged, segments are _never_ changed after
they're closed. But they'll be thrown away. Say you have segment1 and
segment2 that get merged into segment3. As soon as the last searcher
that is looking at segment1 and segment2 is closed, those two segments
will be deleted from your disk.

But for any given doc, the docid in segment3 will very likely be different
than it was in segment1 or 2.

I think you're reading too much into LUCENE-2897. I'm pretty sure the
segment in question is not available to you anyway before this rewrite is
done,
but freely admit I don't know much about it.

You're probably going to get into the whole PerSegment family of operations,
which is something I'm not all that familiar with so I'll leave
explanations
to others.


On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla  wrote:

> Hi Erick,
> Many thanks for the info. An additional question:
>
> Do i understand you correctly that when two segmets get merged, the docids
> (of the original segments) remain the same?
>
> (unless, perhaps in situation, they were merged using the last index
> segment which was opened for writing and where the docids could have
> suddenly changed in a commit just before the merge)
>
> Yes, you guessed right that I am putting my code into the custom cache - so
> it gets notified on index changes. I don't know yet how, but I think I can
> find the way to the current active, opened (last) index segment. Which is
> actively updated (as opposed to just being merged) -- so my definition of
> 'not last ones' is: where docids don't change. I'd be grateful if someone
> could spot any problem with such assumption.
>
> roman
>
>
>
>
> On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson  >wrote:
>
> > bq: But can I assume
> > that docids in other segments (other than the last one) will be
> relatively
> > stable?
> >
> > Kinda. Maybe. Maybe not. It depends on how you define "other than the
> > last one".
> >
> > The key is that the internal doc IDs may change when segments are
> > merged. And old segments get merged. Doc IDs will _never_ change
> > in a segment once it's closed (although as you note they may be
> > marked as deleted). But that segment may be written to a new segment
> > when merging and the internal ID for a given document in the new
> > segment bears no relationship to internal ID in the old segment.
> >
> > BTW, I think you only really care when opening a new searchers. There is
> > a UserCache (see solrconfig.xml) that gets notified when a new searcher
> > is being opened to give it an opportunity to refresh itself, is that
> > useful?
> >
> > As long as a searcher is open, it's guaranteed that nothing is changing.
> > Hard commits with openSearcher=false don't open new searchers, which
> > is why changes aren't visible until a softCommit or a hard commit with
> > openSearcher=true despite the fact that the segments are closed.
> >
> > FWIW,
> > Erick
> >
> > Best
> > Erick
> >
> >
> >
> > On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla 
> > wrote:
> >
> > > Hi,
> > > docids are 'ephemeral', but i'd still like to build a search cache with
> > > them (they allow for the fastest joins).
> > >
> > > i'm seeing docids keep changing with updates (especially, in the last
> > index
> > > segment) - as per
> > > https://issues.apache.org/jira/browse/LUCENE-2897
> > >
> > > That would be fine, because i could build the cache from diff (of index
> > > state) + reading the latest index segment in its entirety. But can I
> > assume
> > > that docids in other segments (other than the last one) will be
> > relatively
> > > stable? (ie. when an old doc is deleted, the docid is marked as
> removed;
> > > update doc = delete old & create a new docid)?
> > >
> > > thanks
> > >
> > > roman
> > >
> >
>


Re: building custom cache - using lucene docids

2013-11-24 Thread Jack Krupansky
We should probably talk about "internal" Lucene document IDs and "external" 
or "rebased" Lucene document IDs. The internal document IDs are always 
"per-segment" and never, ever change for that closed segment. But... the 
application would not normally see these IDs. Usually the externally visible 
Lucene document IDs have been "rebased" to add the sum total count of 
documents (both existing and deleted) of all preceding segments to the 
document IDs of a given segment, producing a "global" (across the full index 
of all segments) Lucene document ID.


So, if you have those three segments, with deleted documents in the first 
two segments, and then merge those first two segments, the 
externally-visible Lucene document IDs for the third segment will suddenly 
all be different, shifted lower by the number of deleted documents that were 
just merged away, even though nothing changed in the third segment itself.


Maybe these should be called "local" (to the segment) Lucene document IDs 
and "global" (across all segment) Lucene document IDs. Or, maybe internal 
vs. external is good enough.


In short, it is completely safe to use and save Lucene document IDs, but 
only as long as no merging of segments is performed. Even one tiny merge and 
all subsequent saved document IDs are invalidated. Be careful with your 
merge policy - normally merges are happening in the background, 
automatically.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Sunday, November 24, 2013 8:31 AM
To: solr-user@lucene.apache.org
Subject: Re: building custom cache - using lucene docids

bq: Do i understand you correctly that when two segmets get merged, the
docids
(of the original segments) remain the same?

The original segments are unchanged, segments are _never_ changed after
they're closed. But they'll be thrown away. Say you have segment1 and
segment2 that get merged into segment3. As soon as the last searcher
that is looking at segment1 and segment2 is closed, those two segments
will be deleted from your disk.

But for any given doc, the docid in segment3 will very likely be different
than it was in segment1 or 2.

I think you're reading too much into LUCENE-2897. I'm pretty sure the
segment in question is not available to you anyway before this rewrite is
done,
but freely admit I don't know much about it.

You're probably going to get into the whole PerSegment family of operations,
which is something I'm not all that familiar with so I'll leave
explanations
to others.


On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla  wrote:


Hi Erick,
Many thanks for the info. An additional question:

Do i understand you correctly that when two segmets get merged, the docids
(of the original segments) remain the same?

(unless, perhaps in situation, they were merged using the last index
segment which was opened for writing and where the docids could have
suddenly changed in a commit just before the merge)

Yes, you guessed right that I am putting my code into the custom cache - 
so

it gets notified on index changes. I don't know yet how, but I think I can
find the way to the current active, opened (last) index segment. Which is
actively updated (as opposed to just being merged) -- so my definition of
'not last ones' is: where docids don't change. I'd be grateful if someone
could spot any problem with such assumption.

roman




On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson wrote:

> bq: But can I assume
> that docids in other segments (other than the last one) will be
relatively
> stable?
>
> Kinda. Maybe. Maybe not. It depends on how you define "other than the
> last one".
>
> The key is that the internal doc IDs may change when segments are
> merged. And old segments get merged. Doc IDs will _never_ change
> in a segment once it's closed (although as you note they may be
> marked as deleted). But that segment may be written to a new segment
> when merging and the internal ID for a given document in the new
> segment bears no relationship to internal ID in the old segment.
>
> BTW, I think you only really care when opening a new searchers. There is
> a UserCache (see solrconfig.xml) that gets notified when a new searcher
> is being opened to give it an opportunity to refresh itself, is that
> useful?
>
> As long as a searcher is open, it's guaranteed that nothing is changing.
> Hard commits with openSearcher=false don't open new searchers, which
> is why changes aren't visible until a softCommit or a hard commit with
> openSearcher=true despite the fact that the segments are closed.
>
> FWIW,
> Erick
>
> Best
> Erick
>
>
>
> On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla 
> wrote:
>
> > Hi,
> > docids are 'ephemeral', but i'd still like to build a search cache 
> > with

> > them (they allow for the fastest joins).
> >
> > i'm seeing docids keep changing with updates (especially, in the last
> index
> > segment) - as per
> > https://issues.apache.org/jira/browse/LUCENE-2897
> >
> > That would be fine, because i co

Commit behaviour in SolrCloud

2013-11-24 Thread adfel70
Hi everyone,

I am wondering how commit operation works in SolrCloud:
Say I have 2 parallel indexing processes. What if one process sends big
update request (an add command with a lot of docs), and the other one just
happens to send a commit command while the update request is being
processed. 
Is it possible that only part of the documents will be commited? 
What will happen with the other docs? Is Solr transactional and promise that
there will be no partial results?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commit behaviour in SolrCloud

2013-11-24 Thread Mark Miller
SolrCloud does not use commits for update acceptance promises.

The idea is, if you get a success from the update, it’s in the system, commit 
or not.

Soft Commits are used for visibility only.

Standard Hard Commits are used essentially for internal purposes and should be 
done via auto commit generally.

To your question though - it is fine to send a commit while updates are coming 
in from another source - it’s just not generally necessary to do that anyway.

- Mark

On Nov 24, 2013, at 1:01 PM, adfel70  wrote:

> Hi everyone,
> 
> I am wondering how commit operation works in SolrCloud:
> Say I have 2 parallel indexing processes. What if one process sends big
> update request (an add command with a lot of docs), and the other one just
> happens to send a commit command while the update request is being
> processed. 
> Is it possible that only part of the documents will be commited? 
> What will happen with the other docs? Is Solr transactional and promise that
> there will be no partial results?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879.html
> Sent from the Solr - User mailing list archive at Nabble.com.



How To Use Multivalued Field Payload at Boosting?

2013-11-24 Thread Furkan KAMACI
I have a multivalued field and they have payloads. How can I use that
payloads at boosting? (When user searches for a keyword and if a match
happens at that multivalued field its payload will be added it to the
general score)

PS: I use Solr 4.5.1 as Cloud.


Re: Commit behaviour in SolrCloud

2013-11-24 Thread Furkan KAMACI
I suggest you to read here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks;
Furkan KAMACI


2013/11/24 Mark Miller 

> SolrCloud does not use commits for update acceptance promises.
>
> The idea is, if you get a success from the update, it’s in the system,
> commit or not.
>
> Soft Commits are used for visibility only.
>
> Standard Hard Commits are used essentially for internal purposes and
> should be done via auto commit generally.
>
> To your question though - it is fine to send a commit while updates are
> coming in from another source - it’s just not generally necessary to do
> that anyway.
>
> - Mark
>
> On Nov 24, 2013, at 1:01 PM, adfel70  wrote:
>
> > Hi everyone,
> >
> > I am wondering how commit operation works in SolrCloud:
> > Say I have 2 parallel indexing processes. What if one process sends big
> > update request (an add command with a lot of docs), and the other one
> just
> > happens to send a commit command while the update request is being
> > processed.
> > Is it possible that only part of the documents will be commited?
> > What will happen with the other docs? Is Solr transactional and promise
> that
> > there will be no partial results?
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Reverse mm(min-should-match)

2013-11-24 Thread Mikhail Khludnev
Morning Doug,

it sounds like you can encode norm as the number of term positions in the
title (assuming it's single value).
When you search, SpanQuery can access particular positions of the matched
terms, and then compare them to the number of terms decoded from the norm.
It's sounds more like hack for solving particular problem, and I'm in
doubts regarding providing it as a general functionality.


On Fri, Nov 22, 2013 at 11:54 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
>
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
>
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
>
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
>
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
>
> mm=100%
> q=solr
>
> This will match the title above, as 100% of [solr] matches the field
>
> What I really want to get at is a reverse mm:
>
> Rmm=100%
> q=solr
>
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
>
> However an exact search would match:
>
> Rmm=100%
> q=solr the worlds greatest search engine
>
> Here 100% of the query matches the title, so I'm good.
>
> Is there any way to achieve this in Solr?
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


[ANNOUNCE] Apache Solr 4.6 released.

2013-11-24 Thread Simon Willnauer
24 November 2013, Apache Solr™ 4.6 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.6

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.6 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.6 Release Highlights:

* Many improvements and enhancements for shard splitting options

* New AnalyzingInfixLookupFactory to leverage the AnalyzingInfixSuggester

* New CollapsingQParserPlugin for high performance field collapsing on high
  cardinality fields

* New SolrJ APIs for collection management

* New DocBasedVersionConstraintsProcessorFactory providing support for user
  configured doc-centric versioning rules

* New default index format: Lucene46Codec

* New EnumField type

Solr 4.6 also includes many other new features as well as numerous
optimizations and bugfixes.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

Happy Searching

Simon


Re: Can I use boosting fields with edismax ?

2013-11-24 Thread Erick Erickson
This should work. Try adding &debug=all to your URL, and examine
the output both with and without your boosting. I believe you'll see
the difference in the score calculations. From there it's a matter
of adjusting the boosts to get the results you want.


Best,
Erick


On Sat, Nov 23, 2013 at 9:17 AM, Amit Aggarwal wrote:

> Hello All ,
>
> I am using defType=edismax
> So will boosting will work like this in solrConfig.xml
>
> value_search^2.0 desc_search country_search^1.5
> state_search^2.0 city_search^2.5 area_search^3.0
>
> I think it is not working ..
>
> If yes , then what should I do ?
>


Re: useColdSearcher in SolrCloud config

2013-11-24 Thread Erick Erickson
bq: For example, what if the leader could give a list of
queries/filters currently in the cache which could then be executed on
the replica?

How is the better than each replica firing off its own warming
queries for its caches etc? Each replica may well fire different
autowarm queries since there's no guarantee that their first
autowarmcount queries in the caches is the same, but that would
also be true of getting the autowarm queries from the leader.

Any firstSearcher and  newSearcher queries will be identical
anyway since solrconfig.xml is identical, so that's not a problem.

So I remain to be convinced that this would buy us anything, but
I've been wrong more than once :)

Best,
Erick


On Sat, Nov 23, 2013 at 1:31 PM, Shawn Heisey  wrote:

> On 11/23/2013 4:20 AM, Shalin Shekhar Mangar wrote:
> > As you said, loading caches from the caches of another server is not
> > feasible but there is some merit in warming with the queries of the
> > leader. For example, what if the leader could give a list of
> > queries/filters currently in the cache which could then be executed on
> > the replica? That'd be useful I think.
>
> That is an interesting idea.
>
> Thoughts, conjured with only surface understanding of the internals
> involved:
>
> One question is whether this would be done via the zookeeper queue or
> with direct inter-server communication.  My only worry with doing it in
> zookeeper is the potential for it to put a major load on low-end
> zookeeper machines.  We often tell people that their ZK nodes do not
> need much in the way of resources.  If the user has high-end machines,
> even if they are doing double duty as SolrCloud and Zookeeper, that
> would not really be a worry.
>
> Would we want to control the max number of forwarded keys via the
> existing autowarmCount setting, or have a new per-cache setting with a
> relatively low default?  If it's a new setting, I would recommend that
> it not be included in the example solrconfig.xml file, to discourage
> people from shooting themselves in the foot accidentally.  It should be
> well documented in the wiki and ref guide as an expert setting.
>
> Thanks,
> Shawn
>
>


Re: SolrCloud unstable

2013-11-24 Thread Lance Norskog
Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer 
supported by Oracle. Also, read up on the various garbage collectors. It 
is a complex topic and there are many guides online.


In particular there is a problem in some Java 6 releases that causes a 
massive memory leak in Solr. The symptom is that memory use oscillates 
(normally) from, say 1GB to 2GB. After the bug triggers, the ceiling of 
2GB becomes the floor, and memory use oscillates from 2GB to 3GB. I'm 
not saying this is the problem you have. I'm just saying that is 
important to read up on garbage collection.


Lance

On 11/22/2013 05:27 AM, Martin de Vries wrote:
  


We did some more monitoring and have some new information:

Before
the issue happens the garbage collector's "collection count" increases a
lot. The increase seems to start about an hour before the real problem
occurs:

http://www.analyticsforapplications.com/GC.png [1]

We tried
both the g1 garbage collector and the regular one, the problem happens
with both of them.

We use Java 1.6 on some servers. Will Java 1.7 be
better?

Martin

Martin de Vries schreef op 12.11.2013 10:45:

Hi,

We have:

Solr 4.5.1 - 5 servers
36 cores, 2 shards each,

2 servers per shard (every core is on 4

servers)
about 4.5 GB total

data on disk per server

4GB JVM-Memory per server, 3GB average in

use

Zookeeper 3.3.5 - 3 servers (one shared with Solr)
haproxy load

balancing

Our Solrcloud is very unstable. About one time a week

some cores go in

recovery state or down state. Many timeouts occur

and we have to restart

servers to get them back to work. The failover

doesn't work in many

cases, because one server has the core in down

state, the other in

recovering state. Other cores work fine. When the

cloud is stable I

sometimes see log messages like:
- shard update

error StdNode:
http://033.downnotifier.com:8983/solr/dntest_shard2_replica1/:org.apache.solr.client.solrj.SolrServerException:


IOException occured when talking to server at:


http://033.downnotifier.com:8983/solr/dntest_shard2_replica1

-

forwarding update to
http://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed -
retrying ...

- null:ClientAbortException: java.io.IOException: Broken

pipe

Before the the cloud problems start there are many large

Qtime's in the

log (sometimes over 50 seconds), but there are no

other errors until the

recovery problems start.

Any clue about

what can be wrong?

Kinds regards,

Martin
  


Links:
--
[1]
http://www.analyticsforapplications.com/GC.png





Cloning shards => cloning collections

2013-11-24 Thread Otis Gospodnetic
Hi,

In http://search-lucene.com/m/O1O2r14sU811 Shalin wrote:

"The splitting process is nothing but the creation of a bitset with
which a LiveDocsReader is created. These readers are then added to the
a new index via IW.addIndexes(IndexReader[] readers) method."

... which makes me wonder couldn't the same mechanism be used to clone
shards and thus allow us to clone/duplicate a whole collection?  A handy
feature, IMHO.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


Re: Can I use boosting fields with edismax ?

2013-11-24 Thread Amit Aggarwal
Ok Erick.. I will try thanks
On 25-Nov-2013 2:46 AM, "Erick Erickson"  wrote:

> This should work. Try adding &debug=all to your URL, and examine
> the output both with and without your boosting. I believe you'll see
> the difference in the score calculations. From there it's a matter
> of adjusting the boosts to get the results you want.
>
>
> Best,
> Erick
>
>
> On Sat, Nov 23, 2013 at 9:17 AM, Amit Aggarwal  >wrote:
>
> > Hello All ,
> >
> > I am using defType=edismax
> > So will boosting will work like this in solrConfig.xml
> >
> > value_search^2.0 desc_search country_search^1.5
> > state_search^2.0 city_search^2.5 area_search^3.0
> >
> > I think it is not working ..
> >
> > If yes , then what should I do ?
> >
>


Re: building custom cache - using lucene docids

2013-11-24 Thread Mikhail Khludnev
Roman,

I don't fully understand your question. After segment is flushed it's never
changed, hence segment-local docids are always the same. Due to merge
segment can gone, its' docs become new ones in another segment.  This is
true for 'global' (Solr-style) docnums, which can flip after merge is
happened in the middle of the segments' chain.
As well you are saying about segmented cache I can propose you to look at
CachingWrapperFilter and NoOpRegenerator as a pattern for such data
structures.



On Sat, Nov 23, 2013 at 9:40 AM, Roman Chyla  wrote:

> Hi,
> docids are 'ephemeral', but i'd still like to build a search cache with
> them (they allow for the fastest joins).
>
> i'm seeing docids keep changing with updates (especially, in the last index
> segment) - as per
> https://issues.apache.org/jira/browse/LUCENE-2897
>
> That would be fine, because i could build the cache from diff (of index
> state) + reading the latest index segment in its entirety. But can I assume
> that docids in other segments (other than the last one) will be relatively
> stable? (ie. when an old doc is deleted, the docid is marked as removed;
> update doc = delete old & create a new docid)?
>
> thanks
>
> roman
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Commit behaviour in SolrCloud

2013-11-24 Thread adfel70
Hi Mark, Thanks for the answer.

One more question though: You say that if I get a success from the update,
it’s in the system, commit or not. But when exactly do I get this feedback -
Is it one feedback per the whole request, or per one add inside the request?
I will give an example clarify my question: Say I have new empty index, and
I repeatedly send indexing requests - every request adds 500 new documents
to the index. Is it possible that in some point during this process, to
query the index and get a total of 1,030 docs total? (Lets assume there were
no indexing errors got from Solr)

Thanks again.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commit behaviour in SolrCloud

2013-11-24 Thread Mark Miller
If you want this promise and complete control, you pretty much need to do a doc 
per request and many parallel requests for speed.

The bulk and streaming methods of adding documents do not have a good fine 
grained error reporting strategy yet. It’s okay for certain use cases and and 
especially batch loading, and you will know when an update is rejected - it 
just might not be easy to know which in the batch / stream.

Documents that come in batches are added as they come / are processed - not in 
some atomic unit.

What controls how soon you will see documents or whether you will see them as 
they are still loading is simply when you soft commit and how many docs have 
been indexed when the soft commit happens.

- Mark

On Nov 25, 2013, at 1:03 AM, adfel70  wrote:

> Hi Mark, Thanks for the answer.
> 
> One more question though: You say that if I get a success from the update,
> it’s in the system, commit or not. But when exactly do I get this feedback -
> Is it one feedback per the whole request, or per one add inside the request?
> I will give an example clarify my question: Say I have new empty index, and
> I repeatedly send indexing requests - every request adds 500 new documents
> to the index. Is it possible that in some point during this process, to
> query the index and get a total of 1,030 docs total? (Lets assume there were
> no indexing errors got from Solr)
> 
> Thanks again.
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Commit behaviour in SolrCloud

2013-11-24 Thread adfel70
Just to clarify how these two phrases come together:
1. "you will know when an update is rejected - it just might not be easy to
know which in the batch / stream"

2. "Documents that come in batches are added as they come / are processed -
not in some atomic unit."


If I send a batch of documents in one update request, and some of the docs
fail - will the other docs still remain in the system?
what if soft commit occurred after some of the docs but before all of the
docs got processed, and then some of the remaining docs fail during
processing?
I assume that the client will get an error for the whole batch (because of
the current error reporting strategy), but which docs will remain in the
system? only those which got processed before the fail or non of the docs in
this batch?




Mark Miller-3 wrote
> If you want this promise and complete control, you pretty much need to do
> a doc per request and many parallel requests for speed.
> 
> The bulk and streaming methods of adding documents do not have a good fine
> grained error reporting strategy yet. It’s okay for certain use cases and
> and especially batch loading, and you will know when an update is rejected
> - it just might not be easy to know which in the batch / stream.
> 
> Documents that come in batches are added as they come / are processed -
> not in some atomic unit.
> 
> What controls how soon you will see documents or whether you will see them
> as they are still loading is simply when you soft commit and how many docs
> have been indexed when the soft commit happens.
> 
> - Mark
> 
> On Nov 25, 2013, at 1:03 AM, adfel70 <

> adfel70@

> > wrote:
> 
>> Hi Mark, Thanks for the answer.
>> 
>> One more question though: You say that if I get a success from the
>> update,
>> it’s in the system, commit or not. But when exactly do I get this
>> feedback -
>> Is it one feedback per the whole request, or per one add inside the
>> request?
>> I will give an example clarify my question: Say I have new empty index,
>> and
>> I repeatedly send indexing requests - every request adds 500 new
>> documents
>> to the index. Is it possible that in some point during this process, to
>> query the index and get a total of 1,030 docs total? (Lets assume there
>> were
>> no indexing errors got from Solr)
>> 
>> Thanks again.
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102999.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commit behaviour in SolrCloud

2013-11-24 Thread Mark Miller

On Nov 25, 2013, at 1:40 AM, adfel70  wrote:

> Just to clarify how these two phrases come together:
> 1. "you will know when an update is rejected - it just might not be easy to
> know which in the batch / stream"
> 
> 2. "Documents that come in batches are added as they come / are processed -
> not in some atomic unit."
> 
> 
> If I send a batch of documents in one update request, and some of the docs
> fail - will the other docs still remain in the system?

Yes.

> what if soft commit occurred after some of the docs but before all of the
> docs got processed, and then some of the remaining docs fail during
> processing?

soft commit is only about visibility.

> I assume that the client will get an error for the whole batch (because of
> the current error reporting strategy), but which docs will remain in the
> system? only those which got processed before the fail or non of the docs in
> this batch?

Generally, it will be those processed before the fail if you are using the bulk 
add methods. Somewhat depends on impls and such - for example CloudSolrServer 
can use multiple threads to route documents and so perhaps a couple documents 
after the fail make it in.


- Mark

> 
> 
> 
> 
> Mark Miller-3 wrote
>> If you want this promise and complete control, you pretty much need to do
>> a doc per request and many parallel requests for speed.
>> 
>> The bulk and streaming methods of adding documents do not have a good fine
>> grained error reporting strategy yet. It’s okay for certain use cases and
>> and especially batch loading, and you will know when an update is rejected
>> - it just might not be easy to know which in the batch / stream.
>> 
>> Documents that come in batches are added as they come / are processed -
>> not in some atomic unit.
>> 
>> What controls how soon you will see documents or whether you will see them
>> as they are still loading is simply when you soft commit and how many docs
>> have been indexed when the soft commit happens.
>> 
>> - Mark
>> 
>> On Nov 25, 2013, at 1:03 AM, adfel70 <
> 
>> adfel70@
> 
>> > wrote:
>> 
>>> Hi Mark, Thanks for the answer.
>>> 
>>> One more question though: You say that if I get a success from the
>>> update,
>>> it’s in the system, commit or not. But when exactly do I get this
>>> feedback -
>>> Is it one feedback per the whole request, or per one add inside the
>>> request?
>>> I will give an example clarify my question: Say I have new empty index,
>>> and
>>> I repeatedly send indexing requests - every request adds 500 new
>>> documents
>>> to the index. Is it possible that in some point during this process, to
>>> query the index and get a total of 1,030 docs total? (Lets assume there
>>> were
>>> no indexing errors got from Solr)
>>> 
>>> Thanks again.
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in 
> context:http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102999.html
> Sent from the Solr - User mailing list archive at Nabble.com.