date:20130722

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser

Very true. I was impatient (I think less than three minutes impatient so
hopefully 4.4 will save me from myself) but I didn't realise it was doing
something rather than just hanging. Next time I have to restart a node I'll
just leave and go get a cup of coffee or something.

My configuration is set to auto hard-commit every 5 minutes. No auto
soft-commit time is set.

Over the course of the weekend, while left unattended the nodes have been
going up and down (I've got to solve the issue that is causing them to come
and go, but any suggestions on what is likely to be causing something like
that are welcome), at one point one of the nodes stopped taking updates.
After indexing properly for a few hours with that one shard not accepting
updates, the replica of that shard which contains all the correct documents
must have replicated from the broken node and dropped documents. Is there
any protection against this in Solr or should I be focusing on getting my
nodes to be more reliable? I've now got a situation where four of my five
shards have leaders who are marked as down and followers who are up.

I'm going to start grabbing information about the cluster state so I can
track which changes are happening and in what order. I can get hold of Solr
logs and garbage collection logs while these things are happening.

Is this all just down to my nodes being unreliable?


On 21 July 2013 13:52, Erick Erickson  wrote:

> Well, if I'm reading this right you had a node go out of circulation
> and then bounced nodes until that node became the leader. So of course
> it wouldn't have the documents (how could it?). Basically you shot
> yourself in the foot.
>
> Underlying here is why it took the machine you were re-starting so
> long to come up that you got impatient and started killing nodes.
> There has been quite a bit done to make that process better, so what
> version of Solr are you using? 4.4 is being voted on right now, so if
> you might want to consider upgrading.
>
> There was, for instance, a situation where it would take 3 minutes for
> machines to start up. How impatient were you?
>
> Also, what are your hard commit parameters? All of the documents
> you're indexing will be in the transaction log between hard commits,
> and when a node comes up the leader will replay everything in the tlog
> to the new node, which might be a source of why it took so long for
> the new node to come back up. At the very least the new node you were
> bringing back online will need to do a full index replication (old
> style) to get caught up.
>
> Best
> Erick
>
> On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser 
> wrote:
> > While indexing some documents to a SolrCloud cluster (10 machines, 5
> shards
> > and 2 replicas, so one replica on each machine) one of the replicas
> stopped
> > receiving documents, while the other replica of the shard continued to
> grow.
> >
> > That was overnight so I was unable to track exactly what happened (I'm
> > going off our Graphite graphs here). This morning when I was able to look
> > at the cluster both replicas of that shard were marked as down (with one
> > marked as leader). I attempted to restart the non-leader node but it
> took a
> > long time to restart so I killed it and restarted the old leader, which
> > also took a long time. I killed that one (I'm impatient) and left the
> > non-leader node to restart, not realising it was missing approximately
> 700k
> > documents that the old leader had. Eventually it restarted and became
> > leader. I restarted the old leader and it dropped the number of documents
> > it had to match the previous non-leader.
> >
> > Is this expected behaviour when a replica with fewer documents is started
> > before the other and elected leader? Should I have been paying more
> > attention to the number of documents on the server before restarting
> nodes?
> >
> > I am still in the process of tuning the caches and warming for these
> > servers but we are putting some load through the cluster so it is
> possible
> > that the nodes are having to work quite hard when a new version of the
> core
> > comes is made available. Is this likely to explain why I occasionally see
> > nodes dropping out? Unfortunately in restarting the nodes I lost the GC
> > logs to see whether that was likely to be the culprit. Is this the sort
> of
> > situation where you raise the ZooKeeper timeout a bit? Currently the
> > timeout for all nodes is 15 seconds.
> >
> > Are there any known issues which might explain what's happening? I'm just
> > getting started with SolrCloud after using standard master/slave
> > replication for an index which has got too big for one machine over the
> > last few months.
> >
> > Also, is there any particular information that would be helpful to help
> > with these issues if it should happen again?
>

highlighting required in document

2013-07-22 Thread Jamshaid Ashraf

Hi,

I'm using solr 4.3.0 & following is the response against hit highlighting
request:

Request: http://localhost:8080/solr/collection2/select?q=content:ps4&hl=true

Response:


 This post is regarding ps4 accuracy and qulaity
which is smooth and factastic



 This post is regarding ps4 accuracy and
qulaity which is smooth and factastic


I wanted result like this:


 This post is regarding ps4 accuracy and
qulaity which is smooth and factastic



 This post is regarding ps4 accuracy and
qulaity which is smooth and factastic


Thanks in advance!

Regards,
Jamshaid

Re: DIH and tinyint(1) Field

2013-07-22 Thread deniz

Shalin Shekhar Mangar wrote
> Your database's JDBC driver is interpreting the tinyint(1) as a boolean.
> 
> Solr 4.4 fixes the problem affected date fields with convertType=true. It
> should be released by the end of this week.
> 
> 
> On Mon, Jul 22, 2013 at 12:18 PM, deniz <

> denizdurmus87@

> > wrote:
> 
>> Hello,
>>
>> I have exactly the same problem as here
>>
>>
>> http://lucene.472066.n3.nabble.com/how-to-avoid-DataImportHandler-from-interpreting-quot-tinyint-1-unsigned-quot-value-as-quot-Boolean--td4035241.html#a4036967
>>
>> however for the solution there, it is ruining my date type fields...
>>
>> are there any other ways to deal with this problem?
>>
>>
>>
>> -
>> Zeki ama calismiyor... Calissa yapar...
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.


thank you Shalin, for a quick solution i found that adding
"&tinyInt1isBit=false" to connection url also works fine



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392p4079398.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Mikhail Khludnev

Short answer, no - it has zero sense.

But after some thinking, it can make some sense, potentially.
DisjunctionSumScorer holds child scorers semi-ordered in a binary heap.
Hypothetically inequality can be enforced at that heap, but heap might not
work anymore for such alignment. Hence, instead of heap TreeSet can be used
for experiment.
fwiw, it's a dev list question.

On Mon, Jul 22, 2013 at 4:48 AM, Deepak Konidena wrote:

> I understand that lucene's AND (&&), OR (||) and NOT (!) operators are
> shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
> one can't treat them as boolean operators (adhering to boolean algebra).
>
> I have been trying to construct a simple OR expression, as follows
>
> q = +(field1:value1 OR field2:value2)
>
> with a match on either field1 or field2. But since the OR is merely an
> optional, documents where both field1:value1 and field2:value2 are matched,
> the query returns a score resulting in a match on both the clauses.
>
> How do I enforce short-circuiting in this context? In other words, how to
> implement short-circuiting as in boolean algebra where an expression A || B
> || C returns true if A is true without even looking into whether B or C
> could be true.
> -Deepak
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Regex in Stopword.xml

2013-07-22 Thread Scatman

Hi, 

I was looking for an issue, in order to put some regular expression in the
StopWord.xml, but it seems that we can only have words in the file.
I'm just wondering if there is a feature which will be done in this way or
if someone got a tip it will help me a lot :) 

Best,
Scatman.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr - Multiple Facet Exclusion for the same Field

2013-07-22 Thread Ralf Heyde


Hello,

i need different (multiple) Facet exclusions for the same field. This 
approach works:


http://server/core/select/?q=*:*
 &fq={!tag=b}brand:adidas
 &fq={!tag=c}color:red
 &facet.field={!ex=b}brand
 &facet.field={!ex=c}brand
 &facet.field={!ex=b,c}brand
 &facet.field=brand
 &facet=true&facet.mincount=1

then my result provides different facets for "brand".
BUT: is there any possibility to get to know, which exclusion fits to 
which facet? Is there something like "as" in SQL (e.g. 
facet.field={!ex=b as BrandB}brand) ?

We are using Solr 3.6.

Hopefully this is a feature, not a bug, which we are using.

Thanks in advance.
Ralf

Re: Solr - Multiple Facet Exclusion for the same Field

2013-07-22 Thread Ralf Heyde


Just found it.
Use {!ex=c key=ckey} ...

On 07/22/2013 11:35 AM, Ralf Heyde wrote:

Hello,

i need different (multiple) Facet exclusions for the same field. This 
approach works:


http://server/core/select/?q=*:*
 &fq={!tag=b}brand:adidas
 &fq={!tag=c}color:red
 &facet.field={!ex=b}brand
 &facet.field={!ex=c}brand
 &facet.field={!ex=b,c}brand
 &facet.field=brand
 &facet=true&facet.mincount=1

then my result provides different facets for "brand".
BUT: is there any possibility to get to know, which exclusion fits to 
which facet? Is there something like "as" in SQL (e.g. 
facet.field={!ex=b as BrandB}brand) ?

We are using Solr 3.6.

Hopefully this is a feature, not a bug, which we are using.

Thanks in advance.
Ralf

Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Robert Krüger

Hi,

I use solr embedded in a desktop app and I want to change it to no
longer require the configuration for the container and core to be in
the filesystem but rather be distributed as part of a jar file.

Could someone kindly point me to the right docs?

So far my impression is, I need to instantiate CoreContainer with a
custom SolrResourceLoader with properties parsed via some other API
but from the javadocs alone I feel a bit lost (why does it have to
have an instance directory at all?) and googling did not give me many
results. What would be ideal would be to have something like this
(pseudocode with partly imagined names, which hopefully illustrates
what I am trying to achieve):

ContainerConfig containerConfig =
ContainerConfigParser.parse();
CoreContainer  container = new CoreContainer(containerConfig);

CoreConfig coreConfig = CoreConfigParser.parse(container, );
container.register(, coreConfig);

Ideally I would like to keep XML format to reuse my current solr.xml
and solrconfig.xml but that is just a nice-to-have.

Does such a way exist and if so, what are the real API classes and calls to use?

Thank you in advance,

Robert

Re: Regex in Stopword.xml

2013-07-22 Thread Manuel Le Normand

Use the pattern replace filter factory




This will do exactly what you asked for


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory




On Mon, Jul 22, 2013 at 12:22 PM, Scatman  wrote:

> Hi,
>
> I was looking for an issue, in order to put some regular expression in the
> StopWord.xml, but it seems that we can only have words in the file.
> I'm just wondering if there is a feature which will be done in this way or
> if someone got a tip it will help me a lot :)
>
> Best,
> Scatman.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alan Woodward

Hi Robert,

The upcoming 4.4 release should make this a bit easier (you can check out the 
release branch now if you like, or wait a few days for the official version).  
CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as 
constructor parameters, and you can create a ConfigSolr object from a string 
representation of solr.xml using the ConfigSolr.fromString() static method.

Alan Woodward
www.flax.co.uk


On 22 Jul 2013, at 11:41, Robert Krüger wrote:

> Hi,
> 
> I use solr embedded in a desktop app and I want to change it to no
> longer require the configuration for the container and core to be in
> the filesystem but rather be distributed as part of a jar file.
> 
> Could someone kindly point me to the right docs?
> 
> So far my impression is, I need to instantiate CoreContainer with a
> custom SolrResourceLoader with properties parsed via some other API
> but from the javadocs alone I feel a bit lost (why does it have to
> have an instance directory at all?) and googling did not give me many
> results. What would be ideal would be to have something like this
> (pseudocode with partly imagined names, which hopefully illustrates
> what I am trying to achieve):
> 
> ContainerConfig containerConfig =
> ContainerConfigParser.parse();
> CoreContainer  container = new CoreContainer(containerConfig);
> 
> CoreConfig coreConfig = CoreConfigParser.parse(container,  from Classloader>);
> container.register(, coreConfig);
> 
> Ideally I would like to keep XML format to reuse my current solr.xml
> and solrconfig.xml but that is just a nice-to-have.
> 
> Does such a way exist and if so, what are the real API classes and calls to 
> use?
> 
> Thank you in advance,
> 
> Robert

Re: Regex in Stopword.xml

2013-07-22 Thread Scatman

Thank for reply but it's not a solution that i'm looking for, and i should
better explained myself, because i got like 100 hundred regex to put in the
config. In order to manage easiest Solr, i think the better way is to put
regex in a file... I know that GSA from google do it, so i'd just hoped that
it will the case for Solr :)  

Best,
Scatman. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079438.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Erick Erickson

Wow, you really shouldn't be having nodes go up and down so
frequently, that's a big red flag. That said, SolrCloud should be
pretty robust so this is something to pursue...

But even a 5 minute hard commit can lead to a hefty transaction
log under load, you may want to reduce it substantially depending
on how fast you are sending docs to the index. I'm talking
15-30 seconds here. It's critical that openSearcher be set to false
or you'll invalidate your caches that often. All a hard commit
with openSearcher set to false does is close off the current segment
and open a new one. It does NOT open/warm new searchers etc.

The soft commits control visibility, so that's how you control
whether you can search the docs or not. Pardon me if I'm
repeating stuff you already know!

As far as your nodes coming and going, I've seen some people have
good results by upping the ZooKeeper timeout limit. So I guess
my first question is whether the nodes are actually going out of service
or whether it's just a timeout issue

Good luck!
Erick

On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser  wrote:
> Very true. I was impatient (I think less than three minutes impatient so
> hopefully 4.4 will save me from myself) but I didn't realise it was doing
> something rather than just hanging. Next time I have to restart a node I'll
> just leave and go get a cup of coffee or something.
>
> My configuration is set to auto hard-commit every 5 minutes. No auto
> soft-commit time is set.
>
> Over the course of the weekend, while left unattended the nodes have been
> going up and down (I've got to solve the issue that is causing them to come
> and go, but any suggestions on what is likely to be causing something like
> that are welcome), at one point one of the nodes stopped taking updates.
> After indexing properly for a few hours with that one shard not accepting
> updates, the replica of that shard which contains all the correct documents
> must have replicated from the broken node and dropped documents. Is there
> any protection against this in Solr or should I be focusing on getting my
> nodes to be more reliable? I've now got a situation where four of my five
> shards have leaders who are marked as down and followers who are up.
>
> I'm going to start grabbing information about the cluster state so I can
> track which changes are happening and in what order. I can get hold of Solr
> logs and garbage collection logs while these things are happening.
>
> Is this all just down to my nodes being unreliable?
>
>
> On 21 July 2013 13:52, Erick Erickson  wrote:
>
>> Well, if I'm reading this right you had a node go out of circulation
>> and then bounced nodes until that node became the leader. So of course
>> it wouldn't have the documents (how could it?). Basically you shot
>> yourself in the foot.
>>
>> Underlying here is why it took the machine you were re-starting so
>> long to come up that you got impatient and started killing nodes.
>> There has been quite a bit done to make that process better, so what
>> version of Solr are you using? 4.4 is being voted on right now, so if
>> you might want to consider upgrading.
>>
>> There was, for instance, a situation where it would take 3 minutes for
>> machines to start up. How impatient were you?
>>
>> Also, what are your hard commit parameters? All of the documents
>> you're indexing will be in the transaction log between hard commits,
>> and when a node comes up the leader will replay everything in the tlog
>> to the new node, which might be a source of why it took so long for
>> the new node to come back up. At the very least the new node you were
>> bringing back online will need to do a full index replication (old
>> style) to get caught up.
>>
>> Best
>> Erick
>>
>> On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser 
>> wrote:
>> > While indexing some documents to a SolrCloud cluster (10 machines, 5
>> shards
>> > and 2 replicas, so one replica on each machine) one of the replicas
>> stopped
>> > receiving documents, while the other replica of the shard continued to
>> grow.
>> >
>> > That was overnight so I was unable to track exactly what happened (I'm
>> > going off our Graphite graphs here). This morning when I was able to look
>> > at the cluster both replicas of that shard were marked as down (with one
>> > marked as leader). I attempted to restart the non-leader node but it
>> took a
>> > long time to restart so I killed it and restarted the old leader, which
>> > also took a long time. I killed that one (I'm impatient) and left the
>> > non-leader node to restart, not realising it was missing approximately
>> 700k
>> > documents that the old leader had. Eventually it restarted and became
>> > leader. I restarted the old leader and it dropped the number of documents
>> > it had to match the previous non-leader.
>> >
>> > Is this expected behaviour when a replica with fewer documents is started
>> > before the other and elected leader? Should I have been paying more
>> > attention to t

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser

No need to apologise. It's always good to have things like that reiterated
in case I've misunderstood along the way.

I have a feeling that it's related to garbage collection. I assume that if
the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
still alive and so gets marked as down. I've just taken a look at the GC
logs and can see a couple of full collections which took longer than my ZK
timeout of 15s). I'm still in the process of tuning the cache sizes and
have probably got it wrong (I'm coming from a Solr instance which runs on a
48G heap with ~40m documents and bringing it into five shards with 8G
heap). I thought I was being conservative with the cache sizes but I should
probably drop them right down and start again. The entire index is cached
by Linux so I should just need caches to help with things which eat CPU at
request time.

The indexing level is unusual because normally we wouldn't be indexing
everything sequentially, just making delta updates to the index as things
are changed in our MoR. However, it's handy to know how it reacts under the
most extreme load we could give it.

In the case that I set my hard commit time to 15-30 seconds with
openSearcher set to false, how do I control when I actually do invalidate
the caches and open a new searcher? Is this something that Solr can do
automatically, or will I need some sort of coordinator process to perform a
'proper' commit from outside Solr?

In our case the process of opening a new searcher is definitely a hefty
operation. We have a large number of boosts and filters which are used for
just about every query that is made against the index so we currently have
them warmed which can take upwards of a minute on our giant core.

Thanks for your help.

On 22 July 2013 13:00, Erick Erickson  wrote:

> Wow, you really shouldn't be having nodes go up and down so
> frequently, that's a big red flag. That said, SolrCloud should be
> pretty robust so this is something to pursue...
>
> But even a 5 minute hard commit can lead to a hefty transaction
> log under load, you may want to reduce it substantially depending
> on how fast you are sending docs to the index. I'm talking
> 15-30 seconds here. It's critical that openSearcher be set to false
> or you'll invalidate your caches that often. All a hard commit
> with openSearcher set to false does is close off the current segment
> and open a new one. It does NOT open/warm new searchers etc.
>
> The soft commits control visibility, so that's how you control
> whether you can search the docs or not. Pardon me if I'm
> repeating stuff you already know!
>
> As far as your nodes coming and going, I've seen some people have
> good results by upping the ZooKeeper timeout limit. So I guess
> my first question is whether the nodes are actually going out of service
> or whether it's just a timeout issue
>
> Good luck!
> Erick
>
> On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser 
> wrote:
> > Very true. I was impatient (I think less than three minutes impatient so
> > hopefully 4.4 will save me from myself) but I didn't realise it was doing
> > something rather than just hanging. Next time I have to restart a node
> I'll
> > just leave and go get a cup of coffee or something.
> >
> > My configuration is set to auto hard-commit every 5 minutes. No auto
> > soft-commit time is set.
> >
> > Over the course of the weekend, while left unattended the nodes have been
> > going up and down (I've got to solve the issue that is causing them to
> come
> > and go, but any suggestions on what is likely to be causing something
> like
> > that are welcome), at one point one of the nodes stopped taking updates.
> > After indexing properly for a few hours with that one shard not accepting
> > updates, the replica of that shard which contains all the correct
> documents
> > must have replicated from the broken node and dropped documents. Is there
> > any protection against this in Solr or should I be focusing on getting my
> > nodes to be more reliable? I've now got a situation where four of my five
> > shards have leaders who are marked as down and followers who are up.
> >
> > I'm going to start grabbing information about the cluster state so I can
> > track which changes are happening and in what order. I can get hold of
> Solr
> > logs and garbage collection logs while these things are happening.
> >
> > Is this all just down to my nodes being unreliable?
> >
> >
> > On 21 July 2013 13:52, Erick Erickson  wrote:
> >
> >> Well, if I'm reading this right you had a node go out of circulation
> >> and then bounced nodes until that node became the leader. So of course
> >> it wouldn't have the documents (how could it?). Basically you shot
> >> yourself in the foot.
> >>
> >> Underlying here is why it took the machine you were re-starting so
> >> long to come up that you got impatient and started killing nodes.
> >> There has been quite a bit done to make that process better, so what
> >> version of Solr are yo

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser

Sorry, I should also mention that these leader nodes which are marked as
down can actually still be queried locally with distrib=false with no
problems. Is it possible that they've somehow got themselves out-of-sync?


On 22 July 2013 13:37, Neil Prosser  wrote:

> No need to apologise. It's always good to have things like that reiterated
> in case I've misunderstood along the way.
>
> I have a feeling that it's related to garbage collection. I assume that if
> the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
> still alive and so gets marked as down. I've just taken a look at the GC
> logs and can see a couple of full collections which took longer than my ZK
> timeout of 15s). I'm still in the process of tuning the cache sizes and
> have probably got it wrong (I'm coming from a Solr instance which runs on a
> 48G heap with ~40m documents and bringing it into five shards with 8G
> heap). I thought I was being conservative with the cache sizes but I should
> probably drop them right down and start again. The entire index is cached
> by Linux so I should just need caches to help with things which eat CPU at
> request time.
>
> The indexing level is unusual because normally we wouldn't be indexing
> everything sequentially, just making delta updates to the index as things
> are changed in our MoR. However, it's handy to know how it reacts under the
> most extreme load we could give it.
>
> In the case that I set my hard commit time to 15-30 seconds with
> openSearcher set to false, how do I control when I actually do invalidate
> the caches and open a new searcher? Is this something that Solr can do
> automatically, or will I need some sort of coordinator process to perform a
> 'proper' commit from outside Solr?
>
> In our case the process of opening a new searcher is definitely a hefty
> operation. We have a large number of boosts and filters which are used for
> just about every query that is made against the index so we currently have
> them warmed which can take upwards of a minute on our giant core.
>
> Thanks for your help.
>
>
> On 22 July 2013 13:00, Erick Erickson  wrote:
>
>> Wow, you really shouldn't be having nodes go up and down so
>> frequently, that's a big red flag. That said, SolrCloud should be
>> pretty robust so this is something to pursue...
>>
>> But even a 5 minute hard commit can lead to a hefty transaction
>> log under load, you may want to reduce it substantially depending
>> on how fast you are sending docs to the index. I'm talking
>> 15-30 seconds here. It's critical that openSearcher be set to false
>> or you'll invalidate your caches that often. All a hard commit
>> with openSearcher set to false does is close off the current segment
>> and open a new one. It does NOT open/warm new searchers etc.
>>
>> The soft commits control visibility, so that's how you control
>> whether you can search the docs or not. Pardon me if I'm
>> repeating stuff you already know!
>>
>> As far as your nodes coming and going, I've seen some people have
>> good results by upping the ZooKeeper timeout limit. So I guess
>> my first question is whether the nodes are actually going out of service
>> or whether it's just a timeout issue
>>
>> Good luck!
>> Erick
>>
>> On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser 
>> wrote:
>> > Very true. I was impatient (I think less than three minutes impatient so
>> > hopefully 4.4 will save me from myself) but I didn't realise it was
>> doing
>> > something rather than just hanging. Next time I have to restart a node
>> I'll
>> > just leave and go get a cup of coffee or something.
>> >
>> > My configuration is set to auto hard-commit every 5 minutes. No auto
>> > soft-commit time is set.
>> >
>> > Over the course of the weekend, while left unattended the nodes have
>> been
>> > going up and down (I've got to solve the issue that is causing them to
>> come
>> > and go, but any suggestions on what is likely to be causing something
>> like
>> > that are welcome), at one point one of the nodes stopped taking updates.
>> > After indexing properly for a few hours with that one shard not
>> accepting
>> > updates, the replica of that shard which contains all the correct
>> documents
>> > must have replicated from the broken node and dropped documents. Is
>> there
>> > any protection against this in Solr or should I be focusing on getting
>> my
>> > nodes to be more reliable? I've now got a situation where four of my
>> five
>> > shards have leaders who are marked as down and followers who are up.
>> >
>> > I'm going to start grabbing information about the cluster state so I can
>> > track which changes are happening and in what order. I can get hold of
>> Solr
>> > logs and garbage collection logs while these things are happening.
>> >
>> > Is this all just down to my nodes being unreliable?
>> >
>> >
>> > On 21 July 2013 13:52, Erick Erickson  wrote:
>> >
>> >> Well, if I'm reading this right you had a node go out of circulation
>> >> and then bou

RE: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Markus Jelsma

It is possible: https://issues.apache.org/jira/browse/SOLR-4260
I rarely see it and i cannot reliably reproduce it but it just sometimes 
happens. Nodes will not bring each other back in sync.

 
 
-Original message-
> From:Neil Prosser 
> Sent: Monday 22nd July 2013 14:41
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
> 
> Sorry, I should also mention that these leader nodes which are marked as
> down can actually still be queried locally with distrib=false with no
> problems. Is it possible that they've somehow got themselves out-of-sync?
> 
> 
> On 22 July 2013 13:37, Neil Prosser  wrote:
> 
> > No need to apologise. It's always good to have things like that reiterated
> > in case I've misunderstood along the way.
> >
> > I have a feeling that it's related to garbage collection. I assume that if
> > the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
> > still alive and so gets marked as down. I've just taken a look at the GC
> > logs and can see a couple of full collections which took longer than my ZK
> > timeout of 15s). I'm still in the process of tuning the cache sizes and
> > have probably got it wrong (I'm coming from a Solr instance which runs on a
> > 48G heap with ~40m documents and bringing it into five shards with 8G
> > heap). I thought I was being conservative with the cache sizes but I should
> > probably drop them right down and start again. The entire index is cached
> > by Linux so I should just need caches to help with things which eat CPU at
> > request time.
> >
> > The indexing level is unusual because normally we wouldn't be indexing
> > everything sequentially, just making delta updates to the index as things
> > are changed in our MoR. However, it's handy to know how it reacts under the
> > most extreme load we could give it.
> >
> > In the case that I set my hard commit time to 15-30 seconds with
> > openSearcher set to false, how do I control when I actually do invalidate
> > the caches and open a new searcher? Is this something that Solr can do
> > automatically, or will I need some sort of coordinator process to perform a
> > 'proper' commit from outside Solr?
> >
> > In our case the process of opening a new searcher is definitely a hefty
> > operation. We have a large number of boosts and filters which are used for
> > just about every query that is made against the index so we currently have
> > them warmed which can take upwards of a minute on our giant core.
> >
> > Thanks for your help.
> >
> >
> > On 22 July 2013 13:00, Erick Erickson  wrote:
> >
> >> Wow, you really shouldn't be having nodes go up and down so
> >> frequently, that's a big red flag. That said, SolrCloud should be
> >> pretty robust so this is something to pursue...
> >>
> >> But even a 5 minute hard commit can lead to a hefty transaction
> >> log under load, you may want to reduce it substantially depending
> >> on how fast you are sending docs to the index. I'm talking
> >> 15-30 seconds here. It's critical that openSearcher be set to false
> >> or you'll invalidate your caches that often. All a hard commit
> >> with openSearcher set to false does is close off the current segment
> >> and open a new one. It does NOT open/warm new searchers etc.
> >>
> >> The soft commits control visibility, so that's how you control
> >> whether you can search the docs or not. Pardon me if I'm
> >> repeating stuff you already know!
> >>
> >> As far as your nodes coming and going, I've seen some people have
> >> good results by upping the ZooKeeper timeout limit. So I guess
> >> my first question is whether the nodes are actually going out of service
> >> or whether it's just a timeout issue
> >>
> >> Good luck!
> >> Erick
> >>
> >> On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser 
> >> wrote:
> >> > Very true. I was impatient (I think less than three minutes impatient so
> >> > hopefully 4.4 will save me from myself) but I didn't realise it was
> >> doing
> >> > something rather than just hanging. Next time I have to restart a node
> >> I'll
> >> > just leave and go get a cup of coffee or something.
> >> >
> >> > My configuration is set to auto hard-commit every 5 minutes. No auto
> >> > soft-commit time is set.
> >> >
> >> > Over the course of the weekend, while left unattended the nodes have
> >> been
> >> > going up and down (I've got to solve the issue that is causing them to
> >> come
> >> > and go, but any suggestions on what is likely to be causing something
> >> like
> >> > that are welcome), at one point one of the nodes stopped taking updates.
> >> > After indexing properly for a few hours with that one shard not
> >> accepting
> >> > updates, the replica of that shard which contains all the correct
> >> documents
> >> > must have replicated from the broken node and dropped documents. Is
> >> there
> >> > any protection against this in Solr or should I be focusing on getting
> >> my
> >> > nodes to be more reliable? I've no

RE: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Markus Jelsma

You should increase your ZK time out, this may be the issue in your case. You 
may also want to try the G1GC collector to keep STW under ZK time out.
 
-Original message-
> From:Neil Prosser 
> Sent: Monday 22nd July 2013 14:38
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
> 
> No need to apologise. It's always good to have things like that reiterated
> in case I've misunderstood along the way.
> 
> I have a feeling that it's related to garbage collection. I assume that if
> the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
> still alive and so gets marked as down. I've just taken a look at the GC
> logs and can see a couple of full collections which took longer than my ZK
> timeout of 15s). I'm still in the process of tuning the cache sizes and
> have probably got it wrong (I'm coming from a Solr instance which runs on a
> 48G heap with ~40m documents and bringing it into five shards with 8G
> heap). I thought I was being conservative with the cache sizes but I should
> probably drop them right down and start again. The entire index is cached
> by Linux so I should just need caches to help with things which eat CPU at
> request time.
> 
> The indexing level is unusual because normally we wouldn't be indexing
> everything sequentially, just making delta updates to the index as things
> are changed in our MoR. However, it's handy to know how it reacts under the
> most extreme load we could give it.
> 
> In the case that I set my hard commit time to 15-30 seconds with
> openSearcher set to false, how do I control when I actually do invalidate
> the caches and open a new searcher? Is this something that Solr can do
> automatically, or will I need some sort of coordinator process to perform a
> 'proper' commit from outside Solr?
> 
> In our case the process of opening a new searcher is definitely a hefty
> operation. We have a large number of boosts and filters which are used for
> just about every query that is made against the index so we currently have
> them warmed which can take upwards of a minute on our giant core.
> 
> Thanks for your help.
> 
> 
> On 22 July 2013 13:00, Erick Erickson  wrote:
> 
> > Wow, you really shouldn't be having nodes go up and down so
> > frequently, that's a big red flag. That said, SolrCloud should be
> > pretty robust so this is something to pursue...
> >
> > But even a 5 minute hard commit can lead to a hefty transaction
> > log under load, you may want to reduce it substantially depending
> > on how fast you are sending docs to the index. I'm talking
> > 15-30 seconds here. It's critical that openSearcher be set to false
> > or you'll invalidate your caches that often. All a hard commit
> > with openSearcher set to false does is close off the current segment
> > and open a new one. It does NOT open/warm new searchers etc.
> >
> > The soft commits control visibility, so that's how you control
> > whether you can search the docs or not. Pardon me if I'm
> > repeating stuff you already know!
> >
> > As far as your nodes coming and going, I've seen some people have
> > good results by upping the ZooKeeper timeout limit. So I guess
> > my first question is whether the nodes are actually going out of service
> > or whether it's just a timeout issue
> >
> > Good luck!
> > Erick
> >
> > On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser 
> > wrote:
> > > Very true. I was impatient (I think less than three minutes impatient so
> > > hopefully 4.4 will save me from myself) but I didn't realise it was doing
> > > something rather than just hanging. Next time I have to restart a node
> > I'll
> > > just leave and go get a cup of coffee or something.
> > >
> > > My configuration is set to auto hard-commit every 5 minutes. No auto
> > > soft-commit time is set.
> > >
> > > Over the course of the weekend, while left unattended the nodes have been
> > > going up and down (I've got to solve the issue that is causing them to
> > come
> > > and go, but any suggestions on what is likely to be causing something
> > like
> > > that are welcome), at one point one of the nodes stopped taking updates.
> > > After indexing properly for a few hours with that one shard not accepting
> > > updates, the replica of that shard which contains all the correct
> > documents
> > > must have replicated from the broken node and dropped documents. Is there
> > > any protection against this in Solr or should I be focusing on getting my
> > > nodes to be more reliable? I've now got a situation where four of my five
> > > shards have leaders who are marked as down and followers who are up.
> > >
> > > I'm going to start grabbing information about the cluster state so I can
> > > track which changes are happening and in what order. I can get hold of
> > Solr
> > > logs and garbage collection logs while these things are happening.
> > >
> > > Is this all just down to my nodes being unreliable?
> > >
> > >
> > > On 21 July 2013 13:

Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran

Hi,

I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which 
extends ValueSourceParser and it works under Solr 4.0.0 but does not work under 
Solr 4.3.1. I compiled the plugin using the latest solr-4.3.1*.jars and 
lucene-4.3.1*.jars but I get the following stacktrace error when starting up a 
core referencing this plugin...seen below. Does anyone know why it might be 
giving me a ClassCastException under 4.3.1?

Thanks,
Niran

2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
Unable to create core: example_core
org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, 
com.example.HitsValueSourceParser failed to instanti
ate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.(SolrCore.java:821)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
at 
org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
at org.apache.solr.core.SolrCore.(SolrCore.java:749)
... 13 more
Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser
at java.lang.Class.asSubclass(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
... 19 more
2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
null:org.apache.solr.common.SolrException: Unable to create core: example_core
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.(SolrCore.java:821)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
... 10 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
at 
org.apache.solr

Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Robert Krüger

Great, thank you!

On Jul 22, 2013 1:35 PM, "Alan Woodward"  wrote:
>
> Hi Robert,
>
> The upcoming 4.4 release should make this a bit easier (you can check out
the release branch now if you like, or wait a few days for the official
version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
object as constructor parameters, and you can create a ConfigSolr object
from a string representation of solr.xml using the ConfigSolr.fromString()
static method.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 22 Jul 2013, at 11:41, Robert Krüger wrote:
>
> > Hi,
> >
> > I use solr embedded in a desktop app and I want to change it to no
> > longer require the configuration for the container and core to be in
> > the filesystem but rather be distributed as part of a jar file.
> >
> > Could someone kindly point me to the right docs?
> >
> > So far my impression is, I need to instantiate CoreContainer with a
> > custom SolrResourceLoader with properties parsed via some other API
> > but from the javadocs alone I feel a bit lost (why does it have to
> > have an instance directory at all?) and googling did not give me many
> > results. What would be ideal would be to have something like this
> > (pseudocode with partly imagined names, which hopefully illustrates
> > what I am trying to achieve):
> >
> > ContainerConfig containerConfig =
> > ContainerConfigParser.parse();
> > CoreContainer  container = new CoreContainer(containerConfig);
> >
> > CoreConfig coreConfig = CoreConfigParser.parse(container,  > from Classloader>);
> > container.register(, coreConfig);
> >
> > Ideally I would like to keep XML format to reuse my current solr.xml
> > and solrconfig.xml but that is just a nice-to-have.
> >
> > Does such a way exist and if so, what are the real API classes and
calls to use?
> >
> > Thank you in advance,
> >
> > Robert
>

Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran

Hi,

I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which 
extends ValueSourceParser and it works under Solr 4.0.0 but it does not work 
under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
lucene-4.3.1*.jars but I get the following stacktrace error when starting up a 
core referencing this plugin...seen below. Does anyone know why it might be 
giving me a ClassCastException under 4.3.1?

Thanks,
Niran

2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
Unable to create core: example_core
org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, 
com.example.HitsValueSourceParser failed to instanti
ate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.(SolrCore.java:821)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
at 
org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
at org.apache.solr.core.SolrCore.(SolrCore.java:749)
... 13 more
Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser
at java.lang.Class.asSubclass(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
... 19 more
2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
null:org.apache.solr.common.SolrException: Unable to create core: example_core
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.(SolrCore.java:821)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
... 10 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
at 
org.apache.solr.cor

how to improve (keyword) relevance?

2013-07-22 Thread eShard

Good morning,
I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev
on tomcat 7.
Early on, I used copyfield to put the meta data into the text field to
simplify solr queries (i.e. I only have to query one field now.)
However, a lot people are concerned about improving relevance.
I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook;
however is there a way to modify it so it only uses one field? (i.e. the
text field?) 

(Note well: I have multi cores and the schemas are all somewhat different;
If I can't get this to work with one field then I would have to build
complex queries for all the other cores; this would vastly over complicate
the UI. Is there another way?)
here's the requesthandler in question:

  <1st name="defaults">
  true
  _query_:"{!edismaxqf=$qfQuery
mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery}"
  
  name^10 description
  1
  name description
  _query_:"{!edismaxqf=$boostQuerQf mm=100%
v=$mainQuery}"^10
  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Shawn Heisey

On 7/22/2013 6:45 AM, Markus Jelsma wrote:
> You should increase your ZK time out, this may be the issue in your case. You 
> may also want to try the G1GC collector to keep STW under ZK time out.

When I tried G1, the occasional stop-the-world GC actually got worse.  I
tried G1 after trying CMS with no other tuning parameters.  The average
GC time went down, but when it got into a place where it had to do a
stop-the-world collection, it was worse.

Based on the GC statistics in jvisualvm and jstat, I didn't think I had
a problem.  The way I discovered that I had a problem was by looking at
my haproxy load balancer -- sometimes requests would be sent to a backup
server instead of my primary, because the ping request handler was
timing out on the LB health check.  The LB was set to time out after
five seconds.  When I went looking deeper with the GC log and some other
tools, I was seeing 8-10 second GC pauses.  G1 was showing me pauses of
12 seconds.

Now I use a heavily tuned CMS config, and there are no more LB switches
to a backup server.  I've put some of my own information about my GC
settings on my personal Solr wiki page:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

I've got an 8GB heap on my systems running 3.5.0 (one copy of the index)
and a 6GB heap on those running 4.2.1 (the other copy of the index).

Summary: Just switching to the G1 collector won't solve GC pause
problems.  There's not a lot of G1 tuning information out there yet.  If
someone can come up with a good set of G1 tuning parameters, G1 might
become better than CMS.

Thanks,
Shawn

Re: Regex in Stopword.xml

2013-07-22 Thread Jack Krupansky

How did you get the impression that GSA supports regex stop words? GSA seems 
to follow the same rules as Solr.


See the doc:
http://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/admin_searchexp/ce_improving_search.html#1050255

As with GSA, the stop words are a simple .TXT file.

In any case, Solr and Lucene do not support "stop words" that are regular 
expressions, although a regex filter can simulate them to a limited degree.


-- Jack Krupansky

-Original Message- 
From: Scatman

Sent: Monday, July 22, 2013 7:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Regex in Stopword.xml

Thank for reply but it's not a solution that i'm looking for, and i should
better explained myself, because i got like 100 hundred regex to put in the
config. In order to manage easiest Solr, i think the better way is to put
regex in a file... I know that GSA from google do it, so i'd just hoped that
it will the case for Solr :)

Best,
Scatman.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079438.html
Sent from the Solr - User mailing list archive at Nabble.com.

queryResultCache should not related with the order of fq's list

2013-07-22 Thread 黄飞鸿

 

Hello, 

 

QueryResultCache should not related with the order of fq's list.

 

There are two case query with the same meaning below. But the case2 can't
use the queryResultCache when case1 is executed.

 

case1: q=:&fq=field1:value1&fq=field2:value2
case2: q=:&fq=field2:value2&fq=field1:value1

 

I think queryResultCache should not be related with the order of fq's list.

 

I am a new comer in posting bug. I can’t sure whether it is a bug.

 

I create the issure:  
https://issues.apache.org/jira/browse/SOLR-5057

 

 

By the way, if the issure is ok , how can I post my code? 

 

Thanks.

Re: how to improve (keyword) relevance?

2013-07-22 Thread Jack Krupansky

Could you please be more specific about the relevancy problem you are trying 
to solve?


-- Jack Krupansky

-Original Message- 
From: eShard

Sent: Monday, July 22, 2013 9:57 AM
To: solr-user@lucene.apache.org
Subject: how to improve (keyword) relevance?

Good morning,
I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev
on tomcat 7.
Early on, I used copyfield to put the meta data into the text field to
simplify solr queries (i.e. I only have to query one field now.)
However, a lot people are concerned about improving relevance.
I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook;
however is there a way to modify it so it only uses one field? (i.e. the
text field?)

(Note well: I have multi cores and the schemas are all somewhat different;
If I can't get this to work with one field then I would have to build
complex queries for all the other cores; this would vastly over complicate
the UI. Is there another way?)
here's the requesthandler in question:

 <1st name="defaults">
 true
 _query_:"{!edismaxqf=$qfQuery
mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery}"
 
 name^10 description
 1
 name description
 _query_:"{!edismaxqf=$boostQuerQf mm=100%
v=$mainQuery}"^10
 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Yonik Seeley

function queries to the rescue!

q={!func}def(query($a),query($b),query($c))
a=field1:value1
b=field2:value2
c=field3:value3

"def" or default function returns the value of the first argument that
matches.  It's named default because it's more commonly used like
def(popularity,50)  (return the value of the popularity field, or 50
if the doc has no value for that field).

-Yonik
http://lucidworks.com


On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena  wrote:
> I understand that lucene's AND (&&), OR (||) and NOT (!) operators are
> shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
> one can't treat them as boolean operators (adhering to boolean algebra).
>
> I have been trying to construct a simple OR expression, as follows
>
> q = +(field1:value1 OR field2:value2)
>
> with a match on either field1 or field2. But since the OR is merely an
> optional, documents where both field1:value1 and field2:value2 are matched,
> the query returns a score resulting in a match on both the clauses.
>
> How do I enforce short-circuiting in this context? In other words, how to
> implement short-circuiting as in boolean algebra where an expression A || B
> || C returns true if A is true without even looking into whether B or C
> could be true.
> -Deepak

adding date column to the index

2013-07-22 Thread Mysurf Mail

I have added a date field to my index.
I dont want the query to search on this field, but I want it to be returned
with each row.
So I have defined it in the scema.xml as follows:
  



I added it to the select in data-config.xml and I see it selected in the
profiler.
now, when I query all fileds (using the dashboard) I dont see it.
Even when I ask for it specifically I dont see it.
What am I doing wrong?

(In the db it is (datetimeoffset(7)))

Re: Auto-sharding and numShard parameter

2013-07-22 Thread Michael Della Bitta

That would be great.

One step toward this goal is to stop treating the situation where there are
no collections or cores as an error condition. It took me a while to get
out of the mindset when bringing up a Solr install that I had to avoid that
scenario at all costs, because red text == bad.

There's no reason for the web interface to be deactivated when there are no
collections or cores, though. Imagine if mysql didn't let you connect to it
via phpmyadmin if you hadn't configured a database yet?


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Sat, Jul 20, 2013 at 10:33 PM, Mark Miller  wrote:

> A lot has changed since those example were written - in general, we are
> moving away from that type of collection initialization and towards using
> the Collections API. Eventually, I'd personally like SolrCloud to ship with
> no predefined collections and have users simply start it and then start
> using the Collections API - preconfigured collections will be second class
> and possibly deprecated at some point.
>
> - Mark
>
> On Jul 20, 2013, at 10:13 PM, Erick Erickson 
> wrote:
>
> > Flavio:
> >
> > One of the great things about having people continually using Solr
> > (and SolrCloud) for the first time is the opportunity to improve the
> > docs. Anyone can update/add to the docs, all it takes is a signon.
> > Unfortunately we has a bunch of spam bots a while ago, so it's now a
> > two step process
> > 1> create a login on the Solr wiki
> > 2> post a message on this list indicating that you'd like to help
> > improve the Wiki and give us your Solr login. We'll add you to the
> > list of people who can edit the wiki and you can help the community by
> > improving the documentation.
> >
> > Best
> > Erick
> >
> > On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier
> >  wrote:
> >> Thank you for the reply Erick,
> >> I was facing exactly with that problem..from the documentation it seems
> >> that those parameter are required to run SolrCloud,
> >> instead they are just used to initialize a sample collection..
> >> I think that in the examples on the user doc it should be better to
> >> separate those 2 concepts: one is starting the server,
> >> another one is creating/managing collections.
> >>
> >> Best,
> >> Flavio
> >>
> >>
> >> On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> >>
> >>> First the numShards parameter is only relevant the very first time you
> >>> create your collection. It's a little confusing because in the
> SolrCloud
> >>> examples you're getting "collection1" by default. Look further down the
> >>> SolrCloud Wiki page, the section titled
> >>> "Managing Collections via the Collections API" for creating collections
> >>> with a different name.
> >>>
> >>> Either way, either when you run the bootstrap command or when you
> >>> create a new collection, that's the only time numShards counts. It's
> >>> ignored the rest of the time.
> >>>
> >>> As far as data growing, you need to either
> >>> 1> create enough shards to handle the eventual size things will be,
> >>> sometimes called "oversharding"
> >>> or
> >>> 2> use the splitShard capabilities in very recent Solrs to expand
> >>> capacity.
> >>>
> >>> Best
> >>> Erick
> >>>
> >>> On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier
> >>>  wrote:
>  Hi to all,
>  Probably this question has a simple answer but I just want to be sure
> of
>  the potential drawbacks..when I run SolrCloud I run the main solr
> >>> instance
>  with the -numShard option (e.g. 2).
>  Then as data grows, shards could potentially become a huge number. If
> I
>  hadstio to restart all nodes and I re-run the master with the
> numShard=2,
>  what will happen? It will be just ignored or Solr will try to reduce
>  shards...?
> 
>  Another question...in SolrCloud, how do I restart all the cloud at
> once?
> >>> Is
>  it possible?
> 
>  Best,
>  Flavio
> >>>
>
>

Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alexandre Rafalovitch

Does it mean that I can easily load Solr configuration as parsed by Solr
from an external program?

Because the last time I tried (4.3.1), the number of jars required was
quite long, including SolrJ jar due to some exception.

Regards.,
   Alex

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward  wrote:

> Hi Robert,
>
> The upcoming 4.4 release should make this a bit easier (you can check out
> the release branch now if you like, or wait a few days for the official
> version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
> object as constructor parameters, and you can create a ConfigSolr object
> from a string representation of solr.xml using the ConfigSolr.fromString()
> static method.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 22 Jul 2013, at 11:41, Robert Krüger wrote:
>
> > Hi,
> >
> > I use solr embedded in a desktop app and I want to change it to no
> > longer require the configuration for the container and core to be in
> > the filesystem but rather be distributed as part of a jar file.
> >
> > Could someone kindly point me to the right docs?
> >
> > So far my impression is, I need to instantiate CoreContainer with a
> > custom SolrResourceLoader with properties parsed via some other API
> > but from the javadocs alone I feel a bit lost (why does it have to
> > have an instance directory at all?) and googling did not give me many
> > results. What would be ideal would be to have something like this
> > (pseudocode with partly imagined names, which hopefully illustrates
> > what I am trying to achieve):
> >
> > ContainerConfig containerConfig =
> > ContainerConfigParser.parse();
> > CoreContainer  container = new CoreContainer(containerConfig);
> >
> > CoreConfig coreConfig = CoreConfigParser.parse(container,  > from Classloader>);
> > container.register(, coreConfig);
> >
> > Ideally I would like to keep XML format to reuse my current solr.xml
> > and solrconfig.xml but that is just a nice-to-have.
> >
> > Does such a way exist and if so, what are the real API classes and calls
> to use?
> >
> > Thank you in advance,
> >
> > Robert
>
>

Re: Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Timothy Potter

I saw something similar and used an absolute path to my JAR file in
solrconfig.xml vs. a relative path and it resolved the issue for me.
Not elegant but worth trying, at least to rule that out.


Tim

On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran
 wrote:
> Hi,
>
> I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin 
> which extends ValueSourceParser and it works under Solr 4.0.0 but it does not 
> work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
> lucene-4.3.1*.jars but I get the following stacktrace error when starting up 
> a core referencing this plugin...seen below. Does anyone know why it might be 
> giving me a ClassCastException under 4.3.1?
>
> Thanks,
> Niran
>
> 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
> Unable to create core: example_core
> org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, 
> com.example.HitsValueSourceParser failed to instanti
> ate org.apache.solr.search.ValueSourceParser
> at org.apache.solr.core.SolrCore.(SolrCore.java:821)
> at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> at 
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed
> to instantiate org.apache.solr.search.ValueSourceParser
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
> at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
> at 
> org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
> at org.apache.solr.core.SolrCore.(SolrCore.java:749)
> ... 13 more
> Caused by: java.lang.ClassCastException: class 
> com.example.HitsValueSourceParser
> at java.lang.Class.asSubclass(Unknown Source)
> at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
> at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
> ... 19 more
> 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
> null:org.apache.solr.common.SolrException: Unable to create core: example_core
> at 
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed
> to instantiate org.apache.solr.search.ValueSourceParser
> at org.apache.solr.core.SolrCore.(SolrCore.java:821)
> at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> at 
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
> ... 10 more
> Caused by: org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed
> to instantiate org.apache.solr.se

RE: Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran

Thanks Tim. 

I copied my jar containing the plugin to the solr's lib directory as it wasn't 
finding my jar due to a bug in 4.3:
https://issues.apache.org/jira/browse/SOLR-4791
but the ClassCastException remains. I'll try solr 4.2 and see if the plugin 
works in that.

Cheers,
Niran

 
-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com] 
Sent: 22 July 2013 15:39
To: solr-user@lucene.apache.org
Subject: Re: Problem instatanting a ValueSourceParser plugin in 4.3.1

I saw something similar and used an absolute path to my JAR file in 
solrconfig.xml vs. a relative path and it resolved the issue for me.
Not elegant but worth trying, at least to rule that out.


Tim

On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran 
 wrote:
> Hi,
>
> I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin 
> which extends ValueSourceParser and it works under Solr 4.0.0 but it does not 
> work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
> lucene-4.3.1*.jars but I get the following stacktrace error when starting up 
> a core referencing this plugin...seen below. Does anyone know why it might be 
> giving me a ClassCastException under 4.3.1?
>
> Thanks,
> Niran
>
> 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
> Unable to create core: example_core
> org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate 
> org.apache.solr.search.ValueSourceParser
> at org.apache.solr.core.SolrCore.(SolrCore.java:821)
> at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> at 
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source) Caused by: 
> org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate 
> org.apache.solr.search.ValueSourceParser
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
> at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
> at 
> org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
> at org.apache.solr.core.SolrCore.(SolrCore.java:749)
> ... 13 more
> Caused by: java.lang.ClassCastException: class 
> com.example.HitsValueSourceParser
> at java.lang.Class.asSubclass(Unknown Source)
> at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
> at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
> ... 19 more
> 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
> null:org.apache.solr.common.SolrException: Unable to create core: example_core
> at 
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source) Caused by: 
> org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate 
> org.apache.solr.search.Val

Re: custom field type plugin

2013-07-22 Thread David Smiley (@MITRE.org)

Like Hoss said, you're going to have to solve this using
http://wiki.apache.org/solr/SpatialForTimeDurations
Using PointType is *not* going to work because your durations are
multi-valued per document.

It would be useful to create a custom field type that wraps the capability
outlined on the wiki to make it easier to use without requiring the user to
think spatially.

You mentioned that these numeric ranges extend upwards of 10 billion or so. 
Unfortunately, the current "prefix tree" implementation under the hood for
non-geodetic spatial, the QuadTree, is unlikely to scale to numbers that
big.  I don't know where the boundary is, but I doubt 10B.  You could try
and see what happens.  I'm working (very slowly on very little spare time)
on improving the PrefixTree implementations to scale to such large numbers;
I hope something will be available this fall.

~ David Smiley


Kevin Stone wrote
> I have a particular use case that I think might require a custom field
> type, however I am having trouble getting the plugin to work.
> My use case has to do with genetics data, and we are running into several
> situations were we need to be able to query multiple regions of a
> chromosome (or gene, or other object types). All that really boils down to
> is being able to give a number, e.g. 10234, and return documents that have
> regions containing the number. So you'd have a document with a list like
> ["1:16090","400:8000","40123:43564"], and it should come back because
> 10234 falls between "1:16090". If there is a better or easier way to
> do this please speak up. I'd rather not have to use a "join" on another
> index, because 1) it's more complex to set up, and 2) we might need to
> join against something else and you can only do one join at a time.
> 
> Anyway… I tried creating a field type similar to a PointType just to see
> if I could get one working. I added the following jars to get it to
> compile:
> apache-solr-core-4.0.0,lucene-core-4.0.0,lucene-queries-4.0.0,apache-solr-solrj-4.0.0.
> I am running solr 4.0.0 on jetty, and put my jar file in a sharedLib
> folder, and specified it in my solr.xml (I have multiple cores).
> 
> After starting up solr, I got the line that it picked up the jar:
> INFO: Adding 'file:/blah/blah/lib/CustomPlugins.jar' to classloader
> 
> But I get this error about it not being able to find the
> AbstractSubTypeFieldType class.
> Here is the first bit of the trace:
> 
> SEVERE: null:java.lang.NoClassDefFoundError:
> org/apache/solr/schema/AbstractSubTypeFieldType
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> ...etc…
> 
> 
> Any hints as to what I did wrong? I can provide source code, or a fuller
> stack trace, config settings, etc.
> 
> Also, I did try to unpack the solr.war, stick my jar in WEB-INF/lib, then
> repack. However, when I did that, I get a NoClassDefFoundError for my
> plugin itself.
> 
> 
> Thanks,
> Kevin
> 
> The information in this email, including attachments, may be confidential
> and is intended solely for the addressee(s). If you believe you received
> this email by mistake, please notify the sender by return email as soon as
> possible.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-field-type-plugin-tp4079086p4079494.html
Sent from the Solr - User mailing list archive at Nabble.com.

Node down, but not out

2013-07-22 Thread jimtronic

I've run into a problem recently that's difficult to debug and search for:

I have three nodes in a cluster and this weekend one of the nodes went
partially down. It no longer responds to distributed updates and it is
marked as GONE in the Cloud view of the admin screen. That's not ideal, but
there's still two boxes up so not the end of the world.

The problem is that it is still responding to ping requests and returning
queries successfully. In my setup, I have the three servers on an haproxy
load balancer so that I can distribute requests and have clients stick to a
specific solr box. Because the bad node is still returning OK to the ping
requests and still returns results for simple queries, the load balancer
does not remove it from the group.

Is there a ping like request handler that would tell me whether the given
box I'm hitting is still "in the cloud"?

Thanks!
Jim Musil



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Regex in Stopword.xml

2013-07-22 Thread Scatman

I know it because i actually want to change GSA with Solr who his much better
in the enterprise's situation :) 

Thank's for reply anyway !

Best,
Scatman.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079491.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Roman Chyla

Deepak,

I think your goal is to gain something in speed, but most likely the
function query will be slower than the query without score computation (the
filter query) - this stems from the fact how the query is executed, but I
may, of course, be wrong. Would you mind sharing measurements you make?

Thanks,

  roman


On Mon, Jul 22, 2013 at 10:54 AM, Yonik Seeley  wrote:

> function queries to the rescue!
>
> q={!func}def(query($a),query($b),query($c))
> a=field1:value1
> b=field2:value2
> c=field3:value3
>
> "def" or default function returns the value of the first argument that
> matches.  It's named default because it's more commonly used like
> def(popularity,50)  (return the value of the popularity field, or 50
> if the doc has no value for that field).
>
> -Yonik
> http://lucidworks.com
>
>
> On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena 
> wrote:
> > I understand that lucene's AND (&&), OR (||) and NOT (!) operators are
> > shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
> > one can't treat them as boolean operators (adhering to boolean algebra).
> >
> > I have been trying to construct a simple OR expression, as follows
> >
> > q = +(field1:value1 OR field2:value2)
> >
> > with a match on either field1 or field2. But since the OR is merely an
> > optional, documents where both field1:value1 and field2:value2 are
> matched,
> > the query returns a score resulting in a match on both the clauses.
> >
> > How do I enforce short-circuiting in this context? In other words, how to
> > implement short-circuiting as in boolean algebra where an expression A
> || B
> > || C returns true if A is true without even looking into whether B or C
> > could be true.
> > -Deepak
>

Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Erick Erickson

Sweet!


On Mon, Jul 22, 2013 at 10:54 AM, Yonik Seeley  wrote:
> function queries to the rescue!
>
> q={!func}def(query($a),query($b),query($c))
> a=field1:value1
> b=field2:value2
> c=field3:value3
>
> "def" or default function returns the value of the first argument that
> matches.  It's named default because it's more commonly used like
> def(popularity,50)  (return the value of the popularity field, or 50
> if the doc has no value for that field).
>
> -Yonik
> http://lucidworks.com
>
>
> On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena  wrote:
>> I understand that lucene's AND (&&), OR (||) and NOT (!) operators are
>> shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
>> one can't treat them as boolean operators (adhering to boolean algebra).
>>
>> I have been trying to construct a simple OR expression, as follows
>>
>> q = +(field1:value1 OR field2:value2)
>>
>> with a match on either field1 or field2. But since the OR is merely an
>> optional, documents where both field1:value1 and field2:value2 are matched,
>> the query returns a score resulting in a match on both the clauses.
>>
>> How do I enforce short-circuiting in this context? In other words, how to
>> implement short-circuiting as in boolean algebra where an expression A || B
>> || C returns true if A is true without even looking into whether B or C
>> could be true.
>> -Deepak

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Timothy Potter

A couple of things I've learned along the way ...

I had a similar architecture where we used fairly low numbers for
auto-commits with openSearcher=false. This keeps the tlog to a
reasonable size. You'll need something on the client side to send in
the hard commit request to open a new searcher every N docs or M
minutes.

Be careful with raising the Zk timeout as that also determines how
quickly Zk can detect a node has crashed (afaik). In other words, it
takes the zk client timeout seconds for Zk to consider an ephemeral
znode as "gone", so I caution you in increasing this value too much.

The other thing to be aware of is this leaderVoteWait safety mechanism
... might see log messages that look like:

2013-06-24 18:12:40,408 [coreLoadExecutor-4-thread-1] INFO
solr.cloud.ShardLeaderElectionContext  - Waiting until we see more
replicas up: total=2 found=1 timeoutin=139368

>From Mark M: This is a safety mechanism - you can turn it off by
configuring leaderVoteWait to 0 in solr.xml. This is meant to protect
the case where you stop a shard or it fails and then the first node to
get started back up has stale data - you don't want it to just become
the leader. So we wait to see everyone we know about in the shard up
to 3 or 5 min by default. Then we know all the shards participate in
the leader election and the leader will end up with all updates it
should have. You can lower that wait or turn it off with 0.

NOTE: I tried setting it to 0 and my cluster went haywire, so consider
just lowering it but not making it zero ;-)

Max heap of 8GB seems overly large to me for 8M docs per shard esp.
since you're using MMapDirectory to cache the primary data structures
of your index in OS cache. I have run shards with 40M docs with 6GB
max heap and chose to have more aggressive cache eviction by using a
smallish LFU filter cache. This approach seems to spread the cost of
GC out over time vs. massive amounts of clean-up when a new searcher
is opened. With 8M docs, each cached filter will require about 1M of
memory, so it seems like you could run with a smaller heap. I'm not a
GC expert but found that having smaller heap and more aggressive cache
evictions reduced full GC's (and how long they run for) on my Solr
instances.

On Mon, Jul 22, 2013 at 8:09 AM, Shawn Heisey  wrote:
> On 7/22/2013 6:45 AM, Markus Jelsma wrote:
>> You should increase your ZK time out, this may be the issue in your case. 
>> You may also want to try the G1GC collector to keep STW under ZK time out.
>
> When I tried G1, the occasional stop-the-world GC actually got worse.  I
> tried G1 after trying CMS with no other tuning parameters.  The average
> GC time went down, but when it got into a place where it had to do a
> stop-the-world collection, it was worse.
>
> Based on the GC statistics in jvisualvm and jstat, I didn't think I had
> a problem.  The way I discovered that I had a problem was by looking at
> my haproxy load balancer -- sometimes requests would be sent to a backup
> server instead of my primary, because the ping request handler was
> timing out on the LB health check.  The LB was set to time out after
> five seconds.  When I went looking deeper with the GC log and some other
> tools, I was seeing 8-10 second GC pauses.  G1 was showing me pauses of
> 12 seconds.
>
> Now I use a heavily tuned CMS config, and there are no more LB switches
> to a backup server.  I've put some of my own information about my GC
> settings on my personal Solr wiki page:
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> I've got an 8GB heap on my systems running 3.5.0 (one copy of the index)
> and a 6GB heap on those running 4.2.1 (the other copy of the index).
>
> Summary: Just switching to the G1 collector won't solve GC pause
> problems.  There's not a lot of G1 tuning information out there yet.  If
> someone can come up with a good set of G1 tuning parameters, G1 might
> become better than CMS.
>
> Thanks,
> Shawn
>

RE: Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran

Hi,

Upgrading to Solr 4.2.1 works for my plugin but 4.3.1 does not work. I believe 
the ClassCastException which I am getting in 4.3.1 is due to this bug in 4.3.1:
https://issues.apache.org/jira/browse/SOLR-4791

Thanks,
Niran

-Original Message-
From: Abeygunawardena, Niran [mailto:niran.abeygunaward...@proquest.co.uk] 
Sent: 22 July 2013 16:01
To: solr-user@lucene.apache.org
Subject: RE: Problem instatanting a ValueSourceParser plugin in 4.3.1

Thanks Tim. 

I copied my jar containing the plugin to the solr's lib directory as it wasn't 
finding my jar due to a bug in 4.3:
https://issues.apache.org/jira/browse/SOLR-4791
but the ClassCastException remains. I'll try solr 4.2 and see if the plugin 
works in that.

Cheers,
Niran

 
-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com]
Sent: 22 July 2013 15:39
To: solr-user@lucene.apache.org
Subject: Re: Problem instatanting a ValueSourceParser plugin in 4.3.1

I saw something similar and used an absolute path to my JAR file in 
solrconfig.xml vs. a relative path and it resolved the issue for me.
Not elegant but worth trying, at least to rule that out.


Tim

On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran 
 wrote:
> Hi,
>
> I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin 
> which extends ValueSourceParser and it works under Solr 4.0.0 but it does not 
> work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
> lucene-4.3.1*.jars but I get the following stacktrace error when starting up 
> a core referencing this plugin...seen below. Does anyone know why it might be 
> giving me a ClassCastException under 4.3.1?
>
> Thanks,
> Niran
>
> 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
> Unable to create core: example_core
> org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate 
> org.apache.solr.search.ValueSourceParser
> at org.apache.solr.core.SolrCore.(SolrCore.java:821)
> at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> at 
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source) Caused by: 
> org.apache.solr.common.SolrException: Error Instantiating 
> ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate 
> org.apache.solr.search.ValueSourceParser
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
> at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
> at 
> org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
> at org.apache.solr.core.SolrCore.(SolrCore.java:749)
> ... 13 more
> Caused by: java.lang.ClassCastException: class 
> com.example.HitsValueSourceParser
> at java.lang.Class.asSubclass(Unknown Source)
> at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
> at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
> ... 19 more
> 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
> null:org.apache.solr.common.SolrException: Unable to create core: example_core
> at 
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>

Re: deserializing highlighting json result

2013-07-22 Thread Jack Krupansky


Exactly why is it difficult to deserialize? Seems simple enough.

-- Jack Krupansky

-Original Message- 
From: Mysurf Mail 
Sent: Monday, July 22, 2013 11:14 AM 
To: solr-user@lucene.apache.org 
Subject: deserializing highlighting json result 


When I request a json result I get the following streucture in the
highlighting

{"highlighting":{
  "394c65f1-dfb1-4b76-9b6c-2f14c9682cc9":{
 "PackageName":["- Testing channel twenty."]},
  "baf8434a-99a4-4046-8a4d-2f7ec09eafc8":{
 "PackageName":["- Testing channel twenty."]},
  "0a699062-cd09-4b2e-a817-330193a352c1":{
"PackageName":["- Testing channel twenty."]},
  "0b9ec891-5ef8-4085-9de2-38bfa9ea327e":{
"PackageName":["- Testing channel twenty."]}}}


It is difficult to deserialize this json because the guid is in the
attribute name.
Is that solveable (using c#)?

Re: Node down, but not out

2013-07-22 Thread Timothy Potter

Why was it down? e.g. did it OOM? If so, the recommended approach is
kill the process on OOM vs. leaving it in the cluster in a zombie
state. I had similar issues when my nodes OOM'd is why I ask. That
said, you can get the /clusterstate.json which contains Zk's status of
a node using a request like:
http://localhost:8983/solr/zookeeper?detail=true&path=%2Fclusterstate.json
Although that would require some basic JSON processing to dig into the
response to get the status of the node of interest, so you may want to
implement a custom request handler.

On Mon, Jul 22, 2013 at 9:55 AM, jimtronic  wrote:
> I've run into a problem recently that's difficult to debug and search for:
>
> I have three nodes in a cluster and this weekend one of the nodes went
> partially down. It no longer responds to distributed updates and it is
> marked as GONE in the Cloud view of the admin screen. That's not ideal, but
> there's still two boxes up so not the end of the world.
>
> The problem is that it is still responding to ping requests and returning
> queries successfully. In my setup, I have the three servers on an haproxy
> load balancer so that I can distribute requests and have clients stick to a
> specific solr box. Because the bad node is still returning OK to the ping
> requests and still returns results for simple queries, the load balancer
> does not remove it from the group.
>
> Is there a ping like request handler that would tell me whether the given
> box I'm hitting is still "in the cloud"?
>
> Thanks!
> Jim Musil
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: queryResultCache should not related with the order of fq's list

2013-07-22 Thread Chris Hostetter


: By the way, if the issure is ok , how can I post my code? 

Take a look at this wiki page for imformation on submitting patches...

https://wiki.apache.org/solr/HowToContribute
https://wiki.apache.org/solr/HowToContribute#Generating_a_patch

...you can attach your patch directly to hte Jira issue you created...

https://wiki.apache.org/solr/HowToContribute#Contributing_your_work


-Hoss

Re: how to improve (keyword) relevance?

2013-07-22 Thread eShard

Sure, let's say the user types in test pdf;
we need the results with all the query words to be near the top of the
result set.
the query will look like this: /select?q=text%3Atest+pdf&wt=xml

How do I ensure that the top resultset contains all of the query words?
How can I boost the first (or second) term when they are both the same field
(i.e. text)?

Does this make sense?

Please bear with me; I'm still new to the solr query syntax so I don't even
know if I'm asking the right question. 

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Node down, but not out

2013-07-22 Thread jimtronic

I'm not sure why it went down exactly -- I restarted the process and lost the
logs. (d'oh!) 

An OOM seems likely, however. Is there a setting for killing the processes
when solr encounters an OOM?

Thanks!

Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Auto-sharding and numShard parameter

2013-07-22 Thread Mark Miller

There is a reason of course, or else it wouldn't be like that.

We addressed it recently.

https://issues.apache.org/jira/browse/SOLR-3633
https://issues.apache.org/jira/browse/SOLR-3677
https://issues.apache.org/jira/browse/SOLR-4943

- Mark

On Jul 22, 2013, at 10:57 AM, Michael Della Bitta 
 wrote:

> That would be great.
> 
> One step toward this goal is to stop treating the situation where there are
> no collections or cores as an error condition. It took me a while to get
> out of the mindset when bringing up a Solr install that I had to avoid that
> scenario at all costs, because red text == bad.
> 
> There's no reason for the web interface to be deactivated when there are no
> collections or cores, though. Imagine if mysql didn't let you connect to it
> via phpmyadmin if you hadn't configured a database yet?
> 
> 
> Michael Della Bitta
> 
> Applications Developer
> 
> o: +1 646 532 3062  | c: +1 917 477 7906
> 
> appinions inc.
> 
> “The Science of Influence Marketing”
> 
> 18 East 41st Street
> 
> New York, NY 10017
> 
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
> 
> 
> On Sat, Jul 20, 2013 at 10:33 PM, Mark Miller  wrote:
> 
>> A lot has changed since those example were written - in general, we are
>> moving away from that type of collection initialization and towards using
>> the Collections API. Eventually, I'd personally like SolrCloud to ship with
>> no predefined collections and have users simply start it and then start
>> using the Collections API - preconfigured collections will be second class
>> and possibly deprecated at some point.
>> 
>> - Mark
>> 
>> On Jul 20, 2013, at 10:13 PM, Erick Erickson 
>> wrote:
>> 
>>> Flavio:
>>> 
>>> One of the great things about having people continually using Solr
>>> (and SolrCloud) for the first time is the opportunity to improve the
>>> docs. Anyone can update/add to the docs, all it takes is a signon.
>>> Unfortunately we has a bunch of spam bots a while ago, so it's now a
>>> two step process
>>> 1> create a login on the Solr wiki
>>> 2> post a message on this list indicating that you'd like to help
>>> improve the Wiki and give us your Solr login. We'll add you to the
>>> list of people who can edit the wiki and you can help the community by
>>> improving the documentation.
>>> 
>>> Best
>>> Erick
>>> 
>>> On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier
>>>  wrote:
 Thank you for the reply Erick,
 I was facing exactly with that problem..from the documentation it seems
 that those parameter are required to run SolrCloud,
 instead they are just used to initialize a sample collection..
 I think that in the examples on the user doc it should be better to
 separate those 2 concepts: one is starting the server,
 another one is creating/managing collections.
 
 Best,
 Flavio
 
 
 On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson <
>> erickerick...@gmail.com>wrote:
 
> First the numShards parameter is only relevant the very first time you
> create your collection. It's a little confusing because in the
>> SolrCloud
> examples you're getting "collection1" by default. Look further down the
> SolrCloud Wiki page, the section titled
> "Managing Collections via the Collections API" for creating collections
> with a different name.
> 
> Either way, either when you run the bootstrap command or when you
> create a new collection, that's the only time numShards counts. It's
> ignored the rest of the time.
> 
> As far as data growing, you need to either
> 1> create enough shards to handle the eventual size things will be,
> sometimes called "oversharding"
> or
> 2> use the splitShard capabilities in very recent Solrs to expand
> capacity.
> 
> Best
> Erick
> 
> On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier
>  wrote:
>> Hi to all,
>> Probably this question has a simple answer but I just want to be sure
>> of
>> the potential drawbacks..when I run SolrCloud I run the main solr
> instance
>> with the -numShard option (e.g. 2).
>> Then as data grows, shards could potentially become a huge number. If
>> I
>> hadstio to restart all nodes and I re-run the master with the
>> numShard=2,
>> what will happen? It will be just ignored or Solr will try to reduce
>> shards...?
>> 
>> Another question...in SolrCloud, how do I restart all the cloud at
>> once?
> Is
>> it possible?
>> 
>> Best,
>> Flavio
> 
>> 
>>

deserializing highlighting json result

2013-07-22 Thread Mysurf Mail

When I request a json result I get the following streucture in the
highlighting

{"highlighting":{
   "394c65f1-dfb1-4b76-9b6c-2f14c9682cc9":{
  "PackageName":["- Testing channel twenty."]},
   "baf8434a-99a4-4046-8a4d-2f7ec09eafc8":{
  "PackageName":["- Testing channel twenty."]},
   "0a699062-cd09-4b2e-a817-330193a352c1":{
 "PackageName":["- Testing channel twenty."]},
   "0b9ec891-5ef8-4085-9de2-38bfa9ea327e":{
 "PackageName":["- Testing channel twenty."]}}}


It is difficult to deserialize this json because the guid is in the
attribute name.
Is that solveable (using c#)?

Re: adding date column to the index

2013-07-22 Thread Gora Mohanty

On 22 July 2013 20:01, Mysurf Mail  wrote:
>
> I have added a date field to my index.
> I dont want the query to search on this field, but I want it to be
> returned
> with each row.
> So I have defined it in the scema.xml as follows:
>stored="true" required="true"/>
>
>
>
> I added it to the select in data-config.xml and I see it selected in the
> profiler.
> now, when I query all fileds (using the dashboard) I dont see it.
> Even when I ask for it specifically I dont see it.
> What am I doing wrong?
>
> (In the db it is (datetimeoffset(7)))

Did you restart your Java container, and reindex?

Regards,
Gora

Re: XInclude and Document Entity not working on schema.xml

2013-07-22 Thread Chris Hostetter

: to use "Document Entity" in schema.xml, I get this exception :
: java.lang.RuntimeException: schema fieldtype
: string(org.apache.solr.schema.StrField) invalid
: arguments:{xml:base=solrres:/commonschema_types.xml}

Elodie can you please open a bug in jira for this with your specific 
example?  please note in the Jira your comment that it works in Solr 4.2.1 
but fails in later versions (if you could test with 4.3 and the newly 
voted 4.4 that would be helpful.)

: The same error appears in this bug (fixed ?):
: https://issues.apache.org/jira/browse/SOLR-3087

That issue was specific to xinclude, not document entities, so it's 
possible the fix applied there did not affect/fix document entities -- but 
since you mentioned that you see document entity includes of 
fieldTypes working in 4.2.1 suggests that it might be a slightly diff 
problem, otherwise i would expect to see it fail as far back as 4.0 just 
like SOLR-3087...

: I also try to use use XML XInclude mechanism
: (http://en.wikipedia.org/wiki/XInclude) to include parts of schema.xml.
: 
: When I try to include a fieldType, I get this exception :
: org.apache.solr.common.SolrException: Unknown fieldType 'long' specified

...the issue you linked to before (SOLR-3087) included a specific test to 
ensure that fieldTYpes could be include like this, and that test works -- 
so pehaps in your testing you have some other subtle bug?  what are the 
absolute paths of the various files you are trying to include in one 
another?


-Hoss

Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alan Woodward

Hi Alex,

I'm not sure I follow - are you trying to create a ConfigSolr object from data 
read in from elsewhere, or trying to export the ConfigSolr object to another 
process?  If you're dealing with solr core java objects, you'll need the solr 
jar and all its dependencies (including solrj).

Alan Woodward
www.flax.co.uk


On 22 Jul 2013, at 15:53, Alexandre Rafalovitch wrote:

> Does it mean that I can easily load Solr configuration as parsed by Solr
> from an external program?
> 
> Because the last time I tried (4.3.1), the number of jars required was
> quite long, including SolrJ jar due to some exception.
> 
> Regards.,
>   Alex
> 
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward  wrote:
> 
>> Hi Robert,
>> 
>> The upcoming 4.4 release should make this a bit easier (you can check out
>> the release branch now if you like, or wait a few days for the official
>> version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
>> object as constructor parameters, and you can create a ConfigSolr object
>> from a string representation of solr.xml using the ConfigSolr.fromString()
>> static method.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 22 Jul 2013, at 11:41, Robert Krüger wrote:
>> 
>>> Hi,
>>> 
>>> I use solr embedded in a desktop app and I want to change it to no
>>> longer require the configuration for the container and core to be in
>>> the filesystem but rather be distributed as part of a jar file.
>>> 
>>> Could someone kindly point me to the right docs?
>>> 
>>> So far my impression is, I need to instantiate CoreContainer with a
>>> custom SolrResourceLoader with properties parsed via some other API
>>> but from the javadocs alone I feel a bit lost (why does it have to
>>> have an instance directory at all?) and googling did not give me many
>>> results. What would be ideal would be to have something like this
>>> (pseudocode with partly imagined names, which hopefully illustrates
>>> what I am trying to achieve):
>>> 
>>> ContainerConfig containerConfig =
>>> ContainerConfigParser.parse();
>>> CoreContainer  container = new CoreContainer(containerConfig);
>>> 
>>> CoreConfig coreConfig = CoreConfigParser.parse(container, >> from Classloader>);
>>> container.register(, coreConfig);
>>> 
>>> Ideally I would like to keep XML format to reuse my current solr.xml
>>> and solrconfig.xml but that is just a nice-to-have.
>>> 
>>> Does such a way exist and if so, what are the real API classes and calls
>> to use?
>>> 
>>> Thank you in advance,
>>> 
>>> Robert
>> 
>>

Re: Node down, but not out

2013-07-22 Thread Timothy Potter

There is but I couldn't get it to work in my environment on Jetty, see:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E

Let me know if you have any better luck. I had to resort to something
hacky but was out of time I could devote to such unproductive
endeavors ;-)

On Mon, Jul 22, 2013 at 10:49 AM, jimtronic  wrote:
> I'm not sure why it went down exactly -- I restarted the process and lost the
> logs. (d'oh!)
>
> An OOM seems likely, however. Is there a setting for killing the processes
> when solr encounters an OOM?
>
> Thanks!
>
> Jim
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: XInclude and Document Entity not working on schema.xml

2013-07-22 Thread Chris Hostetter


: Elodie can you please open a bug in jira for this with your specific 
...
: ...the issue you linked to before (SOLR-3087) included a specific test to 
: ensure that fieldTYpes could be include like this, and that test works -- 
: so pehaps in your testing you have some other subtle bug?  what are the 
: absolute paths of the various files you are trying to include in one 
: another?

Hmm... actually, i had some time while i was on a conf call, so i just 
updated the test to also test entity includes, and i wan't able to 
reproduce either of hte problems you described.

can you please take a look at this test, and the configs it uses, and 
compare with how you are trying to do things...

http://svn.apache.org/r1505749

http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test/org/apache/solr/core/TestXIncludeConfig.java?view=markup
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-xinclude.xml?view=markup
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-snippet-types.incl?view=markup
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-snippet-type.xml?view=markup



-Hoss

Re: how to improve (keyword) relevance?

2013-07-22 Thread Jack Krupansky

Again, you haven't indicated what the problem is. I mean, have you actually 
confirmed that a problem exists? Add debugQuery=true to your query and 
examine the "explain" section if you believe that Solr has improperly 
computed any document scores.


If you simply want to boost a term in a query, use the "^" operator, which 
applies to the preceding term. a boost of 1.0 means no change, 2.0 means 
double, 0.5 means cut in half.


But, you don't need to boost. Relevancy is based on the data in the 
documents themselves.


BTW, q=text%3Atest+pdf does not search for "pdf" in the "text" field - 
field- qualification only applies to a single term, but you can use 
parentheses: q=text%3A(test+pdf)


-- Jack Krupansky

-Original Message- 
From: eShard

Sent: Monday, July 22, 2013 12:34 PM
To: solr-user@lucene.apache.org
Subject: Re: how to improve (keyword) relevance?

Sure, let's say the user types in test pdf;
we need the results with all the query words to be near the top of the
result set.
the query will look like this: /select?q=text%3Atest+pdf&wt=xml

How do I ensure that the top resultset contains all of the query words?
How can I boost the first (or second) term when they are both the same field
(i.e. text)?

Does this make sense?

Please bear with me; I'm still new to the solr query syntax so I don't even
know if I'm asking the right question.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alexandre Rafalovitch

I am trying to read a solr config files from outside of running Solr
instance. It's - one of the approaches - for SolrLint (
https://github.com/arafalov/SolrLint ). I kind of expected to just need
core Solr classes for that, but I needed SolrJ and Lucene analyzer jar and
a bunch of other jars.

The goal was to avoid recreating valid/invalid parsing of config files and
just use Solr's definition.

Anyway, I don't want to hijack the thread. In the end, I think Solr's parse
mechanism is probably not the best match for me, as I explicitly want to
detect things like field definitions in wrong place or incorrect spelling
and the current parser just ignores those by doing select XPath queries
instead.

Regards,
   Alex.



Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jul 22, 2013 at 1:16 PM, Alan Woodward  wrote:

> Hi Alex,
>
> I'm not sure I follow - are you trying to create a ConfigSolr object from
> data read in from elsewhere, or trying to export the ConfigSolr object to
> another process?  If you're dealing with solr core java objects, you'll
> need the solr jar and all its dependencies (including solrj).
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 22 Jul 2013, at 15:53, Alexandre Rafalovitch wrote:
>
> > Does it mean that I can easily load Solr configuration as parsed by Solr
> > from an external program?
> >
> > Because the last time I tried (4.3.1), the number of jars required was
> > quite long, including SolrJ jar due to some exception.
> >
> > Regards.,
> >   Alex
> >
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward  wrote:
> >
> >> Hi Robert,
> >>
> >> The upcoming 4.4 release should make this a bit easier (you can check
> out
> >> the release branch now if you like, or wait a few days for the official
> >> version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
> >> object as constructor parameters, and you can create a ConfigSolr object
> >> from a string representation of solr.xml using the
> ConfigSolr.fromString()
> >> static method.
> >>
> >> Alan Woodward
> >> www.flax.co.uk
> >>
> >>
> >> On 22 Jul 2013, at 11:41, Robert Krüger wrote:
> >>
> >>> Hi,
> >>>
> >>> I use solr embedded in a desktop app and I want to change it to no
> >>> longer require the configuration for the container and core to be in
> >>> the filesystem but rather be distributed as part of a jar file.
> >>>
> >>> Could someone kindly point me to the right docs?
> >>>
> >>> So far my impression is, I need to instantiate CoreContainer with a
> >>> custom SolrResourceLoader with properties parsed via some other API
> >>> but from the javadocs alone I feel a bit lost (why does it have to
> >>> have an instance directory at all?) and googling did not give me many
> >>> results. What would be ideal would be to have something like this
> >>> (pseudocode with partly imagined names, which hopefully illustrates
> >>> what I am trying to achieve):
> >>>
> >>> ContainerConfig containerConfig =
> >>> ContainerConfigParser.parse();
> >>> CoreContainer  container = new CoreContainer(containerConfig);
> >>>
> >>> CoreConfig coreConfig = CoreConfigParser.parse(container,  >>> from Classloader>);
> >>> container.register(, coreConfig);
> >>>
> >>> Ideally I would like to keep XML format to reuse my current solr.xml
> >>> and solrconfig.xml but that is just a nice-to-have.
> >>>
> >>> Does such a way exist and if so, what are the real API classes and
> calls
> >> to use?
> >>>
> >>> Thank you in advance,
> >>>
> >>> Robert
> >>
> >>
>
>

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Lance Norskog


Are you feeding Graphite from Solr? If so, how?

On 07/19/2013 01:02 AM, Neil Prosser wrote:

That was overnight so I was unable to track exactly what happened (I'm
going off our Graphite graphs here).

Re: adding date column to the index

2013-07-22 Thread Lance Norskog

Solr/Lucene does not automatically add when asked, the way DBMS systems 
do. Instead, all data for a field is added at the same time. To get the 
new field, you have to reload all of your data.


This is also true for deleting fields. If you remove a field, that data 
does not go away until you re-index.


On 07/22/2013 07:31 AM, Mysurf Mail wrote:

I have added a date field to my index.
I dont want the query to search on this field, but I want it to be returned
with each row.
So I have defined it in the scema.xml as follows:
   



I added it to the select in data-config.xml and I see it selected in the
profiler.
now, when I query all fileds (using the dashboard) I dont see it.
Even when I ask for it specifically I dont see it.
What am I doing wrong?

(In the db it is (datetimeoffset(7)))

IllegalStateException

2013-07-22 Thread Michael Long

I'm seeing random crashes in solr 4.0 but I don't have anything to go on 
other than "IllegalStateException". Other than checking for corrupt 
index and out of memory, what other things should I check?



org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet default threw exception
java.lang.IllegalStateException
at 
org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:483)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)

Re: Performance of cross join vs block join

2013-07-22 Thread Roman Chyla

Hello Mikhail,

ps: sending to the solr-user as well, i've realized i was writing just to
you, sorry...

On Mon, Jul 22, 2013 at 3:07 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello Roman,
>
> Pleas get me right. I have no idea what happened with that dependency.
> There are recent patches from Yonik, they should be more actual, and I
> think he can help you with particular issues. From the common (captain's)
> sense I propose to specify any closer version of jetty, I don't think there
> are much reason to rely on that particular one.
>
> I'm thinking about your problem from time to time. You are right, it's
> definitely not a case for block join. I still trying to figure out how to
> make it computationally easier. As far as I get you have recursive
> many-to-many relationship and need to traverse it during the search.
>
> doc(id, author, text, references:[docid,] )
>
> I'm not sure it's possible with lucene now, but if it can, what you think
> about writing DocValues stripe contains internal Lucene docnums instead of
> external docIds. It moves few steps from query time to index time, hence
> can get some performance.
>

Our use case of many-to-many relations is probably a weird one and we ought
to de-normalize the values. What I do (a building a citation network in
memory, using Lucene caches) is just a work-around that happens to
out-perform the index seeking, no surprise on that, but in the expense of
memory. I am aware the de-normalization may be necessary, the DocValues
would probably be a step forward to it - the joins give great flexibility,
it is really cool, but that comes with its own price...


>
> Also, I mentioned you hesitates regarding cross segments join. You
> actually shouldn't due to the following reasons:
>  - Join is a Solr code (which is a top reader beast);
>  - it obtains and works with SolrIndexSearcher which is a top reader...
>  - join happens at Weight without any awareness about leaf segments.
>
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L272
>

Thanks, I think I have not used (i believe) because there was very small
chance it could have been fast enough. It is reading terms/joins for docs
that match the query, so in that sense, it is not different from
pre-computing the citation cache - but it happens for every query/request,
and so for 0.5M of edges it must take some time. But I guess I should
measure it. I haven't made notes so now I am having hard time backtracking
:)

roman


> It seems to me cross segment join works well.
>
>
>
> On Mon, Jul 22, 2013 at 3:08 AM, Roman Chyla wrote:
>
>> ah, in case you know the solution, here ant output:
>>
>> resolve:
>> [ivy:retrieve]
>> [ivy:retrieve] :: problems summary ::
>> [ivy:retrieve]  WARNINGS
>> [ivy:retrieve] module not found:
>> org.eclipse.jetty#jetty-deploy;8.1.10.v20130312
>> [ivy:retrieve]  local: tried
>> [ivy:retrieve]  
>> /home/rchyla/.ivy2/local/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/ivys/ivy.xml
>> [ivy:retrieve]   -- artifact
>> org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
>> [ivy:retrieve]  
>> /home/rchyla/.ivy2/local/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/jars/jetty-deploy.jar
>> [ivy:retrieve]  shared: tried
>> [ivy:retrieve]  
>> /home/rchyla/.ivy2/shared/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/ivys/ivy.xml
>> [ivy:retrieve]   -- artifact
>> org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
>> [ivy:retrieve]  
>> /home/rchyla/.ivy2/shared/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/jars/jetty-deploy.jar
>> [ivy:retrieve]  public: tried
>> [ivy:retrieve]
>> http://repo1.maven.org/maven2/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
>> [ivy:retrieve]  sonatype-releases: tried
>> [ivy:retrieve]
>> http://oss.sonatype.org/content/repositories/releases/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
>> [ivy:retrieve]   -- artifact
>> org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
>> [ivy:retrieve]
>> http://oss.sonatype.org/content/repositories/releases/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.jar
>> [ivy:retrieve]  maven.restlet.org: tried
>> [ivy:retrieve]
>> http://maven.restlet.org/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
>> [ivy:retrieve]   -- artifact
>> org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
>> [ivy:retrieve]
>> http://maven.restlet.org/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.jar
>> [ivy:retrieve]  working-chinese-mirror: tried
>> [ivy:retrieve]
>> http://mirror.netcologne.de/maven2/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
>> [ivy:retrieve]   -- artifact
>> org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
>> [ivy:retrieve]
>> http://m

how number of indexed fields effect performance

2013-07-22 Thread Suryansh Purwar

Hi,

We have a two shard solrcloud cluster with each shard allocated 3 separate
machines. We do complex queries involving a number of filter queries
coupled with group queries and faceting. All of our machines are 64 bit
with 32 gb ram. Our index size is around 10gb with around 8,00,000
documents. We have around 1000 indexed fields per document. 6gb of memeory
is allocated to tomcat under which solr is running  on each of the six
machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
running on 3 of the six machines with 4gb memory allocated to each of the
zookeeper instance. First solr start taking too much time with "Broken pipe
exception because of timeout from client side" coming again and again, then
after sometime a whole shard goes down with one machine at at time followed
by other machines.  Is having 1000 fields indexed with each document
resulting in this problem? If it is so, what would be the ideal number of
indexed fields in such environment.

Regards,
Suryansh

Bug with Group.Limit and Group.Main in Distributed Case

2013-07-22 Thread Monica Skidmore

We are using grouping in a distributed environment, and we have noticed a 
discrepancy:



On a single core with a group.limit > 1 and group.main=true, setting rows=10 
will return 10 documents.  A distributed setup with the same parameters will 
return 10 groups.



We plan to open a jira ticket and submit a fix, but there is the question of 
which way to fix it.  In the case where group.main is not set, the group.limit 
applies to the number of groups for both single and multi core cases, so that 
approach would be consistent.


However, it seems to us that a user requesting the group.main results format 
will likely expect the group.limit to apply to the number of documents.  A 
discussion held around an older fix a couple of years ago supports this view.  
(https://issues.apache.org/jira/browse/SOLR-2063)


Unless there is a good case for the first approach, we plan to go with the 
second; I wanted to put this out to see if we're overlooking something  - or if 
this was implemented in the way for some reason - feedback?

Monica Skidmore
Search Application Services Engineering Lead
CareerBuilder.com

Fw:

2013-07-22 Thread wiredkel


Hi!   http://210.172.48.53/google.com.offers.html

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser

I just have a little python script which I run with cron (luckily that's
the granularity we have in Graphite). It reads the same JSON the admin UI
displays and dumps numeric values into Graphite.

I can open source it if you like. I just need to make sure I remove any
hacks/shortcuts that I've taken because I'm working with our cluster!

On 22 July 2013 19:26, Lance Norskog  wrote:

> Are you feeding Graphite from Solr? If so, how?
>
>
> On 07/19/2013 01:02 AM, Neil Prosser wrote:
>
>> That was overnight so I was unable to track exactly what happened (I'm
>> going off our Graphite graphs here).
>>
>
>

Re: how number of indexed fields effect performance

2013-07-22 Thread Jack Krupansky

Was all of this running fine previously and only started running slow 
recently, or is this your first measurement?


Are very simple queries (single keyword, no filters or facets or sorting or 
anything else, and returning only a few fields) working reasonably well?


-- Jack Krupansky

-Original Message- 
From: Suryansh Purwar

Sent: Monday, July 22, 2013 4:07 PM
To: solr-user@lucene.apache.org
Subject: how number of indexed fields effect performance

Hi,

We have a two shard solrcloud cluster with each shard allocated 3 separate
machines. We do complex queries involving a number of filter queries
coupled with group queries and faceting. All of our machines are 64 bit
with 32 gb ram. Our index size is around 10gb with around 8,00,000
documents. We have around 1000 indexed fields per document. 6gb of memeory
is allocated to tomcat under which solr is running  on each of the six
machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
running on 3 of the six machines with 4gb memory allocated to each of the
zookeeper instance. First solr start taking too much time with "Broken pipe
exception because of timeout from client side" coming again and again, then
after sometime a whole shard goes down with one machine at at time followed
by other machines.  Is having 1000 fields indexed with each document
resulting in this problem? If it is so, what would be the ideal number of
indexed fields in such environment.

Regards,
Suryansh

/update/extract error

2013-07-22 Thread franagan

Hi all,

im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper.
All its runing ok, documents are indexing in 2 diferent shards and select
*:* give me all documents.

Now im trying to add/index a new document via solj ussing CloudSolrServer. 

the code:

CloudSolrServer server = new CloudSolrServer("localhost:2181");
server.setDefaultCollection("tika");


ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(new File("C:\\sample.pdf"), "application/octet-stream");
up.setParam("literal.id", "666");   

server.request(up);
server.commit();

when up.setParam("literal.id", "666");, a exception is thown:

*apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR:
[doc=66
6] unknown field 'ignored_dcterms:modified'*
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43
9)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:918)
at java.lang.Thread.run(Thread.java:662)


My schema looks like this:
 

   
   
  
 
 

my solrConfig.xml:

  
 
  last_modified
  ignored_


  -MM-dd

  

i have already activate /admin/luke check the schema, no dcterms:modified
field in the response only the corrects fields declared in schema.xml

Can someone help me with this issue?

Thanks in advance. 









--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: /update/extract error

2013-07-22 Thread Jack Krupansky

You need a dynamic field pattern for "ignored_*" to ignore unmapped 
metadata.


-- Jack Krupansky

-Original Message- 
From: franagan

Sent: Monday, July 22, 2013 5:14 PM
To: solr-user@lucene.apache.org
Subject: /update/extract error

Hi all,

im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper.
All its runing ok, documents are indexing in 2 diferent shards and select
*:* give me all documents.

Now im trying to add/index a new document via solj ussing CloudSolrServer.

the code:

CloudSolrServer server = new CloudSolrServer("localhost:2181");
server.setDefaultCollection("tika");


ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(new File("C:\\sample.pdf"), "application/octet-stream");
up.setParam("literal.id", "666");

server.request(up);
server.commit();

when up.setParam("literal.id", "666");, a exception is thown:

*apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR:
[doc=66
6] unknown field 'ignored_dcterms:modified'*
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:402)
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:180)
   at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:401)
   at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:375)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43
9)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:895)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:918)
   at java.lang.Thread.run(Thread.java:662)


My schema looks like this:

   
  
  
  
  


my solrConfig.xml:

 

 last_modified
 ignored_
   
   
 -MM-dd
   
 

i have already activate /admin/luke check the schema, no dcterms:modified
field in the response only the corrects fields declared in schema.xml

Can someone help me with this issue?

Thanks in advance.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: /update/extract error

2013-07-22 Thread franagan

I added   to the schema.xml and now its working. 
*
Thank you very much Jack. *





--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-in-Solr-4-3-1-tp4079555p4079564.html
Sent from the Solr - User mailing list archive at Nabble.com.

Use same spell check dictionary across different collections

2013-07-22 Thread smanad

I have 2 collections, lets say coll1 and coll2.

I configured solr.DirectSolrSpellChecker in coll1 solrconfig.xml and works
fine. 

Now, I want to configure coll2 solrconfig.xml to use SAME spell check
dictionary index created above. (I do not want coll2 prepare its own
dictionary index but just do spell check against the coll1 Spell dictionary
index)

Is it possible to do it? Tried out with IndexBasedSpellChecker but could not
get it working. 

Any suggestions?
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-same-spell-check-dictionary-across-different-collections-tp4079566.html
Sent from the Solr - User mailing list archive at Nabble.com.

spellcheck and search in a same solr request

2013-07-22 Thread smanad

Hey, 

Is there a way to do spellcheck and search (using suggestions returned from
spellcheck) in a single Solr request?

I am seeing that if my query is spelled correctly, i get results but if
misspelled, I just get suggestions.

Any pointers will be very helpful.
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-and-search-in-a-same-solr-request-tp4079571.html
Sent from the Solr - User mailing list archive at Nabble.com.

softCommit doesn't work - ?

2013-07-22 Thread tskom

Hi,

I use solr 4.3.1.
I tried to index about 70 documents using sofCommit as below:

SolrInputDocument doc = new SolrInputDocument();
result = fillMetaData(request, doc); // custom one
int softCommit = 1;
solrServer.add(doc, softCommit);

Process ran very fast but there is nothing in the index neither after 10sec
nor after restarting server application
In the solr log I got something like that: 
2013-07-23 01:58:01,543 INFO 
[org.apache.solr.update.processor.LogUpdateProcessor]
(http-127.0.0.1-8090-5) [collection1] webapp=/solr path=/update
params={wt=javabin&version=2} {add=[Rep_CA_FairyCakes
(1441307014244335616)]} 0 3
2013-07-23 01:58:01,546 INFO  [org.apache.solr.update.UpdateHandler]
(http-127.0.0.1-8090-5) start rollback{}
2013-07-23 01:58:01,547 INFO  [org.apache.solr.update.DefaultSolrCoreState]
(http-127.0.0.1-8090-5) Creating new IndexWriter...
2013-07-23 01:58:01,547 INFO  [org.apache.solr.update.DefaultSolrCoreState]
(http-127.0.0.1-8090-5) Waiting until IndexWriter is unused...
core=collection1
2013-07-23 01:58:01,547 INFO  [org.apache.solr.update.DefaultSolrCoreState]
(http-127.0.0.1-8090-5) Rollback old IndexWriter... core=collection1
2013-07-23 01:58:01,617 INFO  [org.apache.solr.core.SolrCore]
(http-127.0.0.1-8090-5) SolrDeletionPolicy.onInit: commits:num=1

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C:\solr\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@7ed1f882;
maxCacheMB=48.0
maxMergeSizeMB=4.0),segFN=segments_ew,generation=536,filenames=[_ah_Lucene41_0.tim,
_9d.fdt, _a5.fdx, _ag_Lucene41_0.pos, _9l.si, _a7.nvd, _a0_Lucene41_0.pos,
_ah_Lucene41_0.tip, _9d.fdx, _a5.fdt, _9r.fnm, _97_Lucene41_0.doc,
_9k_Lucene41_0.tim, _a7.nvm, _ad.fnm, _9k_Lucene41_0.tip, _a9.fnm, _9g.nvm,
_ao_Lucene41_0.tim, _ao_Lucene41_0.tip, _9i_Lucene41_0.doc, _a2.nvm,
_az_Lucene41_0.tim, _az_Lucene41_0.tip, _af_Lucene41_0.pos, _9t.nvm,
_9w.fnm, _9z.si, _a9_Lucene41_0.tim, _9h.fnm, _9g.nvd, _a9_Lucene41_0.tip,
_9d_Lucene41_0.pos, _9t.nvd, _a3.fdx, _aw.nvm, _9i_Lucene41_0.pos, _98.fnm,
_a3.fdt, _a8_Lucene41_0.tim, _am.nvd, _aw.nvd, _a8_Lucene41_0.tip, _9f.si,
_ap.fdt, _ag.fdt, _au.fnm, _aq.nvm, _ap.fdx, _av.fdt, _a0.si,
_ac_Lucene41_0.doc, _a9_Lucene41_0.doc, _at_Lucene41_0.doc, _9u.fdx,
_9z.fnm, _9d.si, _af.nvd, _9j_Lucene41_0.doc, _9u.fdt, _ag.fdx, _9b.si,
_af.nvm, _9q.fnm, _aw_Lucene41_0.tim, _aw_Lucene41_0.tip, _ao.fnm, _9f.fnm,
_a1.fdt, _9l_Lucene41_0.pos, _ad_Lucene41_0.pos, _a1.fdx,
_aa_Lucene41_0.tip, _aa_Lucene41_0.tim, _9j_Lucene41_0.pos, _a2.nvd,
_aj.nvd, _9o.fnm, _am.fnm, _9t_Lucene41_0.doc, _av.fdx, _ab.fdt, _an.nvd,
_at.nvd, _ao_Lucene41_0.doc, _al.fnm, _9e_Lucene41_0.doc, _ab.fdx, _9x.fnm,
_aj.nvm, _at.nvm, _ai.fnm, _9a_Lucene41_0.tim, _ak.nvm, _a2_Lucene41_0.doc,
_an.nvm, _ah.nvd, _aw.fnm, _al_Lucene41_0.doc, _9a_Lucene41_0.tip,
_9f_Lucene41_0.tim, _aq.fnm, _ah.nvm, _9k.nvd, _9b.nvm, _9c.fnm,
_9f_Lucene41_0.tip, _9y_Lucene41_0.pos, _ax_Lucene41_0.doc,
_av_Lucene41_0.tip, _ar_Lucene41_0.tim, _9c.si, _av_Lucene41_0.tim, _9b.nvd,
_ar_Lucene41_0.tip, _as_Lucene41_0.tip, _as_Lucene41_0.tim,
_ae_Lucene41_0.pos, _9j.si, _9z.nvd, _9y_Lucene41_0.doc, _a6_Lucene41_0.doc,
_9d_Lucene41_0.doc, _ao.nvd, _9m.fdx, _ac.fdx, _a6.si, _aa_Lucene41_0.doc,
_9m.fdt, _ac.fdt, _a3_Lucene41_0.pos, _av_Lucene41_0.doc, _9k.nvm,
_ay_Lucene41_0.pos, _9z.nvm, _ai_Lucene41_0.tim, _aq.si, _ap_Lucene41_0.pos,
_ai_Lucene41_0.tip, _96.si, _ab_Lucene41_0.pos, _9e.fnm, _as_Lucene41_0.doc,
_9h.si, _96.nvm, _96.nvd, _ae.fdt, _9f_Lucene41_0.pos, _a4.fdx, _ae.fdx,
_a4.fdt, _9j.fnm, _9z_Lucene41_0.doc, _9p.nvm, _aw.si, _a8.nvm, _9p.nvd,
_9s.fdx, _9v.fnm, _a8.nvd, _9f_Lucene41_0.doc, _9s.fdt, _a2.si, _ai.si,
_9o_Lucene41_0.tip, _a3.si, _9o_Lucene41_0.tim, _aj_Lucene41_0.tip,
_aj_Lucene41_0.tim, _99.si, _9k_Lucene41_0.pos, _97.fdt, _9w.fdx, _a5.si,
_9s_Lucene41_0.pos, _9w.fdt, _aj.fnm, _97.fdx, _9p.fdx, _9t.fnm, _9j.fdx,
_9j.fdt, _ar_Lucene41_0.pos, _au_Lucene41_0.doc, _9p_Lucene41_0.doc,
_9a.fdx, _9j_Lucene41_0.tip, _9q.nvd, _at_Lucene41_0.tip, _an.si,
_9j_Lucene41_0.tim, _at_Lucene41_0.tim, _ad.fdx, _az_Lucene41_0.doc,
_ad.fdt, _9q.nvm, _9g.fdx, _ax_Lucene41_0.pos, _9r.fdt, _9g.fdt, _9r.fdx,
_9a.fdt, _a7.si, _98.nvm, _au_Lucene41_0.tim, _ag.nvm, _az.si,
_au_Lucene41_0.tip, _ag.nvd, _ao.nvm, _9o.fdx, _9q_Lucene41_0.tip, _ax.si,
_9p_Lucene41_0.pos, _9q_Lucene41_0.tim, _az.fdx, _a1.si, _98.nvd, _az.fdt,
_9w_Lucene41_0.doc, _aa_Lucene41_0.pos, _ag.fnm, _a9.nvm, _aa.nvm, _a2.fnm,
_9b_Lucene41_0.tip, _ak.nvd, _9b_Lucene41_0.tim, _a9.nvd, _ai.nvm, _9i.fdx,
_a3.fnm, _9e_Lucene41_0.pos, _a7_Lucene41_0.tip, _9z.fdx,
_a7_Lucene41_0.tim, _ai.nvd, _aa.nvd, _9i.fdt, _9z.fdt, _ae_Lucene41_0.doc,
_9t_Lucene41_0.pos, _ak.si, _97_Lucene41_0.pos, _al_Lucene41_0.tim, _ax.nvm,
_9x.nvm, _ap.fnm, _9c_Lucene41_0.pos, _ah.si, _ax.nvd, _af.fdx, _af.fdt,
_a6.fdx, _ac.fnm, _9r_Lucene41_0.pos, _al_Lucene41_0.tip,
_a1_Lucene41_0.pos, _9t_Lucene41_0.tip, _a4.fnm, _ak_Lucene41_0.pos,

salutations

2013-07-22 Thread chris sleeman

 http://tagtjek.nu/kbjdzhn/qvpcuvlvvyhpgxkjamkgc















 chris sleeman













 7/23/2013 2:37:13 AM

Re:

2013-07-22 Thread wiredkel


Hi!   http://brubud.pl/cnn.com.today.html

how number of indexed fields effect performance

2013-07-22 Thread Suryansh Purwar

It was running fine initially when we just had around 100 fields
indexed. In this case as well it runs fine but after sometime broken pipe
exception starts coming which results in shard getting down.

Regards,
Suryansh



On Tuesday, July 23, 2013, Jack Krupansky wrote:

> Was all of this running fine previously and only started running slow
> recently, or is this your first measurement?
>
> Are very simple queries (single keyword, no filters or facets or sorting
> or anything else, and returning only a few fields) working reasonably well?
>
> -- Jack Krupansky
>
> -Original Message- From: Suryansh Purwar
> Sent: Monday, July 22, 2013 4:07 PM
> To: solr-user@lucene.apache.org
> Subject: how number of indexed fields effect performance
>
> Hi,
>
> We have a two shard solrcloud cluster with each shard allocated 3 separate
> machines. We do complex queries involving a number of filter queries
> coupled with group queries and faceting. All of our machines are 64 bit
> with 32 gb ram. Our index size is around 10gb with around 8,00,000
> documents. We have around 1000 indexed fields per document. 6gb of memeory
> is allocated to tomcat under which solr is running  on each of the six
> machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
> running on 3 of the six machines with 4gb memory allocated to each of the
> zookeeper instance. First solr start taking too much time with "Broken pipe
> exception because of timeout from client side" coming again and again, then
> after sometime a whole shard goes down with one machine at at time followed
> by other machines.  Is having 1000 fields indexed with each document
> resulting in this problem? If it is so, what would be the ideal number of
> indexed fields in such environment.
>
> Regards,
> Suryansh
>

Question about field boost

2013-07-22 Thread Joe Zhang

Dear Solr experts:

Here is my query:

defType=dismax&q=term1+term2&qf=title^100 content

Apparently (at least I thought) my intention is to boost the title field.
While I'm getting some non-trivial results, I'm surprised that the
documents with both term1 and term2 in title (I know such docs do exist in
my repository) were not returned (or maybe ranked very low). The situation
does not change even when I use much larger boost factors.

What am I doing wrong?

Re: Question about field boost

2013-07-22 Thread Jack Krupansky

Maybe you're not doing anything wrong - other than having an artificial 
expectation of what the true relevance of your data actually is. Many 
factors go into relevance scoring. You need to look at all aspects of your 
data.


Maybe your terms don't occur in your titles the way you think they do.

Maybe you need a boost of 500 or more...

Lots of potential maybes.

Relevancy tuning is an art and craft, hardly a science.

Step one: Know your data, inside and out.

Use the debugQuery=true parameter on your queries and see how much of the 
score is dominated by your query terms in the non-title fields.


-- Jack Krupansky

-Original Message- 
From: Joe Zhang

Sent: Monday, July 22, 2013 11:06 PM
To: solr-user@lucene.apache.org
Subject: Question about field boost

Dear Solr experts:

Here is my query:

defType=dismax&q=term1+term2&qf=title^100 content

Apparently (at least I thought) my intention is to boost the title field.
While I'm getting some non-trivial results, I'm surprised that the
documents with both term1 and term2 in title (I know such docs do exist in
my repository) were not returned (or maybe ranked very low). The situation
does not change even when I use much larger boost factors.

What am I doing wrong?

Re: how number of indexed fields effect performance

2013-07-22 Thread Jack Krupansky

After restarting Solr and doing a couple of queries to warm the caches, are 
queries already slow/failing, or does it take some time and a number of 
queries before failures start occurring?


One possibility is that you just need a lot more memory for caches for this 
amount of data. So, maybe the failures are caused by heavy garbage 
collections. So, after restarting Solr, check how much Java heap is 
available, then do some warming queries, then check the Java heap available 
again.


Add the debugQuery=true parameter to your queries and look at the timings to 
see what phases of query processing are taking the most time. Also check 
whether the reported QTime seems to match actual wall clock time; sometimes 
formatting of the results and network transfer time can dwarf actual query 
time.


How many fields are you returning on a typical query?

-- Jack Krupansky


-Original Message- 
From: Suryansh Purwar

Sent: Monday, July 22, 2013 11:06 PM
To: solr-user@lucene.apache.org ; j...@basetechnology.com
Subject: how number of indexed fields effect performance

It was running fine initially when we just had around 100 fields
indexed. In this case as well it runs fine but after sometime broken pipe
exception starts coming which results in shard getting down.

Regards,
Suryansh



On Tuesday, July 23, 2013, Jack Krupansky wrote:


Was all of this running fine previously and only started running slow
recently, or is this your first measurement?

Are very simple queries (single keyword, no filters or facets or sorting
or anything else, and returning only a few fields) working reasonably 
well?


-- Jack Krupansky

-Original Message- From: Suryansh Purwar
Sent: Monday, July 22, 2013 4:07 PM
To: solr-user@lucene.apache.org
Subject: how number of indexed fields effect performance

Hi,

We have a two shard solrcloud cluster with each shard allocated 3 separate
machines. We do complex queries involving a number of filter queries
coupled with group queries and faceting. All of our machines are 64 bit
with 32 gb ram. Our index size is around 10gb with around 8,00,000
documents. We have around 1000 indexed fields per document. 6gb of memeory
is allocated to tomcat under which solr is running  on each of the six
machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
running on 3 of the six machines with 4gb memory allocated to each of the
zookeeper instance. First solr start taking too much time with "Broken 
pipe
exception because of timeout from client side" coming again and again, 
then
after sometime a whole shard goes down with one machine at at time 
followed

by other machines.  Is having 1000 fields indexed with each document
resulting in this problem? If it is so, what would be the ideal number of
indexed fields in such environment.

Regards,
Suryansh

Re: Question about field boost

2013-07-22 Thread Joe Zhang

Thanks for your hint, Jack. Here is the debug results, which I'm having a
hard deciphering (the two terms are "china" and "snowden")...

0.26839527 = (MATCH) sum of:
  0.26839527 = (MATCH) sum of:
0.26757246 = (MATCH) max of:
  7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
0.019873314 = queryWeight(content:china), product of:
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.01193658 = queryNorm
0.039825942 = (MATCH) fieldWeight(content:china in 249), product of:
  4.8989797 = tf(termFreq(content:china)=24)
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)
  0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
0.5836803 = queryWeight(title:china^10.0), product of:
  10.0 = boost
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.01193658 = queryNorm
0.45842302 = (MATCH) fieldWeight(title:china in 249), product of:
  1.0 = tf(termFreq(title:china)=1)
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.09375 = fieldNorm(field=title, doc=249)
8.2282536E-4 = (MATCH) max of:
  8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
0.03407834 = queryWeight(content:snowden), product of:
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.01193658 = queryNorm
0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product
of:
  1.7320508 = tf(termFreq(content:snowden)=3)
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)


On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky wrote:

> Maybe you're not doing anything wrong - other than having an artificial
> expectation of what the true relevance of your data actually is. Many
> factors go into relevance scoring. You need to look at all aspects of your
> data.
>
> Maybe your terms don't occur in your titles the way you think they do.
>
> Maybe you need a boost of 500 or more...
>
> Lots of potential maybes.
>
> Relevancy tuning is an art and craft, hardly a science.
>
> Step one: Know your data, inside and out.
>
> Use the debugQuery=true parameter on your queries and see how much of the
> score is dominated by your query terms in the non-title fields.
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Zhang
> Sent: Monday, July 22, 2013 11:06 PM
> To: solr-user@lucene.apache.org
> Subject: Question about field boost
>
>
> Dear Solr experts:
>
> Here is my query:
>
> defType=dismax&q=term1+term2&**qf=title^100 content
>
> Apparently (at least I thought) my intention is to boost the title field.
> While I'm getting some non-trivial results, I'm surprised that the
> documents with both term1 and term2 in title (I know such docs do exist in
> my repository) were not returned (or maybe ranked very low). The situation
> does not change even when I use much larger boost factors.
>
> What am I doing wrong?
>

Re: Question about field boost

2013-07-22 Thread Joe Zhang

Is my reading correct that the boost is only applied on "china" but not
"snowden"? How can that be?

My query is: q=china+snowden&qf=title^10 content


On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang  wrote:

> Thanks for your hint, Jack. Here is the debug results, which I'm having a
> hard deciphering (the two terms are "china" and "snowden")...
>
> 0.26839527 = (MATCH) sum of:
>   0.26839527 = (MATCH) sum of:
> 0.26757246 = (MATCH) max of:
>   7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
> 0.019873314 = queryWeight(content:china), product of:
>   1.6649085 = idf(docFreq=46832, maxDocs=91058)
>   0.01193658 = queryNorm
> 0.039825942 = (MATCH) fieldWeight(content:china in 249), product
> of:
>   4.8989797 = tf(termFreq(content:china)=24)
>   1.6649085 = idf(docFreq=46832, maxDocs=91058)
>   0.0048828125 = fieldNorm(field=content, doc=249)
>   0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
> 0.5836803 = queryWeight(title:china^10.0), product of:
>   10.0 = boost
>   4.8898454 = idf(docFreq=1861, maxDocs=91058)
>   0.01193658 = queryNorm
> 0.45842302 = (MATCH) fieldWeight(title:china in 249), product of:
>   1.0 = tf(termFreq(title:china)=1)
>   4.8898454 = idf(docFreq=1861, maxDocs=91058)
>   0.09375 = fieldNorm(field=title, doc=249)
> 8.2282536E-4 = (MATCH) max of:
>   8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
> 0.03407834 = queryWeight(content:snowden), product of:
>   2.8549502 = idf(docFreq=14246, maxDocs=91058)
>   0.01193658 = queryNorm
> 0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product
> of:
>   1.7320508 = tf(termFreq(content:snowden)=3)
>   2.8549502 = idf(docFreq=14246, maxDocs=91058)
>   0.0048828125 = fieldNorm(field=content, doc=249)
>
>
> On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky 
> wrote:
>
>> Maybe you're not doing anything wrong - other than having an artificial
>> expectation of what the true relevance of your data actually is. Many
>> factors go into relevance scoring. You need to look at all aspects of your
>> data.
>>
>> Maybe your terms don't occur in your titles the way you think they do.
>>
>> Maybe you need a boost of 500 or more...
>>
>> Lots of potential maybes.
>>
>> Relevancy tuning is an art and craft, hardly a science.
>>
>> Step one: Know your data, inside and out.
>>
>> Use the debugQuery=true parameter on your queries and see how much of the
>> score is dominated by your query terms in the non-title fields.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Joe Zhang
>> Sent: Monday, July 22, 2013 11:06 PM
>> To: solr-user@lucene.apache.org
>> Subject: Question about field boost
>>
>>
>> Dear Solr experts:
>>
>> Here is my query:
>>
>> defType=dismax&q=term1+term2&**qf=title^100 content
>>
>> Apparently (at least I thought) my intention is to boost the title field.
>> While I'm getting some non-trivial results, I'm surprised that the
>> documents with both term1 and term2 in title (I know such docs do exist in
>> my repository) were not returned (or maybe ranked very low). The situation
>> does not change even when I use much larger boost factors.
>>
>> What am I doing wrong?
>>
>
>

Re: Question about field boost

2013-07-22 Thread Jack Krupansky

That means that for that document "china" occurs in the title vs. "snowden" 
found in a document but not in the title.


-- Jack Krupansky

-Original Message- 
From: Joe Zhang

Sent: Tuesday, July 23, 2013 12:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about field boost

Is my reading correct that the boost is only applied on "china" but not
"snowden"? How can that be?

My query is: q=china+snowden&qf=title^10 content


On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang  wrote:


Thanks for your hint, Jack. Here is the debug results, which I'm having a
hard deciphering (the two terms are "china" and "snowden")...

0.26839527 = (MATCH) sum of:
  0.26839527 = (MATCH) sum of:
0.26757246 = (MATCH) max of:
  7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
0.019873314 = queryWeight(content:china), product of:
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.01193658 = queryNorm
0.039825942 = (MATCH) fieldWeight(content:china in 249), product
of:
  4.8989797 = tf(termFreq(content:china)=24)
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)
  0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
0.5836803 = queryWeight(title:china^10.0), product of:
  10.0 = boost
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.01193658 = queryNorm
0.45842302 = (MATCH) fieldWeight(title:china in 249), product of:
  1.0 = tf(termFreq(title:china)=1)
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.09375 = fieldNorm(field=title, doc=249)
8.2282536E-4 = (MATCH) max of:
  8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
0.03407834 = queryWeight(content:snowden), product of:
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.01193658 = queryNorm
0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product
of:
  1.7320508 = tf(termFreq(content:snowden)=3)
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)


On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky 
wrote:



Maybe you're not doing anything wrong - other than having an artificial
expectation of what the true relevance of your data actually is. Many
factors go into relevance scoring. You need to look at all aspects of 
your

data.

Maybe your terms don't occur in your titles the way you think they do.

Maybe you need a boost of 500 or more...

Lots of potential maybes.

Relevancy tuning is an art and craft, hardly a science.

Step one: Know your data, inside and out.

Use the debugQuery=true parameter on your queries and see how much of the
score is dominated by your query terms in the non-title fields.

-- Jack Krupansky

-Original Message- From: Joe Zhang
Sent: Monday, July 22, 2013 11:06 PM
To: solr-user@lucene.apache.org
Subject: Question about field boost


Dear Solr experts:

Here is my query:

defType=dismax&q=term1+term2&**qf=title^100 content

Apparently (at least I thought) my intention is to boost the title field.
While I'm getting some non-trivial results, I'm surprised that the
documents with both term1 and term2 in title (I know such docs do exist 
in
my repository) were not returned (or maybe ranked very low). The 
situation

does not change even when I use much larger boost factors.

What am I doing wrong?

Re: adding date column to the index

2013-07-22 Thread Mysurf Mail

clarify: I did deleted the data in the index and reloaded it (+ commit).
(As i said, I have seen it loaded in the sb profiler)
Thanks for your comment.


On Mon, Jul 22, 2013 at 9:25 PM, Lance Norskog  wrote:

> Solr/Lucene does not automatically add when asked, the way DBMS systems
> do. Instead, all data for a field is added at the same time. To get the new
> field, you have to reload all of your data.
>
> This is also true for deleting fields. If you remove a field, that data
> does not go away until you re-index.
>
>
> On 07/22/2013 07:31 AM, Mysurf Mail wrote:
>
>> I have added a date field to my index.
>> I dont want the query to search on this field, but I want it to be
>> returned
>> with each row.
>> So I have defined it in the scema.xml as follows:
>>> stored="true" required="true"/>
>>
>>
>>
>> I added it to the select in data-config.xml and I see it selected in the
>> profiler.
>> now, when I query all fileds (using the dashboard) I dont see it.
>> Even when I ask for it specifically I dont see it.
>> What am I doing wrong?
>>
>> (In the db it is (datetimeoffset(7)))
>>
>>
>

Re: deserializing highlighting json result

2013-07-22 Thread Mysurf Mail

the guid appears as the attribute id and not as

"id":"baf8434a-99a4-4046-8a4d-2f7ec09eafc8":

Trying to create an object that holds this guid will create an attribute
with name baf8434a-99a4-4046-8a4d-2f7ec09eafc8

On Mon, Jul 22, 2013 at 6:30 PM, Jack Krupansky wrote:

> Exactly why is it difficult to deserialize? Seems simple enough.
>
> -- Jack Krupansky
>
> -Original Message- From: Mysurf Mail Sent: Monday, July 22, 2013
> 11:14 AM To: solr-user@lucene.apache.org Subject: deserializing
> highlighting json result
> When I request a json result I get the following streucture in the
> highlighting
>
> {"highlighting":{
>   "394c65f1-dfb1-4b76-9b6c-**2f14c9682cc9":{
>  "PackageName":["- Testing channel twenty."]},
>   "baf8434a-99a4-4046-8a4d-**2f7ec09eafc8":{
>  "PackageName":["- Testing channel twenty."]},
>   "0a699062-cd09-4b2e-a817-**330193a352c1":{
> "PackageName":["- Testing channel twenty."]},
>   "0b9ec891-5ef8-4085-9de2-**38bfa9ea327e":{
> "PackageName":["- Testing channel twenty."]}}}
>
>
> It is difficult to deserialize this json because the guid is in the
> attribute name.
> Is that solveable (using c#)?
>

77 matches

Mail list logo