Re: Solr 5: not loading shards from symlinked directories

2016-02-05 Thread Alan Woodward
This is a known bug, see https://issues.apache.org/jira/browse/SOLR-8548.  It 
will be fixed in 5.5, or in 5.4.2 if we do another bugfix release.

Alan Woodward
www.flax.co.uk


On 5 Feb 2016, at 06:19, Norgorn wrote:

> I've tried to upgrade from Solr 4.10.3 to 5.4.1. Solr shards are placed on
> different disks and symlinks (ln -s) are created to SOLR_HOME (SOLR_HOME
> itself is set as an absolute path and works fine).
> When Solr starts, it loads only shards placed in home directory, but not
> symlinked ones.
> If I copy shard to home directory (in file system path remains unchanged,
> like SOLR_HOME/my_shard1, both symlinked and copied), it works.
> 
> Are there any ways to overcome this issue?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-not-loading-shards-from-symlinked-directories-tp4255403.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Set router.field in unit tests

2016-04-29 Thread Alan Woodward
It's almost certainly worth using SolrCloudTestBase rather than 
AbstractDistribZkTestBase as well - normally makes the test five or six times 
faster.

Alan Woodward
www.flax.co.uk


On 29 Apr 2016, at 17:11, Erick Erickson wrote:

> I'm pretty sure you can just create a collection after the distributed
> stuff is set up.
> 
> Take a look at:
> 
> CollectionsAPIDistributedZkTest.testNodesUsedByCreate to see creating
> a collection
> in your test just by a request (you can set any params you want there, 
> including
> router.field).
> 
> Or CollectionsAPISolrJTest.testCreateAndDeleteCollection for a niftier
> builder pattern
> SolrJ way.
> 
> Best,
> Erick
> 
> On Fri, Apr 29, 2016 at 5:34 AM, GW  wrote:
>> Not exactly suer what you mean but I think you are wanting to change your
>> schema.xml
>> 
>> > multiValued="false" />
>> 
>> to
>> 
>> > required="true" multiValued="false" />
>> 
>> 
>> restart solr
>> 
>> 
>> On 29 April 2016 at 06:04, Markus Jelsma  wrote:
>> 
>>> Hi - any hints to share?
>>> 
>>> Thanks!
>>> Markus
>>> 
>>> 
>>> 
>>> -Original message-
>>>> From:Markus Jelsma 
>>>> Sent: Thursday 28th April 2016 13:30
>>>> To: solr-user 
>>>> Subject: Set router.field in unit tests
>>>> 
>>>> Hi - i'm working on a unit test that requires the cluster's router.field
>>> to be set to a field different than ID. But i can't find it?! How can i set
>>> router.field with AbstractFullDistribZkTestBase?
>>>> 
>>>> Thanks!
>>>> Markus
>>>> 
>>> 



Re: getZkStateReader() returning NULL

2016-05-05 Thread Alan Woodward
You'll need to call this.server.connect() - the state reader is instantiated 
lazily.

Alan Woodward
www.flax.co.uk


On 5 May 2016, at 01:10, Boman wrote:

> I am attempting to check for existence of a collection prior to creating a
> new one with that name, using Solrj:
> 
>System.out.println("Checking for existence of collection...");
>ZkStateReader zkStateReader = this.server.getZkStateReader(); 
>zkStateReader.updateClusterState();
> 
> this.server was created using:
> 
>   this.server = new CloudSolrClient(this.ZK_HOST);
> 
> The call: this.server.getZkStateReader() consistently returns a NULL.
> 
> Any help would be appreciated. Thanks.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/getZkStateReader-returning-NULL-tp4274663.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Reverse query?

2015-10-05 Thread Alan Woodward
Hi Remi,

Your use-case is more-or-less exactly what I wrote luwak for: 
https://github.com/flaxsearch/luwak.  You register your queries with a Monitor 
object, and then match documents against them.  The monitor analyzes the 
documents that are passed in and tries to filter out queries that it can detect 
won't match ahead of time, which is particularly useful if some of your queries 
are complex and expensive to run.

We've found that luwak performs better than the percolator out of the box 
(http://www.flax.co.uk/blog/2015/07/27/a-performance-comparison-of-streamed-search-implementations/),
 but depending on how many queries you have and how complex they are you may 
find that the percolator is a lot easier to set up, as it comes bundled as part 
of elasticsearch while luwak is just a Java library, and will require some 
coding to get it up and running.

Alan Woodward
www.flax.co.uk


On 3 Oct 2015, at 23:05, remi tassing wrote:

> @Jack: After reading the documentation, I think perlocator is what I'm
> after. The filtering possibility is extremely appealing as well. I'll have
> a closer look and experiment a bit.
> 
> @Erik: Yes that's right, notification is not really needed in my case
> though. It should be doable as you said…perlocator could be a good
> reference.
> 
> Thank you all guys!
> On Oct 3, 2015 6:08 PM, "Erick Erickson"  wrote:
> 
>> OK, finally the light dawns. You're doing something akin to "alerts".
>> that is, store a bunch of queries, then when a new document comes
>> in find out if any of the queries would match the doc and send
>> out alerts to each user who has entered a query like that. Your
>> situation may not be doing exactly that, but some kind of alerting
>> mechanism would work, right?
>> 
>> There are several approaches, Googling  "solr alerts" will
>> turn up several. Lucidworks, Flax and others have built some
>> tools (some commercial) for this ability.
>> 
>> One way to approach this is to store the queries "somewhere",
>> perhaps in a DB, perhaps in their own Solr collection, and write
>> a custom component that takes an incoming document and puts
>> it in a MemoryIndex, runs the queries against it and sends
>> the alerts. This requires some lower-level programming, but is
>> quite do-able.
>> 
>> Best,
>> Erick
>> 
>> On Sat, Oct 3, 2015 at 7:02 AM, Gili Nachum  wrote:
>>> Check if MLT (more like this) could fit your requirements.
>>> https://wiki.apache.org/solr/MoreLikeThis
>>> 
>>> If your requirements are more specific I think your client program should
>>> tokenize the target document then construct one or more queries like:
>>> "token token2" OR "token2 token3" OR ...
>>> 
>>> I'm not sure how you get the list of tokens , perhaps using the same api
>>> that the analyze admin page uses (haven't  checked )
>>> On Oct 3, 2015 09:32, "remi tassing"  wrote:
>>> 
>>>> Hi,
>>>> 
>>>> @Erik: Yes I'm using the admin-ui and yes I quickly notice
>> keywordTokenizer
>>>> couldn't work
>>>> @All: sorry for not explaining properly, I'm aware of the phrase query
>> and
>>>> a little bit of the N-Gram.
>>>> 
>>>> So to simplify my problem, the documents indexed are:
>>>> id:1, content:Mad Max
>>>> id:2, content:George Miller
>>>> id:3, content:global market
>>>> id:4, content:Solr development
>>>> 
>>>> Now the query is the content of the wiki page at
>>>> https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29
>>>> 
>>>> the results id:1, id:2, id:3 should be returned but not id:4. Today I'm
>>>> able to do this with something similar to grep (Aho-corasick) but the
>> list
>>>> is growing bigger and bigger. I thought Solr/Lucene could tackle this
>> more
>>>> efficiently and also add other capabilities like filtering ...
>>>> 
>>>> Maybe there is another tool more suitable for the job?
>>>> 
>>>> Remi
>>>> 
>>>> 
>>>> On Fri, Oct 2, 2015 at 10:07 PM, Andrea Roggerone <
>>>> andrearoggerone.o...@gmail.com> wrote:
>>>> 
>>>>> Hi, the phrase query format would be:
>>>>> "Mad Max"~2
>>>>> The * has been added by the mail aggregator around the chars in Bold
>> for
>>>>> some reason. That wasn't a wildcard.
>>>>> 
>&

Re: Reverse query?

2015-10-05 Thread Alan Woodward
Hi Remi,

I'm not sure what you mean by filtering on the fly?  With the percolator, if 
you're going to do filtering at match time, you still need to have added the 
terms to filter on when you add the query.  And you can actually do the same 
sort of thing in luwak, using a FieldFilterPresearcherComponent, although it's 
not documented very well - I should add something to the readme to explain how 
it works!

On 5 Oct 2015, at 15:03, remi tassing wrote:

> Hi Alan,
> 
> I became aware of Luwak a few months ago and I'm planning on using it in
> the future. The only reason I couldn’t use it for my specific scenario was
> the fact that I needed the possibility to filter on the fly and not
> necessarily include filtering while building the query index. Apparently
> from the description, the percolator API in Elasticsearch supports this.
> 
> I might be wrong, so I'll have to experiment a little bit first.
> 
> Remi
> 
> On Mon, Oct 5, 2015 at 1:58 PM, Alan Woodward  wrote:
> 
>> Hi Remi,
>> 
>> Your use-case is more-or-less exactly what I wrote luwak for:
>> https://github.com/flaxsearch/luwak.  You register your queries with a
>> Monitor object, and then match documents against them.  The monitor
>> analyzes the documents that are passed in and tries to filter out queries
>> that it can detect won't match ahead of time, which is particularly useful
>> if some of your queries are complex and expensive to run.
>> 
>> We've found that luwak performs better than the percolator out of the box (
>> http://www.flax.co.uk/blog/2015/07/27/a-performance-comparison-of-streamed-search-implementations/),
>> but depending on how many queries you have and how complex they are you may
>> find that the percolator is a lot easier to set up, as it comes bundled as
>> part of elasticsearch while luwak is just a Java library, and will require
>> some coding to get it up and running.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 3 Oct 2015, at 23:05, remi tassing wrote:
>> 
>>> @Jack: After reading the documentation, I think perlocator is what I'm
>>> after. The filtering possibility is extremely appealing as well. I'll
>> have
>>> a closer look and experiment a bit.
>>> 
>>> @Erik: Yes that's right, notification is not really needed in my case
>>> though. It should be doable as you said…perlocator could be a good
>>> reference.
>>> 
>>> Thank you all guys!
>>> On Oct 3, 2015 6:08 PM, "Erick Erickson" 
>> wrote:
>>> 
>>>> OK, finally the light dawns. You're doing something akin to "alerts".
>>>> that is, store a bunch of queries, then when a new document comes
>>>> in find out if any of the queries would match the doc and send
>>>> out alerts to each user who has entered a query like that. Your
>>>> situation may not be doing exactly that, but some kind of alerting
>>>> mechanism would work, right?
>>>> 
>>>> There are several approaches, Googling  "solr alerts" will
>>>> turn up several. Lucidworks, Flax and others have built some
>>>> tools (some commercial) for this ability.
>>>> 
>>>> One way to approach this is to store the queries "somewhere",
>>>> perhaps in a DB, perhaps in their own Solr collection, and write
>>>> a custom component that takes an incoming document and puts
>>>> it in a MemoryIndex, runs the queries against it and sends
>>>> the alerts. This requires some lower-level programming, but is
>>>> quite do-able.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Sat, Oct 3, 2015 at 7:02 AM, Gili Nachum 
>> wrote:
>>>>> Check if MLT (more like this) could fit your requirements.
>>>>> https://wiki.apache.org/solr/MoreLikeThis
>>>>> 
>>>>> If your requirements are more specific I think your client program
>> should
>>>>> tokenize the target document then construct one or more queries like:
>>>>> "token token2" OR "token2 token3" OR ...
>>>>> 
>>>>> I'm not sure how you get the list of tokens , perhaps using the same
>> api
>>>>> that the analyze admin page uses (haven't  checked )
>>>>> On Oct 3, 2015 09:32, "remi tassing"  wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> @Erik: Yes I'm using the admin-

OverseerCollectionMessageHandler logging

2015-10-09 Thread Alan Woodward
Hi all,

The OverseerCollectionMessageHandler logs all messages that it processes at 
WARN level, which seems wrong?  Particularly as it handles OVERSEERSTATUS 
messages, which means that monitoring systems can trigger warnings all over the 
place.  Is there a specific reason for this, or should I change it to INFO?

Alan Woodward
www.flax.co.uk




Re: OverseerCollectionMessageHandler logging

2015-10-09 Thread Alan Woodward
I'll raise a Jira, thanks Shalin.

Alan Woodward
www.flax.co.uk


On 9 Oct 2015, at 16:05, Shalin Shekhar Mangar wrote:

> Yes, that should be INFO
> 
> On Fri, Oct 9, 2015 at 8:02 PM, Alan Woodward  wrote:
>> Hi all,
>> 
>> The OverseerCollectionMessageHandler logs all messages that it processes at 
>> WARN level, which seems wrong?  Particularly as it handles OVERSEERSTATUS 
>> messages, which means that monitoring systems can trigger warnings all over 
>> the place.  Is there a specific reason for this, or should I change it to 
>> INFO?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: PayloadTermQuery deprecated

2015-10-19 Thread Alan Woodward
Hi Bill,

This looks like an oversight on my part when migrating the payload scoring 
queries - can you open a JIRA ticket to add 'includeSpanScore' as an option to 
PayloadScoreQuery?

As a workaround, you should be able to use a custom similarity that returns 1 
for all scores (see IndexSearcher.NON_SCORING_SIMILARITY for an implementation 
that returns 0, you could just clone that and change SimScorer.score())

Alan Woodward
www.flax.co.uk


On 19 Oct 2015, at 00:39, William Bell wrote:

> Here is my first stab at it. Thoughts?
> 
> Question:
> 
> new PayloadTermQuery(new Term(nv[0].substring(1), nv[1]), new
> AveragePayloadFunction(), false)
> 
> How do I handle the "false"  ? It means boolean includeSpanScore
> 
> 
> @Override
> public Query parse() throws SyntaxError {
> 
>if (qstr == null || qstr.length() == 0) return null;
>//BooleanQuery q = new BooleanQuery();
>BooleanQuery.Builder q = new BooleanQuery.Builder();
>q.setDisableCoord(true);
>if (qstr.length() > 1 && qstr.startsWith("\"") && qstr.endsWith("\"")) {
>qstr = qstr.substring(1,qstr.length()-1);
>}
>String[] nvps = StringUtils.split(qstr, " ");
>for (int i = 0; i < nvps.length; i++) {
>String[] nv = StringUtils.split(nvps[i], ":");
>if (nv.length > 1) {
>  if (nv[0].startsWith("+")) {
>  SpanTermQuery sq = new SpanTermQuery(new
> Term(nv[0].substring(1), nv[1]));
>  PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
> AveragePayloadFunction());
>  q.add(psq, Occur.MUST);
>//q.add(new PayloadTermQuery(new Term(nv[0].substring(1),
> nv[1]), new AveragePayloadFunction(), false), Occur.MUST);
>  } else {
>//q.add(new PayloadTermQuery(new Term(nv[0], nv[1]), new
> AveragePayloadFunction(), false), Occur.SHOULD);
>  SpanTermQuery sq = new SpanTermQuery(new Term(nv[0], nv[1]));
>  PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
> AveragePayloadFunction());
>  q.add(psq, Occur.SHOULD);
>  }
>}
>}
>// return q;
>return q.build();
> }
> 
> 
> On Sun, Oct 18, 2015 at 4:46 PM, William Bell  wrote:
> 
>> Wondering how to change my payload based on example:
>> 
>> https://lucidworks.com/blog/2014/06/13/end-to-end-payload-example-in-solr/
>> 
>> PayloadTermQuery and BooleanQuery are deprecated in 5.3.x
>> 
>> @Override
>> public Query parse() throws SyntaxError {
>> 
>>if (qstr == null || qstr.length() == 0) return null;
>>BooleanQuery q = new BooleanQuery();
>>if (qstr.length() > 1 && qstr.startsWith("\"") && qstr.endsWith("\"")) {
>>qstr = qstr.substring(1,qstr.length()-1);
>>}
>>String[] nvps = StringUtils.split(qstr, " ");
>>for (int i = 0; i < nvps.length; i++) {
>>String[] nv = StringUtils.split(nvps[i], ":");
>>if (nv.length > 1) {
>>  if (nv[0].startsWith("+")) {
>>q.add(new PayloadTermQuery(new Term(nv[0].substring(1), nv[1]),
>>  new AveragePayloadFunction(), false), Occur.MUST);
>>  } else {
>>q.add(new PayloadTermQuery(new Term(nv[0], nv[1]),
>>  new AveragePayloadFunction(), false), Occur.SHOULD);
>>  }
>>}
>>}
>>return q;
>> }
>> 
>> 
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>> 
> 
> 
> 
> -- 
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076



Re: PayloadTermQuery deprecated

2015-10-19 Thread Alan Woodward
I opened https://issues.apache.org/jira/browse/LUCENE-6844

Alan Woodward
www.flax.co.uk


On 19 Oct 2015, at 08:49, Alan Woodward wrote:

> Hi Bill,
> 
> This looks like an oversight on my part when migrating the payload scoring 
> queries - can you open a JIRA ticket to add 'includeSpanScore' as an option 
> to PayloadScoreQuery?
> 
> As a workaround, you should be able to use a custom similarity that returns 1 
> for all scores (see IndexSearcher.NON_SCORING_SIMILARITY for an 
> implementation that returns 0, you could just clone that and change 
> SimScorer.score())
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> On 19 Oct 2015, at 00:39, William Bell wrote:
> 
>> Here is my first stab at it. Thoughts?
>> 
>> Question:
>> 
>> new PayloadTermQuery(new Term(nv[0].substring(1), nv[1]), new
>> AveragePayloadFunction(), false)
>> 
>> How do I handle the "false"  ? It means boolean includeSpanScore
>> 
>> 
>> @Override
>> public Query parse() throws SyntaxError {
>> 
>>if (qstr == null || qstr.length() == 0) return null;
>>//BooleanQuery q = new BooleanQuery();
>>BooleanQuery.Builder q = new BooleanQuery.Builder();
>>q.setDisableCoord(true);
>>if (qstr.length() > 1 && qstr.startsWith("\"") && qstr.endsWith("\"")) {
>>qstr = qstr.substring(1,qstr.length()-1);
>>}
>>String[] nvps = StringUtils.split(qstr, " ");
>>for (int i = 0; i < nvps.length; i++) {
>>String[] nv = StringUtils.split(nvps[i], ":");
>>if (nv.length > 1) {
>>  if (nv[0].startsWith("+")) {
>>  SpanTermQuery sq = new SpanTermQuery(new
>> Term(nv[0].substring(1), nv[1]));
>>  PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
>> AveragePayloadFunction());
>>  q.add(psq, Occur.MUST);
>>//q.add(new PayloadTermQuery(new Term(nv[0].substring(1),
>> nv[1]), new AveragePayloadFunction(), false), Occur.MUST);
>>  } else {
>>//q.add(new PayloadTermQuery(new Term(nv[0], nv[1]), new
>> AveragePayloadFunction(), false), Occur.SHOULD);
>>  SpanTermQuery sq = new SpanTermQuery(new Term(nv[0], nv[1]));
>>  PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
>> AveragePayloadFunction());
>>  q.add(psq, Occur.SHOULD);
>>  }
>>}
>>}
>>// return q;
>>return q.build();
>> }
>> 
>> 
>> On Sun, Oct 18, 2015 at 4:46 PM, William Bell  wrote:
>> 
>>> Wondering how to change my payload based on example:
>>> 
>>> https://lucidworks.com/blog/2014/06/13/end-to-end-payload-example-in-solr/
>>> 
>>> PayloadTermQuery and BooleanQuery are deprecated in 5.3.x
>>> 
>>> @Override
>>> public Query parse() throws SyntaxError {
>>> 
>>>if (qstr == null || qstr.length() == 0) return null;
>>>BooleanQuery q = new BooleanQuery();
>>>if (qstr.length() > 1 && qstr.startsWith("\"") && qstr.endsWith("\"")) {
>>>qstr = qstr.substring(1,qstr.length()-1);
>>>}
>>>String[] nvps = StringUtils.split(qstr, " ");
>>>for (int i = 0; i < nvps.length; i++) {
>>>String[] nv = StringUtils.split(nvps[i], ":");
>>>if (nv.length > 1) {
>>>  if (nv[0].startsWith("+")) {
>>>q.add(new PayloadTermQuery(new Term(nv[0].substring(1), nv[1]),
>>>  new AveragePayloadFunction(), false), Occur.MUST);
>>>  } else {
>>>q.add(new PayloadTermQuery(new Term(nv[0], nv[1]),
>>>  new AveragePayloadFunction(), false), Occur.SHOULD);
>>>  }
>>>}
>>>}
>>>return q;
>>> }
>>> 
>>> 
>>> --
>>> Bill Bell
>>> billnb...@gmail.com
>>> cell 720-256-8076
>>> 
>> 
>> 
>> 
>> -- 
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
> 



Re: PayloadTermQuery deprecated

2015-10-20 Thread Alan Woodward
I've added the includeSpanScore parameter back in for 5.4, so you might do 
better to wait until that's released.  Otherwise, the code looks correct to me.

Alan Woodward
www.flax.co.uk


On 19 Oct 2015, at 21:57, William Bell wrote:

> Alan,
> 
> Does this code look equivalent? And how do I change PayLoadScoreQuery to do
> a Custom Similarity?
> 
> PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
> AveragePayloadFunction());
> 
> @Override
> public Query parse() throws SyntaxError {
> 
>if (qstr == null || qstr.length() == 0) return null;
>//BooleanQuery q = new BooleanQuery();
>BooleanQuery.Builder q = new BooleanQuery.Builder();
>q.setDisableCoord(true);
>if (qstr.length() > 1 && qstr.startsWith("\"") && qstr.endsWith("\"")) {
>qstr = qstr.substring(1,qstr.length()-1);
>}
>String[] nvps = StringUtils.split(qstr, " ");
>for (int i = 0; i < nvps.length; i++) {
>String[] nv = StringUtils.split(nvps[i], ":");
>if (nv.length > 1) {
>  if (nv[0].startsWith("+")) {
>  SpanTermQuery sq = new SpanTermQuery(new
> Term(nv[0].substring(1), nv[1]));
>  PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
> AveragePayloadFunction());
>  q.add(psq, Occur.MUST);
>//q.add(new PayloadTermQuery(new Term(nv[0].substring(1),
> nv[1]), new AveragePayloadFunction(), false), Occur.MUST);
>  } else {
>//q.add(new PayloadTermQuery(new Term(nv[0], nv[1]), new
> AveragePayloadFunction(), false), Occur.SHOULD);
>  SpanTermQuery sq = new SpanTermQuery(new Term(nv[0], nv[1]));
>  PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
> AveragePayloadFunction());
>  q.add(psq, Occur.SHOULD);
>  }
>}
>}
>// return q;
>return q.build();
> }
> 
> 
> On Mon, Oct 19, 2015 at 1:49 AM, Alan Woodward  wrote:
> 
>> Hi Bill,
>> 
>> This looks like an oversight on my part when migrating the payload scoring
>> queries - can you open a JIRA ticket to add 'includeSpanScore' as an option
>> to PayloadScoreQuery?
>> 
>> As a workaround, you should be able to use a custom similarity that
>> returns 1 for all scores (see IndexSearcher.NON_SCORING_SIMILARITY for an
>> implementation that returns 0, you could just clone that and change
>> SimScorer.score())
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 19 Oct 2015, at 00:39, William Bell wrote:
>> 
>>> Here is my first stab at it. Thoughts?
>>> 
>>> Question:
>>> 
>>> new PayloadTermQuery(new Term(nv[0].substring(1), nv[1]), new
>>> AveragePayloadFunction(), false)
>>> 
>>> How do I handle the "false"  ? It means boolean includeSpanScore
>>> 
>>> 
>>> @Override
>>> public Query parse() throws SyntaxError {
>>> 
>>>   if (qstr == null || qstr.length() == 0) return null;
>>>   //BooleanQuery q = new BooleanQuery();
>>>   BooleanQuery.Builder q = new BooleanQuery.Builder();
>>>   q.setDisableCoord(true);
>>>   if (qstr.length() > 1 && qstr.startsWith("\"") &&
>> qstr.endsWith("\"")) {
>>>   qstr = qstr.substring(1,qstr.length()-1);
>>>   }
>>>   String[] nvps = StringUtils.split(qstr, " ");
>>>   for (int i = 0; i < nvps.length; i++) {
>>>   String[] nv = StringUtils.split(nvps[i], ":");
>>>   if (nv.length > 1) {
>>> if (nv[0].startsWith("+")) {
>>> SpanTermQuery sq = new SpanTermQuery(new
>>> Term(nv[0].substring(1), nv[1]));
>>> PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
>>> AveragePayloadFunction());
>>> q.add(psq, Occur.MUST);
>>>   //q.add(new PayloadTermQuery(new Term(nv[0].substring(1),
>>> nv[1]), new AveragePayloadFunction(), false), Occur.MUST);
>>> } else {
>>>   //q.add(new PayloadTermQuery(new Term(nv[0], nv[1]), new
>>> AveragePayloadFunction(), false), Occur.SHOULD);
>>> SpanTermQuery sq = new SpanTermQuery(new Term(nv[0],
>> nv[1]));
>>> PayloadScoreQuery psq = new PayloadScoreQuery(sq, new
>>> AveragePayloadFunction());
>>> q.add(psq, Occur.SHOULD);
>>> }
>>>   }
>>>   }
>>>   // return q;
>>>   return q.bui

Re: NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-23 Thread Alan Woodward
The NPE is from another server (hence being wrapped in a SolrServerException), 
so the original issue *should* be being logged elsewhere - are there no errors 
earlier on in the log?

Alan Woodward
www.flax.co.uk


On 23 Oct 2015, at 12:44, Markus Jelsma wrote:

> Hi - anyone  here to shed some light on the issue?
> 
> Markus
> 
> 
> 
> -Original message-
>> From:Markus Jelsma 
>> Sent: Tuesday 20th October 2015 13:39
>> To: solr-user 
>> Subject: NPE in CloudSolrClient via AbstractFullDistribZkTestBase
>> 
>> Hi - we have some code inside a unit test, extending 
>> AbstractFullDistribZkTestBase. I am indexing thousands of documents as part 
>> of the test to getCommonCloudSolrClient(); Somewhere down the line it trips 
>> over a document. I've debugged inspected the bas document but cannot find 
>> anything wrong with it. The thrown exception is beyond unhelpful:
>> 
>>  > type="org.apache.solr.client.solrj.SolrServerException">org.apache.solr.client.solrj.SolrServerException:
>>  java.lang.NullPointerException
>>at 
>> __randomizedtesting.SeedInfo.seed([D78A66027B188E12:A85800974E3282A7]:0)
>>at 
>> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:948)
>>at 
>> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799)
>>at 
>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>>at 
>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
>> 
>> These are the last lines including that document
>> 
>> SolrInputDocument(fields: [id=q9911555, type=query, 
>> compound_sid=nl_44da5ce2766326cc_52303206, 
>> compound_dig=nl_44da5ce2766326cc_1282013516, filter=44da5ce2766326cc, 
>> uid=3141070978, sid=52303206, dig=1282013516, time=2014-10-08T16:51:06Z, 
>> query=Omeprazol, qtime=46, lang=nl, hits=46, engine=fake])
>> [qtp350954577-74] INFO org.apache.solr.update.processor.LogUpdateProcessor - 
>> [collection1] webapp= path=/update params={wt=javabin&version=2} 
>> {add=[q9911555 (1515548831923568640)]} 0 1
>> [TEST-TestRelatedCompiler.testBasicRelations-seed#[14DC4C771346037F]] ERROR 
>> org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection 
>> collection1 failed due to (0) java.lang.NullPointerException, retry? 0
>> 
>> Any ideas?
>> 
>> Thanks,
>> Markus
>> 



Re: NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-23 Thread Alan Woodward
It looks as though you're adding a null SolrInputDocument to your UpdateRequest 
somehow?  The bit that's throwing a NPE is iterating through the documents in 
order to route things correctly (UpdateRequest.java:204).

Alan Woodward
www.flax.co.uk


On 23 Oct 2015, at 13:53, Markus Jelsma wrote:

> Ah yes, i think i overlooked that one. Here it is:
> 
> type="org.apache.solr.client.solrj.SolrServerException">org.apache.solr.client.solrj.SolrServerException:
>  java.lang.NullPointerException
>at 
> __randomizedtesting.SeedInfo.seed([C5A84EC72B29125E:BA7A28521E031EEB]:0)
>at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:948)
>at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799)
>at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
>at 
> io.openindex.solr.TestCompilerBase.indexDocs(TestCompilerBase.java:264)
>at 
> io.openindex.solr.TestCompilerBase.indexRealLogs(TestCompilerBase.java:224)
>at 
> io.openindex.solr.related.TestRelatedCompiler.testBasicRelations(TestRelatedCompiler.java:42)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:497)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
>at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
>at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
>at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
>at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
>at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
>at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
>at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
>at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
>at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
>at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
>at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowi

Re: NPE in CloudSolrClient via AbstractFullDistribZkTestBase

2015-10-23 Thread Alan Woodward
No worries :-)  Actually it would probably be worth improving the error 
reporting here to throw NPE when the documents are added to the UpdateRequest 
in the first place - do you want to open a JIRA?

Alan Woodward
www.flax.co.uk


On 23 Oct 2015, at 17:00, Markus Jelsma wrote:

> Ah crap, indeed! A few items slipped through some checks that i thought were 
> correct. Sorry to have bothered the list with this nonsense, but i didn't 
> 'see' it anymore :P
> 
> Thanks!
> Markus
> 
> 
> 
> -Original message-
>> From:Alan Woodward 
>> Sent: Friday 23rd October 2015 17:30
>> To: solr-user@lucene.apache.org
>> Subject: Re: NPE in CloudSolrClient via AbstractFullDistribZkTestBase
>> 
>> It looks as though you're adding a null SolrInputDocument to your 
>> UpdateRequest somehow?  The bit that's throwing a NPE is iterating through 
>> the documents in order to route things correctly (UpdateRequest.java:204).
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 23 Oct 2015, at 13:53, Markus Jelsma wrote:
>> 
>>> Ah yes, i think i overlooked that one. Here it is:
>>> 
>>>   >> type="org.apache.solr.client.solrj.SolrServerException">org.apache.solr.client.solrj.SolrServerException:
>>>  java.lang.NullPointerException
>>>   at 
>>> __randomizedtesting.SeedInfo.seed([C5A84EC72B29125E:BA7A28521E031EEB]:0)
>>>   at 
>>> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:948)
>>>   at 
>>> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799)
>>>   at 
>>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>>>   at 
>>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
>>>   at 
>>> io.openindex.solr.TestCompilerBase.indexDocs(TestCompilerBase.java:264)
>>>   at 
>>> io.openindex.solr.TestCompilerBase.indexRealLogs(TestCompilerBase.java:224)
>>>   at 
>>> io.openindex.solr.related.TestRelatedCompiler.testBasicRelations(TestRelatedCompiler.java:42)
>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>   at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>   at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>   at java.lang.reflect.Method.invoke(Method.java:497)
>>>   at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
>>>   at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
>>>   at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
>>>   at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
>>>   at 
>>> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
>>>   at 
>>> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
>>>   at 
>>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
>>>   at 
>>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>>>   at 
>>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>>>   at 
>>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
>>>   at 
>>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
>>>   at 
>>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>>>   at 
>>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>>   at 
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
>>>   at 
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
>>>   at 
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
>>>   at 
>>> com.carrotsearch.randomiz

Re: CloudSolrClient query /admin/info/system

2015-10-27 Thread Alan Woodward
Hi Kevin,

This looks like a bug in CSC - could you raise an issue?

Alan Woodward
www.flax.co.uk


On 26 Oct 2015, at 22:21, Kevin Risden wrote:

> I am trying to use CloudSolrClient to query information about the Solr
> server including version information. I found /admin/info/system and it
> seems to provide the information I am looking for. However, it looks like
> CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is not
> part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly
> missed as part of SOLR-4943 [3]?
> 
> Is this an issue or is there a better way to query this information?
> 
> As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure what
> issues that could cause. Is there a reason that ADMIN_PATHS in
> CloudSolrClient would be different than the paths in CommonParams [1]?
> 
> [1]
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168
> [2]
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808
> [3] https://issues.apache.org/jira/browse/SOLR-4943
> 
> Kevin Risden
> Hadoop Tech Lead | Avalon Consulting, LLC <http://www.avalonconsult.com/>
> M: 732 213 8417
> LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
> <http://www.google.com/+AvalonConsultingLLC> | Twitter
> <https://twitter.com/avalonconsult>



Re: Queries for many terms

2015-11-03 Thread Alan Woodward
TermsQuery works by pulling the postings lists for each term and OR-ing them 
together to create a bitset, which is very memory-efficient but means that you 
don't know at doc collection time which term has actually matched.

For your case you probably want to create a SpanOrQuery, and then iterate 
through the resulting Spans in a specialised Collector.  Depending on how many 
terms you want, though, you may end up requiring a lot of memory for the search.

Alan Woodward
www.flax.co.uk


On 2 Nov 2015, at 17:14, Upayavira wrote:

> I have a scenario where I want to search for documents that contain many
> terms (maybe 100s or 1000s), and then know the number of terms that
> matched. I'm happy to implement this as a query object/parser.
> 
> I understand that Lucene isn't well suited to this scenario. Any
> suggestions as to how to make this more efficient? Does the TermsQuery
> work differently from the BooleanQuery regarding large numbers of terms?
> 
> Upayavira



Re: CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-18 Thread Alan Woodward
At the moment it seems that it's only settable via System properties - see 
https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control.  But 
it would be nice to do this programmatically as well, maybe worth opening a 
JIRA ticket?

Alan Woodward
www.flax.co.uk


On 17 Nov 2015, at 16:44, Kevin Lee wrote:

> Does anyone know if it is possible to set the ACL credentials in 
> CloudSolrClient needed to access a protected resource in Zookeeper?
> 
> Thanks!
> 
>> On Nov 13, 2015, at 1:20 PM, Kevin Lee  wrote:
>> 
>> Hi,
>> 
>> Is there a way to use CloudSolrClient and connect to a Zookeeper instance 
>> where ACL is enabled and resources/files like /live_nodes, etc are ACL 
>> protected?  Couldn’t find a way to set the ACL credentials.
>> 
>> Thanks,
>> Kevin
> 



Re: Solr Auto-Complete

2015-12-02 Thread Alan Woodward
Hi Salman,

It sounds as though you want to do a normal search against a special 'suggest' 
field, that's been indexed with edge ngrams.

Alan Woodward
www.flax.co.uk


On 2 Dec 2015, at 09:31, Salman Ansari wrote:

> Hi,
> 
> I am looking for auto-complete in Solr but on top of just auto complete I
> want as well to return the data completely (not just suggestions), so I
> want to get back the ids, and other fields in the whole document. I tried
> the following 2 approaches but each had issues
> 
> 1) Used the /suggest component but that returns a very specific format
> which looks like I cannot customize. I want to return the whole document
> that has a matching field and not only the suggestion list. So for example,
> if I write "hard" it returns the results in a specific format as follows
> 
>   hard drive
> hard disk
> 
> Is there a way to get back additional fields with suggestions?
> 
> 2) Tried the normal /select component but that does not do auto-complete on
> portion of the word. So, for example, if I write the query as "bara" it
> DOES NOT return "barack obama". Any suggestions how to solve this?
> 
> 
> Regards,
> Salman



UpdateLogs in HDFS

2015-12-02 Thread Alan Woodward
Hi all,

As a step in SOLR-8282, I'm trying to get all access to the data directory done 
by Solr to be mediated through the DirectoryFactory implementation.  Part of 
this is the creation of the UpdateLog, and I'm a bit confused by some of the 
logic in there currently.

The UpdateLog is created by the UpdateHandler, which has some logic in there to 
determine whether or not to use a standard log or an HDFSUpdateLog.  In 
particular, around line 117, we check to see if the update log directory begins 
with "hdfs:/", and if it does we then do a further check to see if the 
directory factory is an HDFSDirectoryFactory or not.

This seems to imply that Solr currently supports storing the update log in HDFS 
even if the actual indexes are on a normal file system.  Which seems odd, at 
the very least.  All our docs say to use HDFSDirectoryFactory if you want to 
store anything in HDFS, and there's nothing anywhere about storing the update 
logs separately from the indexes.  Is this a relic of past behaviour, or is it 
something that a) should be preserved by the refactoring I'm doing, and b) 
documented and tested?

Alan Woodward
www.flax.co.uk




Re: Solr 4.x to Solr 5 => org.noggit.JSONParser$ParseException

2015-02-23 Thread Alan Woodward
I think this means you've got an older version of noggit around.  You need 
version 0.6.

Alan Woodward
www.flax.co.uk


On 23 Feb 2015, at 13:00, Clemens Wyss DEV wrote:

> Just about to upgrade to Solr5. My UnitTests fail:
> 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error creating 
> core [1-de_CH]: null
> java.lang.ExceptionInInitializerError: null
>   at 
> org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at 
> org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at org.apache.solr.core.Config.(Config.java:152) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at org.apache.solr.core.Config.(Config.java:92) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at org.apache.solr.core.SolrConfig.(SolrConfig.java:180) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at 
> org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at 
> org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80)
>  ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) 
> [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) 
> [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   at 
> ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) 
> [target/:na]
> ...
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
>  [.cp/:na]
> Caused by: org.noggit.JSONParser$ParseException: Expected string: 
> char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo'
>   at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na]
>   at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na]
>   at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) 
> ~[noggit.jar:na]
>   at org.apache.solr.core.ConfigOverlay.(ConfigOverlay.java:213) 
> ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
>   ... 56 common frames omitted
> 
> Look like the exception occurs in the ConfigOverlay static block, line 213:
> editable_prop_map =  (Map)new ObjectBuilder(new JSONParser(new StringReader(
>  MAPPING))).getObject();
> 
> What is happening?



Re: Native library of plugin is loaded for every core

2015-05-27 Thread Alan Woodward
Does it work if you load it via the solr home /lib directory, rather than from 
the /lib directory of each individual core?

Alan Woodward
www.flax.co.uk


On 27 May 2015, at 08:45, adfel70 wrote:

> Hi guys, need your help:
> I added a custom plugins to Solr, to support my applicative needs (one index
> handler and 2 search components), all of them access a native library using
> JNI. The native library wrapper class loads the library using the regular
> pattern:
> 
> public class YWrapper{
>   static{
>   System.loadLibrary("YJNI");
>   }
>   ...
> }
> 
> 
> Basically things are working great, but when I try to create another
> collection, an exception is being thrown:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
> CREATing SolrCore 'anotherColl_shard1_replica1': Unable to create core
> [anotherColl_shard1_replica1] caused by: Native Library
> /...path_to_library/LibY.so already loaded in another classloader
> 
> I guess that this happens because every core has its own class loader. Is
> that right? Is there any way to define my plugin (my jar file) as a shared
> library, so it would only be loaded once when the process starts, and not on
> every core instantiation?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Native-library-of-plugin-is-loaded-for-every-core-tp4207996.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Ability to load solrcore.properties from zookeeper

2015-05-28 Thread Alan Woodward
I think this is an oversight, rather than intentional (at least, I certainly 
didn't intend to write it like this!).  The problem here will be that 
CoreDescriptors are currently built entirely from core.properties files, and 
the CoreLocators that construct them don't have any access to zookeeper.

Maybe the way forward is to move properties out of CoreDescriptor and have an 
entirely separate CoreProperties object that is built and returned by the 
ConfigSetService, and that is read via the ResourceLoader.  This would fit in 
quite nicely with the changes I put up on SOLR-7570, in that you could have 
properties specified on the collection config overriding properties from the 
configset, and then local core-specific properties overriding both.

Do you want to open a JIRA bug, Steve?

Alan Woodward
www.flax.co.uk


On 28 May 2015, at 00:58, Chris Hostetter wrote:

> : I am attempting to override some properties in my solrconfig.xml file by
> : specifying properties in a solrcore.properties file which is uploaded in
> : Zookeeper's collections/conf directory, though when I go to create a new
> : collection those properties are never loaded. One work-around is to specify
> 
> yeah ... that's weird ... it looks like the solrcore.properties reading 
> logic goes out ot it's way to read from the conf/ dir of the core, rather 
> then using the SolrResourceLoader (which is ZK aware in cloud mode)
> 
> I don't understand if this is intentional or some kind of weird oversight.
> 
> The relevent method is CoreDescriptor.loadExtraProperties()  By all means 
> please open a bug about this -- and if you're feeling up to it, tackle a 
> patch:  IIUC CoreDescriptor.loadExtraProperties is the relevent method ... 
> it would need to build up the path including the core name and get the 
> system level resource loader (CoreContainer.getResourceLoader()) to access 
> it since the core doesn't exist yet so there is no core level 
> ResourceLoader to use.
> 
> Hopefully some folks who are more recently familiar with the core loading 
> logic (like Alan & Erick) will see the Jira nad can chime in as to wether 
> there is some fundemental reason it has to work the way it does not, or if 
> this bug can be fixed.
> 
> 
> : easy way of updating those properties cluster-wide, I did attempt to
> : specify a request parameter of 'property.properties=solrcore.properties' in
> : the collection creation request but that also fails.
> 
> yeah, looks like regardless of the filename, that method loads it the same 
> way.
> 
> 
> -Hoss
> http://www.lucidworks.com/



Re: Ability to load solrcore.properties from zookeeper

2015-05-29 Thread Alan Woodward
Yeah, you could do it like that.  But looking at it further, I think 
solrcore.properties is actually being loaded in entirely the wrong place - it 
should be done by whatever is creating the CoreDescriptor, and then passed in 
as a Properties object to the CD constructor.  At the moment, you can't refer 
to a property defined in solrcore.properties within your core.properties file.

I'll open a JIRA if Steve hasn't already done so

Alan Woodward
www.flax.co.uk


On 28 May 2015, at 17:57, Chris Hostetter wrote:

> 
> : certainly didn't intend to write it like this!).  The problem here will 
> : be that CoreDescriptors are currently built entirely from 
> : core.properties files, and the CoreLocators that construct them don't 
> : have any access to zookeeper.
> 
> But they do have access to the CoreContainer which is passed to the 
> CoreDescriptor constructor -- it has all the ZK access you'd need at the 
> time when loadExtraProperties() is called.
> 
> correct?
> 
> as fleshed out in my last emil...
> 
> : > patch:  IIUC CoreDescriptor.loadExtraProperties is the relevent method 
> ... 
> : > it would need to build up the path including the core name and get the 
> : > system level resource loader (CoreContainer.getResourceLoader()) to 
> access 
> : > it since the core doesn't exist yet so there is no core level 
> : > ResourceLoader to use.
> 
> 
> -Hoss
> http://www.lucidworks.com/



Re: http request to MiniSolrCloudCluster

2016-05-12 Thread Alan Woodward
Hi Rohana,

What error messages do you get from curl?  MiniSolrCloudCluster just runs 
jetty, so you ought to be able to talk to it over HTTP.

Alan Woodward
www.flax.co.uk


On 12 May 2016, at 09:36, Rohana Rajapakse wrote:

> Hi,
> 
> Is it possible to make http requests (e.g. from cURL) to an active/running  
> MiniSolrCloudCluster?
> One of my existing projects use http requests to an EmbeddedSolrServer. Now I 
> am migrating to Solr-6/7 and trying to use MiniSolrCloudCluster. I have got a 
> MiniSolrCloudCluster up and running, but existing requests fails to talk to 
> my MiniSolrCloudCluster  using the url http://127.0.0.1:6028/solr/minicluster.
> Even the ping requests to this MiniSolrCloudCluster fails: 
> http://127.0.0.1:6028/solr/minicluster/admin/ping?wt=json&distrib=true&indent=true
> 
> Can someone please shed some light on this please?
> 
> Rohana
> 
> 
> Registered Office: 24 Darklake View, Estover, Plymouth, PL6 7TL.
> Company Registration No: 3553908
> 
> This email contains proprietary information, some or all of which may be 
> legally privileged. It is for the intended recipient only. If an addressing 
> or transmission error has misdirected this email, please notify the author by 
> replying to this email. If you are not the intended recipient you may not 
> use, disclose, distribute, copy, print or rely on this email.
> 
> Email transmission cannot be guaranteed to be secure or error free, as 
> information may be intercepted, corrupted, lost, destroyed, arrive late or 
> incomplete or contain viruses. This email and any files attached to it have 
> been checked with virus detection software before transmission. You should 
> nonetheless carry out your own virus check before opening any attachment. 
> GOSS Interactive Ltd accepts no liability for any loss or damage that may be 
> caused by software viruses.
> 
> 



Re: http request to MiniSolrCloudCluster

2016-05-12 Thread Alan Woodward
Are you sure that the cluster is running properly?  Probably worth checking its 
logs to make sure Solr has started correctly?

Alan Woodward
www.flax.co.uk


On 12 May 2016, at 12:48, Rohana Rajapakse wrote:

> Wait.
> With correct port, curl says : "curl: (52) Empty reply from server"
> 
> 
> -Original Message-----
> From: Alan Woodward [mailto:a...@flax.co.uk] 
> Sent: 12 May 2016 11:35
> To: solr-user@lucene.apache.org
> Subject: Re: http request to MiniSolrCloudCluster
> 
> Hi Rohana,
> 
> What error messages do you get from curl?  MiniSolrCloudCluster just runs 
> jetty, so you ought to be able to talk to it over HTTP.
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> On 12 May 2016, at 09:36, Rohana Rajapakse wrote:
> 
>> Hi,
>> 
>> Is it possible to make http requests (e.g. from cURL) to an active/running  
>> MiniSolrCloudCluster?
>> One of my existing projects use http requests to an EmbeddedSolrServer. Now 
>> I am migrating to Solr-6/7 and trying to use MiniSolrCloudCluster. I have 
>> got a MiniSolrCloudCluster up and running, but existing requests fails to 
>> talk to my MiniSolrCloudCluster  using the url 
>> http://127.0.0.1:6028/solr/minicluster.
>> Even the ping requests to this MiniSolrCloudCluster fails: 
>> http://127.0.0.1:6028/solr/minicluster/admin/ping?wt=json&distrib=true&indent=true
>> 
>> Can someone please shed some light on this please?
>> 
>> Rohana
>> 
>> 
>> Registered Office: 24 Darklake View, Estover, Plymouth, PL6 7TL.
>> Company Registration No: 3553908
>> 
>> This email contains proprietary information, some or all of which may be 
>> legally privileged. It is for the intended recipient only. If an addressing 
>> or transmission error has misdirected this email, please notify the author 
>> by replying to this email. If you are not the intended recipient you may not 
>> use, disclose, distribute, copy, print or rely on this email.
>> 
>> Email transmission cannot be guaranteed to be secure or error free, as 
>> information may be intercepted, corrupted, lost, destroyed, arrive late or 
>> incomplete or contain viruses. This email and any files attached to it have 
>> been checked with virus detection software before transmission. You should 
>> nonetheless carry out your own virus check before opening any attachment. 
>> GOSS Interactive Ltd accepts no liability for any loss or damage that may be 
>> caused by software viruses.
>> 
>> 
> 



Re: SolrCloud: A previous ephemeral live node still exists

2016-08-31 Thread Alan Woodward
It looks as though all four nodes are trying to register with ZK using the same 
hostname and port number - possibly they're all connecting as 'localhost'?

Alan Woodward
www.flax.co.uk


On 31 Aug 2016, at 09:34, Chris Rogers wrote:

> Just pinging this again as I sent it late last night. Would be great if 
> someone could help with this. It's got me totally stumped...
> 
> 
> Chris Rogers
> Digital Projects Manager
> Bodleian Digital Library Systems and Services
> chris.rog...@bodleian.ox.ac.uk
> 
> 
> From: Chris Rogers [chris.rog...@bodleian.ox.ac.uk]
> Sent: 30 August 2016 21:34
> To: solr-user@lucene.apache.org
> Subject: SolrCloud: A previous ephemeral live node still exists
> 
> Hi all,
> 
> I'm trying to create a SolrCloud setup with Vagrant boxes using Solr 6.2.0 
> and Zookeeper 3.4.8
> 
> I managed to get this to work perfectly with Solr 6.1.0, but I'm not able to 
> start more than one node in cloud mode with Solr 6.2.0.
> 
> I have four VMs connected on a private network of Vagrant boxes, one running 
> Zookeeper, and three Solr nodes.
> 
> The first Solr node connects to Zookeeper as expected. I can access the admin 
> and see the Cloud info there.
> 
> But when I try and connect with a second node, something goes wrong.
> 
> I get the following error in the Solr log:
> 
> 
> 2016-08-30 20:12:12.292 INFO  (main) [   ] o.e.j.u.log Logging initialized 
> @490ms
> 
> 2016-08-30 20:12:12.551 INFO  (main) [   ] o.e.j.s.Server 
> jetty-9.3.8.v20160314
> 
> 2016-08-30 20:12:12.574 INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider 
> Deployment monitor [file:///home/vagrant/solr-6.2.0/server/contexts/] at 
> interval 0
> 
> 2016-08-30 20:12:12.906 INFO  (main) [   ] 
> o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not find 
> org.apache.jasper.servlet.JspServlet
> 
> 2016-08-30 20:12:12.923 WARN  (main) [   ] o.e.j.s.SecurityHandler 
> ServletContext@o.e.j.w.WebAppContext@5383967b{/solr,file:///home/vagrant/solr-6.2.0/server/solr-webapp/webapp/,STARTING}{/home/vagrant/solr-6.2.0/server/solr-webapp/webapp}
>  has uncovered http methods for path: /
> 
> 2016-08-30 20:12:12.936 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 
> SolrDispatchFilter.init(): WebAppClassLoader=1465085305@57536d79
> 
> 2016-08-30 20:12:12.962 INFO  (main) [   ] o.a.s.c.SolrResourceLoader JNDI 
> not configured for solr (NoInitialContextEx)
> 
> 2016-08-30 20:12:12.962 INFO  (main) [   ] o.a.s.c.SolrResourceLoader using 
> system property solr.solr.home: /home/vagrant/solr-6.2.0/server/solr
> 
> 2016-08-30 20:12:12.963 INFO  (main) [   ] o.a.s.c.SolrResourceLoader new 
> SolrResourceLoader for directory: '/home/vagrant/solr-6.2.0/server/solr'
> 
> 2016-08-30 20:12:12.963 INFO  (main) [   ] o.a.s.c.SolrResourceLoader JNDI 
> not configured for solr (NoInitialContextEx)
> 
> 2016-08-30 20:12:12.963 INFO  (main) [   ] o.a.s.c.SolrResourceLoader using 
> system property solr.solr.home: /home/vagrant/solr-6.2.0/server/solr
> 
> 2016-08-30 20:12:12.987 INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using 
> default ZkCredentialsProvider
> 
> 2016-08-30 20:12:13.017 INFO  (main) [   ] o.a.s.c.c.ConnectionManager 
> Waiting for client to connect to ZooKeeper
> 
> 2016-08-30 20:12:13.118 INFO  (zkCallback-1-thread-1) [   ] 
> o.a.s.c.c.ConnectionManager Watcher 
> org.apache.solr.common.cloud.ConnectionManager@25b8acce 
> name:ZooKeeperConnection Watcher:172.28.128.3:2181 got event WatchedEvent 
> state:SyncConnected type:None path:null path:null type:None
> 
> 2016-08-30 20:12:13.119 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
> is connected to ZooKeeper
> 
> 2016-08-30 20:12:13.119 INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using 
> default ZkACLProvider
> 
> 2016-08-30 20:12:13.130 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading 
> solr.xml from SolrHome (not found in ZooKeeper)
> 
> 2016-08-30 20:12:13.133 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
> container configuration from /home/vagrant/solr-6.2.0/server/solr/solr.xml
> 
> 2016-08-30 20:12:13.233 INFO  (main) [   ] o.a.s.c.CorePropertiesLocator 
> Config-defined core root directory: /home/vagrant/solr-6.2.0/server/solr
> 
> 2016-08-30 20:12:13.264 INFO  (main) [   ] o.a.s.c.CoreContainer New 
> CoreContainer 182259421
> 
> 2016-08-30 20:12:13.265 INFO  (main) [   ] o.a.s.c.CoreContainer Loading 
> cores into CoreContainer [instanceDir=/home/vagrant/solr-6.2.0/server/solr]
> 
> 2016-08-30 20:12:13.266 WARN  (main) [   ] o.a.s.c.CoreContainer Couldn't add 
> files from /home/vagrant/solr-6.2.0/server/solr/lib to classpath: 
> /hom

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

2016-09-02 Thread Alan Woodward
This looks very useful!  It would be nice if you could also query multiple 
fields at the same time, to give more edismax-like functionality.  In fact, you 
could probably extend this slightly to almost entirely replace edismax, by 
allowing multiple fields and multiple analysis paths.

Alan Woodward
www.flax.co.uk


> On 2 Sep 2016, at 01:45, Doug Turnbull  
> wrote:
> 
> I wanted to solicit feedback on my query parser, the match query parser (
> https://github.com/o19s/match-query-parser). It's a work in progress, so
> any thoughts from the community would be welcome.
> 
> The point of this query parser is that it's not a query parser!
> 
> Instead, it's a way of selecting any analyzer to apply to the query string. I
> use it for all kinds of things, finely controlling a bigram phrase search,
> searching with stemmed vs exact variants of the query.
> 
> But it's biggest value to me is as a fix for multiterm synonyms. Because
> I'm not giving the user's query to any underlying query parser -- I'm
> always just doing analysis. So I know my selected analyzer will not be
> disrupted by whitespace-based query parsing prior to query analysis.
> 
> Those of you also in the Elasticsearch community may be familiar with the
> match query (
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> ). This is similar, except it also lets you select whether to turn the
> resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> phrase query body:"sea biscuit" likes to fish. See the examples above for
> more.
> 
> It's also similar to Solr's field query parser. However the field query
> parser tries to turn the fully analyzed token stream into a phrase query.
> Moreover, the field query parser can only select the field's own query-time
> analyzer, while the match query parser let's you select an arbitrary
> analyzer. So match has more bells and whistles and acts as a compliment to
> the field qp.
> 
> Thanks for any thoughts, feedback, or critiques
> 
> Best,
> -Doug



Re: Wrong highlighting in stripped HTML field

2016-09-08 Thread Alan Woodward
Hi, see https://issues.apache.org/jira/browse/SOLR-4686 
<https://issues.apache.org/jira/browse/SOLR-4686> - this is an ongoing point of 
contention!

Alan Woodward
www.flax.co.uk


> On 8 Sep 2016, at 09:38, Duck Geraint (ext) GBJH  
> wrote:
> 
> As far as I can tell, that is how it's currently set-up (does the same on 
> mine at least). The HTML Stripper seems to exclude the pre tag, but include 
> the post tag when it generates the start and end offsets of each text token. 
> I couldn't say why though... (This may just have avoided needing to 
> backtrack).
> 
> Play around in the analysis section of the admin ui to verify this.
> 
> Geraint
> 
> 
> -Original Message-
> From: Neumann, Dennis [mailto:neum...@sub.uni-goettingen.de]
> Sent: 07 September 2016 18:16
> To: solr-user@lucene.apache.org
> Subject: AW: Wrong highlighting in stripped HTML field
> 
> Hello,
> can anyone confirm this behavior of the highlighter? Otherwise my Solr 
> installation might be misconfigured or something.
> Or does anyone know if this is a known issue? In that case I probably should 
> ask on the dev mailing list.
> 
> Thanks and cheers,
> Dennis
> 
> 
> 
> Von: Neumann, Dennis [neum...@sub.uni-goettingen.de]
> Gesendet: Montag, 5. September 2016 18:00
> An: solr-user@lucene.apache.org
> Betreff: Wrong highlighting in stripped HTML field
> 
> Hi guys
> 
> I am having a problem with the standard highlighter. I'm working with Solr 
> 5.4.1. The problem appears in my project, but it is easy to replicate:
> 
> I create a new core with the conf directory from configsets/basic_configs, so 
> everything is set to defaults. I add the following in schema.xml:
> 
> 
> required="false" multiValued="false" />
> 
>
>  
>
>
>  
>  
>
>  
>
> 
> 
> Now I add this document (in the admin interface):
> 
> {"id":"1","testfield":"bla"}
> 
> I search for: testfield:bla
> with hl=on&hl.fl=testfield
> 
> What I get is a response with an incorrectly formatted HTML snippet:
> 
> 
>  "response": {
>"numFound": 1,
>"start": 0,
>"docs": [
>  {
>"id": "1",
>"testfield": "bla",
>"_version_": 1544645963570741200
>  }
>]
>  },
>  "highlighting": {
>"1": {
>  "testfield": [
>"bla"
>  ]
>}
>  }
> 
> Is there a way to tell the highlighter to just enclose the "bla"? I. e. I 
> want to get
> 
> bla
> 
> 
> Best regards
> Dennis
> 
> 
> 
> 
> 
> Syngenta Limited, Registered in England No 2710846; Registered Office : 
> Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire, 
> RG42 6EY, United Kingdom
> 
> This message may contain confidential information. If you are not the 
> designated recipient, please notify the sender immediately, and delete the 
> original and any copies. Any use of the message by you is prohibited.



Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Alan Woodward
This should work:

SolrCore solrCore 
= coreContainer.create(coreName, Paths.get(coreHome).resolve(coreName), 
Collections.emptyMap());


Alan Woodward
www.flax.co.uk


> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
> 
> Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
> with a core where the dataDir is located somewhere outside of where the
> config is located.
> 
> I'd like to do this without system properties, and all through Java code.
> 
> In Solr 5.x I was able to do this with the following code:
> 
> CoreContainer coreContainer = new CoreContainer(solrHome);
> coreContainer.load();
> 
> Properties props = new Properties();
> props.setProperty("dataDir", dataDir + "/" + coreName);
> 
> CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
> new File(coreHome, coreName).getAbsolutePath(), props);
> 
> SolrCore solrCore = coreContainer.create(descriptor);
> new EmbeddedSolrServer(coreContainer, coreName);
> 
> 
> The CoreContainer API changed a bit in 6.x and you can no longer pass in a
> descriptor. I've tried a couple of things with the current API, but haven't
> been able to get it working.
> 
> Any ideas are appreciated.
> 
> Thanks,
> 
> Bryan



Re: EmbeddedSolrServer and Core dataDir in Solr 6.x

2016-10-03 Thread Alan Woodward
Ah, I see what you mean.  Putting the dataDir property into the Map certainly 
ought to work - can you write a test case that shows what’s happening?

Alan Woodward
www.flax.co.uk


> On 3 Oct 2016, at 23:50, Bryan Bende  wrote:
> 
> Alan,
> 
> Thanks for the response. I will double-check, but I believe that is going
> to put the data directory for the core under coreHome/coreName.
> 
> What I am trying to setup (and did a poor job of explaining) is something
> like the following...
> 
> - Solr home in src/test/resources/solr
> - Core home in src/test/resources/myCore
> - dataDir for the myCore in target/myCore (or something not in the source
> tree).
> 
> This way the unit tests can use the Solr home and core config that is under
> version control, but the data from testing would be written somewhere not
> under version control.
> 
> in 5.x I was specifying the dataDir through the properties object... I
> would calculate the path to the target dir in Java code relative to the
> class file, and then pass that as dataDir to the following:
> 
> Properties props = new Properties();
> props.setProperty("dataDir", dataDir + "/" + coreName);
> 
> In 6.x it seems like Properties has been replaced with the
> Map ? and I tried putting dataDir in there, but didn't seem
> to do anything.
> 
> For now I have just been using RAMDirectoryFactory so that no data ever
> gets written to disk.
> 
> I'll keep trying different things, but if you have any thoughts let me know.
> 
> Thanks,
> 
> Bryan
> 
> 
> On Mon, Oct 3, 2016 at 2:07 PM, Alan Woodward  wrote:
> 
>> This should work:
>> 
>> SolrCore solrCore
>>= coreContainer.create(coreName, 
>> Paths.get(coreHome).resolve(coreName),
>> Collections.emptyMap());
>> 
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>>> On 3 Oct 2016, at 18:41, Bryan Bende  wrote:
>>> 
>>> Curious if anyone knows how to create an EmbeddedSolrServer in Solr 6.x,
>>> with a core where the dataDir is located somewhere outside of where the
>>> config is located.
>>> 
>>> I'd like to do this without system properties, and all through Java code.
>>> 
>>> In Solr 5.x I was able to do this with the following code:
>>> 
>>> CoreContainer coreContainer = new CoreContainer(solrHome);
>>> coreContainer.load();
>>> 
>>> Properties props = new Properties();
>>> props.setProperty("dataDir", dataDir + "/" + coreName);
>>> 
>>> CoreDescriptor descriptor = new CoreDescriptor(coreContainer, coreName,
>>> new File(coreHome, coreName).getAbsolutePath(), props);
>>> 
>>> SolrCore solrCore = coreContainer.create(descriptor);
>>> new EmbeddedSolrServer(coreContainer, coreName);
>>> 
>>> 
>>> The CoreContainer API changed a bit in 6.x and you can no longer pass in
>> a
>>> descriptor. I've tried a couple of things with the current API, but
>> haven't
>>> been able to get it working.
>>> 
>>> Any ideas are appreciated.
>>> 
>>> Thanks,
>>> 
>>> Bryan
>> 
>> 



Re: bash to get doc count

2016-10-05 Thread Alan Woodward
tr -d ‘0-9’ is removing all numbers from the line, which I’m guessing is the 
opposite of what you want?

Alan Woodward
www.flax.co.uk


> On 5 Oct 2016, at 20:17, KRIS MUSSHORN  wrote:
> 
> Will someone please tell me why this stores the text "numDocs" instead of 
> returning the number of docs in the core? 
> 
> #!/bin/bash 
> DOC_COUNT=`wget -O- -q 
> $SOLR_HOST'admin/cores?action=STATUS&core='$SOLR_CORE_NAME'&wt=json&indent=true'
>  | grep numDocs | tr -d '0-9'` 
> 
> TIA 
> 
> Kris 



Re: org.apache.lucene.index.CheckIndex throws Illegal initial capacity: -16777216

2017-06-17 Thread Alan Woodward
Solr/Lucene 6 can’t read 4.6 index files, only 5.x ones.  So you’ll need to 
upgrade from 4.6 to 5.x using the upgrade tool from the latest 5.x release, 
then from 5.x to 6 using the current upgrade tool.

Alan Woodward
www.flax.co.uk


> On 17 Jun 2017, at 10:08, Moritz Michael  wrote:
> 
> Hello,
> 
> I'm trying to upgrade a Solr 4.6 index to Solr 6.
> The upgrade does fail with an error.
> 
> I tried to check the index with org.apache.lucene.index.CheckIndex using
> this command:
> java -cp lucene-core-5.5.4.jar:lucene-backward-codecs-5.5.4.jar
> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
> ./[PATH-TO-INDEX]/data/index
> 
> The check fail with this error:
> 
> Opening index @ ./[PATH-TO-INDEX]/data/index
>> 
>> ERROR: could not read any segments file in directory
>> java.lang.IllegalArgumentException: Illegal initial capacity: -16777216
>>at java.util.HashMap.(HashMap.java:448)
>>at java.util.HashMap.(HashMap.java:467)
>>at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:393)
>>at
>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:488)
>>at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2407)
>>at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2309)
>>at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2235)
>> 
> 
> I tried this with Cygwin and the Windows 10 Ubuntu subsystem with the same
> result.
> 
> Any ideas?
> 
> Best
> Moritz



Re: Moving to Point, trouble with IntPoint.newRangeQuery()

2017-09-26 Thread Alan Woodward
The Points queries use a completely different data structure to the previous 
range queries, so you can’t just use them interchangeably, you have to reindex 
your data.  I’m guessing your ‘d1’ field here is a TrieIntField or similar?

Alan Woodward
www.flax.co.uk


> On 26 Sep 2017, at 12:22, Markus Jelsma  wrote:
> 
> Hello,
> 
> I have a QParser impl. that transforms text input to one or more integers, it 
> makes a BooleanQuery one a field with all integers in OR-more. It used to 
> work by transforming the integer using LegacyNumericUtils.intToPrefixCoded, 
> getting a BytesRef.
> 
> I have now moved it to use IntPoint.newRangeQuery(field, integer, integer), i 
> read (think javadocs) this is the way to go, but i get no matches!
> 
>Iterator i = digests.iterator();
>while (i.hasNext()) {
>  Integer digest = i.next();
>  queryBuilder.add(IntPoint.newRangeQuery(field, digest, digest), 
> Occur.SHOULD);
>}
>return queryBuilder.build();
> 
> To be sure i didn't mess up elsewhere i also tried building a string for 
> LuceneQParser and cheat:
> 
>Iterator i = digests.iterator();
>while (i.hasNext()) {
>  Integer digest = i.next();
>  str.append(ClientUtils.escapeQueryChars(digest.toString()));
>  if (i.hasNext()) {
>str.append(" OR ");
>  }
>}
>QParser luceneQParser = new LuceneQParser(str.append(")").toString(), 
> localParams, params, req);
>return luceneQParser.parse();
> 
> Well, this works! This is their respective debug output:
> 
> Using the IntPoint range query:
> 
> 
> 
> 
>  {!q  f=d1}value
>  {!q  f=d1}value
>  (d1:[-1820898630 TO -1820898630])
>  d1:[-1820898630 TO -1820898630]
> 
> LuceneQParser cheat, it does find!
> 
> 
>  
>1
>-1820898630
> 
> 
>  {!qd f=d1}value
>  {!qd f=d1}value
>  d1:-1820898630
> 
> There is not much difference in output, it looks fine, using LuceneQParser 
> you can also match using a range query, so what am i doing wrong?
> 
> Many thanks!
> Markus



Re: problem executing a query using lucene directly

2016-12-22 Thread Alan Woodward
Hi, 

FieldValueQuery reports matches using docvalues, and it looks like they’re not 
enabled on that field.

Alan Woodward
www.flax.co.uk


> On 22 Dec 2016, at 16:21, Roxana Danger  
> wrote:
> 
> Hi all,
> 
> I have created an index using solr. I am trying to execute the following
> code, but I get zero results in the count.
> 
> DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
> File(indexDir).toPath()));
> IndexSearcher searcher = new IndexSearcher( dr );
> 
> System.out.println(dr.maxDoc()); // Shows 200
> Query query = new FieldValueQuery("table");
> CollectionStatistics stats = searcher.collectionStatistics("table");
> System.out.println(stats.docCount()); // Shows 200
> 
> System.out.println(searcher.count(query)); //Shows 0, should be 200
> 
> The definition of the table filed in the schema.xml is:
> 
>  required="true" multiValued="false"/>
> 
> 
> Any idea, why this could be happening? Why the search with the
> FieldValueQuery is not returning the correct result?
> 
> Thank you very much in advance.
> 
> -- 
> Reed Online Ltd is a company registered in England and Wales. Company 
> Registration Number: 6317279.
> Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.



Re: problem executing a query using lucene directly

2016-12-22 Thread Alan Woodward
Solr wraps its IndexReader in an UninvertingReader, which builds doc-values 
structures in memory if required.  If you include the solr jar file on your 
classpath, you should be able to use UninvertingReader.wrap() to do something 
similar.

Alan Woodward
www.flax.co.uk


> On 22 Dec 2016, at 17:58, Roxana Danger  
> wrote:
> 
> Hi Alan,
> thank you very much, but I am not sure if this is the reason.
> 
> but if I use the solrSearcher, FieldValueQuery works well, using the same
> index.
> If SolrIndexSearcher enable this feature, how does it do it?
> 
> Thank you again!
> 
> 
> 
> 
> On 22 December 2016 at 17:34, Alan Woodward  wrote:
> 
>> Hi,
>> 
>> FieldValueQuery reports matches using docvalues, and it looks like they’re
>> not enabled on that field.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>>> On 22 Dec 2016, at 16:21, Roxana Danger 
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have created an index using solr. I am trying to execute the following
>>> code, but I get zero results in the count.
>>> 
>>> DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
>>> File(indexDir).toPath()));
>>> IndexSearcher searcher = new IndexSearcher( dr );
>>> 
>>> System.out.println(dr.maxDoc()); // Shows 200
>>> Query query = new FieldValueQuery("table");
>>> CollectionStatistics stats = searcher.collectionStatistics("table");
>>> System.out.println(stats.docCount()); // Shows 200
>>> 
>>> System.out.println(searcher.count(query)); //Shows 0, should be 200
>>> 
>>> The definition of the table filed in the schema.xml is:
>>> 
>>> >> required="true" multiValued="false"/>
>>> 
>>> 
>>> Any idea, why this could be happening? Why the search with the
>>> FieldValueQuery is not returning the correct result?
>>> 
>>> Thank you very much in advance.
>>> 
>>> --
>>> Reed Online Ltd is a company registered in England and Wales. Company
>>> Registration Number: 6317279.
>>> Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.
>> 
>> 
> 
> 
> -- 
> Roxana Danger | Senior Data Scientist
> Dragon Court, 27-29 Macklin Street, London, WC2B 5LX
> Tel: 020 7067 4568   Ext:
> [image: reed.co.uk] <http://www.reed.co.uk/>
> The UK's #1 job site. <http://www.reed.co.uk/>
> [image: Follow us on Twitter] <https://twitter.com/reedcouk>
> <https://www.linkedin.com/company/reed-co-uk> [image: Like us on Facebook]
> <https://www.facebook.com/reedcouk/>
> <https://plus.google.com/u/0/+reedcouk/posts>
> It's time to Love Mondays » <http://www.reed.co.uk/lovemondays>
> 
> -- 
> Reed Online Ltd is a company registered in England and Wales. Company 
> Registration Number: 6317279.
> Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.



Re: CDCR logging is Needlessly verbose, fills up the file system fast

2017-01-03 Thread Alan Woodward
It’s org.apache.solr.core.SolrCore.Request - not an actual class.

Alan Woodward
www.flax.co.uk


> On 3 Jan 2017, at 16:08, Webster Homer  wrote:
> 
> I am working on changing the log rotation, but looking at the message:
> 
> 2016-12-21 23:24:41.653 INFO  (qtp110456297-18) [c:sial-catalog-material
> s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
> o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
> path=/cdcr params={qt=/cdcr&action=BOOTSTRAP_STATUS&wt=javabin&version=2}
> status=0 QTime=0
> 
> I can't tell which class is generating it
> 
> On Thu, Dec 29, 2016 at 3:15 PM, Erick Erickson 
> wrote:
> 
>> Seems like a bandaid would be to insure your Solr logs rotate
>> appropriately quickly.
>> 
>> That doesn't address the CDCR loging verbosity, but it might get you by.
>> 
>> You can also change the logging at the class level by appropriately
>> editing the
>> log4j properties file. Again perhaps not the best solution but one
>> that's immediately
>> available.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Dec 29, 2016 at 10:37 AM, Webster Homer 
>> wrote:
>>> The logs filled up the file system and caused CDCR to fail due to a
>>> corrupted Tlog file.
>>> 
>>> On Thu, Dec 22, 2016 at 9:10 AM, Webster Homer 
>>> wrote:
>>> 
>>>> While testing CDCR I found that it is writing tons of log messages per
>>>> second. Example:
>>>> 2016-12-21 23:24:41.652 INFO  (qtp110456297-13) [c:sial-catalog-material
>>>> s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
>>>> o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
>>>> path=/cdcr params={qt=/cdcr&action=BOOTSTRAP_STATUS&wt=javabin&
>> version=2}
>>>> status=0 QTime=0
>>>> 2016-12-21 23:24:41.653 INFO  (qtp110456297-18) [c:sial-catalog-material
>>>> s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
>>>> o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
>>>> path=/cdcr params={qt=/cdcr&action=BOOTSTRAP_STATUS&wt=javabin&
>> version=2}
>>>> status=0 QTime=0
>>>> 2016-12-21 23:24:41.655 INFO  (qtp110456297-14) [c:sial-catalog-material
>>>> s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
>>>> o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
>>>> path=/cdcr params={qt=/cdcr&action=BOOTSTRAP_STATUS&wt=javabin&
>> version=2}
>>>> status=0 QTime=0
>>>> 2016-12-21 23:24:41.657 INFO  (qtp110456297-17) [c:sial-catalog-material
>>>> s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
>>>> o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
>>>> path=/cdcr params={qt=/cdcr&action=BOOTSTRAP_STATUS&wt=javabin&
>> version=2}
>>>> status=0 QTime=0
>>>> 
>>>> 
>>>> These should be DEBUG messages and NOT INFO messages. Is there a way to
>>>> selectively turn them off?  The above is from a Target collection, it is
>>>> even worse on the Source side.
>>>> 
>>>> I'd rather not change my logging level as most INFO messages are useful.
>>>> 
>>>> This is a very poor default logging level for these messages.
>>>> 
>>> 
>>> --
>>> 
>>> 
>>> This message and any attachment are confidential and may be privileged or
>>> otherwise protected from disclosure. If you are not the intended
>> recipient,
>>> you must not copy this message or attachment or disclose the contents to
>>> any other person. If you have received this transmission in error, please
>>> notify the sender immediately and delete the message and any attachment
>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not accept liability for any omissions or errors in this
>>> message which may arise as a result of E-Mail-transmission or for damages
>>> resulting from any unauthorized changes of the content of this message
>> and
>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not guarantee that this message is free of viruses and
>> does
>>> not accept liability for any damages caused by any virus transmitted
>>> therewith.
>>> 
>>> Click http://www.merckgroup.com/disclaimer to access the German, French,
>>> Spanish and Portu

Re: Trouble boosting a field

2017-01-14 Thread Alan Woodward
http://splainer.io/ <http://splainer.io/> from the gents at 
OpenSourceConnections is pretty good for this sort of thing, I find…

Alan Woodward
www.flax.co.uk


> On 13 Jan 2017, at 16:35, Tom Chiverton  wrote:
> 
> Well, I've tried much larger values than 8, and it still doesn't seem to do 
> the job ?
> 
> For now, assume my users are searching for exact sub strings of a real title.
> 
> Tom
> 
> 
> On 13/01/17 16:22, Walter Underwood wrote:
>> I use a boost of 8 for title with no boost on the content. Both Infoseek and 
>> Inktomi settled on the 8X boost, getting there with completely different 
>> methodologies.
>> 
>> You might not want the title to completely trump the content. That causes 
>> some odd anomalies. If someone searches for “ice age 2”, do you really want 
>> every title with “2” to come before “ice age two”? Or a search for “steve 
>> jobs” to return every article with “job” or “jobs” in the title first?
>> 
>> Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five 
>> years ago.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:
>>> 
>>> I have a few hundred documents with title and content fields.
>>> 
>>> I want a match in title to trump matches in content. If I search for 
>>> "connected vehicle" then a news article that has that in the content 
>>> shouldn't be ranked higher than the page with that in the title is 
>>> essentially what I want.
>>> 
>>> I have tried dismax with qf=title^2 as well as several other variants with 
>>> the standard query parser (like q="title:"foo"^2 OR content:"foo") but 
>>> documents without the search term in the title still come out before those 
>>> with the term in the title when ordered by score.
>>> 
>>> Is there something I am missing ?
>>> 
>>> From the docs, something like q=title:"connected vehicle"^2 OR 
>>> content:"connected vehicle" should have worked ? Even using ^100 didn't 
>>> help.
>>> 
>>> I tried with the dismax parser using
>>> 
>>>   "q": "Connected Vehicle",
>>>   "defType": "dismax",
>>>   "indent": "true",
>>>   "qf": "title^2000 content",
>>>   "pf": "pf=title^4000 content^2",
>>>   "sort": "score desc",
>>>   "wt": "json",
>>> 
>>> but that was not better. if I remove content from pf/qf then documents seem 
>>> to rank correctly.
>>> Example query and results (content omitted) : http://pastebin.com/5EhrRJP8 
>>> <http://pastebin.com/5EhrRJP8> with managed-schema 
>>> http://pastebin.com/mdraWQWE <http://pastebin.com/mdraWQWE>
>>> 
>>> -- 
>>> 
>>> 
>>> 
>>> Tom Chiverton
>>> Lead Developer
>>> 
>>> e:   <mailto:t...@extravision.com>t...@extravision.com 
>>> <mailto:t...@extravision.com>
>>> p:  0161 817 2922
>>> t:  @extravision <http://www.twitter.com/extravision>
>>> w:   <http://www.extravision.com/>www.extravision.com 
>>> <http://www.extravision.com/>
>>> 
>>>  <http://www.extravision.com/>
>>> 
>>> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, 
>>> M15 4LD.
>>> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>>> 
>>> This e-mail is intended solely for the person to whom it is addressed and 
>>> may contain confidential or privileged information.
>>> Any views or opinions presented in this e-mail are solely of the author and 
>>> do not necessarily represent those of Extravision Ltd.
>>> 
>> 
>> __
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> __
> 



Re: Trouble boosting a field

2017-01-16 Thread Alan Woodward
Just accessible from your browser, so if you have a machine that’s inside your 
firewall but can see the outside world then it will work.

Alan Woodward
www.flax.co.uk


> On 16 Jan 2017, at 09:47, Tom Chiverton  wrote:
> 
> Ohh, that's handy ! But it needs Solr/ElasticSearch to be publicly accessible 
> ?
> 
> 
> On 14/01/17 09:23, Alan Woodward wrote:
>> http://splainer.io/ <http://splainer.io/> from the gents at 
>> OpenSourceConnections is pretty good for this sort of thing, I find…
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>>> On 13 Jan 2017, at 16:35, Tom Chiverton  wrote:
>>> 
>>> Well, I've tried much larger values than 8, and it still doesn't seem to do 
>>> the job ?
>>> 
>>> For now, assume my users are searching for exact sub strings of a real 
>>> title.
>>> 
>>> Tom
>>> 
>>> 
>>> On 13/01/17 16:22, Walter Underwood wrote:
>>>> I use a boost of 8 for title with no boost on the content. Both Infoseek 
>>>> and Inktomi settled on the 8X boost, getting there with completely 
>>>> different methodologies.
>>>> 
>>>> You might not want the title to completely trump the content. That causes 
>>>> some odd anomalies. If someone searches for “ice age 2”, do you really 
>>>> want every title with “2” to come before “ice age two”? Or a search for 
>>>> “steve jobs” to return every article with “job” or “jobs” in the title 
>>>> first?
>>>> 
>>>> Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five 
>>>> years ago.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>> 
>>>>> On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:
>>>>> 
>>>>> I have a few hundred documents with title and content fields.
>>>>> 
>>>>> I want a match in title to trump matches in content. If I search for 
>>>>> "connected vehicle" then a news article that has that in the content 
>>>>> shouldn't be ranked higher than the page with that in the title is 
>>>>> essentially what I want.
>>>>> 
>>>>> I have tried dismax with qf=title^2 as well as several other variants 
>>>>> with the standard query parser (like q="title:"foo"^2 OR content:"foo") 
>>>>> but documents without the search term in the title still come out before 
>>>>> those with the term in the title when ordered by score.
>>>>> 
>>>>> Is there something I am missing ?
>>>>> 
>>>>> From the docs, something like q=title:"connected vehicle"^2 OR 
>>>>> content:"connected vehicle" should have worked ? Even using ^100 didn't 
>>>>> help.
>>>>> 
>>>>> I tried with the dismax parser using
>>>>> 
>>>>>   "q": "Connected Vehicle",
>>>>>   "defType": "dismax",
>>>>>   "indent": "true",
>>>>>   "qf": "title^2000 content",
>>>>>   "pf": "pf=title^4000 content^2",
>>>>>   "sort": "score desc",
>>>>>   "wt": "json",
>>>>> 
>>>>> but that was not better. if I remove content from pf/qf then documents 
>>>>> seem to rank correctly.
>>>>> Example query and results (content omitted) : 
>>>>> http://pastebin.com/5EhrRJP8 <http://pastebin.com/5EhrRJP8> with 
>>>>> managed-schema http://pastebin.com/mdraWQWE <http://pastebin.com/mdraWQWE>
>>>>> 
>>>>> -- 
>>>>> 
>>>>> 
>>>>> 
>>>>> Tom Chiverton
>>>>> Lead Developer
>>>>> 
>>>>> e: <mailto:t...@extravision.com>t...@extravision.com 
>>>>> <mailto:t...@extravision.com>
>>>>> p:0161 817 2922
>>>>> t:@extravision <http://www.twitter.com/extravision>
>>>>> w: <http://www.extravision.com/>www.extravision.com 
>>>>> <http://www.extravision.com/>
>>>>> 
>>>>>  <http://www.extravision.com/>
>>>>> 
>>>>> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, 
>>>>> M15 4LD.
>>>>> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>>>>> 
>>>>> This e-mail is intended solely for the person to whom it is addressed and 
>>>>> may contain confidential or privileged information.
>>>>> Any views or opinions presented in this e-mail are solely of the author 
>>>>> and do not necessarily represent those of Extravision Ltd.
>>>>> 
>>>> __
>>>> This email has been scanned by the Symantec Email Security.cloud service.
>>>> For more information please visit http://www.symanteccloud.com
>>>> __
>> 
>> __
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> __
> 



Re: Query.extractTerms dissapeared from 5.1.0 to 5.2.0

2017-02-01 Thread Alan Woodward
Hi, extractTerms() is now on Weight rather than on Query.

Alan

> On 1 Feb 2017, at 17:43, Max Bridgewater  wrote:
> 
> Hi,
> 
> It seems Query.extractTerms() disapeared from 5.1.0 (
> http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/Query.html)
> to 5.2.0 (
> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Query.html
> ).
> 
> However, I cannot find any comment on it in 5.2.0 release notes. Any
> recommendation on what I should use in place of that method? I am migrating
> some legacy code from Solr 4 to Solr 6.
> 
> Thanks,
> Max.



Re: Announcing Marple, a RESTful API & GUI for inspecting Lucene indexes

2017-02-27 Thread Alan Woodward
At the moment it only works with indexes accessible via the filesystem (ie via 
java.nio.Paths.get()).  But pull requests are always welcome :)

Alan Woodward
www.flax.co.uk


> On 27 Feb 2017, at 14:56, Joe Obernberger  
> wrote:
> 
> Hi Charlie - will this work with an index stored in HDFS as written by Solr 
> Cloud?
> 
> -Joe
> 
> 
> On 2/24/2017 12:24 PM, Charlie Hull wrote:
>> Hi all,
>> 
>> Very pleased to announce the first release of Marple, an open source tool 
>> for inspecting Lucene indexes. We've blogged about it here:
>> http://www.flax.co.uk/blog/2017/02/24/release-1-0-marple-lucene-index-detective/
>>  
>> which contains links to the code and released JAR.
>> 
>> This is very much a work in progress (we started it at the Lucene hackday we 
>> ran in London last autumn) so contributions, bug reports & feature requests 
>> very welcome! We'll also be talking about it at the next London Lucene/Solr 
>> Meetup on March 23rd.
>> 
>> Best
>> 
>> Charlie
> 



Context-aware suggesters in Solr

2014-03-28 Thread Alan Woodward
Hi all,

I have a few of questions about the context-aware AnalyzingInfixSuggester:
- is it possible to choose a specific field for the context at runtime (say, I 
want to limit suggestions by a field that I've already faceted on), or is it 
limited to the hardcoded CONTEXTS_FIELD_NAME?
- is the context-aware functionality exposed to Solr yet?
- how difficult would it be to add similar functionality to the other 
suggesters, if say I only wanted to do prefix matching?

Thanks,

Alan Woodward
www.flax.co.uk




Re: Context-aware suggesters in Solr

2014-03-30 Thread Alan Woodward
Thanks Areek.  So looking at the code in trunk, exposing it to Solr looks to be 
pretty straightforward - just extending DocumentDictionaryFactory to take a 
'contextField' parameter as well, and passing that on to the DocumentDictionary 
constructor.  I'll give it a go!

Thanks again.

Alan Woodward
www.flax.co.uk


On 29 Mar 2014, at 22:29, Areek Zillur wrote:

> The context field can only be set at configuration-time for the
> AnalyzingInfixSuggester (FYI: CONTEXTS_FIELD_NAME refers to the field in
> Lucene index that is internally maintained by the suggester and does not
> reflect any field in user's index). The context field can be specified and
> fed into the suggester using DocumentDictionary,
> DocumentValueSourceDictionary etc, (the support for contexts in
> FileDictionary is not there yet).
> 
> The context-aware functionality is not yet exposed to Solr.
> 
> There were attempts made to make Analyzing/FuzzySuggester to be
> context-aware (LUCENE-5350; patch might be outdated), but its still not in
> trunk (see jira discussion).
> 
> Hope that helps,
> 
> Areek
> 
> 
> On Fri, Mar 28, 2014 at 3:47 AM, Alan Woodward  wrote:
> 
>> Hi all,
>> 
>> I have a few of questions about the context-aware AnalyzingInfixSuggester:
>> - is it possible to choose a specific field for the context at runtime
>> (say, I want to limit suggestions by a field that I've already faceted on),
>> or is it limited to the hardcoded CONTEXTS_FIELD_NAME?
>> - is the context-aware functionality exposed to Solr yet?
>> - how difficult would it be to add similar functionality to the other
>> suggesters, if say I only wanted to do prefix matching?
>> 
>> Thanks,
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> 



Re: Using ExternalFileField on SolrCloud

2014-04-22 Thread Alan Woodward
Hi Varun,

SolrCloud only uses zk to store configuration, not data.  Each solr instance 
will have its own data folder, and you'll need to sync the ExternalFileField 
file to each one.  I don't think you can do this with anything internal to Solr 
at the moment - we've had clients use basic rsync to do it, though, and that 
works well.

Alan Woodward
www.flax.co.uk


On 22 Apr 2014, at 05:57, Varun Gupta wrote:

> Hi,
> 
> I am trying to use ExternalFileField on Solr 4.6 running on SolrCloud for
> the purpose of changing the document score based on a frequently changed
> field. According to the documentation, the external file needs to be
> present in the "data" folder of the collection.
> 
> I am confused over here on where should I upload the external file on
> zookeeper so that the file will end up in the "data" folder? I can see
> "/configs/" and "/collections/" in my
> zookeeper instance. Am I right in trying to propagate the external file
> using zookeeper or should I be looking into some other way to sync the file
> to all solr instances.
> 
> --
> Thanks
> Varun Gupta



Re: DIH issues with 4.7.1

2014-04-25 Thread Alan Woodward
Hi Jonathan,

It's a known bug: https://issues.apache.org/jira/browse/SOLR-5954.  It'll be 
fixed in 4.8, which is being voted on now.

Alan Woodward
www.flax.co.uk


On 25 Apr 2014, at 18:56, Hutchins, Jonathan wrote:

> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
> process that we are using takes 4x as long to complete.  The only odd
> thing I notice is when I enable debug logging for the dataimporthandler
> process, it appears that in the new version each sql query is resulting in
> a new connection opened through jdbcdatasource (log:
> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
> the speed of running a full import?
> 
> Thanks!
> 
> - Jonathan Hutchins
> 
> 



Re: Search for a mask that matches the requested string

2014-04-26 Thread Alan Woodward
Hi, I'm the author of luwak.  I have a half-finished version sitting in a 
branch somewhere that pulls all the intervals-fork-specific code out of the 
library and would run with 4.6.  It would need to be integrated into Solr as 
well, but I have an upcoming project which may well do just that.  Feel free to 
ping me directly!

Alan Woodward
www.flax.co.uk


On 26 Apr 2014, at 03:29, Otis Gospodnetic wrote:

> Luwak is not based on the fork of Lucene or rather, the fork you are seeing
> is there only because the Luwak authors needed highlighting.  If you don't
> need highlighting you can probably modify Luwak a bit to use regular
> Lucene.  The Lucene fork you are seeing there will also, eventually, be
> committed to Lucene trunk and then hopefully backported to 4.x.
> 
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Fri, Apr 25, 2014 at 6:46 PM, Muhammad Gelbana wrote:
> 
>> Luwak is based on a fork of solr\lucene which I cannot use. I have to do
>> this using solr 4.6, whether by writing extra code or not. Thanks.
>> 
>> *-*
>> *Muhammad Gelbana*
>> http://www.linkedin.com/in/mgelbana
>> 
>> 
>> On Sat, Apr 26, 2014 at 12:13 AM, Ahmet Arslan  wrote:
>> 
>>> Hi,
>>> 
>>> You don't need to write code for this. Use luwak (I gave the link in my
>>> first e-mail) instead.
>>> 
>>> If your can't get luwak running because its too complicated etc, see a
>>> similar discussion
>>> 
>>> http://find.searchhub.org/document/9411388c7d2de701#36e50082e918b10c
>>> 
>>> where diy-percolator example pointer is given. It is an example to use
>>> memory index.
>>> 
>>> Ahmet
>>> 
>>> 
>>> 
>>> On Saturday, April 26, 2014 1:05 AM, Muhammad Gelbana <
>> m.gelb...@gmail.com>
>>> wrote:
>>> @Jack, I am ready to write custom code to implement such feature but I
>>> don't know what feature in solr should I extend ? Where should I start ?
>> I
>>> believe it should be a very simple task.
>>> 
>>> @Ahmet, how can I use the class you mentioned ? Is there a tutorial for
>> it
>>> ? I'm not sure how the code in the class's description should work, I've
>>> never extended solr before.
>>> 
>>> Thank you all.
>>> 
>>> *-*
>>> *Muhammad Gelbana*
>>> http://www.linkedin.com/in/mgelbana
>>> 
>>> 
>>> 
>>> On Fri, Apr 25, 2014 at 10:38 PM, Ahmet Arslan 
>> wrote:
>>> 
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> Your use case is different than ad hoc retrieval. Where you have set of
>>>> documents and varying queries.
>>>> 
>>>> In your case it is the reverse, you have a query (string masks) stored
>>>> A?, and incoming documents are percolated against it.
>>>> 
>>>> out of the box Solr does not have support for this today.
>>>> 
>>>> Please see :
>>>> 
>>>> 
>>>> 
>>> 
>> http://lucene.apache.org/core/4_7_2/memory/org/apache/lucene/index/memory/MemoryIndex.html
>>>> 
>>>> By the way wildcard ? matches a single character.
>>>> 
>>>> Ahmet
>>>> 
>>>> 
>>>> On Friday, April 25, 2014 11:02 PM, Muhammad Gelbana <
>>> m.gelb...@gmail.com>
>>>> wrote:
>>>> I have no idea how can this help me. I have been using solr for a few
>>> weeks
>>>> and I'm not familiar with it yet. I'm asking for a very simple task, a
>>> way
>>>> to customize how solr matches a string, does this exist in solr ?
>>>> 
>>>> *-*
>>>> *Muhammad Gelbana*
>>>> http://www.linkedin.com/in/mgelbana
>>>> 
>>>> 
>>>> 
>>>> On Thu, Apr 24, 2014 at 10:09 PM, Ahmet Arslan 
>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Please see : https://github.com/flaxsearch/luwak
>>>>> 
>>>>> Ahmet
>>>>> 
>>>>> 
>>>>> On Thursday, April 24, 2014 8:40 PM, Muhammad Gelbana <
>>>> m.gelb...@gmail.com>
>>>>> wrote:
>>>>> (Please make sure you reply to my address because I didn't subscribe
>> to
>>>>> this mailing list)
>>>>> 
>>>>> I'm using Solr 4.6
>>>>> 
>>>>> I need to store string masks in Solr. By masks, I mean strings that
>> can
>>>>> match other strings.
>>>>> 
>>>>> Then I need to search for masks that match the string I'm providing
>> in
>>> my
>>>>> query. For example, assume the following single-field document stored
>>> in
>>>>> Solr:
>>>>> 
>>>>> {
>>>>>"fieldA": "__A__"
>>>>> }
>>>>> 
>>>>> I need to be able to find this document if I query the fieldA field
>>> with
>>>> a
>>>>> string like *12A34*, as the underscore "*_*" matches a single string.
>>> The
>>>>> single string matching mechanism is my strict goal here, multiple
>>> string
>>>>> matching won't be helpful.
>>>>> 
>>>>> I hope I was clear enough. Please elaborate because I'm not versatile
>>>> with
>>>>> solr and I haven't been using it for too long.
>>>>> Thank you.
>>>>> 
>>>>> *-*
>>>>> *Muhammad Gelbana*
>>>>> http://www.linkedin.com/in/mgelbana
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 



Re: Contribute QParserPlugin

2014-05-28 Thread Alan Woodward
Hi Pawel,

The easiest thing to do is to open a JIRA ticket on the Solr project, here: 
https://issues.apache.org/jira/browse/SOLR, and attach your patch.

Alan Woodward
www.flax.co.uk


On 28 May 2014, at 16:50, Pawel Rog wrote:

> Hi,
> I need QParserPlugin that will use Redis as a backend to prepare filter
> queries. There are several data structures available in Redis (hash, set,
> etc.). From some reasons I cannot fetch data from redis data structures,
> build and send big requests from application. That's why I want to build
> that filters on backend (Solr) side.
> 
> I'm wondering what do I have to do to contribute QParserPlugin into Solr
> repository. Can you suggest me a way (in a few steps) to publish it in Solr
> repository, probably as a contrib?
> 
> --
> Paweł Róg



Re: Percolator feature

2014-05-29 Thread Alan Woodward
Hi,

There's https://github.com/flaxsearch/luwak, which isn't integrated into Solr 
yet, but could be added as a SearchComponent with a bit of work.  It's running 
off a lucene fork at the moment, but I cut a 4.8 branch at Berlin Buzzwords 
which I will push to github later today.

Alan Woodward
www.flax.co.uk


On 28 May 2014, at 21:44, Jorge Luis Betancourt Gonzalez wrote:

> Is there some work around in Solr ecosystem to get something similar to the 
> percolator feature offered by elastic search? 
> 
> Greetings!VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 
> de julio de 2014. Ver www.uci.cu



Re: Solr Replication Issue : Incorrect Base_URL

2014-06-13 Thread Alan Woodward
Hi Pramod,

You need to set hostContext in your solr.xml.  See 
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml

Alan Woodward
www.flax.co.uk


On 13 Jun 2014, at 00:44, pramodEbay wrote:

> Hi,
> I am deploying Solr in a larger web application. The standalone solr
> instance works fine. The path-prefix I use is raptorslrweb. A standalone
> SOLR query to my instance that works is as follows:
> 
> http://hostname:8080/raptorslrweb/solr/reviews/select?q=*%3A*&wt=json&indent=true
> 
> However, when I configure a solr cloud, I get the following error in
> RecoveryStrategy:
> "msg":"org.apache.solr.client.solrj.SolrServerException: Server at
> http://hostname:8080/solr/reviews sent back a redirect (302).",
> 
> The reason is the base_url does not seem to honor the path-prefix.
> clusterstate.json shows the following for the node:
> {"reviews":{
>"shards":{"shard1":{
>"range":null,
>"state":"active",
>"parent":null,
>"replicas":{
>  "core_node1":{
>"state":"down",
>   * "base_url":"http://hostname:8080/solr",*   
>"core":"reviews",
>"node_name":"10.98.63.98:8080_solr"},
> 
> Can someone please tell me where do I tell zookeeper or solr cloud that the
> base url should be hostname:8080/raptorslrweb/solr and not
> hostname:8080/solr.
> 
> Thanks,
> Pramod
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Replication-Issue-Incorrect-Base-URL-tp4141537.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Simple (?) zookeeper question

2013-11-01 Thread Alan Woodward
Unknown document router errors are usually caused by using different solr and 
solrj versions - which version of solr and solrj are you using?

Alan Woodward
www.flax.co.uk


On 1 Nov 2013, at 04:19, Jack Park wrote:

> After digging deeper (slow for a *nix newbee), I uncovered issues with
> the java installation. A step in installation of Oracle Java has it
> that you -install "java" with the path to /bin/java. That done,
> zookeeper seems to be running.
> 
> I booted three cores (on the same box) -- this is the simple one-box
> 3-node cloud test, and used the test code from the Lucidworks course
> to send over and read some documents. That failed with this:
> Unknown document router '{name=compositeId}'
> 
> Lots more research.
> Closer...
> 
> On Thu, Oct 31, 2013 at 5:44 PM, Jack Park  wrote:
>> Latest zookeeper is installed on an Ubuntu server box.
>> Java is 1.7 latest build.
>> whereis points to java just fine.
>> /etc/zookeeper is empty.
>> 
>> boot zookeeper from /bin as sudo ./zkServer.sh start
>> Console says "Started"
>> /etc/zookeeper now has a .pid file
>> In another console, ./zkServer.sh status returns:
>> "It's probably not running"
>> 
>> An interesting fact: the log4j.properties file says there should be a
>> zookeeper.log file in "."; there is no log file. When I do a text
>> search in the zookeeper source code for where it picks up the
>> log4j.properties, nothing is found.
>> 
>> Fascinating, what?  This must be a common beginner's question, not
>> well covered in web-search for my context. Does it ring any bells?
>> 
>> Many thanks.
>> Jack



Re: core swap duplicates core entries in solr.xml

2013-11-09 Thread Alan Woodward
Hi Jeremy,

Could you open a JIRA ticket for this?

Thanks,

Alan Woodward
www.flax.co.uk


On 8 Nov 2013, at 21:16, Branham, Jeremy [HR] wrote:

> When performing  a core swap in SOLR 4.5.1 with persistence on, the two core 
> entries that were swapped are duplicated.
> 
> Solr.xml
> 
> 
> 
>  
> instanceDir="/data/v8p/solr/root/" name="howtopolicies" 
> dataDir="/data/v8p/solr/howtopolicies/data"/>
> instanceDir="/data/v8p/solr/root/" name="wdsc" 
> dataDir="/data/v8p/solr/wdsc/data"/>
> instanceDir="/data/v8p/solr/root/" name="other" 
> dataDir="/data/v8p/solr/other/data"/>
> instanceDir="/data/v8p/solr/root/" name="psd" 
> dataDir="/data/v8p/solr/psd/data"/>
> instanceDir="/data/v8p/solr/root/" name="nat" 
> dataDir="/data/v8p/solr/nat/data"/>
> instanceDir="/data/v8p/solr/root/" name="wdsc2" 
> dataDir="/data/v8p/solr/wdsc2/data"/>
> instanceDir="/data/v8p/solr/root/" name="kms2" 
> dataDir="/data/v8p/solr/kms/data"/>
> instanceDir="/data/v8p/solr/root/" name="howtotools" 
> dataDir="/data/v8p/solr/howtotools/data"/>
> instanceDir="/data/v8p/solr/root/" name="ewts" 
> dataDir="/data/v8p/solr/ewts/data"/>
> instanceDir="/data/v8p/solr/root/" name="wdsr" 
> dataDir="/data/v8p/solr/wdsr/data"/>
> instanceDir="/data/v8p/solr/root/" name="wdsr2" 
> dataDir="/data/v8p/solr/wdsr2/data"/>
> instanceDir="/data/v8p/solr/root/" name="ce" 
> dataDir="/data/v8p/solr/ce/data"/>
> instanceDir="/data/v8p/solr/root/" name="sp2" 
> dataDir="/data/v8p/solr/sp2/data"/>
> instanceDir="/data/v8p/solr/root/" name="terms" 
> dataDir="/data/v8p/solr/terms/data"/>
> instanceDir="/data/v8p/solr/root/" name="tools" 
> dataDir="/data/v8p/solr/tools/data"/>
> instanceDir="/data/v8p/solr/root/" name="kms" 
> dataDir="/data/v8p/solr/kms2/data"/>
> instanceDir="/data/v8p/solr/root/" name="wdsp" 
> dataDir="/data/v8p/solr/wdsp2/data"/>
> instanceDir="/data/v8p/solr/root/" name="wdsp2" 
> dataDir="/data/v8p/solr/wdsp/data"/>
>  
> 
> 
> 
> Performed swap -
> 
> 
> 
>  
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/ce/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/ewts/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/howtopolicies/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/howtotools/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/kms/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/kms2/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/nat/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/other/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/psd/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/sp2/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/terms/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/tools/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/wdsc/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/wdsc2/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/wdsp2/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/wdsp/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/wdsr/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/wdsr2/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/kms/data"/>
> schema="/data/v8p/solr/root/conf/schema.xml" 
> dataDir="/data/v8p/solr/kms2/data"/>
> 
> 
> 
> 
> 
> 
> Jeremy D. Branham
> Performance Technologist II
> Sprint University Performance Support
> Fort Worth, TX | Tel: **DOTNET
> Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627
> http://JeremyBranham.Wordpress.com<http://jeremybranham.wordpress.com/>
> http://www.linkedin.com/in/jeremybranham
> 
> 
> 
> 
> This e-mail may contain Sprint proprietary information intended for the sole 
> use of the recipient(s). Any use by others is prohibited. If you are not the 
> intended recipient, please contact the sender and delete all copies of the 
> message.



Re: SolrCoreAware

2013-11-15 Thread Alan Woodward
Hi Steven,

It's called when the handler is created, either at SolrCore construction time 
(solr startup or core reload) or the first time the handler is requested if 
it's a lazy-loading handler.  

Alan Woodward
www.flax.co.uk


On 15 Nov 2013, at 15:40, Steven Bower wrote:

> Under what circumstances will a handler that implements SolrCoreAware have
> its inform() method called?
> 
> thanks,
> 
> steve



Re: starting up solr automatically

2013-12-05 Thread Alan Woodward
Hi Greg,

It looks as though your script below will bootstrap a collection configuration 
every time Solr is restarted, which probably isn't what you want to do?  You 
only need to upload the config once.

Alan Woodward
www.flax.co.uk


On 4 Dec 2013, at 21:26, Greg Walters wrote:

> I almost forgot, you'll need a file to setup the environment a bit too:
> 
> **
> JAVA_HOME=/usr/java/default
> JAVA_OPTIONS="-Xmx15g \
> -Xms15g \
> -XX:+PrintGCApplicationStoppedTime \
> -XX:+PrintGCDateStamps \
> -XX:+PrintGCDetails \
> -XX:+UseConcMarkSweepGC \
> -XX:+UseParNewGC \
> -XX:+UseTLAB \
> -XX:+CMSParallelRemarkEnabled \
> -XX:+CMSScavengeBeforeRemark \
> -XX:+UseCMSInitiatingOccupancyOnly \
> -XX:CMSInitiatingOccupancyFraction=50 \
> -XX:CMSWaitDuration=30 \
> -XX:GCTimeRatio=40 \
> -Xloggc:/tmp/solr45_gc.log \
> -Dbootstrap_conf=true \
> -Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/
>  \
> -Dcollection.configName=wa-en-collection \
> -DzkHost= \
> -DnumShards= \
> -Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \
> -Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties
>  \
> -Djetty.port=9101 \
> $JAVA_OPTIONS"
> JETTY_HOME=/var/lib/answers/atlascloud/solr45/
> JETTY_USER=tomcat
> JETTY_LOGS=/var/lib/answers/atlascloud/solr45/logs
> **
> 
> On Dec 4, 2013, at 3:21 PM, Greg Walters  wrote:
> 
>> I found the instructions and scripts on that page to be unclear and/or not 
>> work. Here's the script I've been using for solr 4.5.1: 
>> https://gist.github.com/gregwalters/7795791 Do note that you'll have to 
>> change a couple of paths to get things working correctly.
>> 
>> Thanks,
>> Greg
>> 
>> On Dec 4, 2013, at 3:15 PM, Eric Palmer  wrote:
>> 
>>> Hey all,
>>> 
>>> I'm pretty new to solr.  I'm installing it on an amazon linux (rpm based)
>>> ec2 instance and have it running. I even have nutch feeding it pages from
>>> a crawl. I'm very happy about that.
>>> 
>>> I want solr to start on a reboot and am following the instructions at
>>> http://wiki.apache.org/solr/SolrJetty#Starting
>>> 
>>> I'm using solr 4.5.1 and when I check the jetty version I get this
>>> 
>>> java -jar start.jar --version
>>> Active Options: [default, *]
>>> Version Information on 17 entries in the classpath.
>>> Note: order presented here is how they would appear on the classpath.
>>>changes to the OPTIONS=[option,option,...] command line option will
>>> be reflected here.
>>> 0:(dir) | ${jetty.home}/resources
>>> 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
>>> 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
>>> 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
>>> 4: 8.1.10.v20130312 |
>>> ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
>>> 5: 8.1.10.v20130312 |
>>> ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
>>> 6: 8.1.10.v20130312 |
>>> ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
>>> 7: 8.1.10.v20130312 |
>>> ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
>>> 8: 8.1.10.v20130312 |
>>> ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
>>> 9: 8.1.10.v20130312 |
>>> ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
>>> 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
>>> 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
>>> 12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
>>> 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
>>> 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
>>> 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
>>> 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar
>>> 
>>> the instructions reference a jetty.sh script for version 6 and a different
>>> one for 7. Does the version 7 one work with jetty 8? If not where can I get
>>> the one for version 8?
>>> 
>>> BTW - this is just the standard install of solr from the gzip file.
>>> 
>>> thanks in advance for your help.
>>> 
>>> -- 
>>> Eric Palmer
>>> U of Richmond
>> 
> 



Apache Curator integration

2013-12-14 Thread Alan Woodward
Evening all,

I discovered the Apache Curator project yesterday 
(http://curator.apache.org/index.html), which seems to make interaction with 
Zookeeper much easier.  What do people think about using it for SolrCloud?  In 
particular, the LeaderLatch, Barrier and NodeCache recipes would make the 
Overseer and OverseerCollectionProcessor a lot simpler.

Alan Woodward
www.flax.co.uk




Re: Apache Curator integration

2013-12-14 Thread Alan Woodward

On 14 Dec 2013, at 19:58, Mark Miller wrote:

> I’ve looked at it over the years, but honestly, for most things, I don’t 
> think switching would help much other than rewrite code that has been fairly 
> hardened with fresh code that is just likely to introduce new bugs.

Well, that's what our comprehensive and robust test suite is for :-)  I see 
what you mean, although on the flip side, using a third-party library that's 
had lots of testing outside Solr means that it may already pick up corner cases 
that we haven't run in to.

> 
> Targeted use of it for specific things could make sense, but that’s the type 
> of thing I think we should look at a JIRA issue at a time.

Unfortunately I think it'd be all-or-nothing, because we'd have to replace 
SolrZkClient with CuratorFramework whenever we wanted to use a recipe.

> 
> - Mark
> 
> On Dec 14, 2013, at 2:15 PM, Alan Woodward  wrote:
> 
>> Evening all,
>> 
>> I discovered the Apache Curator project yesterday 
>> (http://curator.apache.org/index.html), which seems to make interaction with 
>> Zookeeper much easier.  What do people think about using it for SolrCloud?  
>> In particular, the LeaderLatch, Barrier and NodeCache recipes would make the 
>> Overseer and OverseerCollectionProcessor a lot simpler.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
> 



Re: core.properties and solr.xml

2014-01-14 Thread Alan Woodward
Hi Steve,

I think this is a great idea.  Currently the implementation of CoresLocator is 
picked depending on the type of solr.xml you have (new- vs old-style), but it 
should be easy enough to extend the new-style logic to optionally look up and 
instantiate a plugin implementation.

Core loading and new core creation is all done through the CL now, so as long 
as the plugin implemented all methods, it shouldn't break the Collections API 
either.

Do you want to open a JIRA?

Alan Woodward
www.flax.co.uk


On 14 Jan 2014, at 19:20, Erick Erickson wrote:

> The work done as part of "new style" solr.xml, particularly by
> romsegeek should make this a lot easier. But no, there's no formal
> support for such a thing.
> 
> There's also a desire to make ZK "the one source of truth" in Solr 5,
> although that effort is in early stages.
> 
> Which is a long way of saying that I think this would be a good thing
> to add. Currently there's no formal way to specify one though. We'd
> have to give some thought as to what abstract methods are required.
> The current "old style" and "new style" classes . There's also the
> chicken-and-egg question; how does one specify the new class? This
> seems like something that would be in a (very small) solr.xml or
> specified as a sysprop. And knowing where to load the class from could
> be "interesting".
> 
> A pluggable SolrConfig I think is a stickier wicket, it hasn't been
> broken out into nice interfaces like coreslocator has been. And it's
> used all over the place, passed in and recorded in constructors etc,
> as well as being possibly unique for each core. There's been some talk
> of sharing a single config object, and there's also talk about using
> "config sets" that might address some of those concerns, but neither
> one has gotten very far in 4x land.
> 
> FWIW,
> Erick
> 
> On Tue, Jan 14, 2014 at 1:41 PM, Steven Bower  wrote:
>> Are there any plans/tickets to allow for pluggable SolrConf and
>> CoreLocator? In my use case my solr.xml is totally static, i have a
>> separate dataDir and my core.properties are derived from a separate
>> configuration (living in ZK) but totally outside of the SolrCloud..
>> 
>> I'd like to be able to not have any instance directories and/or no solr.xml
>> or core.properties files laying around as right now I just regenerate them
>> on startup each time in my start scripts..
>> 
>> Obviously I can just hack my stuff in and clearly this could break the
>> write side of the collections API (which i don't care about for my case)...
>> but having a way to plug these would be nice..
>> 
>> steve



Re: core.properties and solr.xml

2014-01-15 Thread Alan Woodward
I think solr.xml is the correct place for it, and you can then set up 
substitution variables to allow it to be set by environment variables, etc.  
But let's discuss on the JIRA ticket.

Alan Woodward
www.flax.co.uk


On 15 Jan 2014, at 15:39, Steven Bower wrote:

> I will open up a JIRA... I'm more concerned over the core locator stuff vs
> the solr.xml.. Should the specification of the core locator go into the
> solr.xml or via some other method?
> 
> steve
> 
> 
> On Tue, Jan 14, 2014 at 5:06 PM, Alan Woodward  wrote:
> 
>> Hi Steve,
>> 
>> I think this is a great idea.  Currently the implementation of
>> CoresLocator is picked depending on the type of solr.xml you have (new- vs
>> old-style), but it should be easy enough to extend the new-style logic to
>> optionally look up and instantiate a plugin implementation.
>> 
>> Core loading and new core creation is all done through the CL now, so as
>> long as the plugin implemented all methods, it shouldn't break the
>> Collections API either.
>> 
>> Do you want to open a JIRA?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 14 Jan 2014, at 19:20, Erick Erickson wrote:
>> 
>>> The work done as part of "new style" solr.xml, particularly by
>>> romsegeek should make this a lot easier. But no, there's no formal
>>> support for such a thing.
>>> 
>>> There's also a desire to make ZK "the one source of truth" in Solr 5,
>>> although that effort is in early stages.
>>> 
>>> Which is a long way of saying that I think this would be a good thing
>>> to add. Currently there's no formal way to specify one though. We'd
>>> have to give some thought as to what abstract methods are required.
>>> The current "old style" and "new style" classes . There's also the
>>> chicken-and-egg question; how does one specify the new class? This
>>> seems like something that would be in a (very small) solr.xml or
>>> specified as a sysprop. And knowing where to load the class from could
>>> be "interesting".
>>> 
>>> A pluggable SolrConfig I think is a stickier wicket, it hasn't been
>>> broken out into nice interfaces like coreslocator has been. And it's
>>> used all over the place, passed in and recorded in constructors etc,
>>> as well as being possibly unique for each core. There's been some talk
>>> of sharing a single config object, and there's also talk about using
>>> "config sets" that might address some of those concerns, but neither
>>> one has gotten very far in 4x land.
>>> 
>>> FWIW,
>>> Erick
>>> 
>>> On Tue, Jan 14, 2014 at 1:41 PM, Steven Bower 
>> wrote:
>>>> Are there any plans/tickets to allow for pluggable SolrConf and
>>>> CoreLocator? In my use case my solr.xml is totally static, i have a
>>>> separate dataDir and my core.properties are derived from a separate
>>>> configuration (living in ZK) but totally outside of the SolrCloud..
>>>> 
>>>> I'd like to be able to not have any instance directories and/or no
>> solr.xml
>>>> or core.properties files laying around as right now I just regenerate
>> them
>>>> on startup each time in my start scripts..
>>>> 
>>>> Obviously I can just hack my stuff in and clearly this could break the
>>>> write side of the collections API (which i don't care about for my
>> case)...
>>>> but having a way to plug these would be nice..
>>>> 
>>>> steve
>> 
>> 



Re: core.properties and solr.xml

2014-01-15 Thread Alan Woodward
This is true.  But if we slap big "warning: experimental" messages all over it, 
then users can't complain too much about backwards-compat breaks.  My intention 
when pulling all this stuff into the CoresLocator interface was to allow other 
implementations to be tested out, and other suggestions have already come up 
from time to time on the list.  It seems a shame to *not* allow this to be 
opened up for advanced users.

Alan Woodward
www.flax.co.uk


On 15 Jan 2014, at 16:24, Mark Miller wrote:

> I think these API’s are pretty new and deep to want to support them for users 
> at this point. It constrains refactoring and can complicates things down the 
> line, especially with SolrCloud. This same discussion has come up in JIRA 
> issues before. At best, I think all the recent refactoring in this area needs 
> to bake.
> 
> - Mark
> 
> On Jan 15, 2014, at 11:01 AM, Alan Woodward  wrote:
> 
>> I think solr.xml is the correct place for it, and you can then set up 
>> substitution variables to allow it to be set by environment variables, etc.  
>> But let's discuss on the JIRA ticket.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 15 Jan 2014, at 15:39, Steven Bower wrote:
>> 
>>> I will open up a JIRA... I'm more concerned over the core locator stuff vs
>>> the solr.xml.. Should the specification of the core locator go into the
>>> solr.xml or via some other method?
>>> 
>>> steve
>>> 
>>> 
>>> On Tue, Jan 14, 2014 at 5:06 PM, Alan Woodward  wrote:
>>> 
>>>> Hi Steve,
>>>> 
>>>> I think this is a great idea.  Currently the implementation of
>>>> CoresLocator is picked depending on the type of solr.xml you have (new- vs
>>>> old-style), but it should be easy enough to extend the new-style logic to
>>>> optionally look up and instantiate a plugin implementation.
>>>> 
>>>> Core loading and new core creation is all done through the CL now, so as
>>>> long as the plugin implemented all methods, it shouldn't break the
>>>> Collections API either.
>>>> 
>>>> Do you want to open a JIRA?
>>>> 
>>>> Alan Woodward
>>>> www.flax.co.uk
>>>> 
>>>> 
>>>> On 14 Jan 2014, at 19:20, Erick Erickson wrote:
>>>> 
>>>>> The work done as part of "new style" solr.xml, particularly by
>>>>> romsegeek should make this a lot easier. But no, there's no formal
>>>>> support for such a thing.
>>>>> 
>>>>> There's also a desire to make ZK "the one source of truth" in Solr 5,
>>>>> although that effort is in early stages.
>>>>> 
>>>>> Which is a long way of saying that I think this would be a good thing
>>>>> to add. Currently there's no formal way to specify one though. We'd
>>>>> have to give some thought as to what abstract methods are required.
>>>>> The current "old style" and "new style" classes . There's also the
>>>>> chicken-and-egg question; how does one specify the new class? This
>>>>> seems like something that would be in a (very small) solr.xml or
>>>>> specified as a sysprop. And knowing where to load the class from could
>>>>> be "interesting".
>>>>> 
>>>>> A pluggable SolrConfig I think is a stickier wicket, it hasn't been
>>>>> broken out into nice interfaces like coreslocator has been. And it's
>>>>> used all over the place, passed in and recorded in constructors etc,
>>>>> as well as being possibly unique for each core. There's been some talk
>>>>> of sharing a single config object, and there's also talk about using
>>>>> "config sets" that might address some of those concerns, but neither
>>>>> one has gotten very far in 4x land.
>>>>> 
>>>>> FWIW,
>>>>> Erick
>>>>> 
>>>>> On Tue, Jan 14, 2014 at 1:41 PM, Steven Bower 
>>>> wrote:
>>>>>> Are there any plans/tickets to allow for pluggable SolrConf and
>>>>>> CoreLocator? In my use case my solr.xml is totally static, i have a
>>>>>> separate dataDir and my core.properties are derived from a separate
>>>>>> configuration (living in ZK) but totally outside of the SolrCloud..
>>>>>> 
>>>>>> I'd like to be able to not have any instance directories and/or no
>>>> solr.xml
>>>>>> or core.properties files laying around as right now I just regenerate
>>>> them
>>>>>> on startup each time in my start scripts..
>>>>>> 
>>>>>> Obviously I can just hack my stuff in and clearly this could break the
>>>>>> write side of the collections API (which i don't care about for my
>>>> case)...
>>>>>> but having a way to plug these would be nice..
>>>>>> 
>>>>>> steve
>>>> 
>>>> 
>> 
> 



Re: Loading resources from Zookeeper

2014-01-24 Thread Alan Woodward
Hi Ugo,

You can load things from the conf/ directory via SolrResourceLoader, which will 
load either from the filesystem or from zookeeper, depending on whether or not 
you're running in SolrCloud mode.

Alan Woodward
www.flax.co.uk


On 24 Jan 2014, at 16:02, Ugo Matrangolo wrote:

> Hi,
> 
> I'm in the process to move our organization search infrastructure to
> SOLR4/SolrCloud. One of the main point is to centralize our cores
> configuration in Zookeeper in order to roll out changes wout redeploying
> all the nodes in our cluster.
> 
> Unfortunately I have some code (custom indexers extending
> org.apache.solr.handler.dataimport.EntityProcessorBase) that are assuming
> to load resources from the filesystem and this is now a problem given that
> everything under solr.home/core/conf is hosted in Zookeeper.
> 
> My question is : what is the best way to load a resource from Zookeeper
> using SOLR APIs ??
> 
> Regards,
> Ugo



Re: Implementing an alerting feature

2014-01-27 Thread Alan Woodward
There's some documentation in the README on github, and the code itself has 
full javadoc (it's a pretty simple library to use).  You can also watch a 
presentation Charlie and I did in Dublin describing how luwak works and what 
we've used it for: http://www.youtube.com/watch?v=rmRCsrJp2A8

Alan Woodward
www.flax.co.uk


On 27 Jan 2014, at 13:18, Furkan KAMACI wrote:

> Hi Charlie;
> 
> Is there any written documentation that explains your library?
> 
> Thanks;
> Furkan KAMACI
> 
> 
> 2014-01-27 Charlie Hull 
> 
>> On 27/01/2014 08:50, elmerfudd wrote:
>> 
>>> I want to implement an alert service in my solr system.
>>> In the FAST ESP system the service is called Real Time Alerting.
>>> 
>>> The service I'm looking for is:
>>> - a document is fed to solr.
>>> - without the document indexed , a set of queries run on the document
>>> - if the document answers a query - an alert will be sent in near
>>> Real-Time.
>>> 
>> 
>> You might want to take a look at Luwak, a library we built recently for
>> running lots of stored queries in an efficient manner. We use this for
>> media monitoring applications.
>> 
>> https://github.com/flaxsearch/luwak
>> 
>> Cheers
>> 
>> Charlie
>> 
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Implementing-an-alerting-feature-tp4113666.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>> 
>> 
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>> 
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk
>> 



Re: How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Alan Woodward
For a particular collection or core?  There should be a collection.configName 
property specified for the core or collection which tells you which ZK config 
directory is being used.

Alan Woodward
www.flax.co.uk


On 11 Feb 2014, at 11:49, Furkan KAMACI wrote:

> Hi;
> 
> I've written a code that I can update a file to Zookeeper for SlorCloud.
> Currently I have many configurations at Zookeeper for SolrCloud. I want to
> update synonyms.txt file so I should know the currently linked
> configuration (I will update the synonyms.txt file under appropriate
> configuration folder) How can I learn it?
> 
> Thanks;
> Furkan KAMACI



Re: Set up embedded Solr container and cores programmatically to read their configs from the classpath

2014-02-12 Thread Alan Woodward
Hi Robert,

I don't think this is possible at the moment, but I hope to get 
https://issues.apache.org/jira/browse/SOLR-4478 in for Lucene/Solr 4.7, which 
should allow you to inject your own SolrResourceLoader implementation for core 
creation (it sounds as though you want to wrap the core's loader in a 
ClasspathResourceLoader).  You could try applying that patch to your setup and 
see if that helps you out.

Alan Woodward
www.flax.co.uk


On 11 Feb 2014, at 10:41, Robert Krüger wrote:

> Hi,
> 
> I have an application with an embedded Solr instance (and I want to
> keep it embedded) and so far I have been setting up my Solr
> installation programmatically using folder paths to specify where the
> specific container or core configs are.
> 
> I have used the CoreContainer methods createAndLoad and create using
> File arguments and this works fine. However, now I want to change this
> so that all configuration files are loaded from certain locations
> using the classloader but I have not been able to get this to work.
> 
> E.g. I want to have my solr config located in the classpath at
> 
> my/base/package/solr/conf
> 
> and the core configs at
> 
> my/base/package/solr/cores/core1/conf,
> my/base/package/solr/cores/core2/conf
> 
> etc..
> 
> Is this possible at all? Looking through the source code it seems that
> specifying classpath resources in such a qualified way is not
> supported but I may be wrong.
> 
> I could get this to work for the container by supplying my own
> implementation of SolrResourceLoader that allows a base path to be
> specified for the resources to be loaded (I first thought that would
> happen already when specifying instanceDir accordingly but looking at
> the code it does not. for resources loaded through the classloader,
> instanceDir is not prepended). However then I am stuck with the
> loading of the cores' resources as the respective code (see
> org.apache.solr.core.CoreContainer#createFromLocal) instantiates a
> SolResourceLoader internally.
> 
> Thanks for any help with this (be it a clarification that it is not possible).
> 
> Robert



Re: using facet enum et fc in the same query.

2014-09-22 Thread Alan Woodward
You should be able to use f..method=enum

Alan Woodward
www.flax.co.uk


On 22 Sep 2014, at 16:21, jerome.dup...@bnf.fr wrote:

> Hello, 
> 
> I have a solr index (12 M docs, 45Go) with facets, and I'm trying to 
> improve facet queries performances.
> 1/ I tried to use docvalue on facet fields, it didn't work well
> 2/ I tried facet.threads=-1 in my querie, and worked perfectely (from more 
> 15s  to 2s for longest queries)
> 
> 3/ I'm trying to use facet.method=enum. It's supposed to improve the 
> performance for facets fileds with few differents values. (type of 
> documents, things like that)
> 
> My problem is that I don't know if there is a way to specifiy enum method 
> for some  facets (3 to 5000 different values), and fc method the some 
> others (up to 12M different values) and the same query?
> 
> Is it possible with something like MyFacet..facet.method=enum
> 
> ?
> 
> Thanks in advance for the answer.
> 
> ---
> Jérôme Dupont
> Bibliothèque Nationale de France
> Département des Systèmes d'Information
> Tour T3 - Quai François Mauriac
> 75706 Paris Cedex 13
> téléphone: 33 (0)1 53 79 45 40
> e-mail: jerome.dup...@bnf.fr
> ---
> 
> 
> Participez à l'acquisition d'un Trésor national - Le manuscrit royal de 
> François I er Avant d'imprimer, pensez à l'environnement.



Re: Filter cache pollution during sharded edismax queries

2014-09-30 Thread Alan Woodward
A bit of digging show that the extra entries in the filter cache are added when 
getting facets from a distributed search.  Once all the facets have been 
gathered, the co-ordinating node then asks the subnodes for an exact count for 
the final top-N facets, and the path for executing this goes though:
SimpleFacets.getListedTermCounts()
--> SolrIndexSearcher.numDocs()
--> SolrIndexSearcher.getPositiveDocSet()
and this last method caches results in the filter cache.

Maybe these should be using a separate cache?
    
Alan Woodward
www.flax.co.uk


On 30 Sep 2014, at 11:38, Charlie Hull wrote:

> Hi,
> 
> We've just found a very similar issue at a client installation. They have
> around 27 million documents and are faceting on fields with high
> cardinality, and are unhappy with query performance and the server hardware
> necessary to make this performance acceptable. Last night we noticed the
> filter cache had a pretty low hit rate and seemed to be filling up with
> many unexpected items (we were testing with only a *single* actual filter
> query). Diagnosing this with the showItems flag set on the Solr admin
> statistics we could see entries relating to facets, even though we were
> sure we were using the default facet.method=fc setting that should prevent
> filters being constructed. We're thus seeing similar cache pollution to Ken
> and Anca.
> 
> We're trying a different type of cache (LFUCache) now and also may try
> tweaking cache sizes to try and help, as the filter creation seems to be
> something we can't easily get round.
> 
> cheers
> 
> Charlie
> Flax
> www.flax.co.uk
> 
> On 18 October 2013 14:32, Anca Kopetz  wrote:
> 
>> Hi Ken,
>> 
>> Have you managed to find out why these entries were stored into
>> filterCache and if they have an impact on the hit ratio ?
>> We noticed the same problem, there are entries of this type :
>> item_+(+(title:western^10.0 | ... in our filterCache.
>> 
>> Thanks,
>> Anca
>> 
>> 
>> On 07/02/2013 09:01 PM, Ken Krugler wrote:
>> 
>> Hi all,
>> 
>> After upgrading from Solr 3.5 to 4.2.1, I noticed our filterCache hit
>> ratio had dropped significantly.
>> 
>> Previously it was at 95+%, but now it's < 50%.
>> 
>> I enabled recording 100 entries for debugging, and in looking at them it
>> seems that edismax (and faceting) is creating entries for me.
>> 
>> This is in a sharded setup, so it's a distributed search.
>> 
>> If I do a search for the string "bogus text" using edismax on two fields,
>> I get an entry in each of the shard's filter caches that looks like:
>> 
>> item_+(((field1:bogus | field2:bogu) (field1:text | field2:text))~2):
>> 
>> Is this expected?
>> 
>> I have a similar situation happening during faceted search, even though my
>> fields are single-value/untokenized strings, and I'm not using the enum
>> facet method.
>> 
>> But I'll get many, many entries in the filterCache for facet values, and
>> they all look like "item_::"
>> 
>> The net result of the above is that even with a very big filterCache size
>> of 2K, the hit ratio is still only 60%.
>> 
>> Thanks for any insights,
>> 
>> -- Ken
>> 
>> --
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Kelkoo SAS
>> Société par Actions Simplifiée
>> Au capital de € 4.168.964,30
>> Siège social : 8, rue du Sentier 75002 Paris
>> 425 093 069 RCS Paris
>> 
>> Ce message et les pièces jointes sont confidentiels et établis à
>> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
>> destinataire de ce message, merci de le détruire et d'en avertir
>> l'expéditeur.
>> 



Re: Filter cache pollution during sharded edismax queries

2014-09-30 Thread Alan Woodward

> 
>> Once all the facets have been gathered, the co-ordinating node then asks
>> the subnodes for an exact count for the final top-N facets,
> 
> 
> What's the point to refine these counts? I've thought that it make sense
> only for facet.limit ed requests. Is it correct statement? can those who
> suffer from the low performance, just unlimit  facet.limit to avoid that
> distributed hop?

Presumably yes, but if you've got a sufficiently high cardinality field then 
any gains made by missing out the hop will probably be offset by having to 
stream all the return values back again.

Alan


> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 



Re: Need to reindex when changing schema.xml?

2014-10-14 Thread Alan Woodward
You should be able to change it without re-indexing, unless you've enabled 
docValues on that field.  AFAIK docValues are the only persistent data 
structure that is different for single-valued versus multi-valued, everything 
else (UninvertedFields, etc) is built on the fly.

I don't think there's any definitive reference on what requires a re-index, but 
that would be a nice thing to add to the Reference Guide

Alan Woodward
www.flax.co.uk


On 14 Oct 2014, at 08:30, Roger Sindreu wrote:

> Hello
> 
> I hope this question has not been asked many times. I did some research but
> I never could find clearly answered anywhere.
> 
> We have several multivalue fields on a instance with millions of documents
> which only contain a single value. I would like to change it to
> multivalue=false to be able to use grouping and stats on those fields.
> 
> My question is: Can I do it without reindexing?
> 
> Is there any document that says when rebuilding the index is needed versus
> when it is not needed?
> 
> Thanks a lot



Re: indexing errors when storeOffsetsWithPositions="true" in solr 4.9.1

2014-11-05 Thread Alan Woodward
Hi Min,

Do you have the specific bit of text that caused this exception to be thrown?

Alan Woodward
www.flax.co.uk


On 4 Nov 2014, at 23:15, Min L wrote:

> Hi All:
> 
> I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I got
> errors during indexing. I thought LUCENE-5111 has fixed issues with
> WordDelimitedFilter. The error is as below:
> 
> Caused by: java.lang.IllegalArgumentException: startOffset must be
> non-negative, and endOffset must be >= startOffset, and offsets must
> not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for
> field 'description_texts'
>   at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630)
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
> 
> 
> My schema.xml looks like below:
> 
>  indexed="true" storeOffsetsWithPositions="true"/>
> 
> 
> 
>  
> 
>
> 
>
> 
>
> 
> "stemdict_en.txt" />
> 
> "^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
> 
>
> 
> ignoreCase="true" enablePositionIncrements="true" />
> 
> splitOnNumerics="0" catenateWords="1" />
> 
>  
> 
>  
> 
>
> 
>
> 
> ignoreCase="true" enablePositionIncrements="true" />
> 
> splitOnNumerics="0" catenateWords="1" />
> 
> "stemdict_en.txt" />
> 
>
> 
>  
> 
>
> 
> 
> Any help is appreciated.
> 
> 
> Thanks.
> 
> Min



Re: Solrcloud and remote Zookeeper ensemble

2014-11-19 Thread Alan Woodward
> SOLR_ZK_ENSEMBLE=zookeeper1:2181/solr,zookeeper2:2181/solr,zookeeper3:2181/solr

This is the incorrect part, it should be:

> SOLR_ZK_ENSEMBLE=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/solr

The chroot is only appended at the end of the connection string.  Not the way I 
would have done it, but that's how ZK works...

Alan Woodward
www.flax.co.uk


On 19 Nov 2014, at 12:54, Robert Kent wrote:

> Hi,
> 
> I'm experiencing some odd behaviour with Solrcloud and Zookeeper.  I am 
> running Solrcloud on one host and am running three Zookeepers on another 
> three hosts.  The Zookeeper part of things works correctly, I can 
> add/remove/etc nodes from Zookeeper.  I am running, or rather trying to run, 
> Solrcloud on top of Hadoop.  Again, the Hadoop side of things works 
> correctly, I can create/remove/etc dirs/files under Hadoop.
> 
> Unfortunately, the solrctl utility bundled with Solrcloud doesn't appear to 
> work correctly.  Depending on how or where I set the Zookeeper ensemble 
> details I get different results.  My Zookeeper instances are used by other 
> services, so I am trying to force the Solrcloud configuration to be created 
> under /solr - from reading the documentation this appears to be the 
> recommended appraoch.
> 
> I have set the Zookeeper ensemble and Hadoop configuration in 
> /etc/default/solr:
> 
> SOLR_ZK_ENSEMBLE=zookeeper1:2181/solr,zookeeper2:2181/solr,zookeeper3:2181/solr
> SOLR_HDFS_HOME=hdfs://zookeeper1:8020/solr
> SOLR_HDFS_CONFIG=/etc/hadoop/conf
> SOLR_HDFS_HOME=hdfs://3xNodeHA:8020/solr
> 
> If I do not specify any Zookeeper parameters for solrctl it creates it 
> Zookeeper configuration under '/solr,zookeeper2:2181' and under that is 
> creates  '/solr,zookeeper3:2181/solr/configs/my-data'.  This also occurs if I 
> specify --zk zookeeper1:2181/solr,zookeeper2:2181/solr,zookeeper3:2181/solr.  
> I suspect that something somewhere is not treating the SOLR_ZK_ENSEMBLE 
> variable correctly and believes it is a single connection (eg 
> zookeeper1:2181) and the path is /solr,zookeeper2:2181,zookeeper3:2181/solr.
> 
> If I run solrctl with --zk zookeeper1:2181, it creates its configuration 
> under / (eg /solr.xml /configs).
> 
> If I run solrctl with --zk zookeeper1:2181/solr, it creates the configuration 
> under /solr
> 
> 
> If I completely ignore the Zookeeper configuration Solr works correctly, but 
> as I'm using Lily I need Solr's configuration to exist under Zookeeper.
> 
> What am I missing?  How can I specify a multi-node Zookeeper ensemble and 
> have all of the configuration nodes created under /solr?  How do I point 
> Tomcat towards the Solr configuration under /solr?
> 
> If you would like more details, please look at the attachment as this 
> explains what I did at each step and the results of that step.
> 
> 
> I'm using Cloudera's packages throughout.
> 
> thanks
> 
> Rob
> 
> Registered name: In Practice Systems Ltd.
> Registered address: The Bread Factory, 1a Broughton Street, London, SW8 3QJ
> Registered Number: 1788577
> Registered in England
> Visit our Internet Web site at www.inps.co.uk
> The information in this internet email is confidential and is intended solely 
> for the addressee. Access, copying or re-use of information in it by anyone 
> else is not authorised. Any views or opinions presented are solely those of 
> the author and do not necessarily represent those of INPS or any of its 
> affiliates. If you are not the intended recipient please contact  
> is.helpd...@inps.co.uk
> 
> 



Re: leader split-brain at least once a day - need help

2015-01-07 Thread Alan Woodward
I had a similar issue, which was caused by 
https://issues.apache.org/jira/browse/SOLR-6763.  Are you getting long GC 
pauses or similar before the leader mismatches occur?

Alan Woodward
www.flax.co.uk


On 7 Jan 2015, at 10:01, Thomas Lamy wrote:

> Hi there,
> 
> we are running a 3 server cloud serving a dozen 
> single-shard/replicate-everywhere collections. The 2 biggest collections are 
> ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, Tomcat 
> 7.0.56, Oracle Java 1.7.0_72-b14
> 
> 10 of the 12 collections (the small ones) get filled by DIH full-import once 
> a day starting at 1am. The second biggest collection is updated usind DIH 
> delta-import every 10 minutes, the biggest one gets bulk json updates with 
> commits once in 5 minutes.
> 
> On a regular basis, we have a leader information mismatch:
> org.apache.solr.update.processor.DistributedUpdateProcessor; Request says it 
> is coming from leader, but we are the leader
> or the opposite
> org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState 
> says we are the leader, but locally we don't think so
> 
> One of these pop up once a day at around 8am, making either some cores going 
> to "recovery failed" state, or all cores of at least one cloud node into 
> state "gone".
> This started out of the blue about 2 weeks ago, without changes to neither 
> software, data, or client behaviour.
> 
> Most of the time, we get things going again by restarting solr on the current 
> leader node, forcing a new election - can this be triggered while keeping 
> solr (and the caches) up?
> But sometimes this doesn't help, we had an incident last weekend where our 
> admins didn't restart in time, creating millions of entries in 
> /solr/oversser/queue, making zk close the connection, and leader re-elect 
> fails. I had to flush zk, and re-upload collection config to get solr up 
> again (just like in https://gist.github.com/isoboroff/424fcdf63fa760c1d1a7).
> 
> We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections, 1500 
> requests/s) up and running, which does not have these problems since 
> upgrading to 4.10.2.
> 
> 
> Any hints on where to look for a solution?
> 
> Kind regards
> Thomas
> 
> -- 
> Thomas Lamy
> Cytainment AG & Co KG
> Nordkanalstrasse 52
> 20097 Hamburg
> 
> Tel.: +49 (40) 23 706-747
> Fax: +49 (40) 23 706-139
> Sitz und Registergericht Hamburg
> HRA 98121
> HRB 86068
> Ust-ID: DE213009476
> 



Re: Errors using the Embedded Solar Server

2015-01-21 Thread Alan Woodward
That certainly looks like it ought to work.  Is there log output that you could 
show us as well?

Alan Woodward
www.flax.co.uk


On 21 Jan 2015, at 16:09, Carl Roberts wrote:

> Hi,
> 
> I have downloaded the code and documentation for Solr version 4.10.3.
> 
> I am trying to follow SolrJ Wiki guide and I am running into errors.  The 
> latest error is this one:
> 
> Exception in thread "main" org.apache.solr.common.SolrException: No such 
> core: db
>at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112)
>at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
>at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
>at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
>at solr.Test.main(Test.java:39)
> 
> My code is this:
> 
> package solr;
> 
> import java.io.File;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.Collection;
> 
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.solr.core.CoreContainer;
> import org.apache.solr.core.SolrCore;
> 
> 
> public class Test {
>public static void main(String [] args){
>CoreContainer container = new 
> CoreContainer("/Users/carlroberts/dev/solr-4.10.3");
>System.out.println(container.getDefaultCoreName());
>System.out.println(container.getSolrHome());
>container.load();
>System.out.println(container.isLoaded("db"));
>System.out.println(container.getCoreInitFailures());
>Collection cores = container.getCores();
>System.out.println(cores);
>EmbeddedSolrServer server = new EmbeddedSolrServer( container, "db" );
>SolrInputDocument doc1 = new SolrInputDocument();
>doc1.addField( "id", "id1", 1.0f );
>doc1.addField( "name", "doc1", 1.0f );
>doc1.addField( "price", 10 );
>SolrInputDocument doc2 = new SolrInputDocument();
>doc2.addField( "id", "id2", 1.0f );
>doc2.addField( "name", "doc2", 1.0f );
>doc2.addField( "price", 20 );
>Collection docs = new
>ArrayList();
>docs.add( doc1 );
>docs.add( doc2 );
>try{
>server.add( docs );
>server.commit();
>server.deleteByQuery( "*:*" );
>}catch(IOException e){
>e.printStackTrace();
>}catch(SolrServerException e){
>e.printStackTrace();
>}
>}
> }
> 
> 
> My solr.xml file is this:
> 
> 
> 
> 
> 
> 
> 
>  
>
>  
> 
> 
> And my db/conf directory was copied from example/solr/collection/conf 
> directory and it contains the solrconfig.xml file and schema.xml file.
> 
> I have noticed that the documentation that shows how to use the 
> EmbeddedSolarServer is outdated as it indicates I should use 
> CoreContainer.Initializer class which doesn't exist, and container.load(path, 
> file) which also doesn't exist.
> 
> At this point I have no idea why I am getting the No such core error and I 
> have googled it and there seems to be tons of threads showing this error but 
> for different reasons, and I have tried all the suggested resolutions and get 
> nowhere with this.
> 
> Can you please help?
> 
> Regards,
> 
> Joe



Re: Errors using the Embedded Solar Server

2015-01-21 Thread Alan Woodward
Ah, OK, you need to include a logging jar in your classpath - the log4j and 
slf4j-log4j jars in the solr distribution will help here.  Once you've got some 
logging set up, then you should be able to work out what's going wrong!

Alan Woodward
www.flax.co.uk


On 21 Jan 2015, at 16:53, Carl Roberts wrote:

> So far I have not been able to get the logging to work - here is what I get 
> in the console prior to the exception:
> 
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> db
> /Users/carlroberts/dev/solr-4.10.3/
> false
> {}
> []
> /Users/carlroberts/dev/solr-4.10.3/
> 
> 
> On 1/21/15, 11:50 AM, Alan Woodward wrote:
>> That certainly looks like it ought to work.  Is there log output that you 
>> could show us as well?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 21 Jan 2015, at 16:09, Carl Roberts wrote:
>> 
>>> Hi,
>>> 
>>> I have downloaded the code and documentation for Solr version 4.10.3.
>>> 
>>> I am trying to follow SolrJ Wiki guide and I am running into errors.  The 
>>> latest error is this one:
>>> 
>>> Exception in thread "main" org.apache.solr.common.SolrException: No such 
>>> core: db
>>>at 
>>> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112)
>>>at 
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
>>>at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
>>>at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
>>>at solr.Test.main(Test.java:39)
>>> 
>>> My code is this:
>>> 
>>> package solr;
>>> 
>>> import java.io.File;
>>> import java.io.IOException;
>>> import java.util.ArrayList;
>>> import java.util.Collection;
>>> 
>>> import org.apache.solr.client.solrj.SolrServerException;
>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
>>> import org.apache.solr.common.SolrInputDocument;
>>> import org.apache.solr.core.CoreContainer;
>>> import org.apache.solr.core.SolrCore;
>>> 
>>> 
>>> public class Test {
>>>public static void main(String [] args){
>>>CoreContainer container = new 
>>> CoreContainer("/Users/carlroberts/dev/solr-4.10.3");
>>>System.out.println(container.getDefaultCoreName());
>>>System.out.println(container.getSolrHome());
>>>container.load();
>>>System.out.println(container.isLoaded("db"));
>>>System.out.println(container.getCoreInitFailures());
>>>Collection cores = container.getCores();
>>>System.out.println(cores);
>>>EmbeddedSolrServer server = new EmbeddedSolrServer( container, "db" 
>>> );
>>>SolrInputDocument doc1 = new SolrInputDocument();
>>>doc1.addField( "id", "id1", 1.0f );
>>>doc1.addField( "name", "doc1", 1.0f );
>>>doc1.addField( "price", 10 );
>>>SolrInputDocument doc2 = new SolrInputDocument();
>>>doc2.addField( "id", "id2", 1.0f );
>>>doc2.addField( "name", "doc2", 1.0f );
>>>doc2.addField( "price", 20 );
>>>Collection docs = new
>>>ArrayList();
>>>docs.add( doc1 );
>>>docs.add( doc2 );
>>>try{
>>>server.add( docs );
>>>server.commit();
>>>server.deleteByQuery( "*:*" );
>>>}catch(IOException e){
>>>e.printStackTrace();
>>>}catch(SolrServerException e){
>>>e.printStackTrace();
>>>}
>>>}
>>> }
>>> 
>>> 
>>> My solr.xml file is this:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>  
>>>
>>>  
>>> 
>>> 
>>> And my db/conf directory was copied from example/solr/collection/conf 
>>> directory and it contains the solrconfig.xml file and schema.xml file.
>>> 
>>> I have noticed that the documentation that shows how to use the 
>>> EmbeddedSolarServer is outdated as it indicates I should use 
>>> CoreContainer.Initializer class which doesn't exist, and 
>>> container.load(path, file) which also doesn't exist.
>>> 
>>> At this point I have no idea why I am getting the No such core error and I 
>>> have googled it and there seems to be tons of threads showing this error 
>>> but for different reasons, and I have tried all the suggested resolutions 
>>> and get nowhere with this.
>>> 
>>> Can you please help?
>>> 
>>> Regards,
>>> 
>>> Joe
>> 
> 



Re: Errors using the Embedded Solar Server

2015-01-21 Thread Alan Woodward
Aha, I think you're being stung by 
https://issues.apache.org/jira/browse/SOLR-6643.  Which will be fixed in the 
upcoming 5.0 release, or you can patch your system with the patch attached to 
that issue.

Alan Woodward
www.flax.co.uk


On 21 Jan 2015, at 19:44, Carl Roberts wrote:

> Already did.  And the logging gets me no closer to fixing the issue. Here is 
> the logging.
> 
> [main] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader 
> for directory: '/Users/carlroberts/dev/solr-4.10.3/'
> [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/lib/commons-logging-1.2.jar' to 
> classloader
> [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/lib/servlet-api.jar' to classloader
> [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/lib/slf4j-simple-1.7.5.jar' to 
> classloader
> [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration 
> from /Users/carlroberts/dev/solr-4.10.3/solr.xml
> [main] INFO org.apache.solr.core.CoreContainer - New CoreContainer 1727098510
> [main] INFO org.apache.solr.core.CoreContainer - Loading cores into 
> CoreContainer [instanceDir=/Users/carlroberts/dev/solr-4.10.3/]
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting socketTimeout to: 0
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting urlScheme to: null
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting connTimeout to: 0
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting maxConnectionsPerHost to: 20
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting corePoolSize to: 0
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting maximumPoolSize to: 2147483647
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting maxThreadIdleTime to: 5
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting sizeOfQueue to: -1
> [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - 
> Setting fairnessPolicy to: false
> [main] INFO org.apache.solr.update.UpdateShardHandler - Creating 
> UpdateShardHandler HTTP client with params: 
> socketTimeout=0&connTimeout=0&retry=false
> [main] INFO org.apache.solr.logging.LogWatcher - SLF4J impl is 
> org.slf4j.impl.SimpleLoggerFactory
> [main] INFO org.apache.solr.logging.LogWatcher - No LogWatcher configured
> [main] INFO org.apache.solr.core.CoreContainer - Host Name: null
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> new SolrResourceLoader for directory: '/Users/carlroberts/dev/solr-4.10.3/db/'
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrConfig - Adding 
> specified lib dirs to ClassLoader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/aspectjrt-1.6.11.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/bcmail-jdk15-1.45.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/bcprov-jdk15-1.45.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/boilerpipe-1.1.0.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/commons-compress-1.7.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - 
> Adding 
> 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/dom4j-1.6.1.jar'
>  to classloader
> [coreLoadExecutor-5-thread-1] INFO

[ANN] Lucene/SOLR hackday in Cambridge, UK

2013-07-03 Thread Alan Woodward
Hi all,

Flax is running a Lucene/SOLR hack day here in Cambridge on Friday, 26th July, 
with committer and LucidWorks co-founder Grant Ingersoll.  We'll provide the 
venue, some food and the internet - you provide enthusiasm and great ideas for 
hacking!

Details here: 
http://www.meetup.com/Enterprise-Search-Cambridge-UK/events/127351142/.  Places 
are limited, so please book early!

Looking forward to seeing you there,

Alan Woodward
www.flax.co.uk




Re: Simple Moving Average of Query Durations

2013-07-04 Thread Alan Woodward
I started some work on https://issues.apache.org/jira/browse/SOLR-4735, which 
may help here.  Have been pulled away onto other things, but I want to get back 
to it soon.

Alan Woodward
www.flax.co.uk


On 3 Jul 2013, at 23:54, Otis Gospodnetic wrote:

> Hi Jan,
> 
> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue -
> SOLR-1792?
> 
> Otis
> --
> Performance Monitoring -- http://sematext.com/spm
> Solr & ElasticSearch Support -- http://sematext.com/
> 
> 
> 
> 
> On Wed, Jul 3, 2013 at 5:59 PM, Jan Morlock  
> wrote:
>> Hi,
>> 
>> we would like to observe the mean value of the average time per request for
>> the last N (e.g. 20) queries (a.k.a. simple moving average) of our Solr
>> server using Nagios. Does anybody know if such an observable is already
>> implemented.
>> 
>> If not, I think the perfect place for it would be the getStatistics() method
>> inside
>> solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java. Would
>> you agree?
>> 
>> Thank you very much.
>> 
>> Best regards
>> Jan
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Simple-Moving-Average-of-Query-Durations-tp4075312.html
>> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Moving replica from node to node?

2013-07-11 Thread Alan Woodward
And CREATE and UNLOAD are almost exactly the wrong descriptors, because CREATE 
loads up a core that's already there, and UNLOAD can in fact delete it from the 
filesystem…

Alan Woodward
www.flax.co.uk


On 11 Jul 2013, at 20:15, Mark Miller wrote:

> Yeah, though CREATE and UNLOAD end up being kind of funny descriptors.
> You'd think LOAD and UNLOAD or CREATE and DELETE or something...
> 
> 
> On Wed, Jul 10, 2013 at 11:35 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
> 
>> Thanks Mark.  I assume you are referring to using the Core Admin API -
>> CREATE and UNLOAD?
>> 
>> Added https://issues.apache.org/jira/browse/SOLR-5032
>> 
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>> 
>> 
>> 
>> On Mon, Jul 8, 2013 at 10:50 PM, Mark Miller 
>> wrote:
>>> It's simply a sugar method that no one has gotten to yet. I almost have
>> once or twice, but I always have moved onto other things before even
>> starting.
>>> 
>>> It's fairly simple to just start another replica on the TO node and then
>> delete the replica on the FROM node, so not a lot of urgency.
>>> 
>>> - Mark
>>> 
>>> On Jul 8, 2013, at 10:18 PM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Solr(Cloud) currently doesn't have any facility to move a specific
>>>> replica from one node to the other.
>>>> 
>>>> How come?  Is there a technical or philosophical reason, or "just" the
>>>> "24 hours/day reason"?
>>>> 
>>>> Thanks,
>>>> Otis
>>>> --
>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>> Performance Monitoring -- http://sematext.com/spm
>>> 
>> 
> 
> 
> 
> -- 
> - Mark



Re: external file field and fl parameter

2013-07-14 Thread Alan Woodward
Hi Chris,

Try wrapping the field name in a field() function in your fl parameter list, 
like so:
fl=field(eff_field_name)

Alan Woodward
www.flax.co.uk


On 14 Jul 2013, at 18:41, Chris Collins wrote:

> Why would I be re-indexing an external file field? The whole purpose is that 
> its brought in at runtime and not part of the index?
> 
> C
> On Jul 14, 2013, at 10:13 AM, Shawn Heisey  wrote:
> 
>> On 7/14/2013 7:05 AM, Chris Collins wrote:
>>> Yep I did switch on stored=true in the field type.  I was able to confirm a 
>>> few ways that there are values for the eff by two methods:
>>> 
>>> 1) changing desc to asc produced drastically different results.
>>> 
>>> 2) debugging FileFloatSource the following was getting triggered filling 
>>> the vals array:
>>> while ((doc = docsEnum.nextDoc()) != 
>>> DocIdSetIterator.NO_MORE_DOCS)
>>>   {
>>>   vals[doc] = fval;
>>>   }
>>> 
>>> At least by you asking these questions I guess it should work.  I will 
>>> continue dissecting. 
>> 
>> Did you reindex when you changed the schema?  Sorting uses indexed
>> values, not stored values.  The fl parameter requires the stored values.
>> These are separate within the index, and one cannot substitute for the
>> other.  If you didn't reindex, then you won't have the stored values for
>> existing documents.
>> 
>> http://wiki.apache.org/solr/HowToReindex
>> 
>> Thanks,
>> Shawn
>> 
>> 
> 



Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alan Woodward
Hi Robert,

The upcoming 4.4 release should make this a bit easier (you can check out the 
release branch now if you like, or wait a few days for the official version).  
CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as 
constructor parameters, and you can create a ConfigSolr object from a string 
representation of solr.xml using the ConfigSolr.fromString() static method.

Alan Woodward
www.flax.co.uk


On 22 Jul 2013, at 11:41, Robert Krüger wrote:

> Hi,
> 
> I use solr embedded in a desktop app and I want to change it to no
> longer require the configuration for the container and core to be in
> the filesystem but rather be distributed as part of a jar file.
> 
> Could someone kindly point me to the right docs?
> 
> So far my impression is, I need to instantiate CoreContainer with a
> custom SolrResourceLoader with properties parsed via some other API
> but from the javadocs alone I feel a bit lost (why does it have to
> have an instance directory at all?) and googling did not give me many
> results. What would be ideal would be to have something like this
> (pseudocode with partly imagined names, which hopefully illustrates
> what I am trying to achieve):
> 
> ContainerConfig containerConfig =
> ContainerConfigParser.parse();
> CoreContainer  container = new CoreContainer(containerConfig);
> 
> CoreConfig coreConfig = CoreConfigParser.parse(container,  from Classloader>);
> container.register(, coreConfig);
> 
> Ideally I would like to keep XML format to reuse my current solr.xml
> and solrconfig.xml but that is just a nice-to-have.
> 
> Does such a way exist and if so, what are the real API classes and calls to 
> use?
> 
> Thank you in advance,
> 
> Robert



Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alan Woodward
Hi Alex,

I'm not sure I follow - are you trying to create a ConfigSolr object from data 
read in from elsewhere, or trying to export the ConfigSolr object to another 
process?  If you're dealing with solr core java objects, you'll need the solr 
jar and all its dependencies (including solrj).

Alan Woodward
www.flax.co.uk


On 22 Jul 2013, at 15:53, Alexandre Rafalovitch wrote:

> Does it mean that I can easily load Solr configuration as parsed by Solr
> from an external program?
> 
> Because the last time I tried (4.3.1), the number of jars required was
> quite long, including SolrJ jar due to some exception.
> 
> Regards.,
>   Alex
> 
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward  wrote:
> 
>> Hi Robert,
>> 
>> The upcoming 4.4 release should make this a bit easier (you can check out
>> the release branch now if you like, or wait a few days for the official
>> version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
>> object as constructor parameters, and you can create a ConfigSolr object
>> from a string representation of solr.xml using the ConfigSolr.fromString()
>> static method.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 22 Jul 2013, at 11:41, Robert Krüger wrote:
>> 
>>> Hi,
>>> 
>>> I use solr embedded in a desktop app and I want to change it to no
>>> longer require the configuration for the container and core to be in
>>> the filesystem but rather be distributed as part of a jar file.
>>> 
>>> Could someone kindly point me to the right docs?
>>> 
>>> So far my impression is, I need to instantiate CoreContainer with a
>>> custom SolrResourceLoader with properties parsed via some other API
>>> but from the javadocs alone I feel a bit lost (why does it have to
>>> have an instance directory at all?) and googling did not give me many
>>> results. What would be ideal would be to have something like this
>>> (pseudocode with partly imagined names, which hopefully illustrates
>>> what I am trying to achieve):
>>> 
>>> ContainerConfig containerConfig =
>>> ContainerConfigParser.parse();
>>> CoreContainer  container = new CoreContainer(containerConfig);
>>> 
>>> CoreConfig coreConfig = CoreConfigParser.parse(container, >> from Classloader>);
>>> container.register(, coreConfig);
>>> 
>>> Ideally I would like to keep XML format to reuse my current solr.xml
>>> and solrconfig.xml but that is just a nice-to-have.
>>> 
>>> Does such a way exist and if so, what are the real API classes and calls
>> to use?
>>> 
>>> Thank you in advance,
>>> 
>>> Robert
>> 
>> 



Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Alan Woodward
Can you try upgrading to the just-released 4.4?  Solr.xml persistence had all 
kinds of bugs in 4.3, which should have been fixed now.

Alan Woodward
www.flax.co.uk


On 23 Jul 2013, at 13:36, Ali, Saqib wrote:

> Hello all,
> 
> Every time I issue a SPLITSHARD using Collections API, the zkHost attribute
> in the solr.xml goes missing. I have to manually edit the solr.xml to add
> zkHost after every SPLITSHARD.
> 
> Any thoughts on what could be causing this?
> 
> Thanks.



Re: Facet at zappos.com

2013-07-26 Thread Alan Woodward
Hi,

Have a look at the wiki page for multi-select faceting: 
http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams.

Alan Woodward
www.flax.co.uk


On 26 Jul 2013, at 07:23, Ifnu bima wrote:

> Hi,
> 
> I'm currently looking at zappos solr implementation on their website.
> One thing make me curious is how their facet filter works.
> 
> If you see zappos facet filter, there are some facet that allow us
> filter using multiple value, for example size and brands. The
> behaviour allow user to select multiple facet value without removing
> other value in same facet filter. If you compare this behaviour with,
> for example, solr sample /browse handler in solr distribution, it is
> quite different, since it will only allow selection of single facet
> value per facet filter.
> 
> is zappos multiple facet value can be achieved using only
> configuration at solrconfig.xml? or it needs custom code while writing
> solr client?
> 
> thanks and regards
> 
> -- 
> http://ifnubima.org/indo-java-podcast/
> http://project-template.googlecode.com/
> @ifnubima
> 
> regards



Re: EmbeddedSolrServer Solr 4.4.0 bug?

2013-07-31 Thread Alan Woodward
Hi Luis,

You need to call coreContainer.load() after construction for it to load the 
cores.  Previously the CoreContainer(solrHome, configFile) constructor also 
called load(), but this was the only constructor to do that.

I probably need to put something in CHANGES.txt to point this out...

Alan Woodward
www.flax.co.uk


On 31 Jul 2013, at 08:53, Luis Cappa Banda wrote:

> Hello guys,
> 
> Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that
> EmbeddedSolrServer has changed a little the way of construction:
> 
> *Solr 4.1.0 style:*
> 
> CoreContainer coreContainer = new CoreContainer(*solrHome, new
> File(solrHome+"/solr.xml"*));
> EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
> core);
> 
> *Solr 4.4.0 new style:
> *
> 
> CoreContainer coreContainer = new CoreContainer(*solrHome*);
> EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer,
> core);
> 
> 
> However, it's not working. I've got the following solr.xml configuration
> file:
> 
> * hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}"
> zkClientTimeout="${zkClientTimeout:15000}">
> *
> **
> **
> **
> 
> 
> And resources appears to be loaded correctly:
> 
> *2013-07-31 09:46:37,583 47889 [main] INFO  org.apache.solr.core.ConfigSolr
> - Loading container configuration from /opt/solr/solr.xml*
> 
> 
> But when indexing into core with coreName 'core', it throws an Exception:
> 
> *2013-07-31 09:50:49,409 5189 [main] ERROR
> com.buguroo.solr.index.WriteIndex  - No such core: core*
> 
> Or I am sleppy, something that's possible, or there is some kind of bug
> here.
> 
> Best regards,
> 
> -- 
> - Luis Cappa



Re: Solr metrics in Codahale metrics and Graphite?

2013-04-07 Thread Alan Woodward
I've been thinking about how to improve this reporting, especially now that 
metrics-3 (which removes all of the funky thread issues we ran into last time I 
tried to add it to Solr) is close to release.  I think we could go about it as 
follows:

* refactor the existing JMX reporting to use metrics-3.  This would mean 
replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a 
JmxReporter, keeping the existing config logic to determine which JMX server to 
use.  PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 data 
back into SolrMBean format to keep the reporting backwards-compatible.  This 
seems like a lot of work for no visible benefit, but…
* we can then add the ability to define other metrics reporters in 
solrconfig.xml.  There are already reporters for Ganglia and Graphite - you 
just add then to the Solr lib/ directory, configure them in solrconfig, and 
voila - Solr can be monitored using the same devops tools you use to monitor 
everything else.

Does this sound sane?

Alan Woodward
www.flax.co.uk


On 6 Apr 2013, at 20:49, Walter Underwood wrote:

> Wow, that really doesn't help at all, since these seem to only be reported in 
> the stats page. 
> 
> I don't need another non-standard app-specific set of metrics, especially one 
> that needs polling. I need metrics delivered to the common system that we use 
> for all our servers.
> 
> This is also why SPM is not useful for us, sorry Otis.
> 
> Also, there is no time period on these stats. How do you graph the 95th 
> percentile? I know there was a lot of work on these, but they seem really 
> useless to me. I'm picky about metrics, working at Netflix does that to you.
> 
> wunder
> 
> On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
> 
>> In the Jira, but not in the docs. 
>> 
>> It would be nice to have VM stats like GC, too, so we can have common 
>> monitoring and alerting on all our services.
>> 
>> wunder
>> 
>> On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
>> 
>>> It's there! :)
>>> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
>>> 
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> 
>>> On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood  
>>> wrote:
>>>> That sounds great. I'll check out the bug, I didn't see anything in the 
>>>> docs about this. And if I can't find it with a search engine, it probably 
>>>> isn't there.  --wunder
>>>> 
>>>> On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
>>>> 
>>>>> On 3/29/2013 12:07 PM, Walter Underwood wrote:
>>>>>> What are folks using for this?
>>>>> 
>>>>> I don't know that this really answers your question, but Solr 4.1 and
>>>>> later includes a big chunk of codahale metrics internally for request
>>>>> handler statistics - see SOLR-1972.  First we tried including the jar
>>>>> and using the API, but that created thread leak problems, so the source
>>>>> code was added.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
> 
> 
> 
> 



Re: /admin/stats.jsp in SolrCloud

2013-04-10 Thread Alan Woodward
It's under /admin/mbeans.

Alan Woodward
www.flax.co.uk


On 10 Apr 2013, at 20:53, Tim Vaillancourt wrote:

> Hey guys,
> 
> This feels like a silly question already, here goes:
> 
> In SolrCloud it doesn't seem obvious to me where one can grab stats
> regarding caches for a given core using an http call (JSON/XML). Those
> values are available in the web-based app, but I am looking for a http call
> that would return this same data.
> 
> In 3.x this was located at /admin/stats.php, and I used a script to grab
> the data, but in SolrCloud I am unclear and would like to add that to the
> docs below:
> 
> http://wiki.apache.org/solr/SolrCaching#Overview
> http://wiki.apache.org/solr/SolrAdminStats
> 
> Thanks!
> 
> Tim



Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Alan Woodward
Hi Walter, Dmitry,

I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with some 
work-in-progress.  Have a look!

Alan Woodward
www.flax.co.uk


On 23 Apr 2013, at 07:40, Dmitry Kan wrote:

> Hello Walter,
> 
> Have you had a chance to get something working with graphite, codahale and
> solr?
> 
> Has anyone else tried these tools with Solr 3.x family? How much work is it
> to set things up?
> 
> We have tried zabbix in the past. Even though it required lots of up front
> investment on configuration, it looks like a compelling option.
> In the meantime, we are looking into something more "solr-tailed" yet
> simple. Even without metrics persistence. Tried: jconsole and viewing stats
> via jmx. Main point for us now is to gather the RAM usage.
> 
> Dmitry
> 
> 
> On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wrote:
> 
>> If it isn't obvious, I'm glad to help test a patch for this. We can run a
>> simulated production load in dev and report to our metrics server.
>> 
>> wunder
>> 
>> On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
>> 
>>> That approach sounds great. --wunder
>>> 
>>> On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
>>> 
>>>> I've been thinking about how to improve this reporting, especially now
>> that metrics-3 (which removes all of the funky thread issues we ran into
>> last time I tried to add it to Solr) is close to release.  I think we could
>> go about it as follows:
>>>> 
>>>> * refactor the existing JMX reporting to use metrics-3.  This would
>> mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
>> adding a JmxReporter, keeping the existing config logic to determine which
>> JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
>> the metrics-3 data back into SolrMBean format to keep the reporting
>> backwards-compatible.  This seems like a lot of work for no visible
>> benefit, but…
>>>> * we can then add the ability to define other metrics reporters in
>> solrconfig.xml.  There are already reporters for Ganglia and Graphite - you
>> just add then to the Solr lib/ directory, configure them in solrconfig, and
>> voila - Solr can be monitored using the same devops tools you use to
>> monitor everything else.
>>>> 
>>>> Does this sound sane?
>>>> 
>>>> Alan Woodward
>>>> www.flax.co.uk
>>>> 
>>>> 
>>>> On 6 Apr 2013, at 20:49, Walter Underwood wrote:
>>>> 
>>>>> Wow, that really doesn't help at all, since these seem to only be
>> reported in the stats page.
>>>>> 
>>>>> I don't need another non-standard app-specific set of metrics,
>> especially one that needs polling. I need metrics delivered to the common
>> system that we use for all our servers.
>>>>> 
>>>>> This is also why SPM is not useful for us, sorry Otis.
>>>>> 
>>>>> Also, there is no time period on these stats. How do you graph the
>> 95th percentile? I know there was a lot of work on these, but they seem
>> really useless to me. I'm picky about metrics, working at Netflix does that
>> to you.
>>>>> 
>>>>> wunder
>>>>> 
>>>>> On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
>>>>> 
>>>>>> In the Jira, but not in the docs.
>>>>>> 
>>>>>> It would be nice to have VM stats like GC, too, so we can have common
>> monitoring and alerting on all our services.
>>>>>> 
>>>>>> wunder
>>>>>> 
>>>>>> On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
>>>>>> 
>>>>>>> It's there! :)
>>>>>>> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
>>>>>>> 
>>>>>>> Otis
>>>>>>> --
>>>>>>> Solr & ElasticSearch Support
>>>>>>> http://sematext.com/
>>>>>>> 
>>>>>>> On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood <
>> wun...@wunderwood.org> wrote:
>>>>>>>> That sounds great. I'll check out the bug, I didn't see anything in
>> the docs about this. And if I can't find it with a search engine, it
>> probably isn't there.  --wunder
>>>>>>>> 
>>>>>>>> On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
>>>>>>>> 
>>>>>>>>> On 3/29/2013 12:07 PM, Walter Underwood wrote:
>>>>>>>>>> What are folks using for this?
>>>>>>>>> 
>>>>>>>>> I don't know that this really answers your question, but Solr 4.1
>> and
>>>>>>>>> later includes a big chunk of codahale metrics internally for
>> request
>>>>>>>>> handler statistics - see SOLR-1972.  First we tried including the
>> jar
>>>>>>>>> and using the API, but that created thread leak problems, so the
>> source
>>>>>>>>> code was added.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Shawn
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> 
>>> 
>>> 
>> 
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>> 
>> 
>> 
>> 



Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Alan Woodward
This is on top of trunk at the moment, but would be back ported to 4.4 if there 
was interest.

Alan Woodward
www.flax.co.uk


On 25 Apr 2013, at 10:32, Dmitry Kan wrote:

> Hi Alan,
> 
> Great! What is the solr version you are patching?
> 
> Speaking of graphite, we have set it up recently to monitor our shard farm.
> So far since the RAM usage has been most important metric we were fine with
> pidstat command and a little script generating stats for carbon.
> Having some additional stats from SOLR itself would certainly be great to
> have.
> 
> Dmitry
> 
> On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward  wrote:
> 
>> Hi Walter, Dmitry,
>> 
>> I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with
>> some work-in-progress.  Have a look!
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 23 Apr 2013, at 07:40, Dmitry Kan wrote:
>> 
>>> Hello Walter,
>>> 
>>> Have you had a chance to get something working with graphite, codahale
>> and
>>> solr?
>>> 
>>> Has anyone else tried these tools with Solr 3.x family? How much work is
>> it
>>> to set things up?
>>> 
>>> We have tried zabbix in the past. Even though it required lots of up
>> front
>>> investment on configuration, it looks like a compelling option.
>>> In the meantime, we are looking into something more "solr-tailed" yet
>>> simple. Even without metrics persistence. Tried: jconsole and viewing
>> stats
>>> via jmx. Main point for us now is to gather the RAM usage.
>>> 
>>> Dmitry
>>> 
>>> 
>>> On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood >> wrote:
>>> 
>>>> If it isn't obvious, I'm glad to help test a patch for this. We can run
>> a
>>>> simulated production load in dev and report to our metrics server.
>>>> 
>>>> wunder
>>>> 
>>>> On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
>>>> 
>>>>> That approach sounds great. --wunder
>>>>> 
>>>>> On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
>>>>> 
>>>>>> I've been thinking about how to improve this reporting, especially now
>>>> that metrics-3 (which removes all of the funky thread issues we ran into
>>>> last time I tried to add it to Solr) is close to release.  I think we
>> could
>>>> go about it as follows:
>>>>>> 
>>>>>> * refactor the existing JMX reporting to use metrics-3.  This would
>>>> mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
>>>> adding a JmxReporter, keeping the existing config logic to determine
>> which
>>>> JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
>>>> the metrics-3 data back into SolrMBean format to keep the reporting
>>>> backwards-compatible.  This seems like a lot of work for no visible
>>>> benefit, but…
>>>>>> * we can then add the ability to define other metrics reporters in
>>>> solrconfig.xml.  There are already reporters for Ganglia and Graphite -
>> you
>>>> just add then to the Solr lib/ directory, configure them in solrconfig,
>> and
>>>> voila - Solr can be monitored using the same devops tools you use to
>>>> monitor everything else.
>>>>>> 
>>>>>> Does this sound sane?
>>>>>> 
>>>>>> Alan Woodward
>>>>>> www.flax.co.uk
>>>>>> 
>>>>>> 
>>>>>> On 6 Apr 2013, at 20:49, Walter Underwood wrote:
>>>>>> 
>>>>>>> Wow, that really doesn't help at all, since these seem to only be
>>>> reported in the stats page.
>>>>>>> 
>>>>>>> I don't need another non-standard app-specific set of metrics,
>>>> especially one that needs polling. I need metrics delivered to the
>> common
>>>> system that we use for all our servers.
>>>>>>> 
>>>>>>> This is also why SPM is not useful for us, sorry Otis.
>>>>>>> 
>>>>>>> Also, there is no time period on these stats. How do you graph the
>>>> 95th percentile? I know there was a lot of work on these, but they seem
>>>> really useless to me. I'm picky about metrics, working at Netflix does
>> that
>>>> to you.
>>>>>>> 

Re: Unsubscribing from JIRA

2013-05-01 Thread Alan Woodward
Hi MJ,

It looks like you're subscribed to the lucene dev list.  Send an email to 
dev-unsubscr...@lucene.apache.org to get yourself taken off the list.

Alan Woodward
www.flax.co.uk


On 1 May 2013, at 17:25, johnmu...@aol.com wrote:

> Hi,
> 
> 
> Can someone show me how to unsubscribe from JIRA?
> 
> 
> Years ago, I subscribed to JIRA and since then I have been receiving emails 
> from JIRA for all kind of issues: when an issue is created, closed or 
> commented on.  Yes, I looked around and could not figure out how to 
> unsubscribe, but maybe I didn't look hard enough?
> 
> 
> Here is an example email subject line header from JIRA: "[jira] [Commented] 
> (LUCENE-3842) Analyzing Suggester"  I have the same issue from "Jenkins" (and 
> example: "[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1537 - Still 
> Failing").
> 
> 
> Thanks in advance!!!
> 
> 
> -MJ



Re: java.lang.IllegalArgumentException: No enum const class org.apache.lucene.util.Version.LUCENE_43

2013-05-08 Thread Alan Woodward
Hi Roald,

On the ticket, you report the following version information:
solr-spec : 4.2.1.2013.03.26.08.26.55
solr-impl : 4.2.1 1461071 - mark - 2013-03-26 08:26:55
lucene-spec : 4.2.1
lucene-impl : 4.2.1 1461071 - mark - 2013-03-26 08:23:34

This shows that your servlet container is running 4.2.1, not 4.3.  So the 
example solrconfig.xml from 4.3 won't work here.

Alan Woodward
www.flax.co.uk


On 8 May 2013, at 12:52, Roald wrote:

> Hi all,
> 
> I just reported this issue: http://issues.apache.org/jira/browse/SOLR-4800
> 
> java.lang.IllegalArgumentException: No enum const class
> org.apache.lucene.util.Version.LUCENE_43
> 
> solr-4.3.0/example/solr/collection1/conf/solrconfig.xml has
> LUCENE_43
> 
> Which causes:
> 
> SolrCore Initialization Failures
> 
> collection1:
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Could not load config for solrconfig.xml
> 
> From catalina.out :
> 
> SEVERE: Unable to create core: collection1
> org.apache.solr.common.SolrException: Could not load config for
> solrconfig.xml
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:991)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: org.apache.solr.common.SolrException: Invalid luceneMatchVersion
> 'LUCENE_43', valid values are: [LUCENE_30, LUCENE_31, LUCENE_32, LUCENE_33,
> LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_40, LUCENE_41, LUCENE_42,
> LUCENE_CURRENT] or a string in format 'V.V'
> at org.apache.solr.core.Config.parseLuceneVersionString(Config.java:313)
> at org.apache.solr.core.Config.getLuceneVersion(Config.java:298)
> at org.apache.solr.core.SolrConfig.(SolrConfig.java:119)
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:989)
> ... 11 more
> Caused by: java.lang.IllegalArgumentException: No enum const class
> org.apache.lucene.util.Version.LUCENE_43
> at java.lang.Enum.valueOf(Enum.java:214)
> at org.apache.lucene.util.Version.valueOf(Version.java:34)
> at org.apache.lucene.util.Version.parseLeniently(Version.java:133)
> at org.apache.solr.core.Config.parseLuceneVersionString(Config.java:311)
> ... 14 more
> May 7, 2013 9:10:00 PM org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Unable to create core:
> collection1
> at
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: org.apache.solr.common.SolrException: Could not load config for
> solrconfig.xml
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:991)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> ... 10 more
> Caused by: org.apache.solr.common.SolrException: Invalid luceneMatchVersion
> 'LUCENE_43', valid values are: [LUCENE_30, LUCENE_31, LUCENE_32, LUCENE_33,
> LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_40, LUCENE_41, LUCENE_42,
> LUCENE_CURRENT] or a string in format 'V.V'
> at org.apache.solr.core.Config.parseLuceneVersionString(Config.java:313)
> at org.apache.solr.core.Config.getLuceneVersion(Config.java:298)
> at org.apache.solr.core.SolrConfig.(SolrConfig.java:119)
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java

Re: Core admin action "CREATE" fails for existing core

2013-05-23 Thread Alan Woodward
I think the wiki needs to be updated to reflect this?  
http://wiki.apache.org/solr/CoreAdmin

If somebody adds me as an editor (AlanWoodward), I'll do it.

Alan Woodward
www.flax.co.uk


On 23 May 2013, at 16:43, Mark Miller wrote:

> Yes, this did change - it's actually a protection for a previous change 
> though.
> 
> There was a time when you did a core reload by just making a new core with 
> the same name and closing the old core - that is no longer really supported 
> though - the proper way to do this is to use SolrCore#reload, and that has 
> been the case for all of 4.x release if I remember right. I supported making 
> this change to force people who might still be doing what is likely quite a 
> buggy operation to switch to the correct code.
> 
> Sorry about the inconvenience.
> 
> - Mark
> 
> On May 23, 2013, at 10:45 AM, André Widhani  wrote:
> 
>> It seems to me that the behavior of the Core admin action "CREATE" has 
>> changed when going from Solr 4.1 to 4.3.
>> 
>> With 4.1, I could re-configure an existing core (changing path/name to 
>> solrconfig.xml for example). In 4.3, I get an error message:
>> 
>> SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 
>> 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 
>> 'core-tex69b6iom1djrbzmlmg83-index2' already exists.
>> 
>> Is this change intended?
>> 
>> André
>> 
> 



Re: Core admin action "CREATE" fails for existing core

2013-05-23 Thread Alan Woodward
Thanks!

Alan Woodward
www.flax.co.uk


On 23 May 2013, at 17:38, Steve Rowe wrote:

> Alan, I've added AlanWoodward to the Solr AdminGroup page.
> 
> On May 23, 2013, at 12:29 PM, Alan Woodward  wrote:
> 
>> I think the wiki needs to be updated to reflect this?  
>> http://wiki.apache.org/solr/CoreAdmin
>> 
>> If somebody adds me as an editor (AlanWoodward), I'll do it.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 23 May 2013, at 16:43, Mark Miller wrote:
>> 
>>> Yes, this did change - it's actually a protection for a previous change 
>>> though.
>>> 
>>> There was a time when you did a core reload by just making a new core with 
>>> the same name and closing the old core - that is no longer really supported 
>>> though - the proper way to do this is to use SolrCore#reload, and that has 
>>> been the case for all of 4.x release if I remember right. I supported 
>>> making this change to force people who might still be doing what is likely 
>>> quite a buggy operation to switch to the correct code.
>>> 
>>> Sorry about the inconvenience.
>>> 
>>> - Mark
>>> 
>>> On May 23, 2013, at 10:45 AM, André Widhani  
>>> wrote:
>>> 
>>>> It seems to me that the behavior of the Core admin action "CREATE" has 
>>>> changed when going from Solr 4.1 to 4.3.
>>>> 
>>>> With 4.1, I could re-configure an existing core (changing path/name to 
>>>> solrconfig.xml for example). In 4.3, I get an error message:
>>>> 
>>>> SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 
>>>> 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 
>>>> 'core-tex69b6iom1djrbzmlmg83-index2' already exists.
>>>> 
>>>> Is this change intended?
>>>> 
>>>> André
>>>> 
>>> 
>> 
> 



Re: CPU spikes on trunk

2013-01-02 Thread Alan Woodward
Hi Markus,

How recent a check-out from trunk are you running?  I added a bunch of 
statistics recording a few months back which we had to back out over Christmas 
because it was causing memory leaks.

Alan Woodward
a...@flax.co.uk


On 2 Jan 2013, at 15:15, Markus Jelsma wrote:

> Hi,
> 
> We have two clusters running on similar machines equipped with SSD's. One 
> runs a 6 month old trunk check out and another always has a very recent check 
> out. Both sometimes receive a few documents to index. The old cluster 
> actually processes queries.
> 
> We've seen performance differences before, the idle new cluster is always 
> more slow to respond than the old one. Top and other monitoring tools show 
> frequent CPU-spikes even when nothing is going on, CPU usage increases when a 
> proxy starts to admin/ping them.
> 
> Is anyone familiar with this observation? Did i miss something?
> 
> Thanks,
> Markus



Re: Solr 4.0, slow opening searchers

2013-01-11 Thread Alan Woodward
Hi Marcel,

Are you committing data with hard commits or soft commits?  I've seen systems 
where we've inadvertently only used soft commits, which means that the entire 
transaction log has to be re-read on startup, which can take a long time.  Hard 
commits flush indexed data to disk, and make it a lot quicker to restart.

Alan Woodward
a...@flax.co.uk


On 11 Jan 2013, at 13:51, Marcel Bremer wrote:

> Hi,
> 
> We're experiencing slow startup times of searchers in Solr when containing a 
> large number of documents.
> 
> We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, 
> spread across 9 cores. These documents contain keywords, with additional 
> statistics, which we are using for suggestions and related keywords. When we 
> (re)start Solr on one of our servers it can take up to two hours before Solr 
> has opened all of it's searchers and starts accepting connections again. We 
> can't figure out why it takes so long to open those searchers. Also the CPU 
> and memory usage of Solr while opening searchers is not extremely high.
> 
> Are there any known issues or tips someone could give us to speed up opening 
> searchers?
> 
> If you need more details, please ping me.
> 
> 
> Best regards,
> 
> Marcel Bremer
> Vinden.nl BV



Re: Advanced Search Option in Solr corresponding to DtSearch options

2013-02-07 Thread Alan Woodward
Hi Soumyanayan,

We developed a parser that converts dtSearch queries to Lucene queries, with 
some Solr integration - see 
http://www.flax.co.uk/blog/2012/04/24/dtsolr-an-open-source-replacement-for-the-dtsearch-closed-source-search-engine/

At the moment it relies on an unreleased version of Lucene/Solr, because we 
needed to get some extra data from the index that wasn't available in trunk for 
our use case, but it can probably be tweaked to just use vanilla solr.  Feel 
free to contact me for more details!

Alan Woodward
www.flax.co.uk


On 6 Feb 2013, at 16:09, Soumyanayan Kar wrote:

> Hi,
> 
> 
> 
> We are replacing the search and indexing module in an application from
> DtSearch to Solr using solrnet as the .net Solr client library.
> 
> 
> 
> We are relatively new to Solr/Lucene and would need some help/direction to
> understand the more advanced search options in Solr.
> 
> 
> 
> The current application supports the following search options using
> DtSearch:
> 
> 
> 
> 1)Word(s) or phrase
> 
> 2)Exact words or phrases
> 
> 3)Not these words or phrases
> 
> 4)One or more of words("A" OR "B" OR "C")
> 
> 5)Proximity of word with n words of another word
> 
> 6)Numeric range - From - To
> 
> 7)Option
> 
> . Stemming(search* finds searching or searches)
> 
> . Synonym(search& finds seek or look)
> 
> . Fuzzy within n letters(p%arts finds paris)
> 
> . Phonic homonyms(#Smith also finds Smithe and Smythe)
> 
> 
> 
> As an example the search query that gets generated to be posted to DtSearch
> for the below use case:
> 
> 1.   Search Phrase:  generic collection
> 
> 2.   Exact Phrase: linq
> 
> 3.   Not these words: sql
> 
> 4.   One or more of these words:  ICollection or ArrayList or
> Hashtable
> 
> 5.   Proximity:   csharp within
> 4 words of language
> 
> 6.   Options:
> 
> a.  Stemming
> 
> b.  Synonym
> 
> c.   Fuzzy within 2 letters
> 
> d.  Phonic homonyms
> 
> 
> 
> Search Query: generic* collection* generic& collection& #generic #collection
> g%%eneric c%%ollection "linq"  -sql ICollection OR ArrayList OR Hashtable
> csharp w/4 language
> 
> 
> 
> We have been able to do simple searches(singular term search in a file
> content) with highlights with Solr. Now we need to replace these options
> with Solr/Lucene.
> 
> 
> 
> Can anybody provide some directions on what/where should we be looking.
> 
> 
> 
> Thanks & Regards,
> 
> 
> 
> Soumya.
> 
> 
> 
> 
> 



Re: UnionDocsAndPositionsEnum class not found

2013-02-09 Thread Alan Woodward
It's in a bit of a weird position, as it's not defined as an inner class, it's 
defined as a separate top-level class within the same file as MultiPhraseQuery. 
 I vaguely remember this giving me problems on the positions branch a while 
back, but I can't remember how I got it working in the end, sorry :-(

Maybe it should be moved to a top-level class in its own file?

Alan Woodward
www.flax.co.uk


On 9 Feb 2013, at 15:51, Markus Jelsma wrote:

> Yes indeed. It makes little sense, the class is there.
> 
> -Original message-
>> From:Mark Miller 
>> Sent: Sat 09-Feb-2013 15:13
>> To: solr-user@lucene.apache.org
>> Subject: Re: UnionDocsAndPositionsEnum class not found
>> 
>> Looks odd - the supposedly missing class looks like an inner class in 
>> MultiPhraseQuery.
>> 
>> - Mark
>> 
>> On Feb 9, 2013, at 6:19 AM, Markus Jelsma  wrote:
>> 
>>> Any ideas so far? I've not yet found anything that remotely looks like the 
>>> root of the problem so far :)
>>> 
>>> 
>>> -Original message-
>>>> From:Markus Jelsma 
>>>> Sent: Wed 06-Feb-2013 10:23
>>>> To: solr-user@lucene.apache.org
>>>> Subject: UnionDocsAndPositionsEnum class not found
>>>> 
>>>> Hi,
>>>> 
>>>> We're getting the following trace for some Dismax queries that contain 
>>>> non-alphanumerics:
>>>> 
>>>> Feb 6, 2013 10:06:56 AM org.apache.solr.common.SolrException log
>>>> SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
>>>> org/apache/lucene/search/UnionDocsAndPositionsEnum
>>>>   at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:483)
>>>>   at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:290)
>>>>   at 
>>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>>>>   at 
>>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>>>>   at 
>>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>>>   at 
>>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>>>>   at 
>>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>>>   at 
>>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>>>>   at 
>>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>>>>   at 
>>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>>>>   at 
>>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>>>>   at 
>>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>>>>   at 
>>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>>>>   at 
>>>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>>>>   at 
>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>>>>   at org.eclipse.jetty.server.Server.handle(Server.java:365)
>>>>   at 
>>>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>>>>   at 
>>>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>>>>   at 
>>>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
>>>>   at 
>>>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
>>>>   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
>>>>   at 
>>>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>>>>   at 
>>>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>>>>   at 
>>>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>>>>   at 
>>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>>>>   at 
>>>

Re: Faceting on the first part or first letter of values

2013-02-27 Thread Alan Woodward
Hi Teun,

In the past I've done this by creating a separate field that contains just the 
first letter of the original field, and faceting on that.  So say you've got a 
'colour' field with contents [red, yellow, blue], then you'd add a 
'colourPrefix' field with contents [r, y, b] and facet/filter on that.

Alan Woodward
www.flax.co.uk


On 27 Feb 2013, at 07:01, Teun Duynstee wrote:

> What I really miss in the SimpleFaceting component is the ability to get
> facets not of the full term, but grouped by the first letter(s). I wrote a
> Jira issue on this ( https://issues.apache.org/jira/browse/SOLR-4496). I
> also wrote a patch with a rather simplistic first try of an implementation.
> 
> Now that I've had a better look at faceting multi valued fields and the
> inner working of UninvertedField, I see that doing it right is harder than
> I thought, but I'd still like to give it a try.
> 
> So can anyone give me some tips on how to approach this? Should I treat the
> facet on the first letters as a completely independent field? Should I
> build the index based on the index of the complete field? By the nature of
> this kind of facet, you'll always have a fairly limited number of terms.
> 
> Thanks a lot,
> Teun



Re: Group By and Sum

2013-03-18 Thread Alan Woodward
Hi Adam,

Have a look at the stats component: http://wiki.apache.org/solr/StatsComponent. 
 In your case, I think you'd need to add an extra field for your month, and 
then run a query filtered by your date range with stats.field=NetSales, 
stats.field=TransCount, and stats.facet=month.

Make sure you use Solr 4.2 for this, by the way, as it's massively faster - 
I've found stats queries over ~500,000 documents dropping from 60 seconds to 2 
seconds with an upgrade from 4.0 to 4.2.

Alan Woodward
www.flax.co.uk


On 18 Mar 2013, at 16:48, Adam Harris wrote:

> Hello All,
> 
> Pretty stuck here and I am hoping you might be the person to help me out. I 
> am working with SOLR and JSONiq which are totally new to me and doing even 
> the simplest of things is just escaping me. I know SQL pretty well however 
> this simple requirement seems escape me. I'll jump right into it.
> 
> Here is the schema of my Core:
> 
> 
> 
> 
> 
>   
> 
>required="true"/>
> 
>   
> 
>required="true"/>
> 
>required="true"/>
> 
>   
> 
>required="true"/>
> 
>required="true"/>
> 
> 
> 
> 
> 
> I need to group by the month of BusinessDateTime and sum up NetSales and 
> TransCount for a given date range. Now if this were SQL i would just right
> 
> 
> SELECT sum(TransCount), sum(NetSales)
> 
> FROM Core
> 
> WHERE BusinessDateTime BETWEEN '2012/04/01' AND '2013/04/01'
> 
> GROUP BY MONTH(BusinessDateTime)
> 
> But ofcourse nothing is this simple with SOLR and/or JSONiq. I have tried 
> messing around with Facet and Group but they never seem to work the way i 
> want them to. For example here is a query i am currently playing with:
> 
> 
> ?wt=json
> 
> &indent=true
> 
> &q=*:*
> 
> &rows=0
> 
> &facet=true
> 
> &facet.date=BusinessDateTime
> 
> &facet.date.start=2012-02-01T00:00:01Z
> 
> &facet.date.end=2013-02-01T23:59:59Z
> 
> &facet.date.gap=%2B1MONTH
> 
> &group=true
> 
> &group.field=BusinessDateTime
> 
> &group.facet=true
> 
> &group.field=NetSales
> 
> Now the facet is working properly however it is returning the count of the 
> documents however i need the sum of the NetSales and the TransCount fields 
> instead.
> 
> Any help or suggestions would be greatly appreciated.
> 
> Thanks,
> Adam



Re: Testing Solr4 - first impressions and problems

2012-10-15 Thread Alan Woodward
Hi Shawn,

The transaction log is only being used to support near-real-time search at the 
moment, I think, so it sounds like it's surplus to requirements for your 
use-case.  I'd just turn it off.

Alan Woodward
www.romseysoftware.co.uk

On 15 Oct 2012, at 07:04, Shawn Heisey wrote:

> On 10/14/2012 5:45 PM, Erick Erickson wrote:
>> About your second point. Try committing more often with openSearcher
>> set to false.
>> There's a bit here:
>> http://wiki.apache.org/solr/SolrConfigXml
>> 
>> 
>>   1 
>>   15000 
>>   false 
>> 
>> 
>> 
>> That should keep the size of the transaction log down to reasonable levels...
> 
> I have autocommit turned completely off -- both values set to zero.  The DIH 
> import from MySQL, over 12 million rows per shard, is done in one go on all 
> my build cores at once, then I swap cores.  It takes a little over three 
> hours and produces a 22GB index.  I have batchSize set to -1 so that jdbc 
> streams the records.
> 
> When I first set this up back on 1.4.1, I had some kind of severe problem 
> when autocommit was turned on.  I can no longer remember what it caused, but 
> it was a huge showstopper of some kind.
> 
> Thanks,
> Shawn
> 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward
The extra codecs are supplied in a separate jar file now 
(lucene-codecs-4.0.0.jar) - I guess this isn't being packaged into solr.war by 
default?  You should be able to download it here:

http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-codecs/4.0.0/lucene-codecs-4.0.0-javadoc.jar

 and drop it into the lib/ directory.

On 15 Oct 2012, at 00:49, Shawn Heisey wrote:

> On 10/14/2012 3:21 PM, Rafał Kuć wrote:
>> Hello!
>> 
>> Try adding the following to solrconfig.xml:
>> 
>> 
> 
> I did this and got a little further, but still no go.  From what it's saying 
> now, I don't think it will be possible in the current state of branch_4x to 
> use anything but the default.
> 
> SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
> org.apache.lucene.codecs.PostingsFormat with name 'Block' does not exist. You 
> need to add the corresponding JAR file supporting this SPI to your 
> classpath.The current classpath supports the following names: [Lucene40]
> 
> I saw that LUCENE-4446 was applied to branch_4x a few hours ago. I did 'svn 
> up' and rebuilt Solr.  Trying again, it appears to be using Lucene41, which I 
> believe is the Block format.  But when I tried to change the format for my 
> unique key fields to Bloom, that still didn't work.  Is this something I 
> should file an issue on?
> 
> SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
> org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not exist. You 
> need to add
> the corresponding JAR file supporting this SPI to your classpath.The current 
> classpath supports the following names: [Lucene40, Lucene41]
> 
> Thanks,
> Shawn
> 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward

> 
> This should not be required, because I am building from source.  I compiled 
> Solr from lucene-solr source checked out from branch_4x.  I grepped the 
> entire tree for lucene-codec and found nothing.
> 
> It turns out that running 'ant generate-maven-artifacts' created the jar file 
> -- along with a huge number of other jars that I don't need.  It took an 
> extremely long time to run, for a jar that's a little over 300KB.
> 
> I would argue that the codecs jar should be created by compiling a dist 
> target for Solr.  Someone else should determine whether it's appropriate to 
> put it in the .war file, but I think it's important enough to make available 
> without compiling everything in the Lucene universe.

I agree - it looks as though the codecs module wasn't added to the solr build 
when it was split off.  I've created a JIRA ticket 
(https://issues.apache.org/jira/browse/SOLR-3947) and added a patch.

On the error below, I'll have to defer to someone who knows how this actually 
works...

> 
> I put this jar in my lib, and now I get a new error when I try the 
> BloomFilter postingsFormat:
> 
> SEVERE: null:java.lang.UnsupportedOperationException: Error - 
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
> constructed without a choice of PostingsFormat
>at 
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
>at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
>at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>at 
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
>at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
>at 
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
>at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
>at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
>at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
>at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
>at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> 
> 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward
See discussion on https://issues.apache.org/jira/browse/SOLR-3843, this was 
apparently intentional.

That also links to the following: 
http://wiki.apache.org/solr/SolrConfigXml#codecFactory, which suggests you need 
to use solr.SchemaCodecFactory for per-field codecs - this might solve your 
postingsFormat exception.

On 15 Oct 2012, at 18:41, Alan Woodward wrote:

> 
>> 
>> This should not be required, because I am building from source.  I compiled 
>> Solr from lucene-solr source checked out from branch_4x.  I grepped the 
>> entire tree for lucene-codec and found nothing.
>> 
>> It turns out that running 'ant generate-maven-artifacts' created the jar 
>> file -- along with a huge number of other jars that I don't need.  It took 
>> an extremely long time to run, for a jar that's a little over 300KB.
>> 
>> I would argue that the codecs jar should be created by compiling a dist 
>> target for Solr.  Someone else should determine whether it's appropriate to 
>> put it in the .war file, but I think it's important enough to make available 
>> without compiling everything in the Lucene universe.
> 
> I agree - it looks as though the codecs module wasn't added to the solr build 
> when it was split off.  I've created a JIRA ticket 
> (https://issues.apache.org/jira/browse/SOLR-3947) and added a patch.
> 
> On the error below, I'll have to defer to someone who knows how this actually 
> works...
> 
>> 
>> I put this jar in my lib, and now I get a new error when I try the 
>> BloomFilter postingsFormat:
>> 
>> SEVERE: null:java.lang.UnsupportedOperationException: Error - 
>> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
>> constructed without a choice of PostingsFormat
>>   at 
>> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
>>   at 
>> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>>   at 
>> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
>>   at 
>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>>   at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>>   at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>>   at 
>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>>   at 
>> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
>>   at 
>> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>>   at 
>> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
>>   at 
>> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
>>   at 
>> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
>>   at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
>>   at 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
>>   at 
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
>>   at 
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>>   at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
>>   at 
>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>>   at 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>>   at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>> 
>> 
> 



Grouping based on multiple criteria

2012-10-30 Thread Alan Woodward
Hi list,

I'd like to be able to present a list of results which are grouped on a single 
field, but then show various members of each group according to several 
different criteria.  So for example, for e-commerce search, we group at the top 
level by the vendor, but then show the most expensive item, least expensive 
item, most heavily discounted item, etc.

I can't find anything that would let me do this in the current grouping code.  
I'm thinking I'd need to implement a form of TopFieldCollector that maintained 
multiple sort orders that could be used for the second pass collector, but 
there doesn't seem to be anywhere to plug that in easily.

Is there anything already out there that I'm missing, or do I have to do some 
actual work?  :-)

Thanks, Alan

[ANNOUNCE] Apache Solr 7.3.0 released

2018-04-04 Thread Alan Woodward
4th April 2018, Apache Solr™ 7.3.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 7.3.0

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

This release includes the following changes since the 7.2.0 release:

- A new update request processor supports OpenNLP-based entity extraction
and language detection
- Support for automatic time-based collection creation
- Multivalued primitive fields can now be used in sorting
- A new SortableTextField allows both indexing and sorting/faceting on free
text
- Several new stream evaluators
- Improvements around leader-initiated recovery
- New autoscaling features: triggers can perform operations based on any
metric available from the Metrics API, based on a defined schedule, or in
response to a query rate over a 1-minute average. A new screen in the Admin
UI will show suggested autoscaling actions.
- Metrics can now be exported to Prometheus
- {!parent} and {!child} support filtering with exclusions via new local
parameters
- Introducing {!filters} query parser for referencing filter queries and
excluding them
- Support for running Solr with Java 10
- A new contrib/ltr NeuralNetworkModel class

Furthermore, this release includes Apache Lucene 7.3.0 which includes
several changes since the 7.2.0 release

The release is available for immediate download at:

http://www.apache.org/dyn/closer.lua/lucene/solr/7.3.0

Please read CHANGES.txt for a detailed list of changes:

https://lucene.apache.org/solr/7_3_0/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


[ANNOUNCE] Apache Solr 8.5.0 released

2020-03-24 Thread Alan Woodward
## 24 March 2020, Apache Solr™ 8.5.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 8.5.0.

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search, dynamic clustering, database integration, 
rich document handling, and geospatial search. Solr is highly scalable, 
providing fault tolerant distributed search and indexing, and powers the search 
and navigation features of many of the world's largest internet sites.

Solr 8.5.0 is available for immediate download at:

  

### Solr 8.5.0 Release Highlights:

 * A new queries property of JSON Request API let to declare queries in Query 
DSL format and refer to them by their names.
 * A new command line tool bin/postlogs allows you to index Solr logs into a 
Solr collection. This is helpful for log analysis and troubleshooting. 
Documentation is not yet integrated into the Solr Reference Guide, but is 
available in a branch via GitHub: 
https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/logs.adoc.
 * A new stream decorator delete() is available to help solve some issues with 
traditional delete-by-query, which can be expensive in large indexes.
 * Solr now has the ability to run with a Java Security Manager enabled.

Please read CHANGES.txt for a full list of changes:

  

Solr 8.5.0 also includes improvements and bugfixes in the corresponding Apache 
Lucene release:

  



  1   2   >