AW: LatLonPointSpatialField, sorting : sort param could not be parsed as a query, and is not a field that exists in the index

2017-11-02 Thread Clemens Wyss DEV
Sorry for "re-asking". Anybody else facing this issue (bug?), or can anybody 
provide an advice "where to look"?
Thx
Clemens

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Mittwoch, 1. November 2017 11:06
An: 'solr-user@lucene.apache.org' 
Betreff: LatLonPointSpatialField, sorting : sort param could not be parsed as a 
query, and is not a field that exists in the index 

Context: solr 6.6.0

Im switching my schemas from derprecated solr.LatLonType to 
solr.LatLonPointSpatialField. Now my sortquery (which used to work with 
solr.LatLonType):

sort=geodist(b4_location__geo_si,47.36667,8.55) asc

raises the error

"sort param could not be parsed as a query, and is not a field that exists in 
the index: geodist(b4_location__geo_si,47.36667,8.55)"

Invoking sort by 

sfield=b4_location__geo_si&pt=47.36667,8.55&sort=geodist() asc

works as expected though...

Why does "sort=geodict(fld,lat,ln)" no more work?

Thx for any hints advices
Clemens


Re: Automatic creation of indexes

2017-11-02 Thread Jokin C
Oh, nice, this was just what I was looking for, I will follow the issue.
Thanks!

On Wed, Nov 1, 2017 at 3:03 PM, Shawn Heisey  wrote:

> On 10/31/2017 5:32 AM, Jokin Cuadrado wrote:
>
>> Hi, I'm using solr to store time series data, log events etc. Right now I
>> use a solr cloud collection and cleaning it deleting documents via
>> queries,
>> but I would like to know what approaches are other people using.
>> Is there a way to  create a collection when receiving a post to a
>> inexistent inded? So i could use the date as part of the index name, and
>> the cleanup process would be just to delete the old collections.
>>
>
> Solr will not automatically create indexes/collections/shards.
>
> Automatic handling of time-partitioned indexes is something that is being
> worked on by at least one Solr developer.  There is no ETA available.
>
> https://issues.apache.org/jira/browse/SOLR-11299
>
> Emir, your message did not actually include anything related to the
> presentation you mentioned.  There's no URL pointing anywhere.  If you
> included it as an attachment, that's generally something that doesn't work
> on this list -- most attachments are filtered by the list software.
>
>
> Thanks,
> Shawn
>


Re: Automatic creation of indexes

2017-11-02 Thread Jokin C
Nice presentation, the concepts that in it are the reason that I was
searching for this feature.

Thanks!

On Wed, Nov 1, 2017 at 5:12 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> >Emir, your message did not actually include anything related to the
> presentation you mentioned.
> Ups - seems I forgot to paste: https://www.youtube.com/watch?v=1gzwAgrk47c
> 
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 1 Nov 2017, at 15:03, Shawn Heisey  wrote:
> >
> > On 10/31/2017 5:32 AM, Jokin Cuadrado wrote:
> >> Hi, I'm using solr to store time series data, log events etc. Right now
> I
> >> use a solr cloud collection and cleaning it deleting documents via
> queries,
> >> but I would like to know what approaches are other people using.
> >> Is there a way to  create a collection when receiving a post to a
> >> inexistent inded? So i could use the date as part of the index name, and
> >> the cleanup process would be just to delete the old collections.
> >
> > Solr will not automatically create indexes/collections/shards.
> >
> > Automatic handling of time-partitioned indexes is something that is
> being worked on by at least one Solr developer.  There is no ETA available.
> >
> > https://issues.apache.org/jira/browse/SOLR-11299
> >
> > Emir, your message did not actually include anything related to the
> presentation you mentioned.  There's no URL pointing anywhere.  If you
> included it as an attachment, that's generally something that doesn't work
> on this list -- most attachments are filtered by the list software.
> >
> >
> > Thanks,
> > Shawn
>
>


Re: SOLR-11504: Provide a config to restrict number of indexing threads

2017-11-02 Thread Emir Arnautović
Hi Nawab,

> One indexing thread in lucene  corresponds to one segment being written. I 
> need a fine control on the number of segments.

I didn’t check the code, but I would be surprised that it is how things work. 
It can appear that it is working like that if each client thread is doing 
commits. Is that the case?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Nov 2017, at 18:00, Nawab Zada Asad Iqbal  wrote:
> 
> Well, the reason i want to control number of indexing threads is to
> restrict number of "segments" being created at one time in the RAM. One
> indexing thread in lucene  corresponds to one segment being written. I need
> a fine control on the number of segments. Less than that, and I will not be
> fully utilizing my writing capacity. On the other hand, if I have more
> threads, then I will end up a lot more segments of small size, which I will
> need to flush frequently and then merge, and that will cause a different
> kind of problem.
> 
> Your suggestion will require me and other such solr users to create a tight
> coupling between the clients and the Solr servers. My client is not SolrJ
> based. IN a scenario when I am connecting and indexing to Solr remotely, I
> want more requests to be waiting on the solr side so that they start
> writing as soon as an Indexing thread is available, vs waiting on my client
> side - on the other side of the wire.
> 
> Thanks
> Nawab
> 
> On Wed, Nov 1, 2017 at 7:11 AM, Shawn Heisey  wrote:
> 
>> On 10/31/2017 4:57 PM, Nawab Zada Asad Iqbal wrote:
>> 
>>> I hit this issue https://issues.apache.org/jira/browse/SOLR-11504 while
>>> migrating to solr6 and locally working around it in Lucene code. I am
>>> thinking to fix it properly and hopefully patch back to Solr. Since,
>>> Lucene
>>> code does not want to keep any such config, I am thinking to use a
>>> counting
>>> semaphore in Solr code before calling IndexWriter.addDocument(s) or
>>> IndexWriter.updateDocument(s).
>>> 
>> 
>> There's a fairly simple way to control the number of indexing threads that
>> doesn't require ANY changes to Solr:  Don't start as many threads/processes
>> on your indexing client(s).  If you control the number of simultaneous
>> requests sent to Solr, then Solr won't start as many indexing threads.
>> That kind of control over your indexing system is something that's always
>> preferable to have.
>> 
>> Thanks,
>> Shawn
>> 



Re: Advice on Stemming in Solr

2017-11-02 Thread Emir Arnautović
Hi Edwin,
It seems that it would be best if you do not apply *ing stemming rule at all. 
The first idea is to trick stemmer and replace any word that ends with ing to 
some nonexisting char combination e.g. ‘wqx’. You can use 
solr.PatternReplaceFilterFactory to do that. You can switch it back after 
stemming if want to have proper token in index.

HTH,
Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Emir,
> 
> We do have quite alot of words that should not be stemmed. Currently, the
> KStemFilterFactory are stemming all the non-English words that end with
> "ing" as well. There are quite alot of places and names which ends in
> "ing", and all these are being stemmed as well, which leads to an
> inaccurate search.
> 
> Regards,
> Edwin
> 
> 
> On 1 November 2017 at 18:20, Emir Arnautović 
> wrote:
> 
>> Hi Edwin,
>> If the number of words that should not be stemmed is not high you could
>> use KeywordMarkerFilterFactory to flag those words as keywords and it
>> should prevent stemmer from changing them.
>> Depending on what you want to achieve, you might not be able to avoid
>> using stemmer at indexing time. If you want to find documents that contain
>> only “walking” with search term “walk”, then you have to stem at index
>> time. Cases when you use stemming on query time only are rare and specific.
>> If you want to prefer exact matches over stemmed matches, you have to
>> index same content with and without stemming and boost matches on field
>> without stemming.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> We are currently using KStemFilterFactory in Solr, but we found that it
>> is
>>> actually doing stemming on non-English words like "ximenting", which it
>>> stem to "ximent". This is not what we wanted.
>>> 
>>> Another option is to use the HunspellStemFilterFactory, but there are
>> some
>>> English words like "running", walking" that are not being stemmed.
>>> 
>>> Would like to check, is it advisable to use Stemming at index? Or we
>> should
>>> not use Stemming at index time, but at query time, do a search for the
>>> stemmed words as well, like for example, if the user search for
>> "walking",
>>> we will do the search together with "walk", and the actual word of
>> walking
>>> will have higher weightage.
>>> 
>>> I'm currently using Solr 6.5.1.
>>> 
>>> Regards,
>>> Edwin
>> 
>> 



Re: SOLR-11504: Provide a config to restrict number of indexing threads

2017-11-02 Thread Michael McCandless
Actually, it's one lucene segment per *concurrent* indexing thread.

So if you have 10 indexing threads in Lucene at once, then 10 in-memory
segments will be created and will have to be written on refresh/commit.

Elasticsearch uses a bounded thread pool to service all indexing requests,
which I think is a healthy approach.  It shouldn't have to be the client's
job to worry about server side details like this.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Nov 2, 2017 at 5:23 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Nawab,
>
> > One indexing thread in lucene  corresponds to one segment being written.
> I need a fine control on the number of segments.
>
> I didn’t check the code, but I would be surprised that it is how things
> work. It can appear that it is working like that if each client thread is
> doing commits. Is that the case?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 1 Nov 2017, at 18:00, Nawab Zada Asad Iqbal  wrote:
> >
> > Well, the reason i want to control number of indexing threads is to
> > restrict number of "segments" being created at one time in the RAM. One
> > indexing thread in lucene  corresponds to one segment being written. I
> need
> > a fine control on the number of segments. Less than that, and I will not
> be
> > fully utilizing my writing capacity. On the other hand, if I have more
> > threads, then I will end up a lot more segments of small size, which I
> will
> > need to flush frequently and then merge, and that will cause a different
> > kind of problem.
> >
> > Your suggestion will require me and other such solr users to create a
> tight
> > coupling between the clients and the Solr servers. My client is not SolrJ
> > based. IN a scenario when I am connecting and indexing to Solr remotely,
> I
> > want more requests to be waiting on the solr side so that they start
> > writing as soon as an Indexing thread is available, vs waiting on my
> client
> > side - on the other side of the wire.
> >
> > Thanks
> > Nawab
> >
> > On Wed, Nov 1, 2017 at 7:11 AM, Shawn Heisey 
> wrote:
> >
> >> On 10/31/2017 4:57 PM, Nawab Zada Asad Iqbal wrote:
> >>
> >>> I hit this issue https://issues.apache.org/jira/browse/SOLR-11504
> while
> >>> migrating to solr6 and locally working around it in Lucene code. I am
> >>> thinking to fix it properly and hopefully patch back to Solr. Since,
> >>> Lucene
> >>> code does not want to keep any such config, I am thinking to use a
> >>> counting
> >>> semaphore in Solr code before calling IndexWriter.addDocument(s) or
> >>> IndexWriter.updateDocument(s).
> >>>
> >>
> >> There's a fairly simple way to control the number of indexing threads
> that
> >> doesn't require ANY changes to Solr:  Don't start as many
> threads/processes
> >> on your indexing client(s).  If you control the number of simultaneous
> >> requests sent to Solr, then Solr won't start as many indexing threads.
> >> That kind of control over your indexing system is something that's
> always
> >> preferable to have.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


SynonymGraphFilterFactory with edismax

2017-11-02 Thread Amar Raja
Hello,

I have the following field definition:


  







  


And the following two synonym definitions:

kids => boys,girls
metallic => rose gold,metallic

The intent being a user searching for "kids" should get girls or boys
results, but searching for "boys" will not bring back girls results.
Similarly searching for "metallic" should bring back results for either
"metallic" or "rose gold", but the search for "rose gold" should not bring
back "metallic".

Another property I have set is q.op=AND. I.e. "boys tops" should return
where only both terms exist.

The first synonym works well, producing the following dismax query:

(+(+DisjunctionMaxQuery((Synonym(web_name:boi
web_name:girl))~1.0)))/no_coord

However, for the second I get this:

(+(+DisjunctionMaxQuery(+web_name:rose +web_name:gold)
web_name:metal)~2))~1.0)))/no_coord

But for any terms where any of the terms in the RHS have multiple terms, it
seems to want to match both synonyms, so in this case only documents with
both "metallic" and "rose gold" will match.

Any ideas where I am going wrong?


Re: Upgrade path from 5.4.1

2017-11-02 Thread simon
though see SOLR-11078 , which is reporting significant query slowdowns
after converting  *Trie to *Point fields in 7.1, compared with 6.4.2

On Wed, Nov 1, 2017 at 9:06 PM, Yonik Seeley  wrote:

> On Wed, Nov 1, 2017 at 2:36 PM, Erick Erickson 
> wrote:
> > I _always_ prefer to reindex if possible. Additionally, as of Solr 7
> > all the numeric types are deprecated in favor of points-based types
> > which are faster on all fronts and use less memory.
>
> They are a good step forward in genera, and faster for range queries
> (and multiple-dimensions), but looking at the design I'd guess that
> they may be slower for exact-match queries?
> Has anyone tested this?
>
> -Yonik
>


SynonymGraphFilterFactory with edismax

2017-11-02 Thread Amar Raja
Hello,

I have the following field definition:


  







  


And the following two synonym definitions:

kids => boys,girls
metallic => rose gold,metallic

The intent being a user searching for "kids" should get girls or boys
results, but searching for "boys" will not bring back girls results.
Similarly searching for "metallic" should bring back results for either
"metallic" or "rose gold", but the search for "rose gold" should not bring
back "metallic".

Another property I have set is q.op=AND. I.e. "boys tops" should return
where only both terms exist.

The first synonym works well, producing the following dismax query:

(+(+DisjunctionMaxQuery((Synonym(web_name:boi web_name:girl))~1.0)))/no_
coord

However, for the second I get this:

(+(+DisjunctionMaxQuery(+web_name:rose +web_name:gold)
web_name:metal)~2))~1.0)))/no_coord

But for any terms where any of the terms in the RHS have multiple terms, it
seems to want to match both synonyms, so in this case only documents with
both "metallic" and "rose gold" will match.

Any ideas where I am going wrong?


Re: SynonymGraphFilterFactory with edismax

2017-11-02 Thread Steve Rowe
Hi Amar,

What version of Solr are you using?  This looks like a bug that was fixed in 
Solr 6.6.1: .

--
Steve
www.lucidworks.com

> On Nov 2, 2017, at 8:31 AM, Amar Raja  
> wrote:
> 
> Hello,
> 
> I have the following field definition:
> 
> 
>  
>
> ignoreCase="true" expand="true"/>
> words="lang/stopwords_en.txt" />
>
>
> protected="protwords.txt"/>
>
>  
> 
> 
> And the following two synonym definitions:
> 
> kids => boys,girls
> metallic => rose gold,metallic
> 
> The intent being a user searching for "kids" should get girls or boys
> results, but searching for "boys" will not bring back girls results.
> Similarly searching for "metallic" should bring back results for either
> "metallic" or "rose gold", but the search for "rose gold" should not bring
> back "metallic".
> 
> Another property I have set is q.op=AND. I.e. "boys tops" should return
> where only both terms exist.
> 
> The first synonym works well, producing the following dismax query:
> 
> (+(+DisjunctionMaxQuery((Synonym(web_name:boi
> web_name:girl))~1.0)))/no_coord
> 
> However, for the second I get this:
> 
> (+(+DisjunctionMaxQuery(+web_name:rose +web_name:gold)
> web_name:metal)~2))~1.0)))/no_coord
> 
> But for any terms where any of the terms in the RHS have multiple terms, it
> seems to want to match both synonyms, so in this case only documents with
> both "metallic" and "rose gold" will match.
> 
> Any ideas where I am going wrong?



how to ensure that one shard does not get overloaded when we use routing

2017-11-02 Thread Ketan Thanki
Hi,

I have 4 shard and 4 replica and I do Composite document routing for my unique 
field 'Id'  as mentions below.
e.g :  tenants bits use as projectId/2! prefix with Id

how to ensure that one shard does not get overloaded when we use routing

Regards,
Ketan.
Please cast a vote for Asite in the 2017 Construction Computing Awards: Click 
here to Vote

[CC Award Winners!]



Re: SynonymGraphFilterFactory with edismax

2017-11-02 Thread Amar Raja
Thanks Steve,

We have a smoking gun! I am on 6.5.1, and have tested in 7.1 and I don't
see the same issue.

I can't upgrade just yet, however I have found setting mm=1 sorts this out
in my case, giving me the following:

(+(+DisjunctionMaxQueryweb_name:metal (+web_name:rose
+web_name:gold))~1))~1.0)))/no_coord

I am still testing, however it looks positive so far.

One thing I still noticed is the single word synonyms output "Synonym(...)"
within the debug query, but multi-word do not - even in 7.1.0.

Is this an issue? To be honest, I am not sure what it means, just something
I noticed as a difference in the parsed query.

Thanks again for your help, I thought I was going mad for a while.



On 2 November 2017 at 14:38, Steve Rowe  wrote:

> Hi Amar,
>
> What version of Solr are you using?  This looks like a bug that was fixed
> in Solr 6.6.1: .
>
> --
> Steve
> www.lucidworks.com
>
> > On Nov 2, 2017, at 8:31 AM, Amar Raja  thecommercepartnership.com> wrote:
> >
> > Hello,
> >
> > I have the following field definition:
> >
> >  positionIncrementGap="100">
> >  
> >
> > synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> > > words="lang/stopwords_en.txt" />
> >
> >
> > > protected="protwords.txt"/>
> >
> >  
> > 
> >
> > And the following two synonym definitions:
> >
> > kids => boys,girls
> > metallic => rose gold,metallic
> >
> > The intent being a user searching for "kids" should get girls or boys
> > results, but searching for "boys" will not bring back girls results.
> > Similarly searching for "metallic" should bring back results for either
> > "metallic" or "rose gold", but the search for "rose gold" should not
> bring
> > back "metallic".
> >
> > Another property I have set is q.op=AND. I.e. "boys tops" should return
> > where only both terms exist.
> >
> > The first synonym works well, producing the following dismax query:
> >
> > (+(+DisjunctionMaxQuery((Synonym(web_name:boi
> > web_name:girl))~1.0)))/no_coord
> >
> > However, for the second I get this:
> >
> > (+(+DisjunctionMaxQuery(+web_name:rose +web_name:gold)
> > web_name:metal)~2))~1.0)))/no_coord
> >
> > But for any terms where any of the terms in the RHS have multiple terms,
> it
> > seems to want to match both synonyms, so in this case only documents with
> > both "metallic" and "rose gold" will match.
> >
> > Any ideas where I am going wrong?
>
>


ANNOUNCE: Solr Reference Guide for Solr 7.1 released

2017-11-02 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for 7.1 is now available.

This 1,077-page PDF is the definitive guide to using Apache Solr, the
search server built on Lucene.

The PDF Guide can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-7.1.pdf.

It is also available online at https://lucene.apache.org/solr/guide/7_1.

New in this version of the Guide is documentation for the new features
released in Solr 7.1. In addition, we have reorganized the main
sections a bit, adding a new section "Deployment and Operations" where
information for operational management of Solr (such as the location
of major config files, how to go to production, running on HDFS, etc.)
now resides. We intend to add more to this section in future releases.

Regards,
Cassandra


Re: how to ensure that one shard does not get overloaded when we use routing

2017-11-02 Thread Erick Erickson
Well, you have to monitor. That's the down-side to using this type of
routing, you're effectively saying "I know enough about my usage to
predict".

What do you think you're gaining by using this? Putting all docs from
a single org on a subset of your servers reduces some part of the
parallelism you get from sharding. So unless you have a very specific
use case and some data to back it up I wonder why you even want to try
to control it like this ;)


Best,
Erick

On Thu, Nov 2, 2017 at 7:51 AM, Ketan Thanki  wrote:
> Hi,
>
> I have 4 shard and 4 replica and I do Composite document routing for my 
> unique field 'Id'  as mentions below.
> e.g :  tenants bits use as projectId/2! prefix with Id
>
> how to ensure that one shard does not get overloaded when we use routing
>
> Regards,
> Ketan.
> Please cast a vote for Asite in the 2017 Construction Computing Awards: Click 
> here to Vote
>
> [CC Award Winners!]
>


Re: Upgrade path from 5.4.1

2017-11-02 Thread Erick Erickson
Yonik:

Yeah, I was justparroting what had been reported I have no data to
back it up personally. I just saw the JIRA that Simon indicated and it
looks like the statement "which are faster on all fronts and use less
memory" is just flat wrong when it comes to looking up individual
values.

Ya learn somethin' new every day.

On Thu, Nov 2, 2017 at 6:57 AM, simon  wrote:
> though see SOLR-11078 , which is reporting significant query slowdowns
> after converting  *Trie to *Point fields in 7.1, compared with 6.4.2
>
> On Wed, Nov 1, 2017 at 9:06 PM, Yonik Seeley  wrote:
>
>> On Wed, Nov 1, 2017 at 2:36 PM, Erick Erickson 
>> wrote:
>> > I _always_ prefer to reindex if possible. Additionally, as of Solr 7
>> > all the numeric types are deprecated in favor of points-based types
>> > which are faster on all fronts and use less memory.
>>
>> They are a good step forward in genera, and faster for range queries
>> (and multiple-dimensions), but looking at the design I'd guess that
>> they may be slower for exact-match queries?
>> Has anyone tested this?
>>
>> -Yonik
>>


Re: Upgrade path from 5.4.1

2017-11-02 Thread Petersen, Robert (Contr)
Thanks guys! I kind of suspected this would be the best route and I'll move 
forward with a fresh start on 7.x as soon as I can get ops to give me the 
needed machines! 😊


Best

Robi


From: Erick Erickson 
Sent: Thursday, November 2, 2017 8:17:49 AM
To: solr-user
Subject: Re: Upgrade path from 5.4.1

Yonik:

Yeah, I was justparroting what had been reported I have no data to
back it up personally. I just saw the JIRA that Simon indicated and it
looks like the statement "which are faster on all fronts and use less
memory" is just flat wrong when it comes to looking up individual
values.

Ya learn somethin' new every day.

On Thu, Nov 2, 2017 at 6:57 AM, simon  wrote:
> though see SOLR-11078 , which is reporting significant query slowdowns
> after converting  *Trie to *Point fields in 7.1, compared with 6.4.2
>
> On Wed, Nov 1, 2017 at 9:06 PM, Yonik Seeley  wrote:
>
>> On Wed, Nov 1, 2017 at 2:36 PM, Erick Erickson 
>> wrote:
>> > I _always_ prefer to reindex if possible. Additionally, as of Solr 7
>> > all the numeric types are deprecated in favor of points-based types
>> > which are faster on all fronts and use less memory.
>>
>> They are a good step forward in genera, and faster for range queries
>> (and multiple-dimensions), but looking at the design I'd guess that
>> they may be slower for exact-match queries?
>> Has anyone tested this?
>>
>> -Yonik
>>



This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


Anyone have any comments on current solr monitoring favorites?

2017-11-02 Thread Petersen, Robert (Contr)
OK I'm probably going to open a can of worms here...  lol


In the old old days I used PSI probe to monitor solr running on tomcat which 
worked ok on a machine by machine basis.


Later I had a grafana dashboard on top of graphite monitoring which was really 
nice looking but kind of complicated to set up.


Even later I successfully just dropped in a newrelic java agent which had solr 
monitors and a dashboard right out of the box, but it costs money for the full 
tamale.


For basic JVM health and Solr QPS and time percentiles, does anyone have any 
favorites or other alternative suggestions?


Thanks in advance!

Robi



This communication is confidential. Frontier only sends and receives email on 
the basis of the terms set out at http://www.frontier.com/email_disclaimer.


Re: Solr streaming questions

2017-11-02 Thread Webster Homer
This is a new project, and it's requirements are not yet completely
defined. The system we are looking at building is an automated B2B system
where a customer's system calls in with queries and we return products,
skus, pricing and availability to the caller.

As it turns out relevancy will not be an issue for this system as the
queries are all pretty simple, and users won't see or care about the
relevancy of the hits returned. We have a current system, but it fails to
scale, not from the search, but from calls to the pricing systems. We are
currently in  the early stages of designing this new system. It seems
likely that we can use streaming. I'm sure we could make /select work as
well. The fact that streaming supports joins between collections is
potentially useful.

On Wed, Nov 1, 2017 at 11:16 AM, Erick Erickson 
wrote:

> Perhaps if you bothered to explain your use-case we could suggest
> alternatives.
>
> Streaming is built to handle very large result sets in a
> divide-and-conquer manner,
> thus the ability to specify worker nodes each of which handles a
> sub-set of the results.
>
> Partitioning the output streams requires a way to bucket the results
> from multiple sources
> to workers such that all the documents that fall into buckets can be
> routed to the
> same worker. There may be many sources (think shards) and many replicas.
>
> Score is unsuitable for such bucketing. You're simply trying to use
> streaming for
> a use-case it was not designed for.
>
> You have two choices here.
> > use streaming as it was intended,
> > use cursorMark for processing in batches.
>
> Best,
> Erick
>
> On Wed, Nov 1, 2017 at 8:33 AM, Webster Homer 
> wrote:
> > I know that /select supports score. However, I don't want to have to page
> > the results, I want to use stream to stream the results of a search, but
> I
> > cannot sort by the relevancy of the result. This seems like a MAJOR
> deficit
> > for the streaming API
> >
> > /select wants to do paging which in my case I don't want.
> >
> > This all seems fairly arbitrary to me and a questionable limitation for
> > /export, especially since /export has a search facility
> >
> > On Tue, Oct 31, 2017 at 7:46 PM, Joel Bernstein 
> wrote:
> >
> >> It is not possible to use score with the /export handler. The /export
> >> handler currently only supports sorting by fields.
> >>
> >> You can sort by score using the default /select handler.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Tue, Oct 31, 2017 at 1:50 PM, Webster Homer 
> >> wrote:
> >>
> >> > I have a potential use case for solr searching via streaming
> expressions.
> >> > I am currently using solr 6.2.0, but we will soon be upgrading to the
> >> 7.1.0
> >> > version.
> >> >
> >> > I started testing out searching using streaming expressions.
> >> > 1. If I use an alias instead of a collection name it fails. I see that
> >> > there is a Jira, SOLR-7377. Is this fixed in 7.1.0?
> >> >
> >> > 2. If I try to sort the results by score, it gives me an undefined
> field
> >> > error. So it seems that streaming searches must not return values
> ordered
> >> > by relevancy?
> >> > This is a stopper for us if it has not been addressed.
> >> >
> >> > This is my query:
> >> > search(test-catalog-product-170724,defType="edismax",q="
> >> > 7732-18-5",qf="searchmv_cas_number",mm="2<-12%",fl="id_record_spec,
> >> > id_s, score",sort="score desc",qt="/export")
> >> >
> >> > This is the error:
> >> > "EXCEPTION": "java.util.concurrent.ExecutionException:
> >> > java.io.IOException:
> >> > -->
> >> > http://141.247.245.207:8983/solr/test-catalog-product-
> >> > 170724_shard2_replica1/:org.apache.solr.common.SolrException:
> >> > undefined field: \"score\"",
> >> >
> >> > I could not find a Jira for this issue. Is it not possible to retrieve
> >> the
> >> > results ordered relevancy (score desc)?
> >> >
> >> > Seems kind of limiting
> >> >
> >> > --
> >> >
> >> >
> >> > This message and any attachment are confidential and may be
> privileged or
> >> > otherwise protected from disclosure. If you are not the intended
> >> recipient,
> >> > you must not copy this message or attachment or disclose the contents
> to
> >> > any other person. If you have received this transmission in error,
> please
> >> > notify the sender immediately and delete the message and any
> attachment
> >> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> >> > subsidiaries do not accept liability for any omissions or errors in
> this
> >> > message which may arise as a result of E-Mail-transmission or for
> damages
> >> > resulting from any unauthorized changes of the content of this message
> >> and
> >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> >> > subsidiaries do not guarantee that this message is free of viruses and
> >> does
> >> > not accept liability for any damages caused by any virus transmitted
> >> > therewith.
> >> >
> >> > Click http://www.emdgroup.com/disclaimer 

Solr streaming innerJoin doesn't return rows

2017-11-02 Thread Webster Homer
I'm using Solr 6.2.0. I am trying to understand how the streaming api works.

in 6.2 simple expressions seem to behave well. I am having a problem making
the joins work. I don't see errors, but I don't see data either.

Using the Solr Admin Console for testing, this query works:
search(test-catalog-product-170724,
defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_record_spec,
id_s",sort="id_record_spec asc")

As does this:
search(sial-catalog-material-171030,
defType="edismax",q="T1503SIGMA",qf="id_record_spec",fl="id_record_spec,stream_en_s_pri_name,display_cas_number,display_package_size,key_erp_material_number,display_material_qty,display_formula_weight,display_material_uom,key_brand,display_en_name",sort="id_record_spec
asc")

And this works:
innerJoin(
search(sial-catalog-material-171030,
defType="edismax",q="T1503SIGMA",qf="id_record_spec",fl="id_record_spec,stream_en_s_pri_name,display_cas_number,display_package_size,key_erp_material_number,display_material_qty,display_formula_weight,display_material_uom,key_brand,display_en_name",sort="id_record_spec
asc"),
search(test-catalog-product-170724,
defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_record_spec,
id_s",sort="id_record_spec asc"),
on="id_record_spec"
)

but this doesn't throw an error, but it also doesn't return anything.
innerJoin(
search(sial-catalog-material-171030, q=*:*,
fl="id_record_spec,stream_en_s_pri_name,display_cas_number,display_package_size,key_erp_material_number,display_material_qty,display_formula_weight,display_material_uom,key_brand,display_en_name",sort="id_record_spec
asc"),
search(test-catalog-product-170724,
defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_record_spec,
id_s",sort="id_record_spec asc"),
on="id_record_spec"
)

Do we have to  explicitly provide the same query to both searches in the
join? I see examples in the documents that look like my last join.

I also see the same behavior with this:
hashJoin(
search(test-catalog-product-170724,
defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_record_spec,
id_s",sort="id_record_spec asc"),
hashed=search(sial-catalog-material-171030, q=*:*,
fl="id_record_spec,stream_en_s_pri_name,display_cas_number,display_package_size,key_erp_material_number,display_material_qty,display_formula_weight,display_material_uom,key_brand,display_en_name",sort="id_record_spec
asc"),
on="id_record_spec"
)

no errors but no data either. There is data, so what am I doing wrong? I
suspect some user error but am at a loss to understand what it is.

Thanks

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Anyone have any comments on current solr monitoring favorites?

2017-11-02 Thread Walter Underwood
We use New Relic for JVM, CPU, and disk monitoring.

I tried the built-in metrics support in 6.4, but it just didn’t do what we 
want. We want rates and percentiles for each request handler. That gives us 
95th percentile for textbooks suggest or for homework search results page, etc. 
The Solr metrics didn’t do that. The Jetty metrics didn’t do that.

We built a dedicated servlet filter that goes in front of the Solr webapp and 
reports metrics. It has some special hacks to handle some weird behavior in 
SolrJ. A request to the “/srp” handler is sent as “/select?qt=/srp”, so we 
normalize that.

The metrics start with the cluster name, the hostname, and the collection. The 
rest is generated like this:

URL: GET /solr/textbooks/select?q=foo&qt=/auto
Metric: textbooks.GET./auto

URL: GET /solr/textbooks/select?q=foo
Metric: textbooks.GET./select

URL: GET /solr/questions/auto
Metric: questions.GET./auto

So a full metric for the cluster “solr-cloud” and the host “search01" would 
look like “solr-cloud.search01.solr.textbooks.GET./auto.m1_rate”.

We send all that to InfluxDB. We’ve configured a template so that each part of 
the metric name is mapped to a field, so we can write efficient queries in 
InfluxQL.

Metrics are graphed in Grafana. We have dashboards that mix Cloudwatch (for the 
load balancer) and InfluxDB.

I’m still working out the kinks in some of the more complicated queries, but 
the data is all there. I also want to expand the servlet filter to report HTTP 
response codes.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 2, 2017, at 9:30 AM, Petersen, Robert (Contr) 
>  wrote:
> 
> OK I'm probably going to open a can of worms here...  lol
> 
> 
> In the old old days I used PSI probe to monitor solr running on tomcat which 
> worked ok on a machine by machine basis.
> 
> 
> Later I had a grafana dashboard on top of graphite monitoring which was 
> really nice looking but kind of complicated to set up.
> 
> 
> Even later I successfully just dropped in a newrelic java agent which had 
> solr monitors and a dashboard right out of the box, but it costs money for 
> the full tamale.
> 
> 
> For basic JVM health and Solr QPS and time percentiles, does anyone have any 
> favorites or other alternative suggestions?
> 
> 
> Thanks in advance!
> 
> Robi
> 
> 
> 
> This communication is confidential. Frontier only sends and receives email on 
> the basis of the terms set out at http://www.frontier.com/email_disclaimer.



From Zero to Learning to Rank in Apache Solr

2017-11-02 Thread Michael Alcorn
Here's a tutorial I wrote that some of you all might find useful:
https://github.com/airalcorn2/Solr-LTR. Feedback is welcome.

Thanks,
Michael A. Alcorn


Re: Solr streaming innerJoin doesn't return rows

2017-11-02 Thread Joel Bernstein
The joins are MapReduce joins which require shuffling of entire result
sets. This means you need to use the /export handler to make them work.

The joins in general are designed to be done in parallel on large clusters.
You won't be able to get good performance with large joins on a single node
or even a small cluster.

So you'll really need to think about how the joins are designed and whether
they fit your use case.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Nov 2, 2017 at 2:25 PM, Webster Homer 
wrote:

> I'm using Solr 6.2.0. I am trying to understand how the streaming api
> works.
>
> in 6.2 simple expressions seem to behave well. I am having a problem making
> the joins work. I don't see errors, but I don't see data either.
>
> Using the Solr Admin Console for testing, this query works:
> search(test-catalog-product-170724,
> defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> record_spec,
> id_s",sort="id_record_spec asc")
>
> As does this:
> search(sial-catalog-material-171030,
> defType="edismax",q="T1503SIGMA",qf="id_record_spec",fl="id_record_spec,
> stream_en_s_pri_name,display_cas_number,display_package_
> size,key_erp_material_number,display_material_qty,display_
> formula_weight,display_material_uom,key_brand,display_en_name",sort="id_
> record_spec
> asc")
>
> And this works:
> innerJoin(
> search(sial-catalog-material-171030,
> defType="edismax",q="T1503SIGMA",qf="id_record_spec",fl="id_record_spec,
> stream_en_s_pri_name,display_cas_number,display_package_
> size,key_erp_material_number,display_material_qty,display_
> formula_weight,display_material_uom,key_brand,display_en_name",sort="id_
> record_spec
> asc"),
> search(test-catalog-product-170724,
> defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> record_spec,
> id_s",sort="id_record_spec asc"),
> on="id_record_spec"
> )
>
> but this doesn't throw an error, but it also doesn't return anything.
> innerJoin(
> search(sial-catalog-material-171030, q=*:*,
> fl="id_record_spec,stream_en_s_pri_name,display_cas_number,
> display_package_size,key_erp_material_number,display_
> material_qty,display_formula_weight,display_material_uom,
> key_brand,display_en_name",sort="id_record_spec
> asc"),
> search(test-catalog-product-170724,
> defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> record_spec,
> id_s",sort="id_record_spec asc"),
> on="id_record_spec"
> )
>
> Do we have to  explicitly provide the same query to both searches in the
> join? I see examples in the documents that look like my last join.
>
> I also see the same behavior with this:
> hashJoin(
> search(test-catalog-product-170724,
> defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> record_spec,
> id_s",sort="id_record_spec asc"),
> hashed=search(sial-catalog-material-171030, q=*:*,
> fl="id_record_spec,stream_en_s_pri_name,display_cas_number,
> display_package_size,key_erp_material_number,display_
> material_qty,display_formula_weight,display_material_uom,
> key_brand,display_en_name",sort="id_record_spec
> asc"),
> on="id_record_spec"
> )
>
> no errors but no data either. There is data, so what am I doing wrong? I
> suspect some user error but am at a loss to understand what it is.
>
> Thanks
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>


Re: Solr streaming innerJoin doesn't return rows

2017-11-02 Thread Webster Homer
Thank you, that helps a lot. I suspect that we won't use joins, but getting
them to work at all is a plus. However, that does work once I add the
/export to both searches. It doesn't perform all that badly considering
that I am running it on a small solrcloud on an under powered developer's
VM.

On Thu, Nov 2, 2017 at 1:49 PM, Joel Bernstein  wrote:

> The joins are MapReduce joins which require shuffling of entire result
> sets. This means you need to use the /export handler to make them work.
>
> The joins in general are designed to be done in parallel on large clusters.
> You won't be able to get good performance with large joins on a single node
> or even a small cluster.
>
> So you'll really need to think about how the joins are designed and whether
> they fit your use case.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Nov 2, 2017 at 2:25 PM, Webster Homer 
> wrote:
>
> > I'm using Solr 6.2.0. I am trying to understand how the streaming api
> > works.
> >
> > in 6.2 simple expressions seem to behave well. I am having a problem
> making
> > the joins work. I don't see errors, but I don't see data either.
> >
> > Using the Solr Admin Console for testing, this query works:
> > search(test-catalog-product-170724,
> > defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> > record_spec,
> > id_s",sort="id_record_spec asc")
> >
> > As does this:
> > search(sial-catalog-material-171030,
> > defType="edismax",q="T1503SIGMA",qf="id_record_spec",fl="id_record_spec,
> > stream_en_s_pri_name,display_cas_number,display_package_
> > size,key_erp_material_number,display_material_qty,display_
> > formula_weight,display_material_uom,key_brand,display_en_name",sort="id_
> > record_spec
> > asc")
> >
> > And this works:
> > innerJoin(
> > search(sial-catalog-material-171030,
> > defType="edismax",q="T1503SIGMA",qf="id_record_spec",fl="id_record_spec,
> > stream_en_s_pri_name,display_cas_number,display_package_
> > size,key_erp_material_number,display_material_qty,display_
> > formula_weight,display_material_uom,key_brand,display_en_name",sort="id_
> > record_spec
> > asc"),
> > search(test-catalog-product-170724,
> > defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> > record_spec,
> > id_s",sort="id_record_spec asc"),
> > on="id_record_spec"
> > )
> >
> > but this doesn't throw an error, but it also doesn't return anything.
> > innerJoin(
> > search(sial-catalog-material-171030, q=*:*,
> > fl="id_record_spec,stream_en_s_pri_name,display_cas_number,
> > display_package_size,key_erp_material_number,display_
> > material_qty,display_formula_weight,display_material_uom,
> > key_brand,display_en_name",sort="id_record_spec
> > asc"),
> > search(test-catalog-product-170724,
> > defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> > record_spec,
> > id_s",sort="id_record_spec asc"),
> > on="id_record_spec"
> > )
> >
> > Do we have to  explicitly provide the same query to both searches in the
> > join? I see examples in the documents that look like my last join.
> >
> > I also see the same behavior with this:
> > hashJoin(
> > search(test-catalog-product-170724,
> > defType="edismax",q="T1503SIGMA",qf="id_record_spec",mm="2<-12%",fl="id_
> > record_spec,
> > id_s",sort="id_record_spec asc"),
> > hashed=search(sial-catalog-material-171030, q=*:*,
> > fl="id_record_spec,stream_en_s_pri_name,display_cas_number,
> > display_package_size,key_erp_material_number,display_
> > material_qty,display_formula_weight,display_material_uom,
> > key_brand,display_en_name",sort="id_record_spec
> > asc"),
> > on="id_record_spec"
> > )
> >
> > no errors but no data either. There is data, so what am I doing wrong? I
> > suspect some user error but am at a loss to understand what it is.
> >
> > Thanks
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>

-- 


This message and any attachment are confidential and may be privileged or 
otherw

RE: adding documents to a secured solr server.

2017-11-02 Thread Phil Scadden
Yes, that worked.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Thursday, 2 November 2017 6:14 p.m.
To: solr-user@lucene.apache.org
Subject: Re: adding documents to a secured solr server.

On 11/1/2017 10:04 PM, Phil Scadden wrote:
> For testing, I changed to HttpSolrClient and specifying the core on process 
> and commit instead of opening it as server/core. This time worked... sort of. 
> Despite deleting the entire index with deletebyquery and seeing that it was 
> empty in the coreAdmin, I get :
>
> possible analysis error: cannot change DocValues type from SORTED_SET to 
> NUMERIC for field "access"
>
> I tried deleting the field in the admin interface and then adding it back in 
> again in that admin interface. But, no. Still comes up with that error. I 
> know deleting the index files on disk works but I don’t have access to the 
> server. This is a frustrating problem.

Variations of this error happen when settings on a field with docValues="true" 
are changed, and the index already has documents added with the previous 
settings.

Each Lucene segment stores information about what kind of docValues are present 
for each field that has docValues, and if you change an aspect of the field 
(multivalued, field class, etc) and try to add a new document with that 
different information, Lucene will complain.  The reason that deleting all 
documents didn't work is that when you delete documents, they are only MARKED 
as deleted, the segments (and deleted
docs) remain on the disk.

The only SURE way to fix it is to completely delete the index directory (or 
directories), reload the core/collection (or restart Solr), and reindex from 
scratch.  One thing you *might* be able to do if you don't have access to the 
server is delete all documents and then optimize the index, which should delete 
all segments and effectively leave you with a brand new empty index.  I'm not 
100% sure that this would take care of it, but I *think* it would.

Thanks,
Shawn
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Anyone have any comments on current solr monitoring favorites?

2017-11-02 Thread Emir Arnautović
Hi Robi,
Did you try Sematext’s SPM? It provides host, JVM and Solr metrics and more. We 
use it for monitoring our Solr instances and for consulting.

Disclaimer - see signature :)

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Nov 2017, at 19:35, Walter Underwood  wrote:
> 
> We use New Relic for JVM, CPU, and disk monitoring.
> 
> I tried the built-in metrics support in 6.4, but it just didn’t do what we 
> want. We want rates and percentiles for each request handler. That gives us 
> 95th percentile for textbooks suggest or for homework search results page, 
> etc. The Solr metrics didn’t do that. The Jetty metrics didn’t do that.
> 
> We built a dedicated servlet filter that goes in front of the Solr webapp and 
> reports metrics. It has some special hacks to handle some weird behavior in 
> SolrJ. A request to the “/srp” handler is sent as “/select?qt=/srp”, so we 
> normalize that.
> 
> The metrics start with the cluster name, the hostname, and the collection. 
> The rest is generated like this:
> 
> URL: GET /solr/textbooks/select?q=foo&qt=/auto
> Metric: textbooks.GET./auto
> 
> URL: GET /solr/textbooks/select?q=foo
> Metric: textbooks.GET./select
> 
> URL: GET /solr/questions/auto
> Metric: questions.GET./auto
> 
> So a full metric for the cluster “solr-cloud” and the host “search01" would 
> look like “solr-cloud.search01.solr.textbooks.GET./auto.m1_rate”.
> 
> We send all that to InfluxDB. We’ve configured a template so that each part 
> of the metric name is mapped to a field, so we can write efficient queries in 
> InfluxQL.
> 
> Metrics are graphed in Grafana. We have dashboards that mix Cloudwatch (for 
> the load balancer) and InfluxDB.
> 
> I’m still working out the kinks in some of the more complicated queries, but 
> the data is all there. I also want to expand the servlet filter to report 
> HTTP response codes.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Nov 2, 2017, at 9:30 AM, Petersen, Robert (Contr) 
>>  wrote:
>> 
>> OK I'm probably going to open a can of worms here...  lol
>> 
>> 
>> In the old old days I used PSI probe to monitor solr running on tomcat which 
>> worked ok on a machine by machine basis.
>> 
>> 
>> Later I had a grafana dashboard on top of graphite monitoring which was 
>> really nice looking but kind of complicated to set up.
>> 
>> 
>> Even later I successfully just dropped in a newrelic java agent which had 
>> solr monitors and a dashboard right out of the box, but it costs money for 
>> the full tamale.
>> 
>> 
>> For basic JVM health and Solr QPS and time percentiles, does anyone have any 
>> favorites or other alternative suggestions?
>> 
>> 
>> Thanks in advance!
>> 
>> Robi
>> 
>> 
>> 
>> This communication is confidential. Frontier only sends and receives email 
>> on the basis of the terms set out at 
>> http://www.frontier.com/email_disclaimer.
> 



Configuring HDFS Keyprovider for Solr

2017-11-02 Thread q4
I'm trying to create a Solr collection and store it in an HDFS encryption
zone. Getting errors below:

org.apache.solr.common.SolrException: Error CREATEing SolrCore
'person4_shard1_replica_n1': Unable to create core
[person4_shard1_replica_n1] Caused by: No KeyProvider is configured, cannot
access an encrypted file
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:949)
at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$168(CoreAdminOperation.java:91)
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:384)
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:389)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:745)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:726)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:507)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:378)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:322)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Unable to create core
[person4_shard1_replica_n1]
at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:996)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:916)
... 37 more
Caused by: org.apache.solr.common.SolrException: Cannot obtain lock file:
hdfs://:/encryption_zone/person4/core_node1/data/index/write.lock
at org.apache.solr.core.SolrCore.(SolrCore.java:988)
at org.apache.solr.core.SolrCore.(SolrCore.java:843)
at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:980)
... 38 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Cannot obtain
lock file:
hdfs://:/encryption_zone/person4/core_node1/data/index/write.lock
at
org.apache.solr.store.hdfs.HdfsLockFactory.obtainLock(HdfsLockFactory.java:85)
at 
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
at
org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:104)
at
org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:104)
at org.apache.lucene.i

Trouble using Jython script as ScriptTransformer

2017-11-02 Thread Kevin Grimes
Hey all,

I’m running v6.3.0. I’ve been trying to configure a Jython ScriptTransformer in 
my data-config.xml (pulls from JdbcDataSource). But when I run the full import, 
it tries to interpret the script as JavaScript, even though I added the 
language=Jython attribute to the 

Re: Advice on Stemming in Solr

2017-11-02 Thread Zheng Lin Edwin Yeo
Hi Emir,

We are looking to change to HunspellStemFilterFactory. This has a
dictionary file containing words and applicable flags, and an affix file
that specifies how these flags will control spell checking.
Probably we can control it from those files in HunspellStemFilterFactory?

Regards,
Edwin


On 2 November 2017 at 17:46, Emir Arnautović 
wrote:

> Hi Edwin,
> It seems that it would be best if you do not apply *ing stemming rule at
> all. The first idea is to trick stemmer and replace any word that ends with
> ing to some nonexisting char combination e.g. ‘wqx’. You can use 
> solr.PatternReplaceFilterFactory
> to do that. You can switch it back after stemming if want to have proper
> token in index.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi Emir,
> >
> > We do have quite alot of words that should not be stemmed. Currently, the
> > KStemFilterFactory are stemming all the non-English words that end with
> > "ing" as well. There are quite alot of places and names which ends in
> > "ing", and all these are being stemmed as well, which leads to an
> > inaccurate search.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 November 2017 at 18:20, Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi Edwin,
> >> If the number of words that should not be stemmed is not high you could
> >> use KeywordMarkerFilterFactory to flag those words as keywords and it
> >> should prevent stemmer from changing them.
> >> Depending on what you want to achieve, you might not be able to avoid
> >> using stemmer at indexing time. If you want to find documents that
> contain
> >> only “walking” with search term “walk”, then you have to stem at index
> >> time. Cases when you use stemming on query time only are rare and
> specific.
> >> If you want to prefer exact matches over stemmed matches, you have to
> >> index same content with and without stemming and boost matches on field
> >> without stemming.
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We are currently using KStemFilterFactory in Solr, but we found that it
> >> is
> >>> actually doing stemming on non-English words like "ximenting", which it
> >>> stem to "ximent". This is not what we wanted.
> >>>
> >>> Another option is to use the HunspellStemFilterFactory, but there are
> >> some
> >>> English words like "running", walking" that are not being stemmed.
> >>>
> >>> Would like to check, is it advisable to use Stemming at index? Or we
> >> should
> >>> not use Stemming at index time, but at query time, do a search for the
> >>> stemmed words as well, like for example, if the user search for
> >> "walking",
> >>> we will do the search together with "walk", and the actual word of
> >> walking
> >>> will have higher weightage.
> >>>
> >>> I'm currently using Solr 6.5.1.
> >>>
> >>> Regards,
> >>> Edwin
> >>
> >>
>
>