Re: How do I create a schema file for FIX data in Solr

2018-04-02 Thread Adhyan Arizki
Raymond,

May i suggest you to take a look at the examples given in Solr package?
Essentially you need to understand which field is to be searchable by the
application and what not. These FIX data can be represented i  JSON or XML.

To parse and upload the data to Solr, you can use different libraries out
there. Personally I have used SolrJ and Rsolr and they are essentially the
same.


On Mon, 2 Apr 2018, 12:35 Raymond Xie,  wrote:

> Thank you, Shawn, Rick and other readers,
>
> To Shawn:
>
> For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> means BeginString, in this example, its value is  FIX.4.4.9, and 9 means
> body length, it is 653 for this message, 35 is RIO, meaning the message
> type is RIO, 122 stands for OrigSendingTime and has a format of
> UTCTimestamp
>
> You can refer to this page for details: https://www.onixs.biz
> /fix-dictionary/4.2/fields_by_tag.html
>
> All the values are explained as string type.
>
> All the tag numbers are from FIX standard so it doesn't change (in my case)
>
> I expect a python program might be needed to parse the message and extract
> each tag's value, index is to be made on those extracted value as long as
> their field (tag) name.
>
> With index in place, ideally and naturally user will search for any
> keyword, however, in this case, most queries would be based on tag 37
> (Order ID) and 75 (Trade Date), there is another customized tag (not in the
> standard) Order Version to be queried on.
>
> I understand the parser creation would be a manual process, as long as I
> know or have a small sample program, I will do it myself and maybe adjust
> it as per need.
>
> To Rick:
>
> You mentioned creating JSON document, my understanding is a parser would be
> needed to generate that JSON document, do you have any existing example
> code?
>
>
>
>
> Thank you guys very much.
>
>
>
>
>
>
>
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey  wrote:
>
> > On 4/1/2018 10:12 AM, Raymond Xie wrote:
> >
> >> FIX is a format standard of financial data. It contains lots of tags in
> >> number with value for the tag, like 8=asdf, where 8 is the tag and asdf
> is
> >> the tag's value. Each tag has its definition.
> >>
> >> The sample msg in FIX format was in the original question.
> >>
> >> All I need to do is to know how to paste the msg and get all tag's
> value.
> >>
> >> I found so far a parser is what I need to start with., But I am more
> >> concerning about how to create index in Solr on the extracted tag's
> value,
> >> that is the first step, the next would be to customize the dashboard for
> >> users to search with a value to find out which msg contains that value
> in
> >> which tag and present users the whole msg as proof.
> >>
> >
> > Most of Solr's functionality is provided by Lucene.  Lucene is a java API
> > that implements search functionality.  Solr bolts on some functionality
> on
> > top of Lucene, but doesn't really do anything to fundamentally change the
> > fact that you're dealing with a Lucene index.  So I'm going to mostly
> talk
> > about Lucene below.
> >
> > Lucene organizes data in a unit that we call a "document." An easy
> analogy
> > for this is that it is a lot like a row in a single database table.  It
> has
> > fields, each field has a type. Unless custom software is used, there is
> > really no support for data other than basic primitive types -- numbers
> and
> > strings.  The only complex type that I can think of that Solr supports
> out
> > of the box is geospatial coordinates, and it might even support
> > multi-dimensional coordinates, but I'm not sure.  It's not all that
> complex
> > -- the field just stores and manipulates multiple numbers instead of one.
> > The Lucene API does support a FEW things that Solr doesn't implement.  I
> > don't think those are applicable to what you're trying to do.
> >
> > Let's look at the first part of the data that you included in the first
> > message:
> >
> > 8=FIX.4.4 9=653 35=RIO
> >
> > Is "8" always a mixture of letters and numbers and periods? Is "9" always
> > a number, and is it always a WHOLE number?  Is "35" always letters?
> > Looking deeper to data that I didn't quote ... is "122" always a
> date/time
> > value?  Are the tag numbers always picked from a well-defined set, or do
> > they change?
> >
> > Assuming that the answers in the previous paragraph are found and a
> > configuration is created to deal with all of it ... how are you planning
> to
> > search it?  What kind of queries would you expect somebody to make?
> That's
> > going to have a huge influence on how you configure things.
> >
> > Writing the schema is usually where people spend the most time when
> > they're setting up Solr.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: MatchMode in Dismax parser

2018-04-02 Thread Emir Arnautović
Hi,
In case you want to round up, you can use negative numbers and percentage of 
failed matches so 75% of matches rounded up can be written as -25%.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Mar 2018, at 17:00, Shawn Heisey  wrote:
> 
> On 3/29/2018 1:42 AM, iamluckysharma.0...@gmail.com wrote:
>> Just a suggestion , Shouldn't we need to use Math.round instead of direct 
>> int when watch mode is in %,
>> example i have 3 boolean clauses if i go for mm=50%, currently it reduce it 
>> to ~1, instead it can be ~2,
>> 
>> another example could be when we have 5 boolean clauses and mm=75%, we get 
>> calc as 3.75 currently it took 3, so instead of 3 it should have taken 4. as 
>> Math.round()
> 
> Maybe that is what it SHOULD do, but in most languages, converting a float 
> value to an integer truncates the decimal portion, it doesn't round.  To do 
> that requires a deliberate choice in the code, and that probably doesn't 
> exist in dismax/edismax.  If your assertion is that this should have been 
> done from day one, I'd say you're right.  But that decision is now ancient 
> history.  The person who wrote the code might have had a very good reason to 
> NOT do it that way.
> 
> At this point, if the functionality were changed, it would result in an 
> upgraded Solr version behaving VERY differently than the previous version.  
> While new functionality is often added in any new minor release, changing 
> existing behavior that users rely on without a configuration option is 
> usually only done in a major version.  So for 7.x, dismax/edismax would need 
> an option to enable rounding on minimum-should-match calculations.  Sounds 
> like a great feature request to put into Jira, and patches are always welcome.
> 
> Thanks,
> Shawn
> 



custom filter class on schema.xml on solrcloud

2018-04-02 Thread void
I have used a custom filter provided by a jar in schema.xml in standalone
Solr like below



And for this, 

I have loaded the jar in solrconfig.xml like below



It's working fine But when I've tried to use it in solrcloud with external
zookeeper mode I've got an error 'IO exception' maybe for uploading a large
jar file in zookeeper.

I've also tried to put this jar in the lib folder of solr home but got error
'Plugin init failure'

After that, I've tried blob store api but the documentation says "Blob store
can only be used to dynamically load components configured in
solrconfig.xml. Components specified in schema.xml cannot be loaded from
blob store"

So, how can I use custom filter class in schema.xml in solrcloud mode with
external zookeeper configuration






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: how to reset the index in solr

2018-04-02 Thread delk
If you want to delete all items from Solr index, use the query

*:*




-
Development Center Toronto 
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


custom filter class on schema.xml on solrcloud

2018-04-02 Thread void
I have used a custom filter provided by a jar in schema.xml in standalone
Solr like below



And for this, 

I have loaded the jar in solrconfig.xml like below



It's working fine But when I've tried to use it in solrcloud with external
zookeeper mode I've got an error 'IO exception' maybe for uploading a large
jar file in zookeeper.

I've also tried to put this jar in the lib folder of solr home but got error
'Plugin init failure'

After that, I've tried blob store api but the documentation says "Blob store
can only be used to dynamically load components configured in
solrconfig.xml. Components specified in schema.xml cannot be loaded from
blob store"

So, how can I use custom filter class in schema.xml in solrcloud mode with
external zookeeper configuration






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-02 Thread Rick Leir
Raymond
There is a default field normally called df. You would normally use Copyfield 
to copy all searchable fields into the default field. 
Cheers -- Rick

On April 1, 2018 11:34:07 PM EDT, Raymond Xie  wrote:
>Hi Rick,
>
>I sorted it out half:
>
>I should have specified the field in the search query, so, instead of
>http://localhost:8983/solr/films/browse?q=batman, I should use:
>http://localhost:8983/solr/films/browse?q=name:batman
>
>Sorry for this newbie mistake.
>
>But what about if I/user doesn't know or doesn't want to specify the
>search
>scope to be restricted in field "name" but anywhere in the index'ed
>documents?
>
>
>**
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir  wrote:
>
>> Raymond
>> The output is not visible to me because the mailing list strips
>images.
>> Please try a different way to show the output.
>> Cheers -- Rick
>>
>> On March 29, 2018 10:17:13 PM EDT, Raymond Xie 
>> wrote:
>> > I am new to Solr, following Steve Rowe's example on
>>
>>https://github.com/apache/lucene-solr/tree/master/solr/example/films:
>> >
>> >It would be greatly appreciated if anyone can enlighten me where to
>> >start
>> >troubleshooting, thank you very much in advance.
>> >
>> >The steps I followed are:
>> >
>> >Here ya go << END_OF_SCRIPT
>> >
>> >bin/solr stop
>> >rm server/logs/*.log
>> >rm -Rf server/solr/films/
>> >bin/solr start
>> >bin/solr create -c films
>> >curl http://localhost:8983/solr/films/schema -X POST -H
>> >'Content-type:application/json' --data-binary '{
>> >"add-field" : {
>> >"name":"name",
>> >"type":"text_general",
>> >"multiValued":false,
>> >"stored":true
>> >},
>> >"add-field" : {
>> >"name":"initial_release_date",
>> >"type":"pdate",
>> >"stored":true
>> >}
>> >}'
>> >bin/post -c films example/films/films.json
>> >curl http://localhost:8983/solr/films/config/params -H
>> >'Content-type:application/json'  -d '{
>> >"update" : {
>> >  "facets": {
>> >"facet.field":"genre"
>> >}
>> >  }
>> >}'
>> >
>> ># END_OF_SCRIPT
>> >
>> >Additional fun -
>> >
>> >Add highlighting:
>> >curl http://localhost:8983/solr/films/config/params -H
>> >'Content-type:application/json'  -d '{
>> >"set" : {
>> >  "browse": {
>> >"hl":"on",
>> >"hl.fl":"name"
>> >}
>> >  }
>> >}'
>> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll
>> >see "batman" highlighted in the results
>> >
>> >
>> >
>> >I got nothing in my search:
>> >
>> >
>> >
>> >
>> >**
>> >*Sincerely yours,*
>> >
>> >
>> >*Raymond*
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: How do I create a schema file for FIX data in Solr

2018-04-02 Thread Rick Leir
Ray
Have you looked around for an existing FIX to Solr conduit? If FIX is a common 
standard then I would expect that someone has done some work on this and 
github'd it.

Even just FIX to JSON.
Cheers -- Rick

On April 2, 2018 12:34:44 AM EDT, Raymond Xie  wrote:
>Thank you, Shawn, Rick and other readers,
>
>To Shawn:
>
>For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>means BeginString, in this example, its value is  FIX.4.4.9, and 9
>means
>body length, it is 653 for this message, 35 is RIO, meaning the message
>type is RIO, 122 stands for OrigSendingTime and has a format of
>UTCTimestamp
>
>You can refer to this page for details: https://www.onixs.biz
>/fix-dictionary/4.2/fields_by_tag.html
>
>All the values are explained as string type.
>
>All the tag numbers are from FIX standard so it doesn't change (in my
>case)
>
>I expect a python program might be needed to parse the message and
>extract
>each tag's value, index is to be made on those extracted value as long
>as
>their field (tag) name.
>
>With index in place, ideally and naturally user will search for any
>keyword, however, in this case, most queries would be based on tag 37
>(Order ID) and 75 (Trade Date), there is another customized tag (not in
>the
>standard) Order Version to be queried on.
>
>I understand the parser creation would be a manual process, as long as
>I
>know or have a small sample program, I will do it myself and maybe
>adjust
>it as per need.
>
>To Rick:
>
>You mentioned creating JSON document, my understanding is a parser
>would be
>needed to generate that JSON document, do you have any existing example
>code?
>
>
>
>
>Thank you guys very much.
>
>
>
>
>
>
>
>
>
>**
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey 
>wrote:
>
>> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>>
>>> FIX is a format standard of financial data. It contains lots of tags
>in
>>> number with value for the tag, like 8=asdf, where 8 is the tag and
>asdf is
>>> the tag's value. Each tag has its definition.
>>>
>>> The sample msg in FIX format was in the original question.
>>>
>>> All I need to do is to know how to paste the msg and get all tag's
>value.
>>>
>>> I found so far a parser is what I need to start with., But I am more
>>> concerning about how to create index in Solr on the extracted tag's
>value,
>>> that is the first step, the next would be to customize the dashboard
>for
>>> users to search with a value to find out which msg contains that
>value in
>>> which tag and present users the whole msg as proof.
>>>
>>
>> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>API
>> that implements search functionality.  Solr bolts on some
>functionality on
>> top of Lucene, but doesn't really do anything to fundamentally change
>the
>> fact that you're dealing with a Lucene index.  So I'm going to mostly
>talk
>> about Lucene below.
>>
>> Lucene organizes data in a unit that we call a "document." An easy
>analogy
>> for this is that it is a lot like a row in a single database table. 
>It has
>> fields, each field has a type. Unless custom software is used, there
>is
>> really no support for data other than basic primitive types --
>numbers and
>> strings.  The only complex type that I can think of that Solr
>supports out
>> of the box is geospatial coordinates, and it might even support
>> multi-dimensional coordinates, but I'm not sure.  It's not all that
>complex
>> -- the field just stores and manipulates multiple numbers instead of
>one.
>> The Lucene API does support a FEW things that Solr doesn't implement.
> I
>> don't think those are applicable to what you're trying to do.
>>
>> Let's look at the first part of the data that you included in the
>first
>> message:
>>
>> 8=FIX.4.4 9=653 35=RIO
>>
>> Is "8" always a mixture of letters and numbers and periods? Is "9"
>always
>> a number, and is it always a WHOLE number?  Is "35" always letters?
>> Looking deeper to data that I didn't quote ... is "122" always a
>date/time
>> value?  Are the tag numbers always picked from a well-defined set, or
>do
>> they change?
>>
>> Assuming that the answers in the previous paragraph are found and a
>> configuration is created to deal with all of it ... how are you
>planning to
>> search it?  What kind of queries would you expect somebody to make? 
>That's
>> going to have a huge influence on how you configure things.
>>
>> Writing the schema is usually where people spend the most time when
>> they're setting up Solr.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: How do I create a schema file for FIX data in Solr

2018-04-02 Thread Rick Leir
Google 
   fix to json, 
there are a few interesting leads.

On April 2, 2018 12:34:44 AM EDT, Raymond Xie  wrote:
>Thank you, Shawn, Rick and other readers,
>
>To Shawn:
>
>For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>means BeginString, in this example, its value is  FIX.4.4.9, and 9
>means
>body length, it is 653 for this message, 35 is RIO, meaning the message
>type is RIO, 122 stands for OrigSendingTime and has a format of
>UTCTimestamp
>
>You can refer to this page for details: https://www.onixs.biz
>/fix-dictionary/4.2/fields_by_tag.html
>
>All the values are explained as string type.
>
>All the tag numbers are from FIX standard so it doesn't change (in my
>case)
>
>I expect a python program might be needed to parse the message and
>extract
>each tag's value, index is to be made on those extracted value as long
>as
>their field (tag) name.
>
>With index in place, ideally and naturally user will search for any
>keyword, however, in this case, most queries would be based on tag 37
>(Order ID) and 75 (Trade Date), there is another customized tag (not in
>the
>standard) Order Version to be queried on.
>
>I understand the parser creation would be a manual process, as long as
>I
>know or have a small sample program, I will do it myself and maybe
>adjust
>it as per need.
>
>To Rick:
>
>You mentioned creating JSON document, my understanding is a parser
>would be
>needed to generate that JSON document, do you have any existing example
>code?
>
>
>
>
>Thank you guys very much.
>
>
>
>
>
>
>
>
>
>**
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey 
>wrote:
>
>> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>>
>>> FIX is a format standard of financial data. It contains lots of tags
>in
>>> number with value for the tag, like 8=asdf, where 8 is the tag and
>asdf is
>>> the tag's value. Each tag has its definition.
>>>
>>> The sample msg in FIX format was in the original question.
>>>
>>> All I need to do is to know how to paste the msg and get all tag's
>value.
>>>
>>> I found so far a parser is what I need to start with., But I am more
>>> concerning about how to create index in Solr on the extracted tag's
>value,
>>> that is the first step, the next would be to customize the dashboard
>for
>>> users to search with a value to find out which msg contains that
>value in
>>> which tag and present users the whole msg as proof.
>>>
>>
>> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>API
>> that implements search functionality.  Solr bolts on some
>functionality on
>> top of Lucene, but doesn't really do anything to fundamentally change
>the
>> fact that you're dealing with a Lucene index.  So I'm going to mostly
>talk
>> about Lucene below.
>>
>> Lucene organizes data in a unit that we call a "document." An easy
>analogy
>> for this is that it is a lot like a row in a single database table. 
>It has
>> fields, each field has a type. Unless custom software is used, there
>is
>> really no support for data other than basic primitive types --
>numbers and
>> strings.  The only complex type that I can think of that Solr
>supports out
>> of the box is geospatial coordinates, and it might even support
>> multi-dimensional coordinates, but I'm not sure.  It's not all that
>complex
>> -- the field just stores and manipulates multiple numbers instead of
>one.
>> The Lucene API does support a FEW things that Solr doesn't implement.
> I
>> don't think those are applicable to what you're trying to do.
>>
>> Let's look at the first part of the data that you included in the
>first
>> message:
>>
>> 8=FIX.4.4 9=653 35=RIO
>>
>> Is "8" always a mixture of letters and numbers and periods? Is "9"
>always
>> a number, and is it always a WHOLE number?  Is "35" always letters?
>> Looking deeper to data that I didn't quote ... is "122" always a
>date/time
>> value?  Are the tag numbers always picked from a well-defined set, or
>do
>> they change?
>>
>> Assuming that the answers in the previous paragraph are found and a
>> configuration is created to deal with all of it ... how are you
>planning to
>> search it?  What kind of queries would you expect somebody to make? 
>That's
>> going to have a huge influence on how you configure things.
>>
>> Writing the schema is usually where people spend the most time when
>> they're setting up Solr.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: How do I create a schema file for FIX data in Solr

2018-04-02 Thread Raymond Xie
Thank you Rick for the enlightening.

I will get the FIX message parsed first and come back here later.


**
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir  wrote:

> Google
>fix to json,
> there are a few interesting leads.
>
> On April 2, 2018 12:34:44 AM EDT, Raymond Xie 
> wrote:
> >Thank you, Shawn, Rick and other readers,
> >
> >To Shawn:
> >
> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
> >means
> >body length, it is 653 for this message, 35 is RIO, meaning the message
> >type is RIO, 122 stands for OrigSendingTime and has a format of
> >UTCTimestamp
> >
> >You can refer to this page for details: https://www.onixs.biz
> >/fix-dictionary/4.2/fields_by_tag.html
> >
> >All the values are explained as string type.
> >
> >All the tag numbers are from FIX standard so it doesn't change (in my
> >case)
> >
> >I expect a python program might be needed to parse the message and
> >extract
> >each tag's value, index is to be made on those extracted value as long
> >as
> >their field (tag) name.
> >
> >With index in place, ideally and naturally user will search for any
> >keyword, however, in this case, most queries would be based on tag 37
> >(Order ID) and 75 (Trade Date), there is another customized tag (not in
> >the
> >standard) Order Version to be queried on.
> >
> >I understand the parser creation would be a manual process, as long as
> >I
> >know or have a small sample program, I will do it myself and maybe
> >adjust
> >it as per need.
> >
> >To Rick:
> >
> >You mentioned creating JSON document, my understanding is a parser
> >would be
> >needed to generate that JSON document, do you have any existing example
> >code?
> >
> >
> >
> >
> >Thank you guys very much.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >**
> >*Sincerely yours,*
> >
> >
> >*Raymond*
> >
> >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey 
> >wrote:
> >
> >> On 4/1/2018 10:12 AM, Raymond Xie wrote:
> >>
> >>> FIX is a format standard of financial data. It contains lots of tags
> >in
> >>> number with value for the tag, like 8=asdf, where 8 is the tag and
> >asdf is
> >>> the tag's value. Each tag has its definition.
> >>>
> >>> The sample msg in FIX format was in the original question.
> >>>
> >>> All I need to do is to know how to paste the msg and get all tag's
> >value.
> >>>
> >>> I found so far a parser is what I need to start with., But I am more
> >>> concerning about how to create index in Solr on the extracted tag's
> >value,
> >>> that is the first step, the next would be to customize the dashboard
> >for
> >>> users to search with a value to find out which msg contains that
> >value in
> >>> which tag and present users the whole msg as proof.
> >>>
> >>
> >> Most of Solr's functionality is provided by Lucene.  Lucene is a java
> >API
> >> that implements search functionality.  Solr bolts on some
> >functionality on
> >> top of Lucene, but doesn't really do anything to fundamentally change
> >the
> >> fact that you're dealing with a Lucene index.  So I'm going to mostly
> >talk
> >> about Lucene below.
> >>
> >> Lucene organizes data in a unit that we call a "document." An easy
> >analogy
> >> for this is that it is a lot like a row in a single database table.
> >It has
> >> fields, each field has a type. Unless custom software is used, there
> >is
> >> really no support for data other than basic primitive types --
> >numbers and
> >> strings.  The only complex type that I can think of that Solr
> >supports out
> >> of the box is geospatial coordinates, and it might even support
> >> multi-dimensional coordinates, but I'm not sure.  It's not all that
> >complex
> >> -- the field just stores and manipulates multiple numbers instead of
> >one.
> >> The Lucene API does support a FEW things that Solr doesn't implement.
> > I
> >> don't think those are applicable to what you're trying to do.
> >>
> >> Let's look at the first part of the data that you included in the
> >first
> >> message:
> >>
> >> 8=FIX.4.4 9=653 35=RIO
> >>
> >> Is "8" always a mixture of letters and numbers and periods? Is "9"
> >always
> >> a number, and is it always a WHOLE number?  Is "35" always letters?
> >> Looking deeper to data that I didn't quote ... is "122" always a
> >date/time
> >> value?  Are the tag numbers always picked from a well-defined set, or
> >do
> >> they change?
> >>
> >> Assuming that the answers in the previous paragraph are found and a
> >> configuration is created to deal with all of it ... how are you
> >planning to
> >> search it?  What kind of queries would you expect somebody to make?
> >That's
> >> going to have a huge influence on how you configure things.
> >>
> >> Writing the schema is usually where people spend the most time when
> >> they're setting up Solr.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
> --
> Sorry for 

Re: querying vs. highlighting: complete freedom?

2018-04-02 Thread David Smiley
Hi Arturas,

Both Erick and I had a go at improving the documentation here.  I hope it's
clearer.
https://builds.apache.org/job/Solr-reference-guide-master/javadoc/highlighting.html
The docs for hl.fl, hl.q, hl.qparser were all updated.  The meat of the
change was a new note in hl.fl including an example.  It's kinda hard to
document the problem you found but I hope the note will be somewhat
illustrative.

~ David

On Mon, Mar 26, 2018 at 3:12 AM Arturas Mazeika  wrote:

> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what I
> mean when I supply a keyword without the field-qualifier. Very impressive.
> Would you care (re)posting this answer to stackoverflow? If that is too
> much of a hassle, I'll do this in a couple of days myself on your behalf.
>
> I am impressed how well, thorough, fast and fully the question was
> answered.
>
> Steven hint pushed me into this direction further: he suggested to use the
> query part of solr to filter and sort out the relevant answers in the 1st
> step and in the 2nd step he'd highlight all the keywords using CTR+F (in
> the browser or some alternative viewer). This brought be to the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to achieve
> this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the query
> 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
> document and the query
> 3. Use the matching of the substrings from the original text to last
> filter/tokenizer/analyzer in the analyze-chain to map the terms of the
> query
> 4. Emulate CTRL+F highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal. If
> one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
> German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside quantum
> mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theory&analysis.fieldtype=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core that
> receives the request is not necessarily the one that processes it? Or this
> already is distributed in a sense that receiving core and processing cores
> are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store this
> information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson 
> wrote:
>
> > Arturas:
> >
> > Try to field-qualify your hl.q parameter. That looks like:
> >
> > hl.q=trans:Kundigung
> > or
> > hl.q=trans:Kündigung
> >
> > I saw the exact behavior you describe when I did _not_ specify the
> > field in the hl.q parameter, i.e.
> >
> > hl.q=Kundigung
> > or
> > hl.q=Kündigung
> >
> > didn't show all highlights.
> >
> > But when I did specify the field, it worked.
> >
> > Here's what I think is happening: Solr uses the default search
> > field when parsing an un-field-qualified query. I.e.
> >
> > q=something
> >
> > is parsed as
> >
> > q=default_search_field:something.
> >
> > The default field is controlled in solrconfig.xml with the "df"
> > parameter, you'll see entries like:
> > my_field
> >
> > Also when I changed the "df" parameter to the field I was highlighting
> > on, I didn't need to specify the field on the hl.q parameter.
> >
> > hl.q=Kundigung
> > or
> > hl.q=Kündigung
> >
> > The default  field is usually "text", which knows nothing about
> > the German-specific filters you've applied unless you changed it.
> >
> > So in the absence of a field-qualification for the hl.q parameter Solr
> > was parsing the query according to the analysis chain specifed
> > in your default field, and probably passed ü through without
> > transforming it. Since your indexing analysis chain for that field
> > folded ü to just plain u, it wasn't found or highlighted.
> >
> > On the surface, this does seem like something that should be
> > changed, I'll go ahead and ping the dev list.
> >
> > NOTE: I was trying this on Solr 7.1
> >
> > Best,
> > Erick
> >
> > On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika 
> > wrote:
> > 

Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread murugesh karmegam
We noticed this issue in our solr clusters right after when Solr cluster is
restarted or Solr cluster is live for some time. Based on my research so
far... I am not seeing zookeeper connection issues from zk server side. It
seems it is solr side ( zk client) side. This issue is pretty constant now
and then.

Error 1 Solr:

WARN  - 2018-02-06 17:35:04.742;
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper
session was expired. Attempting to reconnect to recover relationship with
ZooKeeper...
ERROR - 2018-02-06 17:35:04.743; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are
disabled.
at
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)


Error 2:

>From ingestor log:
/var/log/mwired/core-ingestors/app.log.9:2018-03-30 05:44:52,616 [-38] ERROR
org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection 
failed due to (503)
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at : Cannot talk to ZooKeeper - Updates are disabled., retry? 0
/var/log/mwired/core-ingestors/app.log.9:com.mwired.grid.commons.exception.PersistenceException:
Failed to add 11 docs to solr0 collection , cachedDocs=118; because Error
from server at : Cannot talk to ZooKeeper - Updates are disabled.


Wondering is there any fix? Appreciate any input.

http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-td4311582.html
http://lucene.472066.n3.nabble.com/6-6-Cannot-talk-to-ZooKeeper-Updates-are-disabled-td4352917.html
https://issues.apache.org/jira/browse/SOLR-3274

Thanks in advance.
Murux



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread Yago Riveiro
Hi murugesh,

This error happen normally when you are in long GC pauses. Try to rise the heap 
memory.

The only way to recover from this is restarting the affected node.

Regard.

--

Yago Riveiro

On 2 Apr 2018 15:39 +0100, murugesh karmegam , wrote:
> We noticed this issue in our solr clusters right after when Solr cluster is
> restarted or Solr cluster is live for some time. Based on my research so
> far... I am not seeing zookeeper connection issues from zk server side. It
> seems it is solr side ( zk client) side. This issue is pretty constant now
> and then.
>
> Error 1 Solr:
>
> WARN - 2018-02-06 17:35:04.742;
> org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
> ERROR - 2018-02-06 17:35:04.743; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are
> disabled.
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
>
>
> Error 2:
>
> From ingestor log:
> /var/log/mwired/core-ingestors/app.log.9:2018-03-30 05:44:52,616 [-38] ERROR
> org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection
> failed due to (503)
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at : Cannot talk to ZooKeeper - Updates are disabled., retry? 0
> /var/log/mwired/core-ingestors/app.log.9:com.mwired.grid.commons.exception.PersistenceException:
> Failed to add 11 docs to solr0 collection , cachedDocs=118; because Error
> from server at : Cannot talk to ZooKeeper - Updates are disabled.
>
>
> Wondering is there any fix? Appreciate any input.
>
> http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-td4311582.html
> http://lucene.472066.n3.nabble.com/6-6-Cannot-talk-to-ZooKeeper-Updates-are-disabled-td4352917.html
> https://issues.apache.org/jira/browse/SOLR-3274
>
> Thanks in advance.
> Murux
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: custom filter class on schema.xml on solrcloud

2018-04-02 Thread Erick Erickson
ZK as used by Solr defaults to a max of 1M file sizes specifically so
you _know_ when you are pushing large files around. You can change
that with setting jute.maxbuffer, see the ZooKeeper admin guide.

But if you put the jar file in the right place, it should have been
found. I did note that you put it in a different place than you
specified in sorlconfig.xml, but if it's in the classpath it should be
found.

Try starting Solr with the -v option, that'll show you where
everything is loaded from (or looked for). That may provide a clue.

Best,
Erick

On Mon, Apr 2, 2018 at 2:25 AM, void  wrote:
> I have used a custom filter provided by a jar in schema.xml in standalone
> Solr like below
>
>  stopWordDictionary="resources/yStopWords"/>
>
> And for this,
>
> I have loaded the jar in solrconfig.xml like below
>
> 
>
> It's working fine But when I've tried to use it in solrcloud with external
> zookeeper mode I've got an error 'IO exception' maybe for uploading a large
> jar file in zookeeper.
>
> I've also tried to put this jar in the lib folder of solr home but got error
> 'Plugin init failure'
>
> After that, I've tried blob store api but the documentation says "Blob store
> can only be used to dynamically load components configured in
> solrconfig.xml. Components specified in schema.xml cannot be loaded from
> blob store"
>
> So, how can I use custom filter class in schema.xml in solrcloud mode with
> external zookeeper configuration
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr 7.2 solr.log is missing

2018-04-02 Thread Abhi Basu
Not located in the /server/logs/ folder.

Have these files instead

solr-8983-console.log
solr_gc.log.0.current

I can see logs from the Solr dashboard. Where is the solr.log file going
to? A search of "solr.log" in the system did not find the file.

Is the file called something else for solrcloud mode?

log4j.properties shows this:

# Default Solr log4j config
# rootLogger log level may be programmatically overridden by
-Dsolr.log.level
solr.log=${solr.log.dir}
log4j.rootLogger=INFO, file, CONSOLE

# Console appender will be programmatically disabled when Solr is started
with option -Dsolr.log.muteconsole
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
%-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n

#- size rotation with log cleanup.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=4MB
log4j.appender.file.MaxBackupIndex=9

#- File to log to and log format
log4j.appender.file.File=${solr.log}/solr.log
log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
%-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n

# Adjust logging levels that should differ from root logger
log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.hadoop=WARN
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.server.Server=INFO
log4j.logger.org.eclipse.jetty.server.ServerConnector=INFO

# set to INFO to enable infostream log messages
log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF


Thanks,

Abhi

-- 
Abhi Basu


Collection out of disk space, commit problem

2018-04-02 Thread Webster Homer
Over the weekend one of our Dev solrcloud ran out of disk space. Examining
the problem we found one collection that had 2 months of uncommitted tlog
files. Unfortuneatly the solr logs rolled over and so I cannot see the
commit behavior during the last time data was loaded to it.

The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
 
   ${solr.autoCommit.maxTime:6}
   false


solr.autoCommit.maxTime  is set to 6

 
   ${solr.autoSoftCommit.maxTime:5000}
 
solr.autoSoftCommit.maxTime  is set to 3000

I found tlog files dated to Feb. 27. There is an automated job that reloads
the data once a week. It looks like no commits occurred from Feb 27 onward.
Once the disk got full solr got very unhappy.

This solrcloud has 2 shards and one replica per shard.

We have a second development solrcloud which has the same collections with
identical configurations except that these collections have 2 shards and 2
replicas per shard. That one doesn't seem to have the tlog files
accumulating.

I have long suspected that autoCommit is not reliable, and this seems to
indicate that it is not.

We have several collections that share the same configuration, and have
similar ETL jobs loading them. This is the second time that this particular
collection has had this  problem.

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Collection out of disk space, commit problem

2018-04-02 Thread Erick Erickson
Webster:

Do you by any chance have CDCR configured? If so, insure that
buffering is disabled. Buffering was intended to be enabled
_temporarily_ during, say, a maintenance window and was conceived
before the bootstrapping capability was added to CDCR.

But I don't recall your other e-mails mention CDCR so I mention this
on the off chance...

Best,
Erick

On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer  wrote:
> Over the weekend one of our Dev solrcloud ran out of disk space. Examining
> the problem we found one collection that had 2 months of uncommitted tlog
> files. Unfortuneatly the solr logs rolled over and so I cannot see the
> commit behavior during the last time data was loaded to it.
>
> The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
>  
>${solr.autoCommit.maxTime:6}
>false
> 
>
> solr.autoCommit.maxTime  is set to 6
>
>  
>${solr.autoSoftCommit.maxTime:5000}
>  
> solr.autoSoftCommit.maxTime  is set to 3000
>
> I found tlog files dated to Feb. 27. There is an automated job that reloads
> the data once a week. It looks like no commits occurred from Feb 27 onward.
> Once the disk got full solr got very unhappy.
>
> This solrcloud has 2 shards and one replica per shard.
>
> We have a second development solrcloud which has the same collections with
> identical configurations except that these collections have 2 shards and 2
> replicas per shard. That one doesn't seem to have the tlog files
> accumulating.
>
> I have long suspected that autoCommit is not reliable, and this seems to
> indicate that it is not.
>
> We have several collections that share the same configuration, and have
> similar ETL jobs loading them. This is the second time that this particular
> collection has had this  problem.
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Re: Solr 7.2 solr.log is missing

2018-04-02 Thread Erick Erickson
Technically, Solr doesn't name the file at all, that's in your log4j
config, this line:

log4j.appender.file.File=${solr.log}/solr.log

so it's weird that you can't find it on your machine at all. How do
you _start_ Solr? In particular, to you define a system variable
"-Dsolr.log=some_path"?

And also note that there are three log4j configs, and it's easy to be
using one you don't think you are using, see SOLR-12008.

Best,
Erick

On Mon, Apr 2, 2018 at 10:02 AM, Abhi Basu <9000r...@gmail.com> wrote:
> Not located in the /server/logs/ folder.
>
> Have these files instead
>
> solr-8983-console.log
> solr_gc.log.0.current
>
> I can see logs from the Solr dashboard. Where is the solr.log file going
> to? A search of "solr.log" in the system did not find the file.
>
> Is the file called something else for solrcloud mode?
>
> log4j.properties shows this:
>
> # Default Solr log4j config
> # rootLogger log level may be programmatically overridden by
> -Dsolr.log.level
> solr.log=${solr.log.dir}
> log4j.rootLogger=INFO, file, CONSOLE
>
> # Console appender will be programmatically disabled when Solr is started
> with option -Dsolr.log.muteconsole
> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
> log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout
> log4j.appender.CONSOLE.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
> %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n
>
> #- size rotation with log cleanup.
> log4j.appender.file=org.apache.log4j.RollingFileAppender
> log4j.appender.file.MaxFileSize=4MB
> log4j.appender.file.MaxBackupIndex=9
>
> #- File to log to and log format
> log4j.appender.file.File=${solr.log}/solr.log
> log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout
> log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
> %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n
>
> # Adjust logging levels that should differ from root logger
> log4j.logger.org.apache.zookeeper=WARN
> log4j.logger.org.apache.hadoop=WARN
> log4j.logger.org.eclipse.jetty=WARN
> log4j.logger.org.eclipse.jetty.server.Server=INFO
> log4j.logger.org.eclipse.jetty.server.ServerConnector=INFO
>
> # set to INFO to enable infostream log messages
> log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF
>
>
> Thanks,
>
> Abhi
>
> --
> Abhi Basu


Re: Solr 7.2 solr.log is missing

2018-04-02 Thread Abhi Basu
Wow life is complicated :)

Since I am using this to start solr, I am assuming the one in
/server/scripts/cloud-scripts is being used:
./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/solr
-p 8983 -z zk0-esohad:2181,zk1-esohad:2181,zk5-esohad:2181 -m 10g

So, I guess I need to edit that one.

Thanks,

Abhi

On Mon, Apr 2, 2018 at 1:14 PM, Erick Erickson 
wrote:

> Technically, Solr doesn't name the file at all, that's in your log4j
> config, this line:
>
> log4j.appender.file.File=${solr.log}/solr.log
>
> so it's weird that you can't find it on your machine at all. How do
> you _start_ Solr? In particular, to you define a system variable
> "-Dsolr.log=some_path"?
>
> And also note that there are three log4j configs, and it's easy to be
> using one you don't think you are using, see SOLR-12008.
>
> Best,
> Erick
>
> On Mon, Apr 2, 2018 at 10:02 AM, Abhi Basu <9000r...@gmail.com> wrote:
> > Not located in the /server/logs/ folder.
> >
> > Have these files instead
> >
> > solr-8983-console.log
> > solr_gc.log.0.current
> >
> > I can see logs from the Solr dashboard. Where is the solr.log file going
> > to? A search of "solr.log" in the system did not find the file.
> >
> > Is the file called something else for solrcloud mode?
> >
> > log4j.properties shows this:
> >
> > # Default Solr log4j config
> > # rootLogger log level may be programmatically overridden by
> > -Dsolr.log.level
> > solr.log=${solr.log.dir}
> > log4j.rootLogger=INFO, file, CONSOLE
> >
> > # Console appender will be programmatically disabled when Solr is started
> > with option -Dsolr.log.muteconsole
> > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
> > log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout
> > log4j.appender.CONSOLE.layout.ConversionPattern=%d{-MM-dd
> HH:mm:ss.SSS}
> > %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n
> >
> > #- size rotation with log cleanup.
> > log4j.appender.file=org.apache.log4j.RollingFileAppender
> > log4j.appender.file.MaxFileSize=4MB
> > log4j.appender.file.MaxBackupIndex=9
> >
> > #- File to log to and log format
> > log4j.appender.file.File=${solr.log}/solr.log
> > log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout
> > log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
> > %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n
> >
> > # Adjust logging levels that should differ from root logger
> > log4j.logger.org.apache.zookeeper=WARN
> > log4j.logger.org.apache.hadoop=WARN
> > log4j.logger.org.eclipse.jetty=WARN
> > log4j.logger.org.eclipse.jetty.server.Server=INFO
> > log4j.logger.org.eclipse.jetty.server.ServerConnector=INFO
> >
> > # set to INFO to enable infostream log messages
> > log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF
> >
> >
> > Thanks,
> >
> > Abhi
> >
> > --
> > Abhi Basu
>



-- 
Abhi Basu


Re: Collection out of disk space, commit problem

2018-04-02 Thread Webster Homer
Erick,

Thanks, Normally our dev environment does not use CDCR, except when we're
doing active development on it. As it happens the collection in question,
was one we used to test cdcr. Or rather the configuration for it was, as
the specific collection has been deleted and created many times. Even
though we had cdcr turned off it seems that buffers got set to "enabled"
Which seems to be the default, and it is a really bad default!

Because it's dev and we don't do cdcr there, I might not have thought to
look at that. So thank you for that

Web

On Mon, Apr 2, 2018 at 1:10 PM, Erick Erickson 
wrote:

> Webster:
>
> Do you by any chance have CDCR configured? If so, insure that
> buffering is disabled. Buffering was intended to be enabled
> _temporarily_ during, say, a maintenance window and was conceived
> before the bootstrapping capability was added to CDCR.
>
> But I don't recall your other e-mails mention CDCR so I mention this
> on the off chance...
>
> Best,
> Erick
>
> On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer 
> wrote:
> > Over the weekend one of our Dev solrcloud ran out of disk space.
> Examining
> > the problem we found one collection that had 2 months of uncommitted tlog
> > files. Unfortuneatly the solr logs rolled over and so I cannot see the
> > commit behavior during the last time data was loaded to it.
> >
> > The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
> >  
> >${solr.autoCommit.maxTime:6}
> >false
> > 
> >
> > solr.autoCommit.maxTime  is set to 6
> >
> >  
> >${solr.autoSoftCommit.maxTime:5000}
> >  
> > solr.autoSoftCommit.maxTime  is set to 3000
> >
> > I found tlog files dated to Feb. 27. There is an automated job that
> reloads
> > the data once a week. It looks like no commits occurred from Feb 27
> onward.
> > Once the disk got full solr got very unhappy.
> >
> > This solrcloud has 2 shards and one replica per shard.
> >
> > We have a second development solrcloud which has the same collections
> with
> > identical configurations except that these collections have 2 shards and
> 2
> > replicas per shard. That one doesn't seem to have the tlog files
> > accumulating.
> >
> > I have long suspected that autoCommit is not reliable, and this seems to
> > indicate that it is not.
> >
> > We have several collections that share the same configuration, and have
> > similar ETL jobs loading them. This is the second time that this
> particular
> > collection has had this  problem.
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-02 Thread Adhyan Arizki
Raymond,

You can specify the default behavior in solrconfig.xml under each handler.
For instance for /browse you can specify it should look into name, and for
/query you can default it to different field.

On Mon, Apr 2, 2018 at 9:04 PM, Rick Leir  wrote:

> Raymond
> There is a default field normally called df. You would normally use
> Copyfield to copy all searchable fields into the default field.
> Cheers -- Rick
>
> On April 1, 2018 11:34:07 PM EDT, Raymond Xie 
> wrote:
> >Hi Rick,
> >
> >I sorted it out half:
> >
> >I should have specified the field in the search query, so, instead of
> >http://localhost:8983/solr/films/browse?q=batman, I should use:
> >http://localhost:8983/solr/films/browse?q=name:batman
> >
> >Sorry for this newbie mistake.
> >
> >But what about if I/user doesn't know or doesn't want to specify the
> >search
> >scope to be restricted in field "name" but anywhere in the index'ed
> >documents?
> >
> >
> >**
> >*Sincerely yours,*
> >
> >
> >*Raymond*
> >
> >On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir  wrote:
> >
> >> Raymond
> >> The output is not visible to me because the mailing list strips
> >images.
> >> Please try a different way to show the output.
> >> Cheers -- Rick
> >>
> >> On March 29, 2018 10:17:13 PM EDT, Raymond Xie 
> >> wrote:
> >> > I am new to Solr, following Steve Rowe's example on
> >>
> >>https://github.com/apache/lucene-solr/tree/master/solr/example/films:
> >> >
> >> >It would be greatly appreciated if anyone can enlighten me where to
> >> >start
> >> >troubleshooting, thank you very much in advance.
> >> >
> >> >The steps I followed are:
> >> >
> >> >Here ya go << END_OF_SCRIPT
> >> >
> >> >bin/solr stop
> >> >rm server/logs/*.log
> >> >rm -Rf server/solr/films/
> >> >bin/solr start
> >> >bin/solr create -c films
> >> >curl http://localhost:8983/solr/films/schema -X POST -H
> >> >'Content-type:application/json' --data-binary '{
> >> >"add-field" : {
> >> >"name":"name",
> >> >"type":"text_general",
> >> >"multiValued":false,
> >> >"stored":true
> >> >},
> >> >"add-field" : {
> >> >"name":"initial_release_date",
> >> >"type":"pdate",
> >> >"stored":true
> >> >}
> >> >}'
> >> >bin/post -c films example/films/films.json
> >> >curl http://localhost:8983/solr/films/config/params -H
> >> >'Content-type:application/json'  -d '{
> >> >"update" : {
> >> >  "facets": {
> >> >"facet.field":"genre"
> >> >}
> >> >  }
> >> >}'
> >> >
> >> ># END_OF_SCRIPT
> >> >
> >> >Additional fun -
> >> >
> >> >Add highlighting:
> >> >curl http://localhost:8983/solr/films/config/params -H
> >> >'Content-type:application/json'  -d '{
> >> >"set" : {
> >> >  "browse": {
> >> >"hl":"on",
> >> >"hl.fl":"name"
> >> >}
> >> >  }
> >> >}'
> >> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll
> >> >see "batman" highlighted in the results
> >> >
> >> >
> >> >
> >> >I got nothing in my search:
> >> >
> >> >
> >> >
> >> >
> >> >**
> >> >*Sincerely yours,*
> >> >
> >> >
> >> >*Raymond*
> >>
> >> --
> >> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>



-- 

Best regards,
Adhyan Arizki


Re: PreAnalyzed FieldType, and simultaneously importing JSON

2018-04-02 Thread David Smiley
Hello Markus,

It appears you are not familiar with PreAnalyzedUpdateProcessor?  Using
that is much more flexible -- you could have different URP chains for your
use-cases. IMO PreAnalyzedField ought to go away.  I argued for the URP
version and thus it's superiority to the FieldType here:
https://issues.apache.org/jira/browse/SOLR-4619?focusedCommentId=13611191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13611191
Sadly, the FieldType is the one that is documented in the ref guide, but
not the URP :-(

~ David

On Thu, Mar 29, 2018 at 5:06 PM Markus Jelsma 
wrote:

> Hello,
>
> We want to move to PreAnalyzed FieldType to offload our very heavy
> analysis chain away from the search cluster, so we have to configure our
> fields to accept pre-analyzed tokens in production.
>
> But we use the same schema in development environments too, and that is
> where we use JSON files, or stream (export/import) data directly from
> production servers into a development environment, again via JSON. And in
> case of disaster recovery, we can import the daily exported JSON bzipped
> files back into our production servers.
>
> But this JSON loading does not work with PreAnalyzed FieldType. So to load
> JSON we must reset all fields back to their respective language specific
> FieldTypes on-the-fly, we could automate, but it is a hassle we like to
> avoid.
>
> Have i overlooked any configuration parameters that can help? Must we
> automate the on-the-fly schema reconfiguration and reset to PreAnalyzed
> after JSON loading is finished?
>
> Many thanks!
> Markus
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-02 Thread Joe Obernberger
Hi All - when building machine learning models using information gain, I 
sometimes get this error when the number of iterations is high.  I'm 
using about 20k news articles in my training set (about 10k positive, 
and 10k negative), and (for this particular run) am using 500 terms and 
25,000 iterations.  I have gotten the error with a much lower number of 
iterations (1,000) as well.


The specific stream command was:
update(models, 
batchSize="50",train(MODEL1024_1522696624083,features(MODEL1024_1522696624083,q="*:*",featureSet="FSet_MODEL1024_1522696624083",field="Text",outcome="out_i",positiveLabel=1,numTerms=500),q="*:*",name="MODEL1024",field="Text",outcome="out_i",maxIterations="25000"))


The training data was split across 20 shards - specifically created with:
http://icarus.querymasters.com:9100/solr/admin/collections?action=CREATE&name=MODEL1024_1522696624083&numShards=20&replicationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING

Any ideas?  The complete error is:

java.io.IOException: java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at 
http://vesta:9100/solr/MODEL1024_1522696624083_shard20_replica_n75: 
Expected mime type application/octet-stream but got text/html. 



Error 404 Not Found

HTTP ERROR 404
Problem accessing 
/solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason:

    Not Found



    at 
org.apache.solr.client.solrj.io.stream.TextLogitStream.read(TextLogitStream.java:498)
    at 
org.apache.solr.client.solrj.io.stream.PushBackStream.read(PushBackStream.java:87)
    at 
org.apache.solr.client.solrj.io.stream.UpdateStream.read(UpdateStream.java:109)
    at 
org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:68)
    at 
org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:627)
    at 
org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:87)
    at 
org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.java:523)
    at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:180)
    at 
org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)
    at 
org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:84)
    at 
org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
    at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:198)
    at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)
    at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)
    at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)
    at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)
    at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
    at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:806)

    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:535)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
    at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
    at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
    at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
    at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
    at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

    at org.eclipse.jetty.server.Server.handle(Server.java:534)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
    at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)

    at org.eclipse.jetty.io

Re: Collection out of disk space, commit problem

2018-04-02 Thread Erick Erickson
Homer:

Yeah, the buffering bits are trappy, and in fact is being removed in
CDCR going forward.

Too bad you fell into that trap, there's hope going forward though...

Erick

On Mon, Apr 2, 2018 at 11:42 AM, Webster Homer  wrote:
> Erick,
>
> Thanks, Normally our dev environment does not use CDCR, except when we're
> doing active development on it. As it happens the collection in question,
> was one we used to test cdcr. Or rather the configuration for it was, as
> the specific collection has been deleted and created many times. Even
> though we had cdcr turned off it seems that buffers got set to "enabled"
> Which seems to be the default, and it is a really bad default!
>
> Because it's dev and we don't do cdcr there, I might not have thought to
> look at that. So thank you for that
>
> Web
>
> On Mon, Apr 2, 2018 at 1:10 PM, Erick Erickson 
> wrote:
>
>> Webster:
>>
>> Do you by any chance have CDCR configured? If so, insure that
>> buffering is disabled. Buffering was intended to be enabled
>> _temporarily_ during, say, a maintenance window and was conceived
>> before the bootstrapping capability was added to CDCR.
>>
>> But I don't recall your other e-mails mention CDCR so I mention this
>> on the off chance...
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer 
>> wrote:
>> > Over the weekend one of our Dev solrcloud ran out of disk space.
>> Examining
>> > the problem we found one collection that had 2 months of uncommitted tlog
>> > files. Unfortuneatly the solr logs rolled over and so I cannot see the
>> > commit behavior during the last time data was loaded to it.
>> >
>> > The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
>> >  
>> >${solr.autoCommit.maxTime:6}
>> >false
>> > 
>> >
>> > solr.autoCommit.maxTime  is set to 6
>> >
>> >  
>> >${solr.autoSoftCommit.maxTime:5000}
>> >  
>> > solr.autoSoftCommit.maxTime  is set to 3000
>> >
>> > I found tlog files dated to Feb. 27. There is an automated job that
>> reloads
>> > the data once a week. It looks like no commits occurred from Feb 27
>> onward.
>> > Once the disk got full solr got very unhappy.
>> >
>> > This solrcloud has 2 shards and one replica per shard.
>> >
>> > We have a second development solrcloud which has the same collections
>> with
>> > identical configurations except that these collections have 2 shards and
>> 2
>> > replicas per shard. That one doesn't seem to have the tlog files
>> > accumulating.
>> >
>> > I have long suspected that autoCommit is not reliable, and this seems to
>> > indicate that it is not.
>> >
>> > We have several collections that share the same configuration, and have
>> > similar ETL jobs loading them. This is the second time that this
>> particular
>> > collection has had this  problem.
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread murugesh karmegam
Hi Yago Riveiro , 

Thanks for the reply. We have heap size 64G. Any more is not recommended
right? Except one time I was not able to co relate "updates disabled" with
GC pause.  Also zk timeout is 120 seconds even with long GC pause (more than
10 seconds normally) we should recover right? 

JVM settings 

 /usr/java/latest/bin/java -server -Xms32g -Xmx64g
-DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000
-XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040
-XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45
-XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g
-XX:SurvivorRatio=2 -XX:-ResizePLAB -XX:+AlwaysPreTouch
-XX:+ParallelRefProcEnabled -server
-Xloggc:/var/log/solr/gc-solr2018-03-27-19-16.log -verbose:gc
-XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64m
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
-Xloggc:/var/log/solr/solr_gc.log -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=18983
-Dcom.sun.management.jmxremote.rmi.port=18983
-Djava.rmi.server.hostname=tr3slr3dn27 -DzkClientTimeout=12 -Dzkhost
.../solr -Dsolr.log.dir=/var/log/solr -Djetty.port=8983 -DSTOP.PORT=7983
-DSTOP.KEY=solrrocks -Dhost=tr3slr3dn27 -Duser.timezone=EST
-Djetty.home=/opt/solr/server -Dsolr.solr.home=/data0/solr
-Dsolr.install.dir=/opt/solr
-Dlog4j.configuration=file:/etc/solr/conf/log4j.properties -Xss256k
-Dsolr.autoSoftCommit.maxTime=30 -Dsolr.autoCommit.maxTime=60
-Dsolr.clustering.enabled=false -DsharedLib=/opt/mw/solrlib
-Dsolr.lock.type=native -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m
-XX:MinMetaspaceExpansion=16m -XX:MaxMetaspaceExpansion=32m -Xss256k
-Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
/var/log/solr -jar start.jar --module=http



just to give more idea... 
we are a 48 node cluster with each node having indexes (many together) up to
900GB to 1TB and one major index is with 48 shards with each shard is 80 -
85 G = approx 4TB



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread Erick Erickson
Actually, 64G is on the high side, GC pauses can kill you pretty
easily in that range.

If it's at all possible to cut that down it would be A Good Thing

Best,
Erick

On Mon, Apr 2, 2018 at 12:56 PM, murugesh karmegam  wrote:
> Hi Yago Riveiro ,
>
> Thanks for the reply. We have heap size 64G. Any more is not recommended
> right? Except one time I was not able to co relate "updates disabled" with
> GC pause.  Also zk timeout is 120 seconds even with long GC pause (more than
> 10 seconds normally) we should recover right?
>
> JVM settings
>
>  /usr/java/latest/bin/java -server -Xms32g -Xmx64g
> -DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000
> -XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040
> -XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45
> -XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g
> -XX:SurvivorRatio=2 -XX:-ResizePLAB -XX:+AlwaysPreTouch
> -XX:+ParallelRefProcEnabled -server
> -Xloggc:/var/log/solr/gc-solr2018-03-27-19-16.log -verbose:gc
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64m
> -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
> -Xloggc:/var/log/solr/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983
> -Dcom.sun.management.jmxremote.rmi.port=18983
> -Djava.rmi.server.hostname=tr3slr3dn27 -DzkClientTimeout=12 -Dzkhost
> .../solr -Dsolr.log.dir=/var/log/solr -Djetty.port=8983 -DSTOP.PORT=7983
> -DSTOP.KEY=solrrocks -Dhost=tr3slr3dn27 -Duser.timezone=EST
> -Djetty.home=/opt/solr/server -Dsolr.solr.home=/data0/solr
> -Dsolr.install.dir=/opt/solr
> -Dlog4j.configuration=file:/etc/solr/conf/log4j.properties -Xss256k
> -Dsolr.autoSoftCommit.maxTime=30 -Dsolr.autoCommit.maxTime=60
> -Dsolr.clustering.enabled=false -DsharedLib=/opt/mw/solrlib
> -Dsolr.lock.type=native -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m
> -XX:MinMetaspaceExpansion=16m -XX:MaxMetaspaceExpansion=32m -Xss256k
> -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> /var/log/solr -jar start.jar --module=http
>
>
>
> just to give more idea...
> we are a 48 node cluster with each node having indexes (many together) up to
> 900GB to 1TB and one major index is with 48 shards with each shard is 80 -
> 85 G = approx 4TB
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Learning to Rank (LTR) with grouping

2018-04-02 Thread Roopa Rao
Hi Ilay,

I am still on Solr 6.6.0 and did not patch the grouping fix.
I implemented a temporary workaround solution to have 2 async request from
the web application 1st with grouping 2nd without grouping and merged the
results.
This solution worked for my case as we were getting grouping results for
specific tiles in the page.


Roopa


On Mon, Apr 2, 2018 at 2:57 AM, ilayaraja  wrote:

> Hi Roopa & Deigo,
>
>  I am facing same issue with grouping. Currently, am on Solr 7.2.1 but
> still
> see that grouping with LTR is not working. Did you apply it as patch or the
> latest solr version has the fix already?
>
> Ilay
>
>
>
> -
> --Ilay
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread murugesh karmegam
Thanks Erik for the reply. We even had 92G heap size for some time at one
time. We were able to run and survive with 64G for the last several months
although with some issues mainly this issue "Can not talk to ZK Updates are
disabled". We have dedicated zk quorum. When we have reduced to 32G we ran
into some other issues. So given all of that wondering is there any options
like G1 GC tuning ? 

We are running in 256 GB boxes. The os cache is quiet huge too. 

/usr/java/latest/bin/java -server -Xms32g -Xmx64g
-DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000
-XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040
-XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45
-XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g
-XX:SurvivorRatio=2 -XX:-ResizePLAB   



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread Shawn Heisey
On 4/2/2018 2:43 PM, murugesh karmegam wrote:
> So given all of that wondering is there any options
> like G1 GC tuning ? 

Targeted reply.

I've put some G1 information out there for Solr.

https://wiki.apache.org/solr/ShawnHeisey

Thanks,
Shawn



Re: Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-02 Thread Joel Bernstein
It looks like it accessing a replica that's down. Are the logs from
http://vesta:9100/solr/MODEL1024_1522696624083_shard20_replica_n75 reporting
any issues? When you go to that url is it back up and running?

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Apr 2, 2018 at 3:55 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi All - when building machine learning models using information gain, I
> sometimes get this error when the number of iterations is high.  I'm using
> about 20k news articles in my training set (about 10k positive, and 10k
> negative), and (for this particular run) am using 500 terms and 25,000
> iterations.  I have gotten the error with a much lower number of iterations
> (1,000) as well.
>
> The specific stream command was:
> update(models, batchSize="50",train(MODEL1024_1522696624083,features(
> MODEL1024_1522696624083,q="*:*",featureSet="FSet_MODEL1024_1
> 522696624083",field="Text",outcome="out_i",positiveLabel=1,
> numTerms=500),q="*:*",name="MODEL1024",field="Text",outcome=
> "out_i",maxIterations="25000"))
>
> The training data was split across 20 shards - specifically created with:
> http://icarus.querymasters.com:9100/solr/admin/collections?
> action=CREATE&name=MODEL1024_1522696624083&numShards=20&rep
> licationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING
>
> Any ideas?  The complete error is:
>
> java.io.IOException: java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at http://vesta:9100/solr/MODEL10
> 24_1522696624083_shard20_replica_n75: Expected mime type
> application/octet-stream but got text/html. 
> 
> 
> Error 404 Not Found
> 
> HTTP ERROR 404
> Problem accessing /solr/MODEL1024_1522696624083_shard20_replica_n75/select.
> Reason:
> Not Found
> 
> 
>
> at org.apache.solr.client.solrj.io.stream.TextLogitStream.read(
> TextLogitStream.java:498)
> at org.apache.solr.client.solrj.io.stream.PushBackStream.read(P
> ushBackStream.java:87)
> at org.apache.solr.client.solrj.io.stream.UpdateStream.read(Upd
> ateStream.java:109)
> at org.apache.solr.client.solrj.io.stream.ExceptionStream.read(
> ExceptionStream.java:68)
> at org.apache.solr.handler.StreamHandler$TimerStream.read(
> StreamHandler.java:627)
> at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> iteMap$0(TupleStream.java:87)
> at org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> seWriter.java:523)
> at org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> ponseWriter.java:180)
> at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> .java:559)
> at org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> TupleStream.java:84)
> at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> ter.java:547)
> at org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> ponseWriter.java:198)
> at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> ups(JSONResponseWriter.java:209)
> at org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> nseWriter.java:325)
> at org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> seWriter.java:120)
> at org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> seWriter.java:71)
> at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> esponse(QueryResponseWriterUtil.java:65)
> at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> all.java:806)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:535)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> atchFilter.java:382)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> atchFilter.java:326)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> r(ServletHandler.java:1751)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> dler.java:582)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> Handler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> ndler.java:548)
> at org.eclipse.jetty.server.session.SessionHandler.doHandle(
> SessionHandler.java:226)
> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> ler.java:512)
> at org.eclipse.jetty.server.session.SessionHandler.doScope(
> SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.doScope(
> ContextHandler.java:1112)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> Handler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> ndle(ContextHandlerCollection.java:213)
> at org.eclipse.jetty.server.handler.HandlerCollection.handle(
> HandlerCollection.java:119)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> erWrapper

Re: Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-02 Thread Joe Obernberger
Hi Joel - thank you for your reply.  Yes, the machine (Vesta) is up, and 
I can access it.  I don't see anything specific in the log, apart from 
the same error, but this time to a different server.  We have constant 
indexing happening on this cluster, so if one went down, the indexing 
would stop, and I've not seen that happen.


Interestingly, despite the error, the model is still built at least up 
to some number of iterations.  In other words, many iterations complete OK.


-Joe


On 4/2/2018 6:54 PM, Joel Bernstein wrote:

It looks like it accessing a replica that's down. Are the logs from
http://vesta:9100/solr/MODEL1024_1522696624083_shard20_replica_n75 reporting
any issues? When you go to that url is it back up and running?

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Apr 2, 2018 at 3:55 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:


Hi All - when building machine learning models using information gain, I
sometimes get this error when the number of iterations is high.  I'm using
about 20k news articles in my training set (about 10k positive, and 10k
negative), and (for this particular run) am using 500 terms and 25,000
iterations.  I have gotten the error with a much lower number of iterations
(1,000) as well.

The specific stream command was:
update(models, batchSize="50",train(MODEL1024_1522696624083,features(
MODEL1024_1522696624083,q="*:*",featureSet="FSet_MODEL1024_1
522696624083",field="Text",outcome="out_i",positiveLabel=1,
numTerms=500),q="*:*",name="MODEL1024",field="Text",outcome=
"out_i",maxIterations="25000"))

The training data was split across 20 shards - specifically created with:
http://icarus.querymasters.com:9100/solr/admin/collections?
action=CREATE&name=MODEL1024_1522696624083&numShards=20&rep
licationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING

Any ideas?  The complete error is:

java.io.IOException: java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://vesta:9100/solr/MODEL10
24_1522696624083_shard20_replica_n75: Expected mime type
application/octet-stream but got text/html. 


Error 404 Not Found

HTTP ERROR 404
Problem accessing /solr/MODEL1024_1522696624083_shard20_replica_n75/select.
Reason:
Not Found



 at org.apache.solr.client.solrj.io.stream.TextLogitStream.read(
TextLogitStream.java:498)
 at org.apache.solr.client.solrj.io.stream.PushBackStream.read(P
ushBackStream.java:87)
 at org.apache.solr.client.solrj.io.stream.UpdateStream.read(Upd
ateStream.java:109)
 at org.apache.solr.client.solrj.io.stream.ExceptionStream.read(
ExceptionStream.java:68)
 at org.apache.solr.handler.StreamHandler$TimerStream.read(
StreamHandler.java:627)
 at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
iteMap$0(TupleStream.java:87)
 at org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
seWriter.java:523)
 at org.apache.solr.response.TextResponseWriter.writeVal(TextRes
ponseWriter.java:180)
 at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
.java:559)
 at org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
TupleStream.java:84)
 at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
ter.java:547)
 at org.apache.solr.response.TextResponseWriter.writeVal(TextRes
ponseWriter.java:198)
 at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
ups(JSONResponseWriter.java:209)
 at org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
nseWriter.java:325)
 at org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
seWriter.java:120)
 at org.apache.solr.response.JSONResponseWriter.write(JSONRespon
seWriter.java:71)
 at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
esponse(QueryResponseWriterUtil.java:65)
 at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
all.java:806)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:535)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
atchFilter.java:382)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
atchFilter.java:326)
 at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
r(ServletHandler.java:1751)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
dler.java:582)
 at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
Handler.java:143)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
ndler.java:548)
 at org.eclipse.jetty.server.session.SessionHandler.doHandle(
SessionHandler.java:226)
 at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
ContextHandler.java:1180)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
ler.java:512)
 at org.eclipse.jetty.server.session.SessionHandler.doScope(
SessionHandler.java:185)
 at org.eclipse.jetty.server.handler.ContextHandler.doScope(
ContextHandler.java:1112)
  

Re: Solr 7.1.0 - concurrent.ExecutionException building model

2018-04-02 Thread Shawn Heisey
On 4/2/2018 1:55 PM, Joe Obernberger wrote:
> The training data was split across 20 shards - specifically created with:
> http://icarus.querymasters.com:9100/solr/admin/collections?action=CREATE&name=MODEL1024_1522696624083&numShards=20&replicationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING
>
> Any ideas?  The complete error is:

> HTTP ERROR 404
> Problem accessing
> /solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason:
>     Not Found
> 

I'll warn you in advance that I know nothing at all about the learning
to rank functionality.  I'm replying about the underlying error you're
getting, independent of what your query is trying to accomplish.

It's a 404 error, trying to access the URL mentioned above.

The error doesn't indicate exactly WHAT wasn't found.  It could either
be the core named "MODEL1024_1522696624083_shard20_replica_n75" or the
"/select" handler on that core.  That's something you need to figure
out.  It could be that the core *does* exist, but for some reason, Solr
on that machine was unable to start it.

The solr.log file on the Solr instance that returned the error (which
seems to be on the machine named vesta, answering to port 9100) may have
more detail for the error, or some additional error messages.

Normally SolrCloud is good at making sure that requests aren't sent to
resources that aren't working.  So I'm not sure why this happened.

Are there other errors or warnings in the solr.log file, either on the
instance where you sent your request, or the instance that returned the
404 error?

Thanks,
Shawn



Classifier for query intent?

2018-04-02 Thread Walter Underwood
We are experimenting with a text classifier for determining query intent. 
Anybody have a favorite (or anti-favorite) Java implementation? Speed and ease 
of implementation is important.

Right now, we’re mostly looking at Weka and the Stanford Classifier.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



SolrJ SolrInputDocument#addField doesn't throw an exception

2018-04-02 Thread Leonardo Perez Pulido
Hi all,
It would be nice if org.apache.solr.common.SolrInputDocument#addField throw
an exception when field name is 'id' and the method detect that the indexed
id is not unique, just like the post.jar tool.
I was confident that both had the same behavior, so...
Thanks.


Re: Classifier for query intent?

2018-04-02 Thread Dikshant Shahi
Hello Wunder,

If you are particular about Java Stanford and Weka both are good choices.
OpenNLP also has a document classifier.

You can even explore beyond Java, I mean Python, and consume the intent as
a REST service.

Regards,
Dikshant

On Tue 3 Apr, 2018, 4:48 AM Walter Underwood,  wrote:

> We are experimenting with a text classifier for determining query intent.
> Anybody have a favorite (or anti-favorite) Java implementation? Speed and
> ease of implementation is important.
>
> Right now, we’re mostly looking at Weka and the Stanford Classifier.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-04-02 Thread Raymond Xie
Thanks Rick and Adhyan

I see there is "/browse" in solrconfig.xml :

 

  explicit

  

and  name="defaults" with one item of "df" as shown below:
  

  _text_

  

My understanding is I can put whatever fields I want to enable index and
searching here in parallel with  _text_, am I correct?

Thanks.




**
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 3:24 PM, Adhyan Arizki  wrote:

> Raymond,
>
> You can specify the default behavior in solrconfig.xml under each handler.
> For instance for /browse you can specify it should look into name, and for
> /query you can default it to different field.
>
> On Mon, Apr 2, 2018 at 9:04 PM, Rick Leir  wrote:
>
> > Raymond
> > There is a default field normally called df. You would normally use
> > Copyfield to copy all searchable fields into the default field.
> > Cheers -- Rick
> >
> > On April 1, 2018 11:34:07 PM EDT, Raymond Xie 
> > wrote:
> > >Hi Rick,
> > >
> > >I sorted it out half:
> > >
> > >I should have specified the field in the search query, so, instead of
> > >http://localhost:8983/solr/films/browse?q=batman, I should use:
> > >http://localhost:8983/solr/films/browse?q=name:batman
> > >
> > >Sorry for this newbie mistake.
> > >
> > >But what about if I/user doesn't know or doesn't want to specify the
> > >search
> > >scope to be restricted in field "name" but anywhere in the index'ed
> > >documents?
> > >
> > >
> > >**
> > >*Sincerely yours,*
> > >
> > >
> > >*Raymond*
> > >
> > >On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir  wrote:
> > >
> > >> Raymond
> > >> The output is not visible to me because the mailing list strips
> > >images.
> > >> Please try a different way to show the output.
> > >> Cheers -- Rick
> > >>
> > >> On March 29, 2018 10:17:13 PM EDT, Raymond Xie 
> > >> wrote:
> > >> > I am new to Solr, following Steve Rowe's example on
> > >>
> > >>https://github.com/apache/lucene-solr/tree/master/solr/example/films:
> > >> >
> > >> >It would be greatly appreciated if anyone can enlighten me where to
> > >> >start
> > >> >troubleshooting, thank you very much in advance.
> > >> >
> > >> >The steps I followed are:
> > >> >
> > >> >Here ya go << END_OF_SCRIPT
> > >> >
> > >> >bin/solr stop
> > >> >rm server/logs/*.log
> > >> >rm -Rf server/solr/films/
> > >> >bin/solr start
> > >> >bin/solr create -c films
> > >> >curl http://localhost:8983/solr/films/schema -X POST -H
> > >> >'Content-type:application/json' --data-binary '{
> > >> >"add-field" : {
> > >> >"name":"name",
> > >> >"type":"text_general",
> > >> >"multiValued":false,
> > >> >"stored":true
> > >> >},
> > >> >"add-field" : {
> > >> >"name":"initial_release_date",
> > >> >"type":"pdate",
> > >> >"stored":true
> > >> >}
> > >> >}'
> > >> >bin/post -c films example/films/films.json
> > >> >curl http://localhost:8983/solr/films/config/params -H
> > >> >'Content-type:application/json'  -d '{
> > >> >"update" : {
> > >> >  "facets": {
> > >> >"facet.field":"genre"
> > >> >}
> > >> >  }
> > >> >}'
> > >> >
> > >> ># END_OF_SCRIPT
> > >> >
> > >> >Additional fun -
> > >> >
> > >> >Add highlighting:
> > >> >curl http://localhost:8983/solr/films/config/params -H
> > >> >'Content-type:application/json'  -d '{
> > >> >"set" : {
> > >> >  "browse": {
> > >> >"hl":"on",
> > >> >"hl.fl":"name"
> > >> >}
> > >> >  }
> > >> >}'
> > >> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll
> > >> >see "batman" highlighted in the results
> > >> >
> > >> >
> > >> >
> > >> >I got nothing in my search:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >**
> > >> >*Sincerely yours,*
> > >> >
> > >> >
> > >> >*Raymond*
> > >>
> > >> --
> > >> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> >
> > --
> > Sorry for being brief. Alternate email is rickleir at yahoo dot com
> >
>
>
>
> --
>
> Best regards,
> Adhyan Arizki
>