Re: How do I create a schema file for FIX data in Solr
Raymond, May i suggest you to take a look at the examples given in Solr package? Essentially you need to understand which field is to be searchable by the application and what not. These FIX data can be represented i JSON or XML. To parse and upload the data to Solr, you can use different libraries out there. Personally I have used SolrJ and Rsolr and they are essentially the same. On Mon, 2 Apr 2018, 12:35 Raymond Xie, wrote: > Thank you, Shawn, Rick and other readers, > > To Shawn: > > For *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8 > means BeginString, in this example, its value is FIX.4.4.9, and 9 means > body length, it is 653 for this message, 35 is RIO, meaning the message > type is RIO, 122 stands for OrigSendingTime and has a format of > UTCTimestamp > > You can refer to this page for details: https://www.onixs.biz > /fix-dictionary/4.2/fields_by_tag.html > > All the values are explained as string type. > > All the tag numbers are from FIX standard so it doesn't change (in my case) > > I expect a python program might be needed to parse the message and extract > each tag's value, index is to be made on those extracted value as long as > their field (tag) name. > > With index in place, ideally and naturally user will search for any > keyword, however, in this case, most queries would be based on tag 37 > (Order ID) and 75 (Trade Date), there is another customized tag (not in the > standard) Order Version to be queried on. > > I understand the parser creation would be a manual process, as long as I > know or have a small sample program, I will do it myself and maybe adjust > it as per need. > > To Rick: > > You mentioned creating JSON document, my understanding is a parser would be > needed to generate that JSON document, do you have any existing example > code? > > > > > Thank you guys very much. > > > > > > > > > > ** > *Sincerely yours,* > > > *Raymond* > > On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey wrote: > > > On 4/1/2018 10:12 AM, Raymond Xie wrote: > > > >> FIX is a format standard of financial data. It contains lots of tags in > >> number with value for the tag, like 8=asdf, where 8 is the tag and asdf > is > >> the tag's value. Each tag has its definition. > >> > >> The sample msg in FIX format was in the original question. > >> > >> All I need to do is to know how to paste the msg and get all tag's > value. > >> > >> I found so far a parser is what I need to start with., But I am more > >> concerning about how to create index in Solr on the extracted tag's > value, > >> that is the first step, the next would be to customize the dashboard for > >> users to search with a value to find out which msg contains that value > in > >> which tag and present users the whole msg as proof. > >> > > > > Most of Solr's functionality is provided by Lucene. Lucene is a java API > > that implements search functionality. Solr bolts on some functionality > on > > top of Lucene, but doesn't really do anything to fundamentally change the > > fact that you're dealing with a Lucene index. So I'm going to mostly > talk > > about Lucene below. > > > > Lucene organizes data in a unit that we call a "document." An easy > analogy > > for this is that it is a lot like a row in a single database table. It > has > > fields, each field has a type. Unless custom software is used, there is > > really no support for data other than basic primitive types -- numbers > and > > strings. The only complex type that I can think of that Solr supports > out > > of the box is geospatial coordinates, and it might even support > > multi-dimensional coordinates, but I'm not sure. It's not all that > complex > > -- the field just stores and manipulates multiple numbers instead of one. > > The Lucene API does support a FEW things that Solr doesn't implement. I > > don't think those are applicable to what you're trying to do. > > > > Let's look at the first part of the data that you included in the first > > message: > > > > 8=FIX.4.4 9=653 35=RIO > > > > Is "8" always a mixture of letters and numbers and periods? Is "9" always > > a number, and is it always a WHOLE number? Is "35" always letters? > > Looking deeper to data that I didn't quote ... is "122" always a > date/time > > value? Are the tag numbers always picked from a well-defined set, or do > > they change? > > > > Assuming that the answers in the previous paragraph are found and a > > configuration is created to deal with all of it ... how are you planning > to > > search it? What kind of queries would you expect somebody to make? > That's > > going to have a huge influence on how you configure things. > > > > Writing the schema is usually where people spend the most time when > > they're setting up Solr. > > > > Thanks, > > Shawn > > > > >
Re: MatchMode in Dismax parser
Hi, In case you want to round up, you can use negative numbers and percentage of failed matches so 75% of matches rounded up can be written as -25%. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 29 Mar 2018, at 17:00, Shawn Heisey wrote: > > On 3/29/2018 1:42 AM, iamluckysharma.0...@gmail.com wrote: >> Just a suggestion , Shouldn't we need to use Math.round instead of direct >> int when watch mode is in %, >> example i have 3 boolean clauses if i go for mm=50%, currently it reduce it >> to ~1, instead it can be ~2, >> >> another example could be when we have 5 boolean clauses and mm=75%, we get >> calc as 3.75 currently it took 3, so instead of 3 it should have taken 4. as >> Math.round() > > Maybe that is what it SHOULD do, but in most languages, converting a float > value to an integer truncates the decimal portion, it doesn't round. To do > that requires a deliberate choice in the code, and that probably doesn't > exist in dismax/edismax. If your assertion is that this should have been > done from day one, I'd say you're right. But that decision is now ancient > history. The person who wrote the code might have had a very good reason to > NOT do it that way. > > At this point, if the functionality were changed, it would result in an > upgraded Solr version behaving VERY differently than the previous version. > While new functionality is often added in any new minor release, changing > existing behavior that users rely on without a configuration option is > usually only done in a major version. So for 7.x, dismax/edismax would need > an option to enable rounding on minimum-should-match calculations. Sounds > like a great feature request to put into Jira, and patches are always welcome. > > Thanks, > Shawn >
custom filter class on schema.xml on solrcloud
I have used a custom filter provided by a jar in schema.xml in standalone Solr like below And for this, I have loaded the jar in solrconfig.xml like below It's working fine But when I've tried to use it in solrcloud with external zookeeper mode I've got an error 'IO exception' maybe for uploading a large jar file in zookeeper. I've also tried to put this jar in the lib folder of solr home but got error 'Plugin init failure' After that, I've tried blob store api but the documentation says "Blob store can only be used to dynamically load components configured in solrconfig.xml. Components specified in schema.xml cannot be loaded from blob store" So, how can I use custom filter class in schema.xml in solrcloud mode with external zookeeper configuration -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: how to reset the index in solr
If you want to delete all items from Solr index, use the query *:* - Development Center Toronto -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
custom filter class on schema.xml on solrcloud
I have used a custom filter provided by a jar in schema.xml in standalone Solr like below And for this, I have loaded the jar in solrconfig.xml like below It's working fine But when I've tried to use it in solrcloud with external zookeeper mode I've got an error 'IO exception' maybe for uploading a large jar file in zookeeper. I've also tried to put this jar in the lib folder of solr home but got error 'Plugin init failure' After that, I've tried blob store api but the documentation says "Blob store can only be used to dynamically load components configured in solrconfig.xml. Components specified in schema.xml cannot be loaded from blob store" So, how can I use custom filter class in schema.xml in solrcloud mode with external zookeeper configuration -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance
Raymond There is a default field normally called df. You would normally use Copyfield to copy all searchable fields into the default field. Cheers -- Rick On April 1, 2018 11:34:07 PM EDT, Raymond Xie wrote: >Hi Rick, > >I sorted it out half: > >I should have specified the field in the search query, so, instead of >http://localhost:8983/solr/films/browse?q=batman, I should use: >http://localhost:8983/solr/films/browse?q=name:batman > >Sorry for this newbie mistake. > >But what about if I/user doesn't know or doesn't want to specify the >search >scope to be restricted in field "name" but anywhere in the index'ed >documents? > > >** >*Sincerely yours,* > > >*Raymond* > >On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir wrote: > >> Raymond >> The output is not visible to me because the mailing list strips >images. >> Please try a different way to show the output. >> Cheers -- Rick >> >> On March 29, 2018 10:17:13 PM EDT, Raymond Xie >> wrote: >> > I am new to Solr, following Steve Rowe's example on >> >>https://github.com/apache/lucene-solr/tree/master/solr/example/films: >> > >> >It would be greatly appreciated if anyone can enlighten me where to >> >start >> >troubleshooting, thank you very much in advance. >> > >> >The steps I followed are: >> > >> >Here ya go << END_OF_SCRIPT >> > >> >bin/solr stop >> >rm server/logs/*.log >> >rm -Rf server/solr/films/ >> >bin/solr start >> >bin/solr create -c films >> >curl http://localhost:8983/solr/films/schema -X POST -H >> >'Content-type:application/json' --data-binary '{ >> >"add-field" : { >> >"name":"name", >> >"type":"text_general", >> >"multiValued":false, >> >"stored":true >> >}, >> >"add-field" : { >> >"name":"initial_release_date", >> >"type":"pdate", >> >"stored":true >> >} >> >}' >> >bin/post -c films example/films/films.json >> >curl http://localhost:8983/solr/films/config/params -H >> >'Content-type:application/json' -d '{ >> >"update" : { >> > "facets": { >> >"facet.field":"genre" >> >} >> > } >> >}' >> > >> ># END_OF_SCRIPT >> > >> >Additional fun - >> > >> >Add highlighting: >> >curl http://localhost:8983/solr/films/config/params -H >> >'Content-type:application/json' -d '{ >> >"set" : { >> > "browse": { >> >"hl":"on", >> >"hl.fl":"name" >> >} >> > } >> >}' >> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll >> >see "batman" highlighted in the results >> > >> > >> > >> >I got nothing in my search: >> > >> > >> > >> > >> >** >> >*Sincerely yours,* >> > >> > >> >*Raymond* >> >> -- >> Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: How do I create a schema file for FIX data in Solr
Ray Have you looked around for an existing FIX to Solr conduit? If FIX is a common standard then I would expect that someone has done some work on this and github'd it. Even just FIX to JSON. Cheers -- Rick On April 2, 2018 12:34:44 AM EDT, Raymond Xie wrote: >Thank you, Shawn, Rick and other readers, > >To Shawn: > >For *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8 >means BeginString, in this example, its value is FIX.4.4.9, and 9 >means >body length, it is 653 for this message, 35 is RIO, meaning the message >type is RIO, 122 stands for OrigSendingTime and has a format of >UTCTimestamp > >You can refer to this page for details: https://www.onixs.biz >/fix-dictionary/4.2/fields_by_tag.html > >All the values are explained as string type. > >All the tag numbers are from FIX standard so it doesn't change (in my >case) > >I expect a python program might be needed to parse the message and >extract >each tag's value, index is to be made on those extracted value as long >as >their field (tag) name. > >With index in place, ideally and naturally user will search for any >keyword, however, in this case, most queries would be based on tag 37 >(Order ID) and 75 (Trade Date), there is another customized tag (not in >the >standard) Order Version to be queried on. > >I understand the parser creation would be a manual process, as long as >I >know or have a small sample program, I will do it myself and maybe >adjust >it as per need. > >To Rick: > >You mentioned creating JSON document, my understanding is a parser >would be >needed to generate that JSON document, do you have any existing example >code? > > > > >Thank you guys very much. > > > > > > > > > >** >*Sincerely yours,* > > >*Raymond* > >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey >wrote: > >> On 4/1/2018 10:12 AM, Raymond Xie wrote: >> >>> FIX is a format standard of financial data. It contains lots of tags >in >>> number with value for the tag, like 8=asdf, where 8 is the tag and >asdf is >>> the tag's value. Each tag has its definition. >>> >>> The sample msg in FIX format was in the original question. >>> >>> All I need to do is to know how to paste the msg and get all tag's >value. >>> >>> I found so far a parser is what I need to start with., But I am more >>> concerning about how to create index in Solr on the extracted tag's >value, >>> that is the first step, the next would be to customize the dashboard >for >>> users to search with a value to find out which msg contains that >value in >>> which tag and present users the whole msg as proof. >>> >> >> Most of Solr's functionality is provided by Lucene. Lucene is a java >API >> that implements search functionality. Solr bolts on some >functionality on >> top of Lucene, but doesn't really do anything to fundamentally change >the >> fact that you're dealing with a Lucene index. So I'm going to mostly >talk >> about Lucene below. >> >> Lucene organizes data in a unit that we call a "document." An easy >analogy >> for this is that it is a lot like a row in a single database table. >It has >> fields, each field has a type. Unless custom software is used, there >is >> really no support for data other than basic primitive types -- >numbers and >> strings. The only complex type that I can think of that Solr >supports out >> of the box is geospatial coordinates, and it might even support >> multi-dimensional coordinates, but I'm not sure. It's not all that >complex >> -- the field just stores and manipulates multiple numbers instead of >one. >> The Lucene API does support a FEW things that Solr doesn't implement. > I >> don't think those are applicable to what you're trying to do. >> >> Let's look at the first part of the data that you included in the >first >> message: >> >> 8=FIX.4.4 9=653 35=RIO >> >> Is "8" always a mixture of letters and numbers and periods? Is "9" >always >> a number, and is it always a WHOLE number? Is "35" always letters? >> Looking deeper to data that I didn't quote ... is "122" always a >date/time >> value? Are the tag numbers always picked from a well-defined set, or >do >> they change? >> >> Assuming that the answers in the previous paragraph are found and a >> configuration is created to deal with all of it ... how are you >planning to >> search it? What kind of queries would you expect somebody to make? >That's >> going to have a huge influence on how you configure things. >> >> Writing the schema is usually where people spend the most time when >> they're setting up Solr. >> >> Thanks, >> Shawn >> >> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: How do I create a schema file for FIX data in Solr
Google fix to json, there are a few interesting leads. On April 2, 2018 12:34:44 AM EDT, Raymond Xie wrote: >Thank you, Shawn, Rick and other readers, > >To Shawn: > >For *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8 >means BeginString, in this example, its value is FIX.4.4.9, and 9 >means >body length, it is 653 for this message, 35 is RIO, meaning the message >type is RIO, 122 stands for OrigSendingTime and has a format of >UTCTimestamp > >You can refer to this page for details: https://www.onixs.biz >/fix-dictionary/4.2/fields_by_tag.html > >All the values are explained as string type. > >All the tag numbers are from FIX standard so it doesn't change (in my >case) > >I expect a python program might be needed to parse the message and >extract >each tag's value, index is to be made on those extracted value as long >as >their field (tag) name. > >With index in place, ideally and naturally user will search for any >keyword, however, in this case, most queries would be based on tag 37 >(Order ID) and 75 (Trade Date), there is another customized tag (not in >the >standard) Order Version to be queried on. > >I understand the parser creation would be a manual process, as long as >I >know or have a small sample program, I will do it myself and maybe >adjust >it as per need. > >To Rick: > >You mentioned creating JSON document, my understanding is a parser >would be >needed to generate that JSON document, do you have any existing example >code? > > > > >Thank you guys very much. > > > > > > > > > >** >*Sincerely yours,* > > >*Raymond* > >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey >wrote: > >> On 4/1/2018 10:12 AM, Raymond Xie wrote: >> >>> FIX is a format standard of financial data. It contains lots of tags >in >>> number with value for the tag, like 8=asdf, where 8 is the tag and >asdf is >>> the tag's value. Each tag has its definition. >>> >>> The sample msg in FIX format was in the original question. >>> >>> All I need to do is to know how to paste the msg and get all tag's >value. >>> >>> I found so far a parser is what I need to start with., But I am more >>> concerning about how to create index in Solr on the extracted tag's >value, >>> that is the first step, the next would be to customize the dashboard >for >>> users to search with a value to find out which msg contains that >value in >>> which tag and present users the whole msg as proof. >>> >> >> Most of Solr's functionality is provided by Lucene. Lucene is a java >API >> that implements search functionality. Solr bolts on some >functionality on >> top of Lucene, but doesn't really do anything to fundamentally change >the >> fact that you're dealing with a Lucene index. So I'm going to mostly >talk >> about Lucene below. >> >> Lucene organizes data in a unit that we call a "document." An easy >analogy >> for this is that it is a lot like a row in a single database table. >It has >> fields, each field has a type. Unless custom software is used, there >is >> really no support for data other than basic primitive types -- >numbers and >> strings. The only complex type that I can think of that Solr >supports out >> of the box is geospatial coordinates, and it might even support >> multi-dimensional coordinates, but I'm not sure. It's not all that >complex >> -- the field just stores and manipulates multiple numbers instead of >one. >> The Lucene API does support a FEW things that Solr doesn't implement. > I >> don't think those are applicable to what you're trying to do. >> >> Let's look at the first part of the data that you included in the >first >> message: >> >> 8=FIX.4.4 9=653 35=RIO >> >> Is "8" always a mixture of letters and numbers and periods? Is "9" >always >> a number, and is it always a WHOLE number? Is "35" always letters? >> Looking deeper to data that I didn't quote ... is "122" always a >date/time >> value? Are the tag numbers always picked from a well-defined set, or >do >> they change? >> >> Assuming that the answers in the previous paragraph are found and a >> configuration is created to deal with all of it ... how are you >planning to >> search it? What kind of queries would you expect somebody to make? >That's >> going to have a huge influence on how you configure things. >> >> Writing the schema is usually where people spend the most time when >> they're setting up Solr. >> >> Thanks, >> Shawn >> >> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: How do I create a schema file for FIX data in Solr
Thank you Rick for the enlightening. I will get the FIX message parsed first and come back here later. ** *Sincerely yours,* *Raymond* On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir wrote: > Google >fix to json, > there are a few interesting leads. > > On April 2, 2018 12:34:44 AM EDT, Raymond Xie > wrote: > >Thank you, Shawn, Rick and other readers, > > > >To Shawn: > > > >For *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8 > >means BeginString, in this example, its value is FIX.4.4.9, and 9 > >means > >body length, it is 653 for this message, 35 is RIO, meaning the message > >type is RIO, 122 stands for OrigSendingTime and has a format of > >UTCTimestamp > > > >You can refer to this page for details: https://www.onixs.biz > >/fix-dictionary/4.2/fields_by_tag.html > > > >All the values are explained as string type. > > > >All the tag numbers are from FIX standard so it doesn't change (in my > >case) > > > >I expect a python program might be needed to parse the message and > >extract > >each tag's value, index is to be made on those extracted value as long > >as > >their field (tag) name. > > > >With index in place, ideally and naturally user will search for any > >keyword, however, in this case, most queries would be based on tag 37 > >(Order ID) and 75 (Trade Date), there is another customized tag (not in > >the > >standard) Order Version to be queried on. > > > >I understand the parser creation would be a manual process, as long as > >I > >know or have a small sample program, I will do it myself and maybe > >adjust > >it as per need. > > > >To Rick: > > > >You mentioned creating JSON document, my understanding is a parser > >would be > >needed to generate that JSON document, do you have any existing example > >code? > > > > > > > > > >Thank you guys very much. > > > > > > > > > > > > > > > > > > > >** > >*Sincerely yours,* > > > > > >*Raymond* > > > >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey > >wrote: > > > >> On 4/1/2018 10:12 AM, Raymond Xie wrote: > >> > >>> FIX is a format standard of financial data. It contains lots of tags > >in > >>> number with value for the tag, like 8=asdf, where 8 is the tag and > >asdf is > >>> the tag's value. Each tag has its definition. > >>> > >>> The sample msg in FIX format was in the original question. > >>> > >>> All I need to do is to know how to paste the msg and get all tag's > >value. > >>> > >>> I found so far a parser is what I need to start with., But I am more > >>> concerning about how to create index in Solr on the extracted tag's > >value, > >>> that is the first step, the next would be to customize the dashboard > >for > >>> users to search with a value to find out which msg contains that > >value in > >>> which tag and present users the whole msg as proof. > >>> > >> > >> Most of Solr's functionality is provided by Lucene. Lucene is a java > >API > >> that implements search functionality. Solr bolts on some > >functionality on > >> top of Lucene, but doesn't really do anything to fundamentally change > >the > >> fact that you're dealing with a Lucene index. So I'm going to mostly > >talk > >> about Lucene below. > >> > >> Lucene organizes data in a unit that we call a "document." An easy > >analogy > >> for this is that it is a lot like a row in a single database table. > >It has > >> fields, each field has a type. Unless custom software is used, there > >is > >> really no support for data other than basic primitive types -- > >numbers and > >> strings. The only complex type that I can think of that Solr > >supports out > >> of the box is geospatial coordinates, and it might even support > >> multi-dimensional coordinates, but I'm not sure. It's not all that > >complex > >> -- the field just stores and manipulates multiple numbers instead of > >one. > >> The Lucene API does support a FEW things that Solr doesn't implement. > > I > >> don't think those are applicable to what you're trying to do. > >> > >> Let's look at the first part of the data that you included in the > >first > >> message: > >> > >> 8=FIX.4.4 9=653 35=RIO > >> > >> Is "8" always a mixture of letters and numbers and periods? Is "9" > >always > >> a number, and is it always a WHOLE number? Is "35" always letters? > >> Looking deeper to data that I didn't quote ... is "122" always a > >date/time > >> value? Are the tag numbers always picked from a well-defined set, or > >do > >> they change? > >> > >> Assuming that the answers in the previous paragraph are found and a > >> configuration is created to deal with all of it ... how are you > >planning to > >> search it? What kind of queries would you expect somebody to make? > >That's > >> going to have a huge influence on how you configure things. > >> > >> Writing the schema is usually where people spend the most time when > >> they're setting up Solr. > >> > >> Thanks, > >> Shawn > >> > >> > > -- > Sorry for
Re: querying vs. highlighting: complete freedom?
Hi Arturas, Both Erick and I had a go at improving the documentation here. I hope it's clearer. https://builds.apache.org/job/Solr-reference-guide-master/javadoc/highlighting.html The docs for hl.fl, hl.q, hl.qparser were all updated. The meat of the change was a new note in hl.fl including an example. It's kinda hard to document the problem you found but I hope the note will be somewhat illustrative. ~ David On Mon, Mar 26, 2018 at 3:12 AM Arturas Mazeika wrote: > Hi Erick, > > Adding a field-qualify to the hl.q parameter solved the issue. My > excitement is steaming over the roof! What a thorough answer: the > explanation about the behavior of solr, how it tries to interpret what I > mean when I supply a keyword without the field-qualifier. Very impressive. > Would you care (re)posting this answer to stackoverflow? If that is too > much of a hassle, I'll do this in a couple of days myself on your behalf. > > I am impressed how well, thorough, fast and fully the question was > answered. > > Steven hint pushed me into this direction further: he suggested to use the > query part of solr to filter and sort out the relevant answers in the 1st > step and in the 2nd step he'd highlight all the keywords using CTR+F (in > the browser or some alternative viewer). This brought be to the next > question: > > How can one match query terms with the analyze-chained documents in an > efficient and distributed manner? My current understanding how to achieve > this is the following: > > 1. Get the list of ids (contents) of the documents that match the query > 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the > document and the query > 3. Use the matching of the substrings from the original text to last > filter/tokenizer/analyzer in the analyze-chain to map the terms of the > query > 4. Emulate CTRL+F highlighting > > Web Interface of Solr offers quite a bit to advance towards this goal. If > one fires this request: > > * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a > German-born theoretical physicist[5] who developed the theory of > relativity, one of the two pillars of modern physics (alongside quantum > mechanics).& > * analysis.query=reletivity theory > > to one of the cores of solr, one gets the steps 1-3 done: > > > http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theory&analysis.fieldtype=text_en > > Questions: > > 1. Is there a way to "load-balance" this? In the above url, I need to > specify a specific core. Is it possible to generalize it, so the core that > receives the request is not necessarily the one that processes it? Or this > already is distributed in a sense that receiving core and processing cores > are never the same? > > 2. The document was already analyze-chained. Is is possible to store this > information so one does not need to re-analyze-chain it once more? > > Cheers > Arturas > > On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson > wrote: > > > Arturas: > > > > Try to field-qualify your hl.q parameter. That looks like: > > > > hl.q=trans:Kundigung > > or > > hl.q=trans:Kündigung > > > > I saw the exact behavior you describe when I did _not_ specify the > > field in the hl.q parameter, i.e. > > > > hl.q=Kundigung > > or > > hl.q=Kündigung > > > > didn't show all highlights. > > > > But when I did specify the field, it worked. > > > > Here's what I think is happening: Solr uses the default search > > field when parsing an un-field-qualified query. I.e. > > > > q=something > > > > is parsed as > > > > q=default_search_field:something. > > > > The default field is controlled in solrconfig.xml with the "df" > > parameter, you'll see entries like: > > my_field > > > > Also when I changed the "df" parameter to the field I was highlighting > > on, I didn't need to specify the field on the hl.q parameter. > > > > hl.q=Kundigung > > or > > hl.q=Kündigung > > > > The default field is usually "text", which knows nothing about > > the German-specific filters you've applied unless you changed it. > > > > So in the absence of a field-qualification for the hl.q parameter Solr > > was parsing the query according to the analysis chain specifed > > in your default field, and probably passed ü through without > > transforming it. Since your indexing analysis chain for that field > > folded ü to just plain u, it wasn't found or highlighted. > > > > On the surface, this does seem like something that should be > > changed, I'll go ahead and ping the dev list. > > > > NOTE: I was trying this on Solr 7.1 > > > > Best, > > Erick > > > > On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika > > wrote: > >
Solr 6. 3 Can not talk to ZK Updates are disabled
We noticed this issue in our solr clusters right after when Solr cluster is restarted or Solr cluster is live for some time. Based on my research so far... I am not seeing zookeeper connection issues from zk server side. It seems it is solr side ( zk client) side. This issue is pretty constant now and then. Error 1 Solr: WARN - 2018-02-06 17:35:04.742; org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper... ERROR - 2018-02-06 17:35:04.743; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) Error 2: >From ingestor log: /var/log/mwired/core-ingestors/app.log.9:2018-03-30 05:44:52,616 [-38] ERROR org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection failed due to (503) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at : Cannot talk to ZooKeeper - Updates are disabled., retry? 0 /var/log/mwired/core-ingestors/app.log.9:com.mwired.grid.commons.exception.PersistenceException: Failed to add 11 docs to solr0 collection , cachedDocs=118; because Error from server at : Cannot talk to ZooKeeper - Updates are disabled. Wondering is there any fix? Appreciate any input. http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-td4311582.html http://lucene.472066.n3.nabble.com/6-6-Cannot-talk-to-ZooKeeper-Updates-are-disabled-td4352917.html https://issues.apache.org/jira/browse/SOLR-3274 Thanks in advance. Murux -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr 6. 3 Can not talk to ZK Updates are disabled
Hi murugesh, This error happen normally when you are in long GC pauses. Try to rise the heap memory. The only way to recover from this is restarting the affected node. Regard. -- Yago Riveiro On 2 Apr 2018 15:39 +0100, murugesh karmegam , wrote: > We noticed this issue in our solr clusters right after when Solr cluster is > restarted or Solr cluster is live for some time. Based on my research so > far... I am not seeing zookeeper connection issues from zk server side. It > seems it is solr side ( zk client) side. This issue is pretty constant now > and then. > > Error 1 Solr: > > WARN - 2018-02-06 17:35:04.742; > org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper > session was expired. Attempting to reconnect to recover relationship with > ZooKeeper... > ERROR - 2018-02-06 17:35:04.743; org.apache.solr.common.SolrException; > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are > disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696) > at > org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) > > > Error 2: > > From ingestor log: > /var/log/mwired/core-ingestors/app.log.9:2018-03-30 05:44:52,616 [-38] ERROR > org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection > failed due to (503) > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at : Cannot talk to ZooKeeper - Updates are disabled., retry? 0 > /var/log/mwired/core-ingestors/app.log.9:com.mwired.grid.commons.exception.PersistenceException: > Failed to add 11 docs to solr0 collection , cachedDocs=118; because Error > from server at : Cannot talk to ZooKeeper - Updates are disabled. > > > Wondering is there any fix? Appreciate any input. > > http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-td4311582.html > http://lucene.472066.n3.nabble.com/6-6-Cannot-talk-to-ZooKeeper-Updates-are-disabled-td4352917.html > https://issues.apache.org/jira/browse/SOLR-3274 > > Thanks in advance. > Murux > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: custom filter class on schema.xml on solrcloud
ZK as used by Solr defaults to a max of 1M file sizes specifically so you _know_ when you are pushing large files around. You can change that with setting jute.maxbuffer, see the ZooKeeper admin guide. But if you put the jar file in the right place, it should have been found. I did note that you put it in a different place than you specified in sorlconfig.xml, but if it's in the classpath it should be found. Try starting Solr with the -v option, that'll show you where everything is loaded from (or looked for). That may provide a clue. Best, Erick On Mon, Apr 2, 2018 at 2:25 AM, void wrote: > I have used a custom filter provided by a jar in schema.xml in standalone > Solr like below > > stopWordDictionary="resources/yStopWords"/> > > And for this, > > I have loaded the jar in solrconfig.xml like below > > > > It's working fine But when I've tried to use it in solrcloud with external > zookeeper mode I've got an error 'IO exception' maybe for uploading a large > jar file in zookeeper. > > I've also tried to put this jar in the lib folder of solr home but got error > 'Plugin init failure' > > After that, I've tried blob store api but the documentation says "Blob store > can only be used to dynamically load components configured in > solrconfig.xml. Components specified in schema.xml cannot be loaded from > blob store" > > So, how can I use custom filter class in schema.xml in solrcloud mode with > external zookeeper configuration > > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Solr 7.2 solr.log is missing
Not located in the /server/logs/ folder. Have these files instead solr-8983-console.log solr_gc.log.0.current I can see logs from the Solr dashboard. Where is the solr.log file going to? A search of "solr.log" in the system did not find the file. Is the file called something else for solrcloud mode? log4j.properties shows this: # Default Solr log4j config # rootLogger log level may be programmatically overridden by -Dsolr.log.level solr.log=${solr.log.dir} log4j.rootLogger=INFO, file, CONSOLE # Console appender will be programmatically disabled when Solr is started with option -Dsolr.log.muteconsole log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n #- size rotation with log cleanup. log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.MaxFileSize=4MB log4j.appender.file.MaxBackupIndex=9 #- File to log to and log format log4j.appender.file.File=${solr.log}/solr.log log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n # Adjust logging levels that should differ from root logger log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.apache.hadoop=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.server.Server=INFO log4j.logger.org.eclipse.jetty.server.ServerConnector=INFO # set to INFO to enable infostream log messages log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF Thanks, Abhi -- Abhi Basu
Collection out of disk space, commit problem
Over the weekend one of our Dev solrcloud ran out of disk space. Examining the problem we found one collection that had 2 months of uncommitted tlog files. Unfortuneatly the solr logs rolled over and so I cannot see the commit behavior during the last time data was loaded to it. The solrconfig.xml has both autoCommit and autoSoftCommit enabled. ${solr.autoCommit.maxTime:6} false solr.autoCommit.maxTime is set to 6 ${solr.autoSoftCommit.maxTime:5000} solr.autoSoftCommit.maxTime is set to 3000 I found tlog files dated to Feb. 27. There is an automated job that reloads the data once a week. It looks like no commits occurred from Feb 27 onward. Once the disk got full solr got very unhappy. This solrcloud has 2 shards and one replica per shard. We have a second development solrcloud which has the same collections with identical configurations except that these collections have 2 shards and 2 replicas per shard. That one doesn't seem to have the tlog files accumulating. I have long suspected that autoCommit is not reliable, and this seems to indicate that it is not. We have several collections that share the same configuration, and have similar ETL jobs loading them. This is the second time that this particular collection has had this problem. -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Collection out of disk space, commit problem
Webster: Do you by any chance have CDCR configured? If so, insure that buffering is disabled. Buffering was intended to be enabled _temporarily_ during, say, a maintenance window and was conceived before the bootstrapping capability was added to CDCR. But I don't recall your other e-mails mention CDCR so I mention this on the off chance... Best, Erick On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer wrote: > Over the weekend one of our Dev solrcloud ran out of disk space. Examining > the problem we found one collection that had 2 months of uncommitted tlog > files. Unfortuneatly the solr logs rolled over and so I cannot see the > commit behavior during the last time data was loaded to it. > > The solrconfig.xml has both autoCommit and autoSoftCommit enabled. > >${solr.autoCommit.maxTime:6} >false > > > solr.autoCommit.maxTime is set to 6 > > >${solr.autoSoftCommit.maxTime:5000} > > solr.autoSoftCommit.maxTime is set to 3000 > > I found tlog files dated to Feb. 27. There is an automated job that reloads > the data once a week. It looks like no commits occurred from Feb 27 onward. > Once the disk got full solr got very unhappy. > > This solrcloud has 2 shards and one replica per shard. > > We have a second development solrcloud which has the same collections with > identical configurations except that these collections have 2 shards and 2 > replicas per shard. That one doesn't seem to have the tlog files > accumulating. > > I have long suspected that autoCommit is not reliable, and this seems to > indicate that it is not. > > We have several collections that share the same configuration, and have > similar ETL jobs loading them. This is the second time that this particular > collection has had this problem. > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Re: Solr 7.2 solr.log is missing
Technically, Solr doesn't name the file at all, that's in your log4j config, this line: log4j.appender.file.File=${solr.log}/solr.log so it's weird that you can't find it on your machine at all. How do you _start_ Solr? In particular, to you define a system variable "-Dsolr.log=some_path"? And also note that there are three log4j configs, and it's easy to be using one you don't think you are using, see SOLR-12008. Best, Erick On Mon, Apr 2, 2018 at 10:02 AM, Abhi Basu <9000r...@gmail.com> wrote: > Not located in the /server/logs/ folder. > > Have these files instead > > solr-8983-console.log > solr_gc.log.0.current > > I can see logs from the Solr dashboard. Where is the solr.log file going > to? A search of "solr.log" in the system did not find the file. > > Is the file called something else for solrcloud mode? > > log4j.properties shows this: > > # Default Solr log4j config > # rootLogger log level may be programmatically overridden by > -Dsolr.log.level > solr.log=${solr.log.dir} > log4j.rootLogger=INFO, file, CONSOLE > > # Console appender will be programmatically disabled when Solr is started > with option -Dsolr.log.muteconsole > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender > log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout > log4j.appender.CONSOLE.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS} > %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n > > #- size rotation with log cleanup. > log4j.appender.file=org.apache.log4j.RollingFileAppender > log4j.appender.file.MaxFileSize=4MB > log4j.appender.file.MaxBackupIndex=9 > > #- File to log to and log format > log4j.appender.file.File=${solr.log}/solr.log > log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout > log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS} > %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n > > # Adjust logging levels that should differ from root logger > log4j.logger.org.apache.zookeeper=WARN > log4j.logger.org.apache.hadoop=WARN > log4j.logger.org.eclipse.jetty=WARN > log4j.logger.org.eclipse.jetty.server.Server=INFO > log4j.logger.org.eclipse.jetty.server.ServerConnector=INFO > > # set to INFO to enable infostream log messages > log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF > > > Thanks, > > Abhi > > -- > Abhi Basu
Re: Solr 7.2 solr.log is missing
Wow life is complicated :) Since I am using this to start solr, I am assuming the one in /server/scripts/cloud-scripts is being used: ./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/solr -p 8983 -z zk0-esohad:2181,zk1-esohad:2181,zk5-esohad:2181 -m 10g So, I guess I need to edit that one. Thanks, Abhi On Mon, Apr 2, 2018 at 1:14 PM, Erick Erickson wrote: > Technically, Solr doesn't name the file at all, that's in your log4j > config, this line: > > log4j.appender.file.File=${solr.log}/solr.log > > so it's weird that you can't find it on your machine at all. How do > you _start_ Solr? In particular, to you define a system variable > "-Dsolr.log=some_path"? > > And also note that there are three log4j configs, and it's easy to be > using one you don't think you are using, see SOLR-12008. > > Best, > Erick > > On Mon, Apr 2, 2018 at 10:02 AM, Abhi Basu <9000r...@gmail.com> wrote: > > Not located in the /server/logs/ folder. > > > > Have these files instead > > > > solr-8983-console.log > > solr_gc.log.0.current > > > > I can see logs from the Solr dashboard. Where is the solr.log file going > > to? A search of "solr.log" in the system did not find the file. > > > > Is the file called something else for solrcloud mode? > > > > log4j.properties shows this: > > > > # Default Solr log4j config > > # rootLogger log level may be programmatically overridden by > > -Dsolr.log.level > > solr.log=${solr.log.dir} > > log4j.rootLogger=INFO, file, CONSOLE > > > > # Console appender will be programmatically disabled when Solr is started > > with option -Dsolr.log.muteconsole > > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender > > log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout > > log4j.appender.CONSOLE.layout.ConversionPattern=%d{-MM-dd > HH:mm:ss.SSS} > > %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n > > > > #- size rotation with log cleanup. > > log4j.appender.file=org.apache.log4j.RollingFileAppender > > log4j.appender.file.MaxFileSize=4MB > > log4j.appender.file.MaxBackupIndex=9 > > > > #- File to log to and log format > > log4j.appender.file.File=${solr.log}/solr.log > > log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout > > log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS} > > %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n > > > > # Adjust logging levels that should differ from root logger > > log4j.logger.org.apache.zookeeper=WARN > > log4j.logger.org.apache.hadoop=WARN > > log4j.logger.org.eclipse.jetty=WARN > > log4j.logger.org.eclipse.jetty.server.Server=INFO > > log4j.logger.org.eclipse.jetty.server.ServerConnector=INFO > > > > # set to INFO to enable infostream log messages > > log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF > > > > > > Thanks, > > > > Abhi > > > > -- > > Abhi Basu > -- Abhi Basu
Re: Collection out of disk space, commit problem
Erick, Thanks, Normally our dev environment does not use CDCR, except when we're doing active development on it. As it happens the collection in question, was one we used to test cdcr. Or rather the configuration for it was, as the specific collection has been deleted and created many times. Even though we had cdcr turned off it seems that buffers got set to "enabled" Which seems to be the default, and it is a really bad default! Because it's dev and we don't do cdcr there, I might not have thought to look at that. So thank you for that Web On Mon, Apr 2, 2018 at 1:10 PM, Erick Erickson wrote: > Webster: > > Do you by any chance have CDCR configured? If so, insure that > buffering is disabled. Buffering was intended to be enabled > _temporarily_ during, say, a maintenance window and was conceived > before the bootstrapping capability was added to CDCR. > > But I don't recall your other e-mails mention CDCR so I mention this > on the off chance... > > Best, > Erick > > On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer > wrote: > > Over the weekend one of our Dev solrcloud ran out of disk space. > Examining > > the problem we found one collection that had 2 months of uncommitted tlog > > files. Unfortuneatly the solr logs rolled over and so I cannot see the > > commit behavior during the last time data was loaded to it. > > > > The solrconfig.xml has both autoCommit and autoSoftCommit enabled. > > > >${solr.autoCommit.maxTime:6} > >false > > > > > > solr.autoCommit.maxTime is set to 6 > > > > > >${solr.autoSoftCommit.maxTime:5000} > > > > solr.autoSoftCommit.maxTime is set to 3000 > > > > I found tlog files dated to Feb. 27. There is an automated job that > reloads > > the data once a week. It looks like no commits occurred from Feb 27 > onward. > > Once the disk got full solr got very unhappy. > > > > This solrcloud has 2 shards and one replica per shard. > > > > We have a second development solrcloud which has the same collections > with > > identical configurations except that these collections have 2 shards and > 2 > > replicas per shard. That one doesn't seem to have the tlog files > > accumulating. > > > > I have long suspected that autoCommit is not reliable, and this seems to > > indicate that it is not. > > > > We have several collections that share the same configuration, and have > > similar ETL jobs loading them. This is the second time that this > particular > > collection has had this problem. > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance
Raymond, You can specify the default behavior in solrconfig.xml under each handler. For instance for /browse you can specify it should look into name, and for /query you can default it to different field. On Mon, Apr 2, 2018 at 9:04 PM, Rick Leir wrote: > Raymond > There is a default field normally called df. You would normally use > Copyfield to copy all searchable fields into the default field. > Cheers -- Rick > > On April 1, 2018 11:34:07 PM EDT, Raymond Xie > wrote: > >Hi Rick, > > > >I sorted it out half: > > > >I should have specified the field in the search query, so, instead of > >http://localhost:8983/solr/films/browse?q=batman, I should use: > >http://localhost:8983/solr/films/browse?q=name:batman > > > >Sorry for this newbie mistake. > > > >But what about if I/user doesn't know or doesn't want to specify the > >search > >scope to be restricted in field "name" but anywhere in the index'ed > >documents? > > > > > >** > >*Sincerely yours,* > > > > > >*Raymond* > > > >On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir wrote: > > > >> Raymond > >> The output is not visible to me because the mailing list strips > >images. > >> Please try a different way to show the output. > >> Cheers -- Rick > >> > >> On March 29, 2018 10:17:13 PM EDT, Raymond Xie > >> wrote: > >> > I am new to Solr, following Steve Rowe's example on > >> > >>https://github.com/apache/lucene-solr/tree/master/solr/example/films: > >> > > >> >It would be greatly appreciated if anyone can enlighten me where to > >> >start > >> >troubleshooting, thank you very much in advance. > >> > > >> >The steps I followed are: > >> > > >> >Here ya go << END_OF_SCRIPT > >> > > >> >bin/solr stop > >> >rm server/logs/*.log > >> >rm -Rf server/solr/films/ > >> >bin/solr start > >> >bin/solr create -c films > >> >curl http://localhost:8983/solr/films/schema -X POST -H > >> >'Content-type:application/json' --data-binary '{ > >> >"add-field" : { > >> >"name":"name", > >> >"type":"text_general", > >> >"multiValued":false, > >> >"stored":true > >> >}, > >> >"add-field" : { > >> >"name":"initial_release_date", > >> >"type":"pdate", > >> >"stored":true > >> >} > >> >}' > >> >bin/post -c films example/films/films.json > >> >curl http://localhost:8983/solr/films/config/params -H > >> >'Content-type:application/json' -d '{ > >> >"update" : { > >> > "facets": { > >> >"facet.field":"genre" > >> >} > >> > } > >> >}' > >> > > >> ># END_OF_SCRIPT > >> > > >> >Additional fun - > >> > > >> >Add highlighting: > >> >curl http://localhost:8983/solr/films/config/params -H > >> >'Content-type:application/json' -d '{ > >> >"set" : { > >> > "browse": { > >> >"hl":"on", > >> >"hl.fl":"name" > >> >} > >> > } > >> >}' > >> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll > >> >see "batman" highlighted in the results > >> > > >> > > >> > > >> >I got nothing in my search: > >> > > >> > > >> > > >> > > >> >** > >> >*Sincerely yours,* > >> > > >> > > >> >*Raymond* > >> > >> -- > >> Sorry for being brief. Alternate email is rickleir at yahoo dot com > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com > -- Best regards, Adhyan Arizki
Re: PreAnalyzed FieldType, and simultaneously importing JSON
Hello Markus, It appears you are not familiar with PreAnalyzedUpdateProcessor? Using that is much more flexible -- you could have different URP chains for your use-cases. IMO PreAnalyzedField ought to go away. I argued for the URP version and thus it's superiority to the FieldType here: https://issues.apache.org/jira/browse/SOLR-4619?focusedCommentId=13611191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13611191 Sadly, the FieldType is the one that is documented in the ref guide, but not the URP :-( ~ David On Thu, Mar 29, 2018 at 5:06 PM Markus Jelsma wrote: > Hello, > > We want to move to PreAnalyzed FieldType to offload our very heavy > analysis chain away from the search cluster, so we have to configure our > fields to accept pre-analyzed tokens in production. > > But we use the same schema in development environments too, and that is > where we use JSON files, or stream (export/import) data directly from > production servers into a development environment, again via JSON. And in > case of disaster recovery, we can import the daily exported JSON bzipped > files back into our production servers. > > But this JSON loading does not work with PreAnalyzed FieldType. So to load > JSON we must reset all fields back to their respective language specific > FieldTypes on-the-fly, we could automate, but it is a hassle we like to > avoid. > > Have i overlooked any configuration parameters that can help? Must we > automate the on-the-fly schema reconfiguration and reset to PreAnalyzed > after JSON loading is finished? > > Many thanks! > Markus > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
Solr 7.1.0 - concurrent.ExecutionException building model
Hi All - when building machine learning models using information gain, I sometimes get this error when the number of iterations is high. I'm using about 20k news articles in my training set (about 10k positive, and 10k negative), and (for this particular run) am using 500 terms and 25,000 iterations. I have gotten the error with a much lower number of iterations (1,000) as well. The specific stream command was: update(models, batchSize="50",train(MODEL1024_1522696624083,features(MODEL1024_1522696624083,q="*:*",featureSet="FSet_MODEL1024_1522696624083",field="Text",outcome="out_i",positiveLabel=1,numTerms=500),q="*:*",name="MODEL1024",field="Text",outcome="out_i",maxIterations="25000")) The training data was split across 20 shards - specifically created with: http://icarus.querymasters.com:9100/solr/admin/collections?action=CREATE&name=MODEL1024_1522696624083&numShards=20&replicationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING Any ideas? The complete error is: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://vesta:9100/solr/MODEL1024_1522696624083_shard20_replica_n75: Expected mime type application/octet-stream but got text/html. Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason: Not Found at org.apache.solr.client.solrj.io.stream.TextLogitStream.read(TextLogitStream.java:498) at org.apache.solr.client.solrj.io.stream.PushBackStream.read(PushBackStream.java:87) at org.apache.solr.client.solrj.io.stream.UpdateStream.read(UpdateStream.java:109) at org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:68) at org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:627) at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:87) at org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.java:523) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:180) at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559) at org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:84) at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:198) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:806) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:535) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io
Re: Collection out of disk space, commit problem
Homer: Yeah, the buffering bits are trappy, and in fact is being removed in CDCR going forward. Too bad you fell into that trap, there's hope going forward though... Erick On Mon, Apr 2, 2018 at 11:42 AM, Webster Homer wrote: > Erick, > > Thanks, Normally our dev environment does not use CDCR, except when we're > doing active development on it. As it happens the collection in question, > was one we used to test cdcr. Or rather the configuration for it was, as > the specific collection has been deleted and created many times. Even > though we had cdcr turned off it seems that buffers got set to "enabled" > Which seems to be the default, and it is a really bad default! > > Because it's dev and we don't do cdcr there, I might not have thought to > look at that. So thank you for that > > Web > > On Mon, Apr 2, 2018 at 1:10 PM, Erick Erickson > wrote: > >> Webster: >> >> Do you by any chance have CDCR configured? If so, insure that >> buffering is disabled. Buffering was intended to be enabled >> _temporarily_ during, say, a maintenance window and was conceived >> before the bootstrapping capability was added to CDCR. >> >> But I don't recall your other e-mails mention CDCR so I mention this >> on the off chance... >> >> Best, >> Erick >> >> On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer >> wrote: >> > Over the weekend one of our Dev solrcloud ran out of disk space. >> Examining >> > the problem we found one collection that had 2 months of uncommitted tlog >> > files. Unfortuneatly the solr logs rolled over and so I cannot see the >> > commit behavior during the last time data was loaded to it. >> > >> > The solrconfig.xml has both autoCommit and autoSoftCommit enabled. >> > >> >${solr.autoCommit.maxTime:6} >> >false >> > >> > >> > solr.autoCommit.maxTime is set to 6 >> > >> > >> >${solr.autoSoftCommit.maxTime:5000} >> > >> > solr.autoSoftCommit.maxTime is set to 3000 >> > >> > I found tlog files dated to Feb. 27. There is an automated job that >> reloads >> > the data once a week. It looks like no commits occurred from Feb 27 >> onward. >> > Once the disk got full solr got very unhappy. >> > >> > This solrcloud has 2 shards and one replica per shard. >> > >> > We have a second development solrcloud which has the same collections >> with >> > identical configurations except that these collections have 2 shards and >> 2 >> > replicas per shard. That one doesn't seem to have the tlog files >> > accumulating. >> > >> > I have long suspected that autoCommit is not reliable, and this seems to >> > indicate that it is not. >> > >> > We have several collections that share the same configuration, and have >> > similar ETL jobs loading them. This is the second time that this >> particular >> > collection has had this problem. >> > >> > -- >> > >> > >> > This message and any attachment are confidential and may be privileged or >> > otherwise protected from disclosure. If you are not the intended >> recipient, >> > you must not copy this message or attachment or disclose the contents to >> > any other person. If you have received this transmission in error, please >> > notify the sender immediately and delete the message and any attachment >> > from your system. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not accept liability for any omissions or errors in this >> > message which may arise as a result of E-Mail-transmission or for damages >> > resulting from any unauthorized changes of the content of this message >> and >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not guarantee that this message is free of viruses and >> does >> > not accept liability for any damages caused by any virus transmitted >> > therewith. >> > >> > Click http://www.emdgroup.com/disclaimer to access the German, French, >> > Spanish and Portuguese versions of this disclaimer. >> > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Re: Solr 6. 3 Can not talk to ZK Updates are disabled
Hi Yago Riveiro , Thanks for the reply. We have heap size 64G. Any more is not recommended right? Except one time I was not able to co relate "updates disabled" with GC pause. Also zk timeout is 120 seconds even with long GC pause (more than 10 seconds normally) we should recover right? JVM settings /usr/java/latest/bin/java -server -Xms32g -Xmx64g -DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000 -XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040 -XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45 -XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g -XX:SurvivorRatio=2 -XX:-ResizePLAB -XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -server -Xloggc:/var/log/solr/gc-solr2018-03-27-19-16.log -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64m -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -Xloggc:/var/log/solr/solr_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=18983 -Dcom.sun.management.jmxremote.rmi.port=18983 -Djava.rmi.server.hostname=tr3slr3dn27 -DzkClientTimeout=12 -Dzkhost .../solr -Dsolr.log.dir=/var/log/solr -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Dhost=tr3slr3dn27 -Duser.timezone=EST -Djetty.home=/opt/solr/server -Dsolr.solr.home=/data0/solr -Dsolr.install.dir=/opt/solr -Dlog4j.configuration=file:/etc/solr/conf/log4j.properties -Xss256k -Dsolr.autoSoftCommit.maxTime=30 -Dsolr.autoCommit.maxTime=60 -Dsolr.clustering.enabled=false -DsharedLib=/opt/mw/solrlib -Dsolr.lock.type=native -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m -XX:MinMetaspaceExpansion=16m -XX:MaxMetaspaceExpansion=32m -Xss256k -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/log/solr -jar start.jar --module=http just to give more idea... we are a 48 node cluster with each node having indexes (many together) up to 900GB to 1TB and one major index is with 48 shards with each shard is 80 - 85 G = approx 4TB -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr 6. 3 Can not talk to ZK Updates are disabled
Actually, 64G is on the high side, GC pauses can kill you pretty easily in that range. If it's at all possible to cut that down it would be A Good Thing Best, Erick On Mon, Apr 2, 2018 at 12:56 PM, murugesh karmegam wrote: > Hi Yago Riveiro , > > Thanks for the reply. We have heap size 64G. Any more is not recommended > right? Except one time I was not able to co relate "updates disabled" with > GC pause. Also zk timeout is 120 seconds even with long GC pause (more than > 10 seconds normally) we should recover right? > > JVM settings > > /usr/java/latest/bin/java -server -Xms32g -Xmx64g > -DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000 > -XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040 > -XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45 > -XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g > -XX:SurvivorRatio=2 -XX:-ResizePLAB -XX:+AlwaysPreTouch > -XX:+ParallelRefProcEnabled -server > -Xloggc:/var/log/solr/gc-solr2018-03-27-19-16.log -verbose:gc > -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64m > -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 > -Xloggc:/var/log/solr/solr_gc.log -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.port=18983 > -Dcom.sun.management.jmxremote.rmi.port=18983 > -Djava.rmi.server.hostname=tr3slr3dn27 -DzkClientTimeout=12 -Dzkhost > .../solr -Dsolr.log.dir=/var/log/solr -Djetty.port=8983 -DSTOP.PORT=7983 > -DSTOP.KEY=solrrocks -Dhost=tr3slr3dn27 -Duser.timezone=EST > -Djetty.home=/opt/solr/server -Dsolr.solr.home=/data0/solr > -Dsolr.install.dir=/opt/solr > -Dlog4j.configuration=file:/etc/solr/conf/log4j.properties -Xss256k > -Dsolr.autoSoftCommit.maxTime=30 -Dsolr.autoCommit.maxTime=60 > -Dsolr.clustering.enabled=false -DsharedLib=/opt/mw/solrlib > -Dsolr.lock.type=native -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m > -XX:MinMetaspaceExpansion=16m -XX:MaxMetaspaceExpansion=32m -Xss256k > -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 > /var/log/solr -jar start.jar --module=http > > > > just to give more idea... > we are a 48 node cluster with each node having indexes (many together) up to > 900GB to 1TB and one major index is with 48 shards with each shard is 80 - > 85 G = approx 4TB > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Learning to Rank (LTR) with grouping
Hi Ilay, I am still on Solr 6.6.0 and did not patch the grouping fix. I implemented a temporary workaround solution to have 2 async request from the web application 1st with grouping 2nd without grouping and merged the results. This solution worked for my case as we were getting grouping results for specific tiles in the page. Roopa On Mon, Apr 2, 2018 at 2:57 AM, ilayaraja wrote: > Hi Roopa & Deigo, > > I am facing same issue with grouping. Currently, am on Solr 7.2.1 but > still > see that grouping with LTR is not working. Did you apply it as patch or the > latest solr version has the fix already? > > Ilay > > > > - > --Ilay > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Solr 6. 3 Can not talk to ZK Updates are disabled
Thanks Erik for the reply. We even had 92G heap size for some time at one time. We were able to run and survive with 64G for the last several months although with some issues mainly this issue "Can not talk to ZK Updates are disabled". We have dedicated zk quorum. When we have reduced to 32G we ran into some other issues. So given all of that wondering is there any options like G1 GC tuning ? We are running in 256 GB boxes. The os cache is quiet huge too. /usr/java/latest/bin/java -server -Xms32g -Xmx64g -DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000 -XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040 -XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45 -XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g -XX:SurvivorRatio=2 -XX:-ResizePLAB -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr 6. 3 Can not talk to ZK Updates are disabled
On 4/2/2018 2:43 PM, murugesh karmegam wrote: > So given all of that wondering is there any options > like G1 GC tuning ? Targeted reply. I've put some G1 information out there for Solr. https://wiki.apache.org/solr/ShawnHeisey Thanks, Shawn
Re: Solr 7.1.0 - concurrent.ExecutionException building model
It looks like it accessing a replica that's down. Are the logs from http://vesta:9100/solr/MODEL1024_1522696624083_shard20_replica_n75 reporting any issues? When you go to that url is it back up and running? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Apr 2, 2018 at 3:55 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Hi All - when building machine learning models using information gain, I > sometimes get this error when the number of iterations is high. I'm using > about 20k news articles in my training set (about 10k positive, and 10k > negative), and (for this particular run) am using 500 terms and 25,000 > iterations. I have gotten the error with a much lower number of iterations > (1,000) as well. > > The specific stream command was: > update(models, batchSize="50",train(MODEL1024_1522696624083,features( > MODEL1024_1522696624083,q="*:*",featureSet="FSet_MODEL1024_1 > 522696624083",field="Text",outcome="out_i",positiveLabel=1, > numTerms=500),q="*:*",name="MODEL1024",field="Text",outcome= > "out_i",maxIterations="25000")) > > The training data was split across 20 shards - specifically created with: > http://icarus.querymasters.com:9100/solr/admin/collections? > action=CREATE&name=MODEL1024_1522696624083&numShards=20&rep > licationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING > > Any ideas? The complete error is: > > java.io.IOException: java.util.concurrent.ExecutionException: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error from server at http://vesta:9100/solr/MODEL10 > 24_1522696624083_shard20_replica_n75: Expected mime type > application/octet-stream but got text/html. > > > Error 404 Not Found > > HTTP ERROR 404 > Problem accessing /solr/MODEL1024_1522696624083_shard20_replica_n75/select. > Reason: > Not Found > > > > at org.apache.solr.client.solrj.io.stream.TextLogitStream.read( > TextLogitStream.java:498) > at org.apache.solr.client.solrj.io.stream.PushBackStream.read(P > ushBackStream.java:87) > at org.apache.solr.client.solrj.io.stream.UpdateStream.read(Upd > ateStream.java:109) > at org.apache.solr.client.solrj.io.stream.ExceptionStream.read( > ExceptionStream.java:68) > at org.apache.solr.handler.StreamHandler$TimerStream.read( > StreamHandler.java:627) > at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr > iteMap$0(TupleStream.java:87) > at org.apache.solr.response.JSONWriter.writeIterator(JSONRespon > seWriter.java:523) > at org.apache.solr.response.TextResponseWriter.writeVal(TextRes > ponseWriter.java:180) > at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter > .java:559) > at org.apache.solr.client.solrj.io.stream.TupleStream.writeMap( > TupleStream.java:84) > at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri > ter.java:547) > at org.apache.solr.response.TextResponseWriter.writeVal(TextRes > ponseWriter.java:198) > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD > ups(JSONResponseWriter.java:209) > at org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo > nseWriter.java:325) > at org.apache.solr.response.JSONWriter.writeResponse(JSONRespon > seWriter.java:120) > at org.apache.solr.response.JSONResponseWriter.write(JSONRespon > seWriter.java:71) > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR > esponse(QueryResponseWriterUtil.java:65) > at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC > all.java:806) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:535) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp > atchFilter.java:382) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp > atchFilter.java:326) > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte > r(ServletHandler.java:1751) > at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan > dler.java:582) > at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped > Handler.java:143) > at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa > ndler.java:548) > at org.eclipse.jetty.server.session.SessionHandler.doHandle( > SessionHandler.java:226) > at org.eclipse.jetty.server.handler.ContextHandler.doHandle( > ContextHandler.java:1180) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand > ler.java:512) > at org.eclipse.jetty.server.session.SessionHandler.doScope( > SessionHandler.java:185) > at org.eclipse.jetty.server.handler.ContextHandler.doScope( > ContextHandler.java:1112) > at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped > Handler.java:141) > at org.eclipse.jetty.server.handler.ContextHandlerCollection.ha > ndle(ContextHandlerCollection.java:213) > at org.eclipse.jetty.server.handler.HandlerCollection.handle( > HandlerCollection.java:119) > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl > erWrapper
Re: Solr 7.1.0 - concurrent.ExecutionException building model
Hi Joel - thank you for your reply. Yes, the machine (Vesta) is up, and I can access it. I don't see anything specific in the log, apart from the same error, but this time to a different server. We have constant indexing happening on this cluster, so if one went down, the indexing would stop, and I've not seen that happen. Interestingly, despite the error, the model is still built at least up to some number of iterations. In other words, many iterations complete OK. -Joe On 4/2/2018 6:54 PM, Joel Bernstein wrote: It looks like it accessing a replica that's down. Are the logs from http://vesta:9100/solr/MODEL1024_1522696624083_shard20_replica_n75 reporting any issues? When you go to that url is it back up and running? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Apr 2, 2018 at 3:55 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: Hi All - when building machine learning models using information gain, I sometimes get this error when the number of iterations is high. I'm using about 20k news articles in my training set (about 10k positive, and 10k negative), and (for this particular run) am using 500 terms and 25,000 iterations. I have gotten the error with a much lower number of iterations (1,000) as well. The specific stream command was: update(models, batchSize="50",train(MODEL1024_1522696624083,features( MODEL1024_1522696624083,q="*:*",featureSet="FSet_MODEL1024_1 522696624083",field="Text",outcome="out_i",positiveLabel=1, numTerms=500),q="*:*",name="MODEL1024",field="Text",outcome= "out_i",maxIterations="25000")) The training data was split across 20 shards - specifically created with: http://icarus.querymasters.com:9100/solr/admin/collections? action=CREATE&name=MODEL1024_1522696624083&numShards=20&rep licationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING Any ideas? The complete error is: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://vesta:9100/solr/MODEL10 24_1522696624083_shard20_replica_n75: Expected mime type application/octet-stream but got text/html. Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason: Not Found at org.apache.solr.client.solrj.io.stream.TextLogitStream.read( TextLogitStream.java:498) at org.apache.solr.client.solrj.io.stream.PushBackStream.read(P ushBackStream.java:87) at org.apache.solr.client.solrj.io.stream.UpdateStream.read(Upd ateStream.java:109) at org.apache.solr.client.solrj.io.stream.ExceptionStream.read( ExceptionStream.java:68) at org.apache.solr.handler.StreamHandler$TimerStream.read( StreamHandler.java:627) at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr iteMap$0(TupleStream.java:87) at org.apache.solr.response.JSONWriter.writeIterator(JSONRespon seWriter.java:523) at org.apache.solr.response.TextResponseWriter.writeVal(TextRes ponseWriter.java:180) at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter .java:559) at org.apache.solr.client.solrj.io.stream.TupleStream.writeMap( TupleStream.java:84) at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri ter.java:547) at org.apache.solr.response.TextResponseWriter.writeVal(TextRes ponseWriter.java:198) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD ups(JSONResponseWriter.java:209) at org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo nseWriter.java:325) at org.apache.solr.response.JSONWriter.writeResponse(JSONRespon seWriter.java:120) at org.apache.solr.response.JSONResponseWriter.write(JSONRespon seWriter.java:71) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR esponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC all.java:806) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:535) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp atchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp atchFilter.java:326) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte r(ServletHandler.java:1751) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan dler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped Handler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa ndler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle( SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle( ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand ler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope( SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope( ContextHandler.java:1112)
Re: Solr 7.1.0 - concurrent.ExecutionException building model
On 4/2/2018 1:55 PM, Joe Obernberger wrote: > The training data was split across 20 shards - specifically created with: > http://icarus.querymasters.com:9100/solr/admin/collections?action=CREATE&name=MODEL1024_1522696624083&numShards=20&replicationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING > > Any ideas? The complete error is: > HTTP ERROR 404 > Problem accessing > /solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason: > Not Found > I'll warn you in advance that I know nothing at all about the learning to rank functionality. I'm replying about the underlying error you're getting, independent of what your query is trying to accomplish. It's a 404 error, trying to access the URL mentioned above. The error doesn't indicate exactly WHAT wasn't found. It could either be the core named "MODEL1024_1522696624083_shard20_replica_n75" or the "/select" handler on that core. That's something you need to figure out. It could be that the core *does* exist, but for some reason, Solr on that machine was unable to start it. The solr.log file on the Solr instance that returned the error (which seems to be on the machine named vesta, answering to port 9100) may have more detail for the error, or some additional error messages. Normally SolrCloud is good at making sure that requests aren't sent to resources that aren't working. So I'm not sure why this happened. Are there other errors or warnings in the solr.log file, either on the instance where you sent your request, or the instance that returned the 404 error? Thanks, Shawn
Classifier for query intent?
We are experimenting with a text classifier for determining query intent. Anybody have a favorite (or anti-favorite) Java implementation? Speed and ease of implementation is important. Right now, we’re mostly looking at Weka and the Stanford Classifier. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
SolrJ SolrInputDocument#addField doesn't throw an exception
Hi all, It would be nice if org.apache.solr.common.SolrInputDocument#addField throw an exception when field name is 'id' and the method detect that the indexed id is not unique, just like the post.jar tool. I was confident that both had the same behavior, so... Thanks.
Re: Classifier for query intent?
Hello Wunder, If you are particular about Java Stanford and Weka both are good choices. OpenNLP also has a document classifier. You can even explore beyond Java, I mean Python, and consume the intent as a REST service. Regards, Dikshant On Tue 3 Apr, 2018, 4:48 AM Walter Underwood, wrote: > We are experimenting with a text classifier for determining query intent. > Anybody have a favorite (or anti-favorite) Java implementation? Speed and > ease of implementation is important. > > Right now, we’re mostly looking at Weka and the Stanford Classifier. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >
Re: Need help to get started on Solr, searching get nothing. Thank you very much in advance
Thanks Rick and Adhyan I see there is "/browse" in solrconfig.xml : explicit and name="defaults" with one item of "df" as shown below: _text_ My understanding is I can put whatever fields I want to enable index and searching here in parallel with _text_, am I correct? Thanks. ** *Sincerely yours,* *Raymond* On Mon, Apr 2, 2018 at 3:24 PM, Adhyan Arizki wrote: > Raymond, > > You can specify the default behavior in solrconfig.xml under each handler. > For instance for /browse you can specify it should look into name, and for > /query you can default it to different field. > > On Mon, Apr 2, 2018 at 9:04 PM, Rick Leir wrote: > > > Raymond > > There is a default field normally called df. You would normally use > > Copyfield to copy all searchable fields into the default field. > > Cheers -- Rick > > > > On April 1, 2018 11:34:07 PM EDT, Raymond Xie > > wrote: > > >Hi Rick, > > > > > >I sorted it out half: > > > > > >I should have specified the field in the search query, so, instead of > > >http://localhost:8983/solr/films/browse?q=batman, I should use: > > >http://localhost:8983/solr/films/browse?q=name:batman > > > > > >Sorry for this newbie mistake. > > > > > >But what about if I/user doesn't know or doesn't want to specify the > > >search > > >scope to be restricted in field "name" but anywhere in the index'ed > > >documents? > > > > > > > > >** > > >*Sincerely yours,* > > > > > > > > >*Raymond* > > > > > >On Sun, Apr 1, 2018 at 2:10 PM, Rick Leir wrote: > > > > > >> Raymond > > >> The output is not visible to me because the mailing list strips > > >images. > > >> Please try a different way to show the output. > > >> Cheers -- Rick > > >> > > >> On March 29, 2018 10:17:13 PM EDT, Raymond Xie > > >> wrote: > > >> > I am new to Solr, following Steve Rowe's example on > > >> > > >>https://github.com/apache/lucene-solr/tree/master/solr/example/films: > > >> > > > >> >It would be greatly appreciated if anyone can enlighten me where to > > >> >start > > >> >troubleshooting, thank you very much in advance. > > >> > > > >> >The steps I followed are: > > >> > > > >> >Here ya go << END_OF_SCRIPT > > >> > > > >> >bin/solr stop > > >> >rm server/logs/*.log > > >> >rm -Rf server/solr/films/ > > >> >bin/solr start > > >> >bin/solr create -c films > > >> >curl http://localhost:8983/solr/films/schema -X POST -H > > >> >'Content-type:application/json' --data-binary '{ > > >> >"add-field" : { > > >> >"name":"name", > > >> >"type":"text_general", > > >> >"multiValued":false, > > >> >"stored":true > > >> >}, > > >> >"add-field" : { > > >> >"name":"initial_release_date", > > >> >"type":"pdate", > > >> >"stored":true > > >> >} > > >> >}' > > >> >bin/post -c films example/films/films.json > > >> >curl http://localhost:8983/solr/films/config/params -H > > >> >'Content-type:application/json' -d '{ > > >> >"update" : { > > >> > "facets": { > > >> >"facet.field":"genre" > > >> >} > > >> > } > > >> >}' > > >> > > > >> ># END_OF_SCRIPT > > >> > > > >> >Additional fun - > > >> > > > >> >Add highlighting: > > >> >curl http://localhost:8983/solr/films/config/params -H > > >> >'Content-type:application/json' -d '{ > > >> >"set" : { > > >> > "browse": { > > >> >"hl":"on", > > >> >"hl.fl":"name" > > >> >} > > >> > } > > >> >}' > > >> >try http://localhost:8983/solr/films/browse?q=batman now, and you'll > > >> >see "batman" highlighted in the results > > >> > > > >> > > > >> > > > >> >I got nothing in my search: > > >> > > > >> > > > >> > > > >> > > > >> >** > > >> >*Sincerely yours,* > > >> > > > >> > > > >> >*Raymond* > > >> > > >> -- > > >> Sorry for being brief. Alternate email is rickleir at yahoo dot com > > > > -- > > Sorry for being brief. Alternate email is rickleir at yahoo dot com > > > > > > -- > > Best regards, > Adhyan Arizki >