AW: Spellchecking and suggesting part numbers
Thanks James, this did help a lot. Is it possible to make DirectSolrSpellChecker try to return suggestions with maximum length of matching leading characters? Alexander -Ursprüngliche Nachricht- Von: Dyer, James [mailto:james.d...@ingramcontent.com] Gesendet: Mittwoch, 24. September 2014 16:42 An: solr-user@lucene.apache.org Betreff: RE: Spellchecking and suggesting part numbers Alexander, You could use a higher value for spellcheck.count, maybe 20 or so, then in your application pick out the suggestions that make changes on the right side. Another option is to use DirectSolrSpellChecker (usually a better choice anyhow) and set the "minPrefix" field. This will require up to n characters on the left side to match before it will make suggestions. Taking a quick look at the code, it seems to me it won't try and correct anything in this prefix region also. So perhaps you can set this to 2-4 (default=1). See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] Sent: Wednesday, September 24, 2014 9:06 AM To: solr-user@lucene.apache.org Subject: Spellchecking and suggesting part numbers Hello Solr Users, we are trying to get suggestions for part numbers using the spellchecker. Problem scenario: ABCD1234 // This is the search term ABCE1234 // This is what we get from spellchecker ABCD1244 // This is what we would like to get from spellchecker Characters towards the left of our part numbers are more relevant. The setup is: solr.IndexBasedSpellChecker ./spellchecker did_you_mean_part did_you_mean_part on spellcheck_part Can we tweak the setup such that we should get more relevant part numbers? Thanks, Alexander
RE: Ignoring Duplicates in Multivalue Field
Hi Ahmet, When I add the RunUpdateProcessorFactory Solr didn't remove any duplications. Any other idea? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Monday, November 03, 2014 1:35 AM To: solr-user@lucene.apache.org Subject: Re: Ignoring Duplicates in Multivalue Field Hi Tomer, What happens when you addto your chain? Ahmet On Sunday, November 2, 2014 1:22 PM, Tomer Levi wrote: Hi, I’m trying to make my “update” request handler ignore multivalue duplications in updates. To make my use case clear, let’s assume my index already contains a document like: { id:”100”, “myMultValueField”: [“1”,”2”,”3”] } Later I would like to send an update like: { id:”100”,” myMultValueField” {“add”:”2”} } How can I make the update request handler understand that “2” already exist and ignore it? I tried to add update chain below but it didn’t work for me. myMultValueField And add it to my requestHandler: uniq-fields Tomer Levi Software Engineer Big Data Group Product & Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
Re: SOLRJ - query with ChildDocTransformerFactory crash because of the javabin parser
Hello, And to answer to my question...:D Was a little, let's say mistake, in my query. Instead of fl=[child parentFilter="cat:PARENT" childFilter="cat:CHILD"] should be fl=id,[child parentFilter="cat:PARENT" childFilter="cat:CHILD"] Awkward, because if you put the query and the filter in URL or in the solr queries tool, both methods works very well. But when you want to make the query from java, with Solrj, the library appends that parser section, and it crashes if it cannot find parent fields in the results. Instead of the id, you could put what field name do you want from the parent. Regards, Andrei -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-query-with-ChildDocTransformerFactory-crash-because-of-the-javabin-parser-tp4167183p4167218.html Sent from the Solr - User mailing list archive at Nabble.com.
dynamically change default update chain
Hello solr fellows, I'm working on a project that involves using two update chains. One default chain is used most of the time and another one custom is used sporadically. The default update chain is called automatically without action needed (well, that's why it is default). The custom pipeline can be switched on using update.chain http parameter, like so: [code] UpdateRequest updateRequest = new UpdateRequest(); updateRequest.setCommitWithin(1); updateRequest.setParam("update.chain", "customupdatechain"); updateRequest.add(solrDoc); updateRequest.process(solrServer); [/code] Now I have a new requirement: be able to install the custom chain as the default update chain such that any client that is sending data in will get it processed via the custom chain and not the default chain. And this should happen seamlessly to the client, i.e. no parameter change needed. Is this possible with the current state of the Solr core / collection api or some other method? -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
CFP: FOSDEM 2015 - Open Source Search Dev Room
***Please forward this CFP to anyone who may be interested in participating.*** Hi, Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for Q&A. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
Re: dynamically change default update chain
Just to get the obvious sledgehammer solution out of the way - upload a new, edited solrconfig.xml with the default changed, and reload the core. -Mike On 11/3/14 6:28 AM, Dmitry Kan wrote: Hello solr fellows, I'm working on a project that involves using two update chains. One default chain is used most of the time and another one custom is used sporadically. The default update chain is called automatically without action needed (well, that's why it is default). The custom pipeline can be switched on using update.chain http parameter, like so: [code] UpdateRequest updateRequest = new UpdateRequest(); updateRequest.setCommitWithin(1); updateRequest.setParam("update.chain", "customupdatechain"); updateRequest.add(solrDoc); updateRequest.process(solrServer); [/code] Now I have a new requirement: be able to install the custom chain as the default update chain such that any client that is sending data in will get it processed via the custom chain and not the default chain. And this should happen seamlessly to the client, i.e. no parameter change needed. Is this possible with the current state of the Solr core / collection api or some other method?
RE: FOSDEM 2015 - Open Source Search Dev Room
Hi, forgot to mention: FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See also: https://fosdem.org/2015/ I hope to see you there! Uwe > -Original Message- > From: Uwe Schindler [mailto:uschind...@apache.org] > Sent: Monday, November 03, 2014 1:29 PM > To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr- > u...@lucene.apache.org; gene...@lucene.apache.org > Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room > > ***Please forward this CFP to anyone who may be interested in > participating.*** > > Hi, > > Search has evolved to be much more than simply full-text search. We now > rely on “search engines” for a wide variety of functionality: > search as navigation, search as analytics and backend for data visualization > and sometimes, dare we say it, as a data store. The purpose of this dev room > is to explore the new world of open source search engines: their enhanced > functionality, new use cases, feature and architectural deep dives, and the > position of search in relation to the wider set of software tools. > > We welcome proposals from folks working with or on open source search > engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) > or technologies that heavily depend upon search (e.g. > NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in > presentations on search algorithms, machine learning, real-world > implementation/deployment stories and explorations of the future of > search. > > Talks should be 30-60 minutes in length, including time for Q&A. > > You can submit your talks to us here: > https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3 > 8G0OxSfp84A/viewform > > Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We > cannot guarantee we will have the opportunity to review submissions made > after the deadline, so please submit early (and often)! > > Should you have any questions, you can contact the Dev Room > organizers: opensourcesearch-devr...@lists.fosdem.org > > Cheers, > LH on behalf of the Open Source Search Dev Room Program Committee* > > * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten > Curdt, Uwe Schindler > > - > Uwe Schindler > uschind...@apache.org > Apache Lucene PMC Member / Committer > Bremen, Germany > http://lucene.apache.org/ > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org
order of updates
HI, can anybody give me a confirm? If I add multiple document with the same id but differing on other fields and then issue a commit (no commits before this) the last added document gets indexed, right? H.p. using solr 4 and default settings for optimistic locking. Matteo
Re: dynamically change default update chain
Thanks, Mike, we have discussed something similar with steffkes on IRC today, where I said: "some programmatic convenience would be great of course. But I could in principle imagine having two versions of solrconfig.xml and swapping them followed by a core reload. It just sounds a bit scary to me." But now, after pondering a bit more over it, I start to get inclined towards "fiddling with sending dummy documents with certain fields that will tell the update component to either call another update component or proceed normally" another = custom updater normally = default updater More ideas are of course welcome! Dmitry On Mon, Nov 3, 2014 at 2:41 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > Just to get the obvious sledgehammer solution out of the way - upload a > new, edited solrconfig.xml with the default changed, and reload the core. > > -Mike > > > > On 11/3/14 6:28 AM, Dmitry Kan wrote: > >> Hello solr fellows, >> >> I'm working on a project that involves using two update chains. One >> default >> chain is used most of the time and another one custom is used >> sporadically. >> >> The default update chain is called automatically without action needed >> (well, that's why it is default). >> >> The custom pipeline can be switched on using update.chain http parameter, >> like so: >> >> [code] >> UpdateRequest updateRequest = new UpdateRequest(); >> updateRequest.setCommitWithin(1); >> updateRequest.setParam("update.chain", "customupdatechain"); >> updateRequest.add(solrDoc); >> updateRequest.process(solrServer); >> [/code] >> >> Now I have a new requirement: be able to install the custom chain as the >> default update chain such that any client that is sending data in will get >> it processed via the custom chain and not the default chain. And this >> should happen seamlessly to the client, i.e. no parameter change needed. >> >> Is this possible with the current state of the Solr core / collection api >> or some other method? >> >> > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: SolrCloud use of "min_rf" through SolrJ
In case anyone else runs into this, I've managed to make it work. I didn't notice in the ticket discussion that the specific feature is enabled when min_rf >=2, I was setting min_rf=1. It goes without saying that you should also have at least 2 replicas in your SolrCloud configuration. The actual code I've used to make it return "rf" is UpdateRequest req = new UpdateRequest(); req.setParam(UpdateRequest.MIN_REPFACT, "2"); req.add(doc); NamedList response = solrServer.request(req); -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-use-of-min-rf-through-SolrJ-tp4164966p4167250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ignoring Duplicates in Multivalue Field
The update processors are only processing the values in the "source" data, not the data that has already been indexed and stored. We probably need to file a Jira to add an "insert" field value option that merges in the new field value, skipping it if it already exists or appending it to the end of the existing list of field values for a multivalued field. You could try... a combination of both "remove" and "add", assuming that Solr applies them in the order specified, to remove any existing value and then add it to the end. See: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky -Original Message- From: Tomer Levi Sent: Monday, November 3, 2014 4:19 AM To: solr-user@lucene.apache.org ; Ahmet Arslan Subject: RE: Ignoring Duplicates in Multivalue Field Hi Ahmet, When I add the RunUpdateProcessorFactory Solr didn't remove any duplications. Any other idea? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Monday, November 03, 2014 1:35 AM To: solr-user@lucene.apache.org Subject: Re: Ignoring Duplicates in Multivalue Field Hi Tomer, What happens when you add class="solr.RunUpdateProcessorFactory" /> to your chain? Ahmet On Sunday, November 2, 2014 1:22 PM, Tomer Levi wrote: Hi, I’m trying to make my “update” request handler ignore multivalue duplications in updates. To make my use case clear, let’s assume my index already contains a document like: { id:”100”, “myMultValueField”: [“1”,”2”,”3”] } Later I would like to send an update like: { id:”100”,” myMultValueField” {“add”:”2”} } How can I make the update request handler understand that “2” already exist and ignore it? I tried to add update chain below but it didn’t work for me. myMultValueField And add it to my requestHandler: uniq-fields Tomer Levi Software Engineer Big Data Group Product & Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
Re: dynamically change default update chain
An update: Another idea comes from Erick Hatcher; sharing it for the benefit of anyone who's interested in the topic: maybe you can make a custom request handler that toggles which is the default chain? On Mon, Nov 3, 2014 at 4:08 PM, Dmitry Kan wrote: > Thanks, Mike, > > we have discussed something similar with steffkes on IRC today, where I > said: "some programmatic convenience would be great of course. But I > could in principle imagine having two versions of solrconfig.xml and > swapping them followed by a core reload. It just sounds a bit scary to me. > " > > But now, after pondering a bit more over it, I start to get inclined > towards "fiddling with sending dummy documents with certain fields that > will tell the update component to either call another update component or > proceed normally" > > another = custom updater > normally = default updater > > More ideas are of course welcome! > > Dmitry > > On Mon, Nov 3, 2014 at 2:41 PM, Michael Sokolov < > msoko...@safaribooksonline.com> wrote: > >> Just to get the obvious sledgehammer solution out of the way - upload a >> new, edited solrconfig.xml with the default changed, and reload the core. >> >> -Mike >> >> >> >> On 11/3/14 6:28 AM, Dmitry Kan wrote: >> >>> Hello solr fellows, >>> >>> I'm working on a project that involves using two update chains. One >>> default >>> chain is used most of the time and another one custom is used >>> sporadically. >>> >>> The default update chain is called automatically without action needed >>> (well, that's why it is default). >>> >>> The custom pipeline can be switched on using update.chain http parameter, >>> like so: >>> >>> [code] >>> UpdateRequest updateRequest = new UpdateRequest(); >>> updateRequest.setCommitWithin(1); >>> updateRequest.setParam("update.chain", "customupdatechain"); >>> updateRequest.add(solrDoc); >>> updateRequest.process(solrServer); >>> [/code] >>> >>> Now I have a new requirement: be able to install the custom chain as the >>> default update chain such that any client that is sending data in will >>> get >>> it processed via the custom chain and not the default chain. And this >>> should happen seamlessly to the client, i.e. no parameter change needed. >>> >>> Is this possible with the current state of the Solr core / collection api >>> or some other method? >>> >>> >> > > > -- > Dmitry Kan > Luke Toolbox: http://github.com/DmitryKey/luke > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > SemanticAnalyzer: www.semanticanalyzer.info > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: order of updates
On Mon, Nov 3, 2014 at 8:53 AM, Matteo Grolla wrote: > HI, > can anybody give me a confirm? > If I add multiple document with the same id but differing on other fields and > then issue a commit (no commits before this) the last added document gets > indexed, right? Correct. > using solr 4 and default settings for optimistic locking. If you haven't seen it, I did an example of that a while back: http://heliosearch.org/solr/optimistic-concurrency/ -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
Re: Ignoring Duplicates in Multivalue Field
>From memory, if you use UniqFieldsUpdateProcessor after DistributedUpdateProcessor, then you will be filtering on the set ["1", "2", "3", "2"]. ** * * * * * myMultValueField * * * ** ** On 4 November 2014 01:37, Jack Krupansky wrote: > The update processors are only processing the values in the "source" data, > not the data that has already been indexed and stored. > > We probably need to file a Jira to add an "insert" field value option that > merges in the new field value, skipping it if it already exists or > appending it to the end of the existing list of field values for a > multivalued field. > > You could try... a combination of both "remove" and "add", assuming that > Solr applies them in the order specified, to remove any existing value and > then add it to the end. > > See: > https://cwiki.apache.org/confluence/display/solr/ > Updating+Parts+of+Documents > > -- Jack Krupansky > > -Original Message- From: Tomer Levi > Sent: Monday, November 3, 2014 4:19 AM > To: solr-user@lucene.apache.org ; Ahmet Arslan > Subject: RE: Ignoring Duplicates in Multivalue Field > > > Hi Ahmet, > When I add the RunUpdateProcessorFactory Solr didn't remove any > duplications. > Any other idea? > > > -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] > Sent: Monday, November 03, 2014 1:35 AM > To: solr-user@lucene.apache.org > Subject: Re: Ignoring Duplicates in Multivalue Field > > Hi Tomer, > > What happens when you add/> to your chain? > > Ahmet > > > > On Sunday, November 2, 2014 1:22 PM, Tomer Levi > wrote: > > > > Hi, > I’m trying to make my “update” request handler ignore multivalue > duplications in updates. > To make my use case clear, let’s assume my index already contains a > document like: > { > id:”100”, > “myMultValueField”: [“1”,”2”,”3”] > } > > Later I would like to send an update like: > { > id:”100”,” > myMultValueField” {“add”:”2”} > } > > How can I make the update request handler understand that “2” already > exist and ignore it? > I tried to add update chain below but it didn’t work for me. > > > > myMultValueField > > > > > And add it to my requestHandler: > > > uniq-fields > > > > Tomer Levi > Software Engineer > Big Data Group > Product & Technology Unit > (T) +972 (9) 775-2693 > > tomer.l...@nice.com > www.nice.com >
Re: SOLRJ - query with ChildDocTransformerFactory crash because of the javabin parser
bq: Was a little, let's say mistake, in my query Been there, done that ;) Thanks for closing this out. Best, Erick On Mon, Nov 3, 2014 at 3:25 AM, andreic9203 wrote: > Hello, > > And to answer to my question...:D > > Was a little, let's say mistake, in my query. > Instead of > fl=[child parentFilter="cat:PARENT" childFilter="cat:CHILD"] > should be > fl=id,[child parentFilter="cat:PARENT" childFilter="cat:CHILD"] > > Awkward, because if you put the query and the filter in URL or in the solr > queries tool, both methods works very well. But when you want to make the > query from java, with Solrj, the library appends that parser section, and it > crashes if it cannot find parent fields in the results. > > Instead of the id, you could put what field name do you want from the > parent. > > Regards, > Andrei > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLRJ-query-with-ChildDocTransformerFactory-crash-because-of-the-javabin-parser-tp4167183p4167218.html > Sent from the Solr - User mailing list archive at Nabble.com.
Solr slow start up (tlog is small)
Hi, I am using Solr 4.9 with Tomcat and it works fine except that the deployment of solr.war is too long. While deploying Solr, all webapps on Tomcat stop responding which is unacceptable. Most articles I found say that it might result from big transaction log because of uncommitted documents, but this is not my case. At first, the Solr data is 280G and the start up time is 30 minutes. Then I set a field to stored="false" and re-index whole data. The data size became 185G and the start up time reduced to 17 minutes, but it is still too long. Here are some numbers I measured: 1) Solr home: 280G tlog: 500K 30 min to start up While starting up, disk read is constantly about 50MB/s (according to dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data while starting up, which is 30% of index data size. 2) Solr home: 185G tlog: 5M 17 minutes to start up While starting up, disk read is constantly about 5MB/s (according to dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while starting up, which is about 3% of index data size. p.s. I did commit each time 1000 documents being added and did optimization after all documents are added. Any ideas or suggestions would be appreciated. Thanks, Po-Yu
Solr slow startup
Dear All, Sorry for the possibly newbie question as I have only recently started experimenting with Solr and Solrcloud. I am trying to import an index originally created with Lucene 2.x so Solr 4.10. What I did was: 1. upgrade index to version 3.x with IndexUpgrader 2. upgrade index to version 4.x with IndexUpgrader 3. created schema for Solr and used the default solrconfig (with some paths changes) 4. succesfully started Solr The sizes I am speaking about are in tens of gigabytes and the startup times are 5~10 minutes. I have read here: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCMQFjAB&url=https%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrPerformanceProblems&ei=AKNXVL7ULbGR7Abp7IDYCA&usg=AFQjCNEtw2Zma8ST3JLGL3xw6nG2G_0YuA&sig2=HmM8R1VYuVtXv8lQHsHPJQ&bvm=bv.78597519,bs.1,d.dGY&cad=rja that it has possibly something to do with the updateHandler and enabled the autoCommit as suggested, however with no improvement. Such a long startup time feels odd when Lucene itself seems to load the same indexes in no time. I would very much appreciate any help with this issue. Best, Michal Krajnansky
Re: Solr slow start up (tlog is small)
Can you tell from the logs what Solr is doing during that time? Do you have any warming queries configured? Also see this: https://issues.apache.org/jira/browse/SOLR-6679 (comment out suggester related stuff if you aren't using it) -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data On Mon, Nov 3, 2014 at 11:03 AM, Po-Yu Chuang wrote: > Hi, > > I am using Solr 4.9 with Tomcat and it works fine except that the > deployment of solr.war is too long. While deploying Solr, all webapps on > Tomcat stop responding which is unacceptable. Most articles I found say > that it might result from big transaction log because of uncommitted > documents, but this is not my case. > > At first, the Solr data is 280G and the start up time is 30 minutes. Then I > set a field to stored="false" and re-index whole data. The data size became > 185G and the start up time reduced to 17 minutes, but it is still too long. > > Here are some numbers I measured: > > 1) > Solr home: 280G > tlog: 500K > 30 min to start up > While starting up, disk read is constantly about 50MB/s (according to > dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data while > starting up, which is 30% of index data size. > > 2) > Solr home: 185G > tlog: 5M > 17 minutes to start up > While starting up, disk read is constantly about 5MB/s (according to > dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while > starting up, which is about 3% of index data size. > > p.s. I did commit each time 1000 documents being added and did optimization > after all documents are added. > > Any ideas or suggestions would be appreciated. > > Thanks, > Po-Yu
Re: Solr slow startup
One possible cause of a slow startup with the default configs: https://issues.apache.org/jira/browse/SOLR-6679 -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data On Mon, Nov 3, 2014 at 11:05 AM, Michal Krajňanský wrote: > Dear All, > > > Sorry for the possibly newbie question as I have only recently started > experimenting with Solr and Solrcloud. > > > I am trying to import an index originally created with Lucene 2.x so Solr > 4.10. What I did was: > > 1. upgrade index to version 3.x with IndexUpgrader > 2. upgrade index to version 4.x with IndexUpgrader > 3. created schema for Solr and used the default solrconfig (with some paths > changes) > 4. succesfully started Solr > > The sizes I am speaking about are in tens of gigabytes and the startup > times are 5~10 minutes. > > > I have read here: > https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCMQFjAB&url=https%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrPerformanceProblems&ei=AKNXVL7ULbGR7Abp7IDYCA&usg=AFQjCNEtw2Zma8ST3JLGL3xw6nG2G_0YuA&sig2=HmM8R1VYuVtXv8lQHsHPJQ&bvm=bv.78597519,bs.1,d.dGY&cad=rja > that it has possibly something to do with the updateHandler and enabled the > autoCommit as suggested, however with no improvement. > > Such a long startup time feels odd when Lucene itself seems to load the > same indexes in no time. > > I would very much appreciate any help with this issue. > > > Best, > > > Michal Krajnansky
Re: Solr slow startup
Hey Yonik, That (getting rid of the suggester) solved the issue! You saved me a lot of time and nerves. Best, Michal 2014-11-03 17:19 GMT+01:00 Yonik Seeley : > One possible cause of a slow startup with the default configs: > https://issues.apache.org/jira/browse/SOLR-6679 > > -Yonik > http://heliosearch.org - native code faceting, facet functions, > sub-facets, off-heap data > > > On Mon, Nov 3, 2014 at 11:05 AM, Michal Krajňanský > wrote: > > Dear All, > > > > > > Sorry for the possibly newbie question as I have only recently started > > experimenting with Solr and Solrcloud. > > > > > > I am trying to import an index originally created with Lucene 2.x so Solr > > 4.10. What I did was: > > > > 1. upgrade index to version 3.x with IndexUpgrader > > 2. upgrade index to version 4.x with IndexUpgrader > > 3. created schema for Solr and used the default solrconfig (with some > paths > > changes) > > 4. succesfully started Solr > > > > The sizes I am speaking about are in tens of gigabytes and the startup > > times are 5~10 minutes. > > > > > > I have read here: > > > https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCMQFjAB&url=https%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrPerformanceProblems&ei=AKNXVL7ULbGR7Abp7IDYCA&usg=AFQjCNEtw2Zma8ST3JLGL3xw6nG2G_0YuA&sig2=HmM8R1VYuVtXv8lQHsHPJQ&bvm=bv.78597519,bs.1,d.dGY&cad=rja > > that it has possibly something to do with the updateHandler and enabled > the > > autoCommit as suggested, however with no improvement. > > > > Such a long startup time feels odd when Lucene itself seems to load the > > same indexes in no time. > > > > I would very much appreciate any help with this issue. > > > > > > Best, > > > > > > Michal Krajnansky >
Re: order of updates
Thanks really a lot Yonik! Il giorno 03/nov/2014, alle ore 15:51, Yonik Seeley ha scritto: > On Mon, Nov 3, 2014 at 8:53 AM, Matteo Grolla wrote: >> HI, >>can anybody give me a confirm? >> If I add multiple document with the same id but differing on other fields >> and then issue a commit (no commits before this) the last added document >> gets indexed, right? > > Correct. > >> using solr 4 and default settings for optimistic locking. > > If you haven't seen it, I did an example of that a while back: > > http://heliosearch.org/solr/optimistic-concurrency/ > > -Yonik > http://heliosearch.org - native code faceting, facet functions, > sub-facets, off-heap data
Cannot use Phrase Queries in eDisMax and filtering
I am writing a search bar application with Solr which I'd like to have the following two features: phrase matching for user queries - results which match user phrase are boosted. Field faceting based on 'tags' field. When I execute this query: q=steve jobs& fq=storeid:527bd613e4b0564cc755460a& sort=score desc& start=50& rows=2& fl=*,score& qt=/query& defType=edismax& pf=concept_name^15 note_text^5 file_text^2.5& pf3=1& pf2=1& ps=1& group=true& group.field=conceptid& group.limit=10& group.ngroups=true The phrase boosting feature operates correctly and boosts results which closer match the phrase query "Steve Jobs". As an example, the concept with concept_name="Steve Jobs" has a score of ~3.96 in the results of this query. However, when I execute the query after the user has selected a facet field (The facet fields are bought up from a seperate query) and execute the following query: q=steve jobs& fq=storeid:527bd613e4b0564cc755460a& fq=tag:Person& sort=score desc& start=0& rows=50& fl=*,score& qt=/query& defType=edismax& pf=concept_name^15 note_text^5 file_text^2.5& pf3=1& pf2=1& ps=1& group=true& group.field=conceptid& group.limit=10& group.ngroups=true The phrase boosting does not work, even though the facet filtering does. The concept with concept_name="Steve Jobs" has a score of ~0.2 in the results of this query. I'm not sure if this is a bug, but if it is not can someone point me to the relevant documentation that will help me fix this issue? All queries were written using the SolrJ Library. I also tried searching the string "Steve Jobs" and it returned the correct results (The with concept_name "Steve Jobs" was returned highest)
Re: Solr slow start up (tlog is small)
One other reason for a slow start-up can be large number of segments in the index. Which I'm guessing is not the case since you optimized? But anyway, what's the number of segments in both 280G and 185G indices? Dmitry On Mon, Nov 3, 2014 at 6:17 PM, Yonik Seeley wrote: > Can you tell from the logs what Solr is doing during that time? > Do you have any warming queries configured? > Also see this: https://issues.apache.org/jira/browse/SOLR-6679 > (comment out suggester related stuff if you aren't using it) > > -Yonik > http://heliosearch.org - native code faceting, facet functions, > sub-facets, off-heap data > > > On Mon, Nov 3, 2014 at 11:03 AM, Po-Yu Chuang > wrote: > > Hi, > > > > I am using Solr 4.9 with Tomcat and it works fine except that the > > deployment of solr.war is too long. While deploying Solr, all webapps on > > Tomcat stop responding which is unacceptable. Most articles I found say > > that it might result from big transaction log because of uncommitted > > documents, but this is not my case. > > > > At first, the Solr data is 280G and the start up time is 30 minutes. > Then I > > set a field to stored="false" and re-index whole data. The data size > became > > 185G and the start up time reduced to 17 minutes, but it is still too > long. > > > > Here are some numbers I measured: > > > > 1) > > Solr home: 280G > > tlog: 500K > > 30 min to start up > > While starting up, disk read is constantly about 50MB/s (according to > > dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data > while > > starting up, which is 30% of index data size. > > > > 2) > > Solr home: 185G > > tlog: 5M > > 17 minutes to start up > > While starting up, disk read is constantly about 5MB/s (according to > > dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while > > starting up, which is about 3% of index data size. > > > > p.s. I did commit each time 1000 documents being added and did > optimization > > after all documents are added. > > > > Any ideas or suggestions would be appreciated. > > > > Thanks, > > Po-Yu > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Data colocation hint to solr index
Hi, In my company, we serve car manuals for different car manufacturers with their various makes and models. Typically, the search is always done within context of a car manufacturer, year, make and model. Is there a way in Solr to create indexes based on this criteria? Currently, the index contains all manufactueres, makes and models. This causes index to go over a terabyte. Hence, if we could teach solr to co-locate all the data for a particular manufactuere, make and model, that would be an ideal thing to do. I was wondering if this is possible? Regards, Jim
Re: Data colocation hint to solr index
If you're using SolrCloud then you can use composite IDs such as !doc-id to co-locate documents belonging to a manufacturer together and at query time, you can add _route_=! to the request to route it to the correct node. On Mon, Nov 3, 2014 at 11:00 PM, maninder batth wrote: > Hi, > In my company, we serve car manuals for different car manufacturers with > their various makes and models. Typically, the search is always done within > context of a car manufacturer, year, make and model. Is there a way in Solr > to create indexes based on this criteria? Currently, the index contains all > manufactueres, makes and models. This causes index to go over a terabyte. > Hence, if we could teach solr to co-locate all the data for a particular > manufactuere, make and model, that would be an ideal thing to do. > I was wondering if this is possible? > > Regards, > Jim > -- Regards, Shalin Shekhar Mangar.
Re: Solr slow start up (tlog is small)
Hi Yonik, After removing the suggest component, it takes only 7 seconds to start up now!!! Thank you so much. Po-Yu On Mon, Nov 3, 2014 at 11:17 AM, Yonik Seeley wrote: > Can you tell from the logs what Solr is doing during that time? > Do you have any warming queries configured? > Also see this: https://issues.apache.org/jira/browse/SOLR-6679 > (comment out suggester related stuff if you aren't using it) > > -Yonik > http://heliosearch.org - native code faceting, facet functions, > sub-facets, off-heap data > > > On Mon, Nov 3, 2014 at 11:03 AM, Po-Yu Chuang > wrote: > > Hi, > > > > I am using Solr 4.9 with Tomcat and it works fine except that the > > deployment of solr.war is too long. While deploying Solr, all webapps on > > Tomcat stop responding which is unacceptable. Most articles I found say > > that it might result from big transaction log because of uncommitted > > documents, but this is not my case. > > > > At first, the Solr data is 280G and the start up time is 30 minutes. > Then I > > set a field to stored="false" and re-index whole data. The data size > became > > 185G and the start up time reduced to 17 minutes, but it is still too > long. > > > > Here are some numbers I measured: > > > > 1) > > Solr home: 280G > > tlog: 500K > > 30 min to start up > > While starting up, disk read is constantly about 50MB/s (according to > > dstat). So it seems that Solr reads 30m * 60s * 50MB/s = 90GB of data > while > > starting up, which is 30% of index data size. > > > > 2) > > Solr home: 185G > > tlog: 5M > > 17 minutes to start up > > While starting up, disk read is constantly about 5MB/s (according to > > dstat). So it seems that Solr reads 17m * 60s *5MB/s = 5GB of data while > > starting up, which is about 3% of index data size. > > > > p.s. I did commit each time 1000 documents being added and did > optimization > > after all documents are added. > > > > Any ideas or suggestions would be appreciated. > > > > Thanks, > > Po-Yu >
Re: Solr error : sorry, no dataimport-handler defined!
Hi Alexandre, Thanks so much for your input and examples! Ok so here's what I've done so far with no luck as of yet unfortunately. Inside of solrconfig.xml I put the following: ** As you can see, I've replaced the relative paths with absolute ones. So as of now, my solr 4 server is no longer complaining about not being able to find directories and modules. So we're off to a good start! And now I can list the 'dist' directory and in my case find the jar files I'm looking for. [root@solr1:/opt/solr/collection1/conf] #ls /opt/solr/dist/ | grep dataimporthandler *solr-dataimporthandler-4.10.1.jar* *solr-dataimporthandler-extras-4.10.1.jar* So far so good. I next tried this db-data-config file in the same directory as solrconfig.xml [root@solr1:/opt/solr/collection1/conf] #cat db-data-config.xml.bak Restarted tomcat, and with this setup I wasn't getting any errors in the browser or logs and the web interface was still working. Always a good sign! So then I went down to Core Selector -> collection1 -> data import. And it was quite frustrating, cuz I was getting the same error as before! *sorry, no dataimport-handler defined!* So then I tried the exact db-data-config.xml file from your example. Knowing full well it wouldn't actually work, as I"m using a remote mysql database instead of a local hsqldb database.But at this point, my only goal was to get the data import to show up as an option. I'd tweak the db-data-config.xml file at a later point if this in fact worked! But alas, I was still getting the same result... *sorry, no dataimport-handler defined!* G.. so annoying after all that work. Anyway, I really do appreciate your kindness and help. :) I'm enclosing my solrconfig.xml and both versions of my db-data-config.xml in hopes that we can make some progress here! Thank Tim On Sun, Nov 2, 2014 at 9:50 PM, Alexandre Rafalovitch wrote: > That tutorial seems to be somewhat dodgy. You need at least one more > step of adding DIH library in solrconfig.xml: > > https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_2/solr/example/example-DIH/solr/db/conf/solrconfig.xml#L75 > (I recommend using absolute path though). > > Also, you should not need to spell the full class out. See lower down > in the same class: > > https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_2/solr/example/example-DIH/solr/db/conf/solrconfig.xml#L823 > > Finally, in the config file, I don't remember document element having > a name. Again, the working example can be found in the same directory: > > https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_2/solr/example/example-DIH/solr/db/conf/db-data-config.xml#L3 > > Solr ships with a bunch of examples. If you are using/download > standard distribution, you could start from those until you understand > how it all hangs together. > > Regards, >Alex. > > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On 2 November 2014 21:26, Tim Dunphy wrote: > > Hi Alex, > > > > > >> I thought the "" > >> and the ending span were broken email thing but they seem to be in the > >> solrconfig.xml file as well. I would start from removing those and > >> leaving just the actual definition. > > > > > > Thanks for your response! > > > > OK so I tried your suggestion of removing those span tags like so: > > > -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B 4.10.1 ${solr.data.dir:} ${solr.hdfs.home:} ${solr.hdfs.confdir:} ${solr.hdfs.blockcache.enabled:true} ${solr.hdfs.blockcache.global:true} ${solr.lock.type:native} true false ${solr.ulog.dir:} ${solr.autoCommit.maxTime:15000} false ${solr.autoSoftCommit.maxTime:-1} 1024 true 20 200 static firstSearcher warmi
Re: Cannot use Phrase Queries in eDisMax and filtering
The results are different, because you need to set "start" parameter 0 instead of 50 in the first query (after filtration ) with same rows value -- View this message in context: http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167329.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr error : sorry, no dataimport-handler defined!
Two problems: 1) You have (span) elements in your solrconfig.xml. They just do not belong there. The original tutorial screwed up. Your element should be on the same level as the other elements in that example. 2) You also seem to have another random piece of data configuration in the solrconfig.xml. Also in the spans, so they are being ignored. But still very very wrong. Take those out all together. You should just have 3 things tying together: 1) jars loaded in the lib statement in solrconfig.xml 2) handler definition that points at your data-config file 3) data-config file itself. If you are still having troubles, I strongly recommend getting the shipped example to work and then adding your own stuff until you get that working. Then, try to create a standalone configuration. Sometimes, this is an easier approach for the first time user. Regards, Alex. P.s. I also cover that in my Solr book. A relevant example is here: https://github.com/arafalov/solr-indexing-book/tree/master/published/dihdb Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 3 November 2014 13:40, Tim Dunphy wrote: > Hi Alexandre, > > Thanks so much for your input and examples! Ok so here's what I've done so > far with no luck as of yet unfortunately. > > Inside of solrconfig.xml I put the following: > > > > > > > > > > > > > > As you can see, I've replaced the relative paths with absolute ones. So as > of now, my solr 4 server is no longer complaining about not being able to > find directories and modules. So we're off to a good start! And now I can > list the 'dist' directory and in my case find the jar files I'm looking for. > > > [root@solr1:/opt/solr/collection1/conf] #ls /opt/solr/dist/ | grep > dataimporthandler > solr-dataimporthandler-4.10.1.jar > solr-dataimporthandler-extras-4.10.1.jar > > So far so good. > > I next tried this db-data-config file in the same directory as > solrconfig.xml > > [root@solr1:/opt/solr/collection1/conf] #cat db-data-config.xml.bak > > > > > url="jdbc:mysql://web1.mydomain.com:3306/jokefire" user="admin" > password="secret" batchSize="1" /> > > > > > > > > > > /> > > > > > > > > > Restarted tomcat, and with this setup I wasn't getting any errors in the > browser or logs and the web interface was still working. Always a good sign! > > So then I went down to Core Selector -> collection1 -> data import. And it > was quite frustrating, cuz I was getting the same error as before! > > sorry, no dataimport-handler defined! > > So then I tried the exact db-data-config.xml file from your example. > > > url="jdbc:hsqldb:./example-DIH/hsqldb/ex" user="sa" /> > > deltaQuery="select id from item where last_modified > > '${dataimporter.last_index_time}'"> > > > query="select DESCRIPTION from FEATURE where > ITEM_ID='${item.ID}'" > deltaQuery="select ITEM_ID from FEATURE where > last_modified > '${dataimporter.last_index_time}'" > parentDeltaQuery="select ID from item where > ID=${feature.ITEM_ID}"> > > > > query="select CATEGORY_ID from item_category where > ITEM_ID='${item.ID}'" > deltaQuery="select ITEM_ID, CATEGORY_ID from > item_category where last_modified > '${dataimporter.last_index_time}'" > parentDeltaQuery="select ID from item where > ID=${item_category.ITEM_ID}"> > query="select DESCRIPTION from category where ID = > '${item_category.CATEGORY_ID}'" > deltaQuery="select ID from category where > last_modified > '${dataimporter.last_index_time}'" > parentDeltaQuery="select ITEM_ID, CATEGORY_ID from > item_category where CATEGORY_ID=${category.ID}"> > > > > > > > > Knowing full well it wouldn't actually work, as I"m using a remote mysql > database instead of a local hsqldb database.But at this point, my only goal > was to get the data import to show up as an option. I'd tweak the > db-data-config.xml file at a later point if this in fact worked! > > But alas, I was still getting the same result... > > sorry, no dataimport-handler defined! > > G.. so annoying after all that work. Anyway, I really do appreciate your > kindness and help. :) I'm enclosing my solrconfig.xml and both versions of > my db-data-config.xml in hopes that we can make some progress here! > > > Thank > > > Tim > > > > > > > > > > On Sun, Nov 2, 2014 at 9:50 PM, Alexandre Rafalovitch > wrote: >> >> That tutorial seems to be
Re: Cannot use Phrase Queries in eDisMax and filtering
That was a typo in the email I did not actually send the query with a start param of 50. I sent it with a start param of 0, I just verified. Sorry for the mistake. On Mon, Nov 3, 2014 at 1:41 PM, Ramzi Alqrainy wrote: > The results are different, because you need to set "start" parameter 0 > instead of 50 in the first query (after filtration ) with same rows value > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167329.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Consul instead of ZooKeeper anyone?
Thanks Erick, after looking further into Solr's source code, I see that it's married to ZK libraries and it won't be possible to extend existing code without diverting from the trunk. At the same time, I don't see any reason for lack of abstraction in cloud-related code of Solr and SolrJ. As far as I can see Consul provides all that SolrCloud needs and so if cloud code was using some more abstraction, ZK bindings could be substituted with another library. I am willing to implement a this functionality and the abstraction, but at the same time, I don't want to maintain my own branch of Solr because of this integration. Do you think it would be possible to add an abstraction layer to Solr source code in near future? I think Consul has all the features that SolrCloud needs and what's especially attractive about Consul is that it's memory footprint is 100X smaller than ZK. Mainly though, we are considering Consul as a main service locator for a bunch of other moving parts within Zimbra, so being able to avoid deploying ZK just for SolrCloud would save a bunch of $$ for large customers. Thanks, Greg - Original Message - From: "Erick Erickson" To: solr-user@lucene.apache.org Sent: Friday, October 31, 2014 5:15:09 PM Subject: Re: Consul instead of ZooKeeper anyone? Not that I know of, but look before you leap. I took a quick look at Consul and it really doesn't look like any kind of drop-in replacement. Also, the Zookeeper usage in SolrCloud isn't really pluggable AFAIK, so there'll be lots of places in the Solr code that need to be reworked etc., especially in the realm of collections and sharding. The Collections API will be challenging to port over I think. Not to mention SolrJ and CloudSolrServer for clients who want to interact with SolrCloud through Java. Not saying it won't work, I just suspect that getting it done would be a big job, and thereafter keeping those changes in sync with the changing SolrCloud code base would chew up a lots of time. So if I were putting my Product Manager hat on I'd ask "is the benefit worth the effort?". All that said, go for it if you've a mind to! Best, Erick On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev wrote: > I am investigating a project to make SolrCloud run on Consul instead of > ZooKeeper. So far, my research revealed no such efforts, but I wanted to > check with this list to make sure I am not going to be reinventing the wheel. > Have anyone attempted using Consul instead of ZK to coordinate SolrCloud > nodes? > > Thanks, > Greg
Re: Cannot use Phrase Queries in eDisMax and filtering
I tried to produce your case in my machine with below queries, but everything worked fine with me. I just want to ask you a question what is the field type of "tag" field ? q=bmw& fl=score,*& wt=json& fq=city_id:59& qt=/query& defType=edismax& pf=title^15%20discription^5& pf3=1& pf2=1& ps=1& qroup=true& group.field=member_id& group.limit=10& sort=score desc& group.ngroups=true q=bmw& fl=score,*& wt=json& fq=city_id:59& qt=/query& defType=edismax& pf=title^15%20discription^5& pf3=1& pf2=1& ps=1& qroup=true& group.field=member_id& group.limit=10& group.ngroups=true& sort=score desc& fq=category_id:1777 -- View this message in context: http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167338.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cannot use Phrase Queries in eDisMax and filtering
It is of type string. On Mon, Nov 3, 2014 at 2:29 PM, Ramzi Alqrainy wrote: > I tried to produce your case in my machine with below queries, but > everything > worked fine with me. I just want to ask you a question what is the field > type of "tag" field ? > > q=bmw& > fl=score,*& > wt=json& > fq=city_id:59& > qt=/query& > defType=edismax& > pf=title^15%20discription^5& > pf3=1& > pf2=1& > ps=1& > qroup=true& > group.field=member_id& > group.limit=10& > sort=score desc& > group.ngroups=true > > > > > q=bmw& > fl=score,*& > wt=json& > fq=city_id:59& > qt=/query& > defType=edismax& > pf=title^15%20discription^5& > pf3=1& > pf2=1& > ps=1& > qroup=true& > group.field=member_id& > group.limit=10& > group.ngroups=true& > sort=score desc& > fq=category_id:1777 > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Cannot-use-Phrase-Queries-in-eDisMax-and-filtering-tp4167302p4167338.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Data colocation hint to solr index
Thank you for recommendation on composite IDs. We currently use solr 3.x. After reading on composite ids, it sounds like a feature of solr 4.x. Is something similar available in solr 3.x also? Also, we do not use solrCloud. On Mon, Nov 3, 2014 at 12:41 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > If you're using SolrCloud then you can use composite IDs such as > !doc-id to co-locate documents belonging to a manufacturer together > and at query time, you can add _route_=! to the request to route it > to the correct node. > > On Mon, Nov 3, 2014 at 11:00 PM, maninder batth > wrote: > > > Hi, > > In my company, we serve car manuals for different car manufacturers with > > their various makes and models. Typically, the search is always done > within > > context of a car manufacturer, year, make and model. Is there a way in > Solr > > to create indexes based on this criteria? Currently, the index contains > all > > manufactueres, makes and models. This causes index to go over a terabyte. > > Hence, if we could teach solr to co-locate all the data for a particular > > manufactuere, make and model, that would be an ideal thing to do. > > I was wondering if this is possible? > > > > Regards, > > Jim > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Which Solr releases contain SOLR-4470 (Security for inter-solr-node requests)
I am currently working on SolrCloud and its related security configurations for securing Solr web applications using HTTP Basic Authentication mechanism. Among the Solr nodes inside the SolrCloud clustered env, there seem to be existing some inter-solr-node communication issues due to the security configurations, which are the HTTP Authentication errors. Based on my research, the patch SOLR-4470 (Security for inter-solr-node requests) would be ideal for resolving these issues (please refer to the address: https://wiki.apache.org/solr/SolrSecurity#Security_for_inter-solr-node_requests). However, it seems to me that these security patches are out-of-box additions to the current Solr source codebase, which don't seem to be available in the recent Solr releases. If someone could point out which Solr releases or the jars from some online repositories that contain this patch, it would be appreciated very much. Jerry This e-mail is confidential. If you are not the intended recipient, you must not disclose or use the information contained in it. If you have received this e-mail in error, please tell us immediately by return e-mail and delete the document. No recipient may use the information in this e-mail in violation of any civil or criminal statute. Sentry disclaims all liability for any unauthorized uses of this e-mail or its contents. Sentry accepts no liability or responsibility for any damage caused by any virus transmitted with this e-mail.
Re: Which Solr releases contain SOLR-4470 (Security for inter-solr-node requests)
You find the answer to such questions by looking at the state of the JIRA issue https://issues.apache.org/jira/browse/SOLR-4470 Staus: Open Fix version: Trunk Which means that this feature is not included in any released Solr version (yet). -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 3. nov. 2014 kl. 22.39 skrev Yuan Jerry : > > I am currently working on SolrCloud and its related security configurations > for securing Solr web applications using HTTP Basic Authentication mechanism. > Among the Solr nodes inside the SolrCloud clustered env, there seem to be > existing some inter-solr-node communication issues due to the security > configurations, which are the HTTP Authentication errors. Based on my > research, the patch SOLR-4470 (Security for inter-solr-node requests) would > be ideal for resolving these issues (please refer to the address: > https://wiki.apache.org/solr/SolrSecurity#Security_for_inter-solr-node_requests). > However, it seems to me that these security patches are out-of-box additions > to the current Solr source codebase, which don't seem to be available in the > recent Solr releases. > > If someone could point out which Solr releases or the jars from some online > repositories that contain this patch, it would be appreciated very much. > > Jerry > > > This e-mail is confidential. If you are not the intended recipient, you must > not disclose or use the information contained in it. If you have received > this e-mail in error, please tell us immediately by return e-mail and delete > the document. No recipient may use the information in this e-mail in > violation of any civil or criminal statute. Sentry disclaims all liability > for any unauthorized uses of this e-mail or its contents. Sentry accepts no > liability or responsibility for any damage caused by any virus transmitted > with this e-mail.
Re: Which Solr releases contain SOLR-4470 (Security for inter-solr-node requests)
: I am currently working on SolrCloud and its related security : configurations for securing Solr web applications using HTTP Basic : Authentication mechanism. Among the Solr nodes inside the SolrCloud : clustered env, there seem to be existing some inter-solr-node : communication issues due to the security configurations, which are the : HTTP Authentication errors. Based on my research, the patch SOLR-4470 In my opinion, your best bet to "secure" Solr is to avoid any and all involvement of Basic Auth and instead use SSL with Client certificates... https://cwiki.apache.org/confluence/display/solr/Enabling+SSL 1) Already supported in Solr today - no patches needed 2) eliminates the complexity of needing a proxy in front of solr to handle the client auth, so that the solr nodes can talk to eachother w/o auth -- and/or: having solr nodes "forward" the client auth arround. Instead each solr node authenticates the client using the client's cert, and each node authenticates itself for the inter-node requests using it's own cert. 3) much more secure then Basic-Auth headers which could be sniffed by a man-in-the-middle (you could use SSL + Basic Auth - but if you are going to enable SSL anyway, why bother with Basic Auth? just configure the client certs) -Hoss http://www.lucidworks.com/
custom sorting of search result
Hello, We need to order solr search results according to specific rules. I will explain with an example. Let say solr returns 1000 results for query "sport". These results must be divided into three buckets according to rules that come from database. Then one doc must be chosen from each bucket and put in the results subsequently until all buckets are empty. One approach was to modify/override solr code where it gets results, sorts them and return #rows of elements. However, from the code in Weight.java scoreAll function we see that docs have only internal document id and nothing else. We expect unique solr document id in order to match documents with the custom scoring. We also see that Lucene code handles those doc ids to scoreAll function, and for now We do not want to modify Lucene code and prefer to solve this issue as a Solr plugin . Any ideas are welcome. Thanks. Alex.
Re: Consul instead of ZooKeeper anyone?
bq: Do you think it would be possible to add an abstraction layer to Solr source code in near future? I strongly doubt it. As you've already noted, this is a large amount of work. Without some super-compelling advantage I just don't see the interest. bq: to avoid deploying ZK just for SolrCloud would save a bunch of $$ for large customers How so? It's free. Making this change would, IMO, require a compelling story to generate much enthusiasm. So far I haven't seen that story, and Jürgen and Walter raise valid points that haven't been addressed. I suspect you're significantly underestimating the effort to get this stable in the SolrCloud world as well. I don't really want to be such a wet blanket, but you're asking about a very significant amount of work from a bunch of people, all of whom have lots of things on their plate. So without a _very_ good reason, I think it's unlikely to generate much interest. Best, Erick On Mon, Nov 3, 2014 at 11:17 AM, Greg Solovyev wrote: > Thanks Erick, > after looking further into Solr's source code, I see that it's married to ZK > libraries and it won't be possible to extend existing code without diverting > from the trunk. At the same time, I don't see any reason for lack of > abstraction in cloud-related code of Solr and SolrJ. As far as I can see > Consul provides all that SolrCloud needs and so if cloud code was using some > more abstraction, ZK bindings could be substituted with another library. I am > willing to implement a this functionality and the abstraction, but at the > same time, I don't want to maintain my own branch of Solr because of this > integration. Do you think it would be possible to add an abstraction layer to > Solr source code in near future? > > I think Consul has all the features that SolrCloud needs and what's > especially attractive about Consul is that it's memory footprint is 100X > smaller than ZK. Mainly though, we are considering Consul as a main service > locator for a bunch of other moving parts within Zimbra, so being able to > avoid deploying ZK just for SolrCloud would save a bunch of $$ for large > customers. > > Thanks, > Greg > > - Original Message - > From: "Erick Erickson" > To: solr-user@lucene.apache.org > Sent: Friday, October 31, 2014 5:15:09 PM > Subject: Re: Consul instead of ZooKeeper anyone? > > Not that I know of, but look before you leap. I took a quick look at > Consul and it really doesn't look like any kind of drop-in replacement. > Also, the Zookeeper usage in SolrCloud isn't really pluggable > AFAIK, so there'll be lots of places in the Solr code that need to be > reworked etc., especially in the realm of collections and sharding. > > The Collections API will be challenging to port over I think. > > Not to mention SolrJ and CloudSolrServer for clients who want to interact > with SolrCloud through Java. > > Not saying it won't work, I just suspect that getting it done would be > a big job, and thereafter keeping those changes in sync with the > changing SolrCloud code base would chew up a lots of time. So if > I were putting my Product Manager hat on I'd ask "is the benefit > worth the effort?". > > All that said, go for it if you've a mind to! > > Best, > Erick > > On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev wrote: >> I am investigating a project to make SolrCloud run on Consul instead of >> ZooKeeper. So far, my research revealed no such efforts, but I wanted to >> check with this list to make sure I am not going to be reinventing the >> wheel. Have anyone attempted using Consul instead of ZK to coordinate >> SolrCloud nodes? >> >> Thanks, >> Greg
Re: Data colocation hint to solr index
You have a TB-scale index and you're not using SolrCloud? Are you using master/slave or otherwise splitting up your index? Because if you're not, then please ship me some of your hardware because it must be awesome. Which is a tongue-in-cheek way of saying there must be lots of details you aren't telling us that would help us help you. Best, Erick On Mon, Nov 3, 2014 at 11:45 AM, maninder batth wrote: > Thank you for recommendation on composite IDs. We currently use solr 3.x. > After reading on composite ids, it sounds like a feature of solr 4.x. Is > something similar available in solr 3.x also? Also, we do not use solrCloud. > > On Mon, Nov 3, 2014 at 12:41 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > >> If you're using SolrCloud then you can use composite IDs such as >> !doc-id to co-locate documents belonging to a manufacturer together >> and at query time, you can add _route_=! to the request to route it >> to the correct node. >> >> On Mon, Nov 3, 2014 at 11:00 PM, maninder batth >> wrote: >> >> > Hi, >> > In my company, we serve car manuals for different car manufacturers with >> > their various makes and models. Typically, the search is always done >> within >> > context of a car manufacturer, year, make and model. Is there a way in >> Solr >> > to create indexes based on this criteria? Currently, the index contains >> all >> > manufactueres, makes and models. This causes index to go over a terabyte. >> > Hence, if we could teach solr to co-locate all the data for a particular >> > manufactuere, make and model, that would be an ideal thing to do. >> > I was wondering if this is possible? >> > >> > Regards, >> > Jim >> > >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >>
RE: Missing Records
So I jumped back on this. I have not been using the optimize option on this new set of tests. If I run the full index on the leader I seem to get all of the items in the database minus 3 that have a missing field. Indexing completed. Added/Updated: 903,990 documents. Deleted 0 documents. (Duration: 25m 11s) Requests: 1 (0/s), Fetched: 903,993 (598/s), Skipped: 0, Processed: 903,990 Last Modified:2 minutes ago Num Docs:903990 Max Doc:903990 Heap Memory Usage:2625744 Deleted Docs:0 Version:3249 Segment Count:7 Optimized: Current: If I run it on the other node I get: Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. (Duration: 27m 08s) Requests: 1 (0/s), Fetched: 903,993 (555/s), Skipped: 0, Processed: 903,993 (555/s) Last Modified:about a minute ago Num Docs:897791 Max Doc:897791 Heap Memory Usage:2621072 Deleted Docs:0 Version:3285 Segment Count:7 Optimized: Current: Any ideas? If there is any more info that is needed let me know. AJ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, October 31, 2014 1:44 PM To: solr-user@lucene.apache.org Subject: Re: Missing Records Sorry to say this, but I don't think the numDocs/maxDoc numbers are telling you anything. because it looks like you've optimized which purges any data associated with deleted docs, including the internal IDs which are the numDocs/maxDocs figures. So if there were deletions, we can't see any evidence of same. Siih. On Fri, Oct 31, 2014 at 9:56 AM, AJ Lemke wrote: > I have run some more tests so the numbers have changed a bit. > > Index Results done on Node 1: > Indexing completed. Added/Updated: 903,993 documents. Deleted 0 > documents. (Duration: 31m 47s) > Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: > 903,993 > > Node 1: > Last Modified: 44 minutes ago > Num Docs: 824216 > Max Doc: 824216 > Heap Memory Usage: -1 > Deleted Docs: 0 > Version: 1051 > Segment Count: 1 > Optimized: checked > Current: checked > > Node 2: > Last Modified: 44 minutes ago > Num Docs: 824216 > Max Doc: 824216 > Heap Memory Usage: -1 > Deleted Docs: 0 > Version: 1051 > Segment Count: 1 > Optimized: checked > Current: checked > > Search results are the same as the doc numbers above. > > Logs only have one instance of an error: > > ERROR - 2014-10-31 10:47:12.867; > org.apache.solr.update.StreamingSolrServers$1; error > org.apache.solr.common.SolrException: Bad Request > > > > request: > http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Some info that may be of help > This is on my local vm using jetty with the embedded zookeeper. > Commands to start cloud: > > java -DzkRun -jar start.jar > java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar > > sh zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir > ~/development/configs/inventory/ -confname config_ inventory sh > zkcli.sh -zkhost localhost:9983 -cmd linkconfig -collection inventory > -confname config_ inventory > > curl > "http://localhost:8983/solr/admin/collections?action=CREATE&name=inventory&numShards=1&replicationFactor=2&maxShardsPerNode=4"; > curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name= > inventory " > > AJ > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, October 31, 2014 9:49 AM > To: solr-user@lucene.apache.org > Subject: Re: Missing Records > > OK, that is puzzling. > > bq: If there were duplicates only one of the duplicates should be removed and > I still should be able to search for the ID and find one correct? > > Correct. > > Your bad request error is puzzling, you may be on to something there. > What it looks like is that somehow some of the documents you're > sending to Solr aren't getting indexed, either being dropped through > the network or perhaps have invalid fields, field formats (i.e. a date > in the wrong format, > whatever) or some such. When you complete the run, what are the maxDoc and > numDocs numbers on one of the nodes? > > What else do you see in the logs? They're pretty big after that many adds, > but maybe you can grep for ERROR and see something interesting like stack > traces. Or even "org.apache.solr". This latter will give you some false hits, > but at least it's better than paging through a huge log file > > Personally, in this kind of situation I sometimes use SolrJ to do my indexing > rather than DIH, I find it easier to debu
Re: Solr error : sorry, no dataimport-handler defined!
Hi Alexandre, OK some good progress was made based on this advice. Thanks! I think we're in the home stretch with the data import. Not there yet. But hopefully close. > Two problems: > 1) You have (span) elements in your solrconfig.xml. They just > do not belong there. The original tutorial screwed up. Your element > should be on the same level as the other elements in that example. > 2) You also seem to have another random piece of data configuration in > the solrconfig.xml. Also in the spans, so they are being ignored. But > still very very wrong. Take those out all together. > You should just have 3 things tying together: > 1) jars loaded in the lib statement in solrconfig.xml > 2) handler definition that points at your data-config file > 3) data-config file itself. OK so here I'm loading the libs: * * * * Verified the files are there. [root@solr1:/opt/solr/collection1/conf] #ls -l /opt/solr/lib/ | grep mysql -rw-r--r--. 1 root root 959987 Nov 3 19:17 mysql-connector-java-5.1.33-bin.jar [root@solr1:/opt/solr/collection1/conf] #ls -l /opt/solr/lib/ | grep mysql -rw-r--r--. 1 root root 959987 Nov 3 19:17 mysql-connector-java-5.1.33-bin.jar [root@solr1:/opt/solr/collection1/conf] #ls -l /opt/solr/dist/ | grep dataimport -rw-r--r--. 1 tomcat tomcat 219261 Sep 24 06:07 solr-dataimporthandler-4.10.1.jar -rw-r--r--. 1 tomcat tomcat37443 Sep 24 06:07 solr-dataimporthandler-extras-4.10.1.jar Added this entry to solrconfig.xml (without the spans): db-data-config.xml Then added this db-data-config.xml file in the same directory as the solrconfig.xml Verified I could connect to the DB with the info supplied in the data config file: [root@solr1:/opt/solr/collection1/conf] #mysql -uadmin -p -h web1.mydomain.com jokefire Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 8628551 Server version: 5.5.39 MySQL Community Server (GPL) by Remi Copyright (c) 2000, 2014, Oracle, Monty Program Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [jokefire]> Bounced tomcat, and there it was!! I now had a web interface for the data import feature! Thank you for helping to get me this far! [image: Inline image 1] However at this stage the import, even tho it says it's been started, just kind of sits there. And no records are actually imported. I took a look at the logs and found these entries: 11/3/2014, 7:21:03 PMWARNSimplePropertiesWriterUnable to read: dataimport.properties11/3/2014, 7:21:04 PMWARNSimplePropertiesWriterUnable to read: dataimport.properties It looks as if something is still failing. But I googled that error and found that the answer to that was to make 'the conf directory writable'. I'll experiment with tightening up permissions, but at that point I just wanted to see if that would solve this. So I made it world writable with chmod 777. And lo and behold an import happened!! Last Update: 19:40:26 *Indexing completed. Added/Updated: 4 documents. Deleted 0 documents. (Duration: 01s)* Requests: 1 (1/s), Fetched: 4 (4/s), Skipped: 0, Processed: 4 (4/s) Started: 2 minutes ago Very cool. Finally I can start to use solr with some real data. Not much in this database yet. But that's ok I'll add some data and have a look. Hopefully this will be the database for a live app someday, making this little exercise in solr indexing useful! Thanks again! Tim > If you are still having troubles, I strongly recommend getting the > shipped example to work and then adding your own stuff until you get > that working. Then, try to create a standalone configuration. > Sometimes, this is an easier approach for the first time user. > Regards, >Alex. > P.s. I also cover that in my Solr book. A relevant example is here: > https://github.com/arafalov/solr-indexing-book/tree/master/published/dihdb > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Mon, Nov 3, 2014 at 1:50 PM, Alexandre Rafalovitch wrote: > Two problems: > 1) You have (span) elements in your solrconfig.xml. They just > do not belong there. The original tutorial screwed up. Your element > should be on the same level as the other elements in that example. > 2) You also seem to have another random piece of data configuration in > the solrconfig.xml. Also in the spans, so they are being ignored. But > still very very wrong. Take those out all together. > > You should just have 3 things tying together: > 1) jars loaded in the
Faceting return value of a function query?
Hi, I'm new to Solr, and I'm having a problem with faceting. I would really appreciate it if you could help :) I have a set of documents in JSON format, which I could post to my Solr core using the post.jar tool. Each document contains two fields, namely "startDate" and "endDate", both of which are of type "date". Conceptually, I would like to have a third field "timeSpan" that is automatically generated from the return value of function query "ms(endDate, startDate)", and do range facet on it, i.e. compute the distribution of "timeSpan", among either all of or a filtered subset of the documents. I have tried to find ways of both directly faceting the function return values and automatically generate the "timeSpan" field during indexing, but without luck yet. Suggestions are greatly appreciated! Best, Yubing
Re: Faceting return value of a function query?
Wouldn't it be easiest to compute the span at index time? Then it's very straight-forward. Best, Erick On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰 wrote: > Hi, > > I'm new to Solr, and I'm having a problem with faceting. I would really > appreciate it if you could help :) > > I have a set of documents in JSON format, which I could post to my Solr > core using the post.jar tool. Each document contains two fields, namely > "startDate" and "endDate", both of which are of type "date". > > Conceptually, I would like to have a third field "timeSpan" that is > automatically generated from the return value of function query > "ms(endDate, startDate)", and do range facet on it, i.e. compute the > distribution of "timeSpan", among either all of or a filtered subset of the > documents. > > I have tried to find ways of both directly faceting the function return > values and automatically generate the "timeSpan" field during indexing, but > without luck yet. > > Suggestions are greatly appreciated! > > Best, > Yubing
Re: Faceting return value of a function query?
Hi Erik, Thanks for the reply! Do you mean parse and modify the documents before sending them to Solr? Cheers, Yubing On Mon, Nov 3, 2014 at 8:48 PM, Erick Erickson wrote: > Wouldn't it be easiest to compute the span at index time? Then it's > very straight-forward. > > Best, > Erick > > On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰 > wrote: > > Hi, > > > > I'm new to Solr, and I'm having a problem with faceting. I would really > > appreciate it if you could help :) > > > > I have a set of documents in JSON format, which I could post to my Solr > > core using the post.jar tool. Each document contains two fields, namely > > "startDate" and "endDate", both of which are of type "date". > > > > Conceptually, I would like to have a third field "timeSpan" that is > > automatically generated from the return value of function query > > "ms(endDate, startDate)", and do range facet on it, i.e. compute the > > distribution of "timeSpan", among either all of or a filtered subset of > the > > documents. > > > > I have tried to find ways of both directly faceting the function return > > values and automatically generate the "timeSpan" field during indexing, > but > > without luck yet. > > > > Suggestions are greatly appreciated! > > > > Best, > > Yubing >
Re: Missing log entries with log4j log rotation
On 11/1/2014 11:45 AM, Shawn Heisey wrote: > There appear to be large blocks of time missing in my solr logfiles > created with slf4j->log4j and rotated using the log4j config: > > End of solr.log.1: INFO - 2014-10-31 12:52:25.073; > Start of solr.log: INFO - 2014-11-01 02:27:27.404; > > End of solr.log.2: INFO - 2014-10-29 06:30:32.661; > Start of solr.log.1: INFO - 2014-10-30 07:01:34.241; The more I thought about this problem, the more convinced I became that the issue had to be in log4j, since log4j is responsible for writing and rotating the logs. I posted the question on the log4j mailing list, and the response basically said "If this is a bug in log4j 1.x, it's not going to get fixed. Upgrade to 2.x." We do something similar ourselves when a new major version gets minted, so I can't really complain about that. I was able to get information on what jar changes would be required for such an upgrade, but from what I can tell, log4j2 does not support a property-based configuration file, it must be XML. There are no conversion tools for version 2... the only conversion tool I found would convert log4j.properties into an XML config for version 1.x, which looks very different from a version 2 XML config. There do not appear to be any examples of a RollingFileAppender based config for log4j2. It won't be a relevant upgrade test if I can't configure the new version in the same way as the old version, and because I can't find any examples to work from, I'm going to have to experiment with the config. If we choose to upgrade the project to log4j 2.1, upgrading the logging might prove tricky for some end users. If we do the upgrade right, they would have the option of continuing to use their existing logging setup, which might be losing logs like mine. Thanks, Shawn
Re: Faceting return value of a function query?
Yep. It's almost always easier and faster if you can pre-compute as much as possible during indexing time. It'll take longer to index of course, but the ratio of writing to the index to searching is usually hugely in favor of doing the work during indexing. Best, Erick On Mon, Nov 3, 2014 at 8:52 PM, Yubing (Tom) Dong 董玉冰 wrote: > Hi Erik, > > Thanks for the reply! Do you mean parse and modify the documents before > sending them to Solr? > > Cheers, > Yubing > > On Mon, Nov 3, 2014 at 8:48 PM, Erick Erickson > wrote: > >> Wouldn't it be easiest to compute the span at index time? Then it's >> very straight-forward. >> >> Best, >> Erick >> >> On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰 >> wrote: >> > Hi, >> > >> > I'm new to Solr, and I'm having a problem with faceting. I would really >> > appreciate it if you could help :) >> > >> > I have a set of documents in JSON format, which I could post to my Solr >> > core using the post.jar tool. Each document contains two fields, namely >> > "startDate" and "endDate", both of which are of type "date". >> > >> > Conceptually, I would like to have a third field "timeSpan" that is >> > automatically generated from the return value of function query >> > "ms(endDate, startDate)", and do range facet on it, i.e. compute the >> > distribution of "timeSpan", among either all of or a filtered subset of >> the >> > documents. >> > >> > I have tried to find ways of both directly faceting the function return >> > values and automatically generate the "timeSpan" field during indexing, >> but >> > without luck yet. >> > >> > Suggestions are greatly appreciated! >> > >> > Best, >> > Yubing >>
Admin UI Schema Browser screen and ReverseStringFilterFactory
Hello, I just noticed this one and curious what's causing this (desirable I guess) behavior. I have a chain with ReverseStringFilterFactory in it (both index and query). So, the tokens are reversed from the input. But when I look at the Schema Browser screen and it loads the tokens, it seems to show an uninverted form somehow. Because when I click on the token value it searches for that value that it shows me and finds correct records. But the analysis screen shows the tokens being reversed (also correctly). So, the only explanation I can think of is that the Schema Browser (luke) is somehow uninverting the tokens for the presentation. But where is that defined and what other edge-cases are in there? tl;dr: everything works, but WHY? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Data colocation hint to solr index
On 11/3/2014 12:45 PM, maninder batth wrote: > Thank you for recommendation on composite IDs. We currently use solr 3.x. > After reading on composite ids, it sounds like a feature of solr 4.x. Is > something similar available in solr 3.x also? Also, we do not use solrCloud. The compositeId router is part of SolrCloud, which you will only find in Solr 4.0 and newer. On 3.x, you must normally handle all shard routing outside of Solr. It might be possible to configure the dataimport handler so that its JDBC query selects only documents that belong on that shard, if you happen to be using the dataimport handler already. Thanks, Shawn
Re: Faceting return value of a function query?
I see. Thank you! :-) Sent from my Android phone On Nov 3, 2014 9:35 PM, "Erick Erickson" wrote: > Yep. It's almost always easier and faster if you can pre-compute as > much as possible during indexing time. It'll take longer to index of > course, but the ratio of writing to the index to searching is usually > hugely in favor of doing the work during indexing. > > Best, > Erick > > On Mon, Nov 3, 2014 at 8:52 PM, Yubing (Tom) Dong 董玉冰 > wrote: > > Hi Erik, > > > > Thanks for the reply! Do you mean parse and modify the documents before > > sending them to Solr? > > > > Cheers, > > Yubing > > > > On Mon, Nov 3, 2014 at 8:48 PM, Erick Erickson > > wrote: > > > >> Wouldn't it be easiest to compute the span at index time? Then it's > >> very straight-forward. > >> > >> Best, > >> Erick > >> > >> On Mon, Nov 3, 2014 at 8:18 PM, Yubing (Tom) Dong 董玉冰 > >> wrote: > >> > Hi, > >> > > >> > I'm new to Solr, and I'm having a problem with faceting. I would > really > >> > appreciate it if you could help :) > >> > > >> > I have a set of documents in JSON format, which I could post to my > Solr > >> > core using the post.jar tool. Each document contains two fields, > namely > >> > "startDate" and "endDate", both of which are of type "date". > >> > > >> > Conceptually, I would like to have a third field "timeSpan" that is > >> > automatically generated from the return value of function query > >> > "ms(endDate, startDate)", and do range facet on it, i.e. compute the > >> > distribution of "timeSpan", among either all of or a filtered subset > of > >> the > >> > documents. > >> > > >> > I have tried to find ways of both directly faceting the function > return > >> > values and automatically generate the "timeSpan" field during > indexing, > >> but > >> > without luck yet. > >> > > >> > Suggestions are greatly appreciated! > >> > > >> > Best, > >> > Yubing > >> >