Re: Lucene/Solr Git Mirrors 5 day lag behind SVN?
I added a comment on the INFRA issue. I don't understand why it periodically "gets stuck". Mike McCandless http://blog.mikemccandless.com On Fri, Oct 23, 2015 at 11:27 AM, Kevin Risden wrote: > It looks like both Apache Git mirror (git://git.apache.org/lucene-solr.git) > and GitHub mirror (https://github.com/apache/lucene-solr.git) are 5 days > behind SVN. This seems to have happened before: > https://issues.apache.org/jira/browse/INFRA-9182 > > Is this a known issue? > > Kevin Risden
Order of actions in Update request
Looking at the code and jira I see that ordering actions in solrj update request is currently not supported but I'd like to know if there is any other way to get this capability. I took a quick look at the XML loader and it appears to process actions as it sees them so if the order was changed to order the actions as Add Delete Add Vs Add Add Delete Would this cause any issues with the update? Would it achieve the desired result? Are there any other options for ordering actions as they were provided to the update request? Jamie
Re: Order of actions in Update request
Hi Jamie! On Sat, Oct 24, 2015 at 7:21 AM, Jamie Johnson wrote: > Looking at the code and jira I see that ordering actions in solrj update > request is currently not supported but I'd like to know if there is any > other way to get this capability. I took a quick look at the XML loader > and it appears to process actions as it sees them so if the order was > changed to order the actions as > > Add > Delete > Add > > Vs > Add > Add > Delete > > Would this cause any issues with the update? Would it achieve the desired > result? Are there any other options for ordering actions as they were > provided to the update request? > > Jamie > -- -- This message is intended only for the addressee. Please notify sender by e-mail if you are not the intended recipient. If you are not the intended recipient, you may not copy, disclose, or distribute this message or its contents, in either excerpts or in its entirety, to any other person and any such actions may be unlawful. SecondMarket Solutions, Inc. and it subsidiaries ("SecondMarket") is not responsible for any unauthorized redistribution. Securities-related services of SecondMarket are provided through SMTX, LLC (“SMTX”), a wholly owned subsidiary of SecondMarket and a registered broker dealer and member of FINRA/SIPC. SMTX does not accept time sensitive, action-oriented messages or transaction orders, including orders to purchase or sell securities, via e-mail. SMTX reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the SMTX e-mail system and archived in accordance with FINRA and SEC rules and regulations. This message is intended for those with an in-depth understanding of the high risk and illiquid nature of private securities and these assets may not be suitable for you. This message does not represent a solicitation for an order or an offer to buy or sell any security. There is not enough information contained in this message with which to make an investment decision and any information contained herein should not be used as a basis for this purpose. SMTX does not produce in-house research, make recommendations to purchase or sell specific securities, provide investment advisory services, or conduct a general retail business.
Re: Does docValues impact termfreq ?
Thanks, Jack. I did some more research and found similar results. In our application, we are making multiple (think: 50) concurrent requests to calculate term frequency on a set of documents in "real-time". The faster that results return, the better. Most of these requests are unique, so cache only helps slightly. This analysis is happening on a single solr instance. Other than moving to solr cloud and splitting out the processing onto multiple servers, do you have any suggestions for what might speed up termfreq at query time? Thanks, Aki On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky wrote: > Term frequency applies only to the indexed terms of a tokenized field. > DocValues is really just a copy of the original source text and is not > tokenized into terms. > > Maybe you could explain how exactly you are using term frequency in > function queries. More importantly, what is so "heavy" about your usage? > Generally, moderate use of a feature is much more advisable to heavy usage, > unless you don't care about performance. > > -- Jack Krupansky > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh wrote: > > > Hello, > > > > In our solr application, we use a Function Query (termfreq) very heavily. > > > > Index time and disk space are not important, but we're looking to improve > > performance on termfreq at query time. > > I've been reading up on docValues. Would this be a way to improve > > performance? > > > > I had read that Lucene uses Field Cache for Function Queries, so > > performance may not be affected. > > > > > > And, any general suggestions for improving query performance on Function > > Queries? > > > > Thanks, > > Aki > > >
Re: Does docValues impact termfreq ?
If you mean using the term frequency function query, then I'm not sure there's a huge amount you can do to improve performance. The term frequency is a number that is used often, so it is stored in the index pre-calculated. Perhaps, if your data is not changing, optimising your index would reduce it to one segment, and thus might ever so slightly speed the aggregation of term frequencies, but I doubt it'd make enough difference to make it worth doing. Upayavira On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > Thanks, Jack. I did some more research and found similar results. > > In our application, we are making multiple (think: 50) concurrent > requests > to calculate term frequency on a set of documents in "real-time". The > faster that results return, the better. > > Most of these requests are unique, so cache only helps slightly. > > This analysis is happening on a single solr instance. > > Other than moving to solr cloud and splitting out the processing onto > multiple servers, do you have any suggestions for what might speed up > termfreq at query time? > > Thanks, > Aki > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > > wrote: > > > Term frequency applies only to the indexed terms of a tokenized field. > > DocValues is really just a copy of the original source text and is not > > tokenized into terms. > > > > Maybe you could explain how exactly you are using term frequency in > > function queries. More importantly, what is so "heavy" about your usage? > > Generally, moderate use of a feature is much more advisable to heavy usage, > > unless you don't care about performance. > > > > -- Jack Krupansky > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh wrote: > > > > > Hello, > > > > > > In our solr application, we use a Function Query (termfreq) very heavily. > > > > > > Index time and disk space are not important, but we're looking to improve > > > performance on termfreq at query time. > > > I've been reading up on docValues. Would this be a way to improve > > > performance? > > > > > > I had read that Lucene uses Field Cache for Function Queries, so > > > performance may not be affected. > > > > > > > > > And, any general suggestions for improving query performance on Function > > > Queries? > > > > > > Thanks, > > > Aki > > > > >
Re: Does docValues impact termfreq ?
Gotcha - that's disheartening. One idea: when I run termfreq, I get all of the termfreqs for each document one-by-one. Is there a way to have solr sum it up before creating the request, so I only receive one number in the response? On Sat, Oct 24, 2015 at 11:05 AM, Upayavira wrote: > If you mean using the term frequency function query, then I'm not sure > there's a huge amount you can do to improve performance. > > The term frequency is a number that is used often, so it is stored in > the index pre-calculated. Perhaps, if your data is not changing, > optimising your index would reduce it to one segment, and thus might > ever so slightly speed the aggregation of term frequencies, but I doubt > it'd make enough difference to make it worth doing. > > Upayavira > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > Thanks, Jack. I did some more research and found similar results. > > > > In our application, we are making multiple (think: 50) concurrent > > requests > > to calculate term frequency on a set of documents in "real-time". The > > faster that results return, the better. > > > > Most of these requests are unique, so cache only helps slightly. > > > > This analysis is happening on a single solr instance. > > > > Other than moving to solr cloud and splitting out the processing onto > > multiple servers, do you have any suggestions for what might speed up > > termfreq at query time? > > > > Thanks, > > Aki > > > > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > > > > wrote: > > > > > Term frequency applies only to the indexed terms of a tokenized field. > > > DocValues is really just a copy of the original source text and is not > > > tokenized into terms. > > > > > > Maybe you could explain how exactly you are using term frequency in > > > function queries. More importantly, what is so "heavy" about your > usage? > > > Generally, moderate use of a feature is much more advisable to heavy > usage, > > > unless you don't care about performance. > > > > > > -- Jack Krupansky > > > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh > wrote: > > > > > > > Hello, > > > > > > > > In our solr application, we use a Function Query (termfreq) very > heavily. > > > > > > > > Index time and disk space are not important, but we're looking to > improve > > > > performance on termfreq at query time. > > > > I've been reading up on docValues. Would this be a way to improve > > > > performance? > > > > > > > > I had read that Lucene uses Field Cache for Function Queries, so > > > > performance may not be affected. > > > > > > > > > > > > And, any general suggestions for improving query performance on > Function > > > > Queries? > > > > > > > > Thanks, > > > > Aki > > > > > > > >
Re: Does docValues impact termfreq ?
That's what a normal query does - Lucene takes all the terms used in the query and sums them up for each document in the response, producing a single number, the score, for each document. That's the way Solr is designed to be used. You still haven't elaborated why you are trying to use Solr in a way other than it was intended. -- Jack Krupansky On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh wrote: > Gotcha - that's disheartening. > > One idea: when I run termfreq, I get all of the termfreqs for each document > one-by-one. > > Is there a way to have solr sum it up before creating the request, so I > only receive one number in the response? > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira wrote: > > > If you mean using the term frequency function query, then I'm not sure > > there's a huge amount you can do to improve performance. > > > > The term frequency is a number that is used often, so it is stored in > > the index pre-calculated. Perhaps, if your data is not changing, > > optimising your index would reduce it to one segment, and thus might > > ever so slightly speed the aggregation of term frequencies, but I doubt > > it'd make enough difference to make it worth doing. > > > > Upayavira > > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > > Thanks, Jack. I did some more research and found similar results. > > > > > > In our application, we are making multiple (think: 50) concurrent > > > requests > > > to calculate term frequency on a set of documents in "real-time". The > > > faster that results return, the better. > > > > > > Most of these requests are unique, so cache only helps slightly. > > > > > > This analysis is happening on a single solr instance. > > > > > > Other than moving to solr cloud and splitting out the processing onto > > > multiple servers, do you have any suggestions for what might speed up > > > termfreq at query time? > > > > > > Thanks, > > > Aki > > > > > > > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > > > > > > wrote: > > > > > > > Term frequency applies only to the indexed terms of a tokenized > field. > > > > DocValues is really just a copy of the original source text and is > not > > > > tokenized into terms. > > > > > > > > Maybe you could explain how exactly you are using term frequency in > > > > function queries. More importantly, what is so "heavy" about your > > usage? > > > > Generally, moderate use of a feature is much more advisable to heavy > > usage, > > > > unless you don't care about performance. > > > > > > > > -- Jack Krupansky > > > > > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh > > wrote: > > > > > > > > > Hello, > > > > > > > > > > In our solr application, we use a Function Query (termfreq) very > > heavily. > > > > > > > > > > Index time and disk space are not important, but we're looking to > > improve > > > > > performance on termfreq at query time. > > > > > I've been reading up on docValues. Would this be a way to improve > > > > > performance? > > > > > > > > > > I had read that Lucene uses Field Cache for Function Queries, so > > > > > performance may not be affected. > > > > > > > > > > > > > > > And, any general suggestions for improving query performance on > > Function > > > > > Queries? > > > > > > > > > > Thanks, > > > > > Aki > > > > > > > > > > > >
Re: Does docValues impact termfreq ?
Hi Jack, I'm just using solr to get word count across a large number of documents. It's somewhat non-standard, because we're ignoring relevance, but it seems to work well for this use case otherwise. My understanding then is: 1) since termfreq is pre-processed and fetched, there's no good way to speed it up (except by caching earlier calculations) 2) there's no way to have solr sum up all of the termfreqs across all documents in a search and just return one number for total termfreqs Are these correct? Thanks, Aki On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky wrote: > That's what a normal query does - Lucene takes all the terms used in the > query and sums them up for each document in the response, producing a > single number, the score, for each document. That's the way Solr is > designed to be used. You still haven't elaborated why you are trying to use > Solr in a way other than it was intended. > > -- Jack Krupansky > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh wrote: > > > Gotcha - that's disheartening. > > > > One idea: when I run termfreq, I get all of the termfreqs for each > document > > one-by-one. > > > > Is there a way to have solr sum it up before creating the request, so I > > only receive one number in the response? > > > > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira wrote: > > > > > If you mean using the term frequency function query, then I'm not sure > > > there's a huge amount you can do to improve performance. > > > > > > The term frequency is a number that is used often, so it is stored in > > > the index pre-calculated. Perhaps, if your data is not changing, > > > optimising your index would reduce it to one segment, and thus might > > > ever so slightly speed the aggregation of term frequencies, but I doubt > > > it'd make enough difference to make it worth doing. > > > > > > Upayavira > > > > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > > > Thanks, Jack. I did some more research and found similar results. > > > > > > > > In our application, we are making multiple (think: 50) concurrent > > > > requests > > > > to calculate term frequency on a set of documents in "real-time". The > > > > faster that results return, the better. > > > > > > > > Most of these requests are unique, so cache only helps slightly. > > > > > > > > This analysis is happening on a single solr instance. > > > > > > > > Other than moving to solr cloud and splitting out the processing onto > > > > multiple servers, do you have any suggestions for what might speed up > > > > termfreq at query time? > > > > > > > > Thanks, > > > > Aki > > > > > > > > > > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > > > > > > > > wrote: > > > > > > > > > Term frequency applies only to the indexed terms of a tokenized > > field. > > > > > DocValues is really just a copy of the original source text and is > > not > > > > > tokenized into terms. > > > > > > > > > > Maybe you could explain how exactly you are using term frequency in > > > > > function queries. More importantly, what is so "heavy" about your > > > usage? > > > > > Generally, moderate use of a feature is much more advisable to > heavy > > > usage, > > > > > unless you don't care about performance. > > > > > > > > > > -- Jack Krupansky > > > > > > > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > In our solr application, we use a Function Query (termfreq) very > > > heavily. > > > > > > > > > > > > Index time and disk space are not important, but we're looking to > > > improve > > > > > > performance on termfreq at query time. > > > > > > I've been reading up on docValues. Would this be a way to improve > > > > > > performance? > > > > > > > > > > > > I had read that Lucene uses Field Cache for Function Queries, so > > > > > > performance may not be affected. > > > > > > > > > > > > > > > > > > And, any general suggestions for improving query performance on > > > Function > > > > > > Queries? > > > > > > > > > > > > Thanks, > > > > > > Aki > > > > > > > > > > > > > > > > >
Re: Does docValues impact termfreq ?
If you just want word length, then do work during indexing - index a field for the word length. Then, I believe you can do faceting - e.g. with the json faceting API I believe you can do a sum() calculation on a field rather than the more traditional count. Thinking aloud, there might be an easier way - index a field that is the same for all documents, and facet on it. Instead of counting the number of documents, calculate the sum() of your word count field. I *think* that should work. Upayavira On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote: > Hi Jack, > > I'm just using solr to get word count across a large number of documents. > > It's somewhat non-standard, because we're ignoring relevance, but it > seems > to work well for this use case otherwise. > > My understanding then is: > 1) since termfreq is pre-processed and fetched, there's no good way to > speed it up (except by caching earlier calculations) > > 2) there's no way to have solr sum up all of the termfreqs across all > documents in a search and just return one number for total termfreqs > > > Are these correct? > > Thanks, > Aki > > > On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky > > wrote: > > > That's what a normal query does - Lucene takes all the terms used in the > > query and sums them up for each document in the response, producing a > > single number, the score, for each document. That's the way Solr is > > designed to be used. You still haven't elaborated why you are trying to use > > Solr in a way other than it was intended. > > > > -- Jack Krupansky > > > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh wrote: > > > > > Gotcha - that's disheartening. > > > > > > One idea: when I run termfreq, I get all of the termfreqs for each > > document > > > one-by-one. > > > > > > Is there a way to have solr sum it up before creating the request, so I > > > only receive one number in the response? > > > > > > > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira wrote: > > > > > > > If you mean using the term frequency function query, then I'm not sure > > > > there's a huge amount you can do to improve performance. > > > > > > > > The term frequency is a number that is used often, so it is stored in > > > > the index pre-calculated. Perhaps, if your data is not changing, > > > > optimising your index would reduce it to one segment, and thus might > > > > ever so slightly speed the aggregation of term frequencies, but I doubt > > > > it'd make enough difference to make it worth doing. > > > > > > > > Upayavira > > > > > > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > > > > Thanks, Jack. I did some more research and found similar results. > > > > > > > > > > In our application, we are making multiple (think: 50) concurrent > > > > > requests > > > > > to calculate term frequency on a set of documents in "real-time". The > > > > > faster that results return, the better. > > > > > > > > > > Most of these requests are unique, so cache only helps slightly. > > > > > > > > > > This analysis is happening on a single solr instance. > > > > > > > > > > Other than moving to solr cloud and splitting out the processing onto > > > > > multiple servers, do you have any suggestions for what might speed up > > > > > termfreq at query time? > > > > > > > > > > Thanks, > > > > > Aki > > > > > > > > > > > > > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > > > > > > > > > > wrote: > > > > > > > > > > > Term frequency applies only to the indexed terms of a tokenized > > > field. > > > > > > DocValues is really just a copy of the original source text and is > > > not > > > > > > tokenized into terms. > > > > > > > > > > > > Maybe you could explain how exactly you are using term frequency in > > > > > > function queries. More importantly, what is so "heavy" about your > > > > usage? > > > > > > Generally, moderate use of a feature is much more advisable to > > heavy > > > > usage, > > > > > > unless you don't care about performance. > > > > > > > > > > > > -- Jack Krupansky > > > > > > > > > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh > > > > wrote: > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > In our solr application, we use a Function Query (termfreq) very > > > > heavily. > > > > > > > > > > > > > > Index time and disk space are not important, but we're looking to > > > > improve > > > > > > > performance on termfreq at query time. > > > > > > > I've been reading up on docValues. Would this be a way to improve > > > > > > > performance? > > > > > > > > > > > > > > I had read that Lucene uses Field Cache for Function Queries, so > > > > > > > performance may not be affected. > > > > > > > > > > > > > > > > > > > > > And, any general suggestions for improving query performance on > > > > Function > > > > > > > Queries? > > > > > > > > > > > > > > Thanks, > > > > > > > Aki > > > > > > > > > > > > > > > > > > > > > >
any clean test failing
It is getting stuck on resolve. ant clean test SOLR 5.3.1 [ivy:retrieve] retrieve done (5ms) Overriding previous definition of property "ivy.version" [ivy:retrieve] no resolved descriptor found: launching default resolve Overriding previous definition of property "ivy.version" [ivy:retrieve] using ivy parser to parse file:/home/solr/src/lucene_solr_5_3_1A/solr/server/ivy.xml [ivy:retrieve] :: resolving dependencies :: org.apache.solr# example;work...@hgsolr2devmstr.healthgrades.com [ivy:retrieve] confs: [logging] [ivy:retrieve] validate = true [ivy:retrieve] refresh = false [ivy:retrieve] resolving dependencies for configuration 'logging' [ivy:retrieve] == resolving dependencies for org.apache.solr# example;work...@hgsolr2devmstr.healthgrades.com [logging] [ivy:retrieve] == resolving dependencies org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->log4j#log4j;1.2.17 [logging->master] [ivy:retrieve] default: Checking cache for: dependency: log4j#log4j;1.2.17 {logging=[master]} [ivy:retrieve] don't use cache for log4j#log4j;1.2.17: checkModified=true [ivy:retrieve] tried /home/solr/.ivy2/local/log4j/log4j/1.2.17/ivys/ivy.xml [ivy:retrieve] tried /home/solr/.ivy2/local/log4j/log4j/1.2.17/jars/log4j.jar [ivy:retrieve] local: no ivy file nor artifact found for log4j#log4j;1.2.17 [ivy:retrieve] main: Checking cache for: dependency: log4j#log4j;1.2.17 {logging=[master]} [ivy:retrieve] main: module revision found in cache: log4j#log4j;1.2.17 [ivy:retrieve] found log4j#log4j;1.2.17 in public [ivy:retrieve] == resolving dependencies org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->org.slf4j#slf4j-api;1.7.7 [logging->master] [ivy:retrieve] default: Checking cache for: dependency: org.slf4j#slf4j-api;1.7.7 {logging=[master]} [ivy:retrieve] don't use cache for org.slf4j#slf4j-api;1.7.7: checkModified=true [ivy:retrieve] tried /home/solr/.ivy2/local/org.slf4j/slf4j-api/1.7.7/ivys/ivy.xml [ivy:retrieve] tried /home/solr/.ivy2/local/org.slf4j/slf4j-api/1.7.7/jars/slf4j-api.jar [ivy:retrieve] local: no ivy file nor artifact found for org.slf4j#slf4j-api;1.7.7 [ivy:retrieve] main: Checking cache for: dependency: org.slf4j#slf4j-api;1.7.7 {logging=[master]} [ivy:retrieve] main: module revision found in cache: org.slf4j#slf4j-api;1.7.7 [ivy:retrieve] found org.slf4j#slf4j-api;1.7.7 in public [ivy:retrieve] == resolving dependencies org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->org.slf4j#jcl-over-slf4j;1.7.7 [logging->master] [ivy:retrieve] default: Checking cache for: dependency: org.slf4j#jcl-over-slf4j;1.7.7 {logging=[master]} [ivy:retrieve] don't use cache for org.slf4j#jcl-over-slf4j;1.7.7: checkModified=true -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Does docValues impact termfreq ?
Thanks, let me think about that. We're using termfreq to get the TF score, but we don't know which term we'll need the TF for. So we'd have to do a corpuswide summing of termfreq for each potential term across all documents in the corpus. It seems like it'd require some development work to compute that, and our code would be fragile. Let me think about that more. It might make sense to just move to solrcloud, it's the right architectural decision anyway. On Sat, Oct 24, 2015 at 1:54 PM, Upayavira wrote: > If you just want word length, then do work during indexing - index a > field for the word length. Then, I believe you can do faceting - e.g. > with the json faceting API I believe you can do a sum() calculation on a > field rather than the more traditional count. > > Thinking aloud, there might be an easier way - index a field that is the > same for all documents, and facet on it. Instead of counting the number > of documents, calculate the sum() of your word count field. > > I *think* that should work. > > Upayavira > > On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote: > > Hi Jack, > > > > I'm just using solr to get word count across a large number of documents. > > > > It's somewhat non-standard, because we're ignoring relevance, but it > > seems > > to work well for this use case otherwise. > > > > My understanding then is: > > 1) since termfreq is pre-processed and fetched, there's no good way to > > speed it up (except by caching earlier calculations) > > > > 2) there's no way to have solr sum up all of the termfreqs across all > > documents in a search and just return one number for total termfreqs > > > > > > Are these correct? > > > > Thanks, > > Aki > > > > > > On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky > > > > wrote: > > > > > That's what a normal query does - Lucene takes all the terms used in > the > > > query and sums them up for each document in the response, producing a > > > single number, the score, for each document. That's the way Solr is > > > designed to be used. You still haven't elaborated why you are trying > to use > > > Solr in a way other than it was intended. > > > > > > -- Jack Krupansky > > > > > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh > wrote: > > > > > > > Gotcha - that's disheartening. > > > > > > > > One idea: when I run termfreq, I get all of the termfreqs for each > > > document > > > > one-by-one. > > > > > > > > Is there a way to have solr sum it up before creating the request, > so I > > > > only receive one number in the response? > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira wrote: > > > > > > > > > If you mean using the term frequency function query, then I'm not > sure > > > > > there's a huge amount you can do to improve performance. > > > > > > > > > > The term frequency is a number that is used often, so it is stored > in > > > > > the index pre-calculated. Perhaps, if your data is not changing, > > > > > optimising your index would reduce it to one segment, and thus > might > > > > > ever so slightly speed the aggregation of term frequencies, but I > doubt > > > > > it'd make enough difference to make it worth doing. > > > > > > > > > > Upayavira > > > > > > > > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > > > > > Thanks, Jack. I did some more research and found similar results. > > > > > > > > > > > > In our application, we are making multiple (think: 50) concurrent > > > > > > requests > > > > > > to calculate term frequency on a set of documents in > "real-time". The > > > > > > faster that results return, the better. > > > > > > > > > > > > Most of these requests are unique, so cache only helps slightly. > > > > > > > > > > > > This analysis is happening on a single solr instance. > > > > > > > > > > > > Other than moving to solr cloud and splitting out the processing > onto > > > > > > multiple servers, do you have any suggestions for what might > speed up > > > > > > termfreq at query time? > > > > > > > > > > > > Thanks, > > > > > > Aki > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Term frequency applies only to the indexed terms of a tokenized > > > > field. > > > > > > > DocValues is really just a copy of the original source text > and is > > > > not > > > > > > > tokenized into terms. > > > > > > > > > > > > > > Maybe you could explain how exactly you are using term > frequency in > > > > > > > function queries. More importantly, what is so "heavy" about > your > > > > > usage? > > > > > > > Generally, moderate use of a feature is much more advisable to > > > heavy > > > > > usage, > > > > > > > unless you don't care about performance. > > > > > > > > > > > > > > -- Jack Krupansky > > > > > > > > > > > > > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh < > a...@marketmuse.com> > > > > > wrote: > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > In our solr application, we use
Re: any clean test failing
OK I deleted /home/solr/.ivy2 and it started working. On Sat, Oct 24, 2015 at 11:57 AM, William Bell wrote: > It is getting stuck on resolve. > > ant clean test > > SOLR 5.3.1 > > [ivy:retrieve] retrieve done (5ms) > > Overriding previous definition of property "ivy.version" > > [ivy:retrieve] no resolved descriptor found: launching default resolve > > Overriding previous definition of property "ivy.version" > > [ivy:retrieve] using ivy parser to parse > file:/home/solr/src/lucene_solr_5_3_1A/solr/server/ivy.xml > > [ivy:retrieve] :: resolving dependencies :: org.apache.solr# > example;work...@hgsolr2devmstr.healthgrades.com > > [ivy:retrieve] confs: [logging] > > [ivy:retrieve] validate = true > > [ivy:retrieve] refresh = false > > [ivy:retrieve] resolving dependencies for configuration 'logging' > > [ivy:retrieve] == resolving dependencies for org.apache.solr# > example;work...@hgsolr2devmstr.healthgrades.com [logging] > > [ivy:retrieve] == resolving dependencies > org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->log4j#log4j;1.2.17 > [logging->master] > > [ivy:retrieve] default: Checking cache for: dependency: log4j#log4j;1.2.17 > {logging=[master]} > > [ivy:retrieve] don't use cache for log4j#log4j;1.2.17: checkModified=true > > [ivy:retrieve] tried > /home/solr/.ivy2/local/log4j/log4j/1.2.17/ivys/ivy.xml > > [ivy:retrieve] tried > /home/solr/.ivy2/local/log4j/log4j/1.2.17/jars/log4j.jar > > [ivy:retrieve] local: no ivy file nor artifact found for > log4j#log4j;1.2.17 > > [ivy:retrieve] main: Checking cache for: dependency: log4j#log4j;1.2.17 > {logging=[master]} > > [ivy:retrieve] main: module revision found in cache: log4j#log4j;1.2.17 > > [ivy:retrieve] found log4j#log4j;1.2.17 in public > > [ivy:retrieve] == resolving dependencies > org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->org.slf4j#slf4j-api;1.7.7 > [logging->master] > > [ivy:retrieve] default: Checking cache for: dependency: > org.slf4j#slf4j-api;1.7.7 {logging=[master]} > > [ivy:retrieve] don't use cache for org.slf4j#slf4j-api;1.7.7: > checkModified=true > > [ivy:retrieve] tried > /home/solr/.ivy2/local/org.slf4j/slf4j-api/1.7.7/ivys/ivy.xml > > [ivy:retrieve] tried > /home/solr/.ivy2/local/org.slf4j/slf4j-api/1.7.7/jars/slf4j-api.jar > > [ivy:retrieve] local: no ivy file nor artifact found for > org.slf4j#slf4j-api;1.7.7 > > [ivy:retrieve] main: Checking cache for: dependency: > org.slf4j#slf4j-api;1.7.7 {logging=[master]} > > [ivy:retrieve] main: module revision found in cache: > org.slf4j#slf4j-api;1.7.7 > > [ivy:retrieve] found org.slf4j#slf4j-api;1.7.7 in public > > [ivy:retrieve] == resolving dependencies > org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->org.slf4j#jcl-over-slf4j;1.7.7 > [logging->master] > > [ivy:retrieve] default: Checking cache for: dependency: > org.slf4j#jcl-over-slf4j;1.7.7 {logging=[master]} > > [ivy:retrieve] don't use cache for org.slf4j#jcl-over-slf4j;1.7.7: > checkModified=true > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 > -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Order of actions in Update request
On 10/24/2015 5:21 AM, Jamie Johnson wrote: > Looking at the code and jira I see that ordering actions in solrj update > request is currently not supported but I'd like to know if there is any > other way to get this capability. I took a quick look at the XML loader > and it appears to process actions as it sees them so if the order was > changed to order the actions as > > Add > Delete > Add > > Vs > Add > Add > Delete > > Would this cause any issues with the update? Would it achieve the desired > result? Are there any other options for ordering actions as they were > provided to the update request? If those three actions are in separate update requests using HttpSolrClient or CloudSolrClient in a single thread, I would expect them to be executed in the order you make the requests. If you're using multiple threads, then you probably cannot guarantee the order of the requests. Are you using one of those clients in a single thread and seeing something other than what I have described? If so, I think that might be a bug. If you're using ConcurrentUpdateSolrClient, I don't think you can guarantee order. That client has multiple threads pulling the requests out of an internal queue. If some requests complete substantially faster than others, they could happen out of order. The concurrent client is a poor choice for anything but bulk inserts, and because of the fact that it ignores almost every error that happens while it runs, it often is not a good choice for that either. Thanks, Shawn
Re: Does docValues impact termfreq ?
Can you explain more what you are using TF for? Because it sounds rather like scoring. You could disable field norms and IDF and scoring would be mostly TF, no? Upayavira On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote: > Thanks, let me think about that. > > We're using termfreq to get the TF score, but we don't know which term > we'll need the TF for. So we'd have to do a corpuswide summing of > termfreq > for each potential term across all documents in the corpus. It seems like > it'd require some development work to compute that, and our code would be > fragile. > > Let me think about that more. > > It might make sense to just move to solrcloud, it's the right > architectural > decision anyway. > > > On Sat, Oct 24, 2015 at 1:54 PM, Upayavira wrote: > > > If you just want word length, then do work during indexing - index a > > field for the word length. Then, I believe you can do faceting - e.g. > > with the json faceting API I believe you can do a sum() calculation on a > > field rather than the more traditional count. > > > > Thinking aloud, there might be an easier way - index a field that is the > > same for all documents, and facet on it. Instead of counting the number > > of documents, calculate the sum() of your word count field. > > > > I *think* that should work. > > > > Upayavira > > > > On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote: > > > Hi Jack, > > > > > > I'm just using solr to get word count across a large number of documents. > > > > > > It's somewhat non-standard, because we're ignoring relevance, but it > > > seems > > > to work well for this use case otherwise. > > > > > > My understanding then is: > > > 1) since termfreq is pre-processed and fetched, there's no good way to > > > speed it up (except by caching earlier calculations) > > > > > > 2) there's no way to have solr sum up all of the termfreqs across all > > > documents in a search and just return one number for total termfreqs > > > > > > > > > Are these correct? > > > > > > Thanks, > > > Aki > > > > > > > > > On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky > > > > > > wrote: > > > > > > > That's what a normal query does - Lucene takes all the terms used in > > the > > > > query and sums them up for each document in the response, producing a > > > > single number, the score, for each document. That's the way Solr is > > > > designed to be used. You still haven't elaborated why you are trying > > to use > > > > Solr in a way other than it was intended. > > > > > > > > -- Jack Krupansky > > > > > > > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh > > wrote: > > > > > > > > > Gotcha - that's disheartening. > > > > > > > > > > One idea: when I run termfreq, I get all of the termfreqs for each > > > > document > > > > > one-by-one. > > > > > > > > > > Is there a way to have solr sum it up before creating the request, > > so I > > > > > only receive one number in the response? > > > > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira wrote: > > > > > > > > > > > If you mean using the term frequency function query, then I'm not > > sure > > > > > > there's a huge amount you can do to improve performance. > > > > > > > > > > > > The term frequency is a number that is used often, so it is stored > > in > > > > > > the index pre-calculated. Perhaps, if your data is not changing, > > > > > > optimising your index would reduce it to one segment, and thus > > might > > > > > > ever so slightly speed the aggregation of term frequencies, but I > > doubt > > > > > > it'd make enough difference to make it worth doing. > > > > > > > > > > > > Upayavira > > > > > > > > > > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > > > > > > Thanks, Jack. I did some more research and found similar results. > > > > > > > > > > > > > > In our application, we are making multiple (think: 50) concurrent > > > > > > > requests > > > > > > > to calculate term frequency on a set of documents in > > "real-time". The > > > > > > > faster that results return, the better. > > > > > > > > > > > > > > Most of these requests are unique, so cache only helps slightly. > > > > > > > > > > > > > > This analysis is happening on a single solr instance. > > > > > > > > > > > > > > Other than moving to solr cloud and splitting out the processing > > onto > > > > > > > multiple servers, do you have any suggestions for what might > > speed up > > > > > > > termfreq at query time? > > > > > > > > > > > > > > Thanks, > > > > > > > Aki > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2015 at 7:21 PM, Jack Krupansky > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Term frequency applies only to the indexed terms of a tokenized > > > > > field. > > > > > > > > DocValues is really just a copy of the original source text > > and is > > > > > not > > > > > > > > tokenized into terms. > > > > > > > > > > > > > > > > Maybe you could explain how exactly you are using term > > frequency in > > > > > > > > function
Re: Does docValues impact termfreq ?
Yes, sorry, I am not being clear. We are not even doing scoring, just getting the raw TF values. We're doing this in solr because it can scale well. But with large corpora, retrieving the word counts takes some time, in part because solr is splitting up word count by document and generating a large request. We then get the request and just sum it all up. I'm wondering if there's a more direct way. On Oct 24, 2015 4:00 PM, "Upayavira" wrote: > Can you explain more what you are using TF for? Because it sounds rather > like scoring. You could disable field norms and IDF and scoring would be > mostly TF, no? > > Upayavira > > On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote: > > Thanks, let me think about that. > > > > We're using termfreq to get the TF score, but we don't know which term > > we'll need the TF for. So we'd have to do a corpuswide summing of > > termfreq > > for each potential term across all documents in the corpus. It seems like > > it'd require some development work to compute that, and our code would be > > fragile. > > > > Let me think about that more. > > > > It might make sense to just move to solrcloud, it's the right > > architectural > > decision anyway. > > > > > > On Sat, Oct 24, 2015 at 1:54 PM, Upayavira wrote: > > > > > If you just want word length, then do work during indexing - index a > > > field for the word length. Then, I believe you can do faceting - e.g. > > > with the json faceting API I believe you can do a sum() calculation on > a > > > field rather than the more traditional count. > > > > > > Thinking aloud, there might be an easier way - index a field that is > the > > > same for all documents, and facet on it. Instead of counting the number > > > of documents, calculate the sum() of your word count field. > > > > > > I *think* that should work. > > > > > > Upayavira > > > > > > On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote: > > > > Hi Jack, > > > > > > > > I'm just using solr to get word count across a large number of > documents. > > > > > > > > It's somewhat non-standard, because we're ignoring relevance, but it > > > > seems > > > > to work well for this use case otherwise. > > > > > > > > My understanding then is: > > > > 1) since termfreq is pre-processed and fetched, there's no good way > to > > > > speed it up (except by caching earlier calculations) > > > > > > > > 2) there's no way to have solr sum up all of the termfreqs across all > > > > documents in a search and just return one number for total termfreqs > > > > > > > > > > > > Are these correct? > > > > > > > > Thanks, > > > > Aki > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky > > > > > > > > wrote: > > > > > > > > > That's what a normal query does - Lucene takes all the terms used > in > > > the > > > > > query and sums them up for each document in the response, > producing a > > > > > single number, the score, for each document. That's the way Solr is > > > > > designed to be used. You still haven't elaborated why you are > trying > > > to use > > > > > Solr in a way other than it was intended. > > > > > > > > > > -- Jack Krupansky > > > > > > > > > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh > > > wrote: > > > > > > > > > > > Gotcha - that's disheartening. > > > > > > > > > > > > One idea: when I run termfreq, I get all of the termfreqs for > each > > > > > document > > > > > > one-by-one. > > > > > > > > > > > > Is there a way to have solr sum it up before creating the > request, > > > so I > > > > > > only receive one number in the response? > > > > > > > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira > wrote: > > > > > > > > > > > > > If you mean using the term frequency function query, then I'm > not > > > sure > > > > > > > there's a huge amount you can do to improve performance. > > > > > > > > > > > > > > The term frequency is a number that is used often, so it is > stored > > > in > > > > > > > the index pre-calculated. Perhaps, if your data is not > changing, > > > > > > > optimising your index would reduce it to one segment, and thus > > > might > > > > > > > ever so slightly speed the aggregation of term frequencies, > but I > > > doubt > > > > > > > it'd make enough difference to make it worth doing. > > > > > > > > > > > > > > Upayavira > > > > > > > > > > > > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > > > > > > > Thanks, Jack. I did some more research and found similar > results. > > > > > > > > > > > > > > > > In our application, we are making multiple (think: 50) > concurrent > > > > > > > > requests > > > > > > > > to calculate term frequency on a set of documents in > > > "real-time". The > > > > > > > > faster that results return, the better. > > > > > > > > > > > > > > > > Most of these requests are unique, so cache only helps > slightly. > > > > > > > > > > > > > > > > This analysis is happening on a single solr instance. > > > > > > > > > > > > > > > > Other than moving to solr cloud and splittin
Re: Does docValues impact termfreq ?
yes, but what do you want to do with the TF? What problem are you solving with it? If you are able to share that... On Sat, Oct 24, 2015, at 09:05 PM, Aki Balogh wrote: > Yes, sorry, I am not being clear. > > We are not even doing scoring, just getting the raw TF values. We're > doing > this in solr because it can scale well. > > But with large corpora, retrieving the word counts takes some time, in > part > because solr is splitting up word count by document and generating a > large > request. We then get the request and just sum it all up. I'm wondering if > there's a more direct way. > On Oct 24, 2015 4:00 PM, "Upayavira" wrote: > > > Can you explain more what you are using TF for? Because it sounds rather > > like scoring. You could disable field norms and IDF and scoring would be > > mostly TF, no? > > > > Upayavira > > > > On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote: > > > Thanks, let me think about that. > > > > > > We're using termfreq to get the TF score, but we don't know which term > > > we'll need the TF for. So we'd have to do a corpuswide summing of > > > termfreq > > > for each potential term across all documents in the corpus. It seems like > > > it'd require some development work to compute that, and our code would be > > > fragile. > > > > > > Let me think about that more. > > > > > > It might make sense to just move to solrcloud, it's the right > > > architectural > > > decision anyway. > > > > > > > > > On Sat, Oct 24, 2015 at 1:54 PM, Upayavira wrote: > > > > > > > If you just want word length, then do work during indexing - index a > > > > field for the word length. Then, I believe you can do faceting - e.g. > > > > with the json faceting API I believe you can do a sum() calculation on > > a > > > > field rather than the more traditional count. > > > > > > > > Thinking aloud, there might be an easier way - index a field that is > > the > > > > same for all documents, and facet on it. Instead of counting the number > > > > of documents, calculate the sum() of your word count field. > > > > > > > > I *think* that should work. > > > > > > > > Upayavira > > > > > > > > On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote: > > > > > Hi Jack, > > > > > > > > > > I'm just using solr to get word count across a large number of > > documents. > > > > > > > > > > It's somewhat non-standard, because we're ignoring relevance, but it > > > > > seems > > > > > to work well for this use case otherwise. > > > > > > > > > > My understanding then is: > > > > > 1) since termfreq is pre-processed and fetched, there's no good way > > to > > > > > speed it up (except by caching earlier calculations) > > > > > > > > > > 2) there's no way to have solr sum up all of the termfreqs across all > > > > > documents in a search and just return one number for total termfreqs > > > > > > > > > > > > > > > Are these correct? > > > > > > > > > > Thanks, > > > > > Aki > > > > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky > > > > > > > > > > wrote: > > > > > > > > > > > That's what a normal query does - Lucene takes all the terms used > > in > > > > the > > > > > > query and sums them up for each document in the response, > > producing a > > > > > > single number, the score, for each document. That's the way Solr is > > > > > > designed to be used. You still haven't elaborated why you are > > trying > > > > to use > > > > > > Solr in a way other than it was intended. > > > > > > > > > > > > -- Jack Krupansky > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh > > > > wrote: > > > > > > > > > > > > > Gotcha - that's disheartening. > > > > > > > > > > > > > > One idea: when I run termfreq, I get all of the termfreqs for > > each > > > > > > document > > > > > > > one-by-one. > > > > > > > > > > > > > > Is there a way to have solr sum it up before creating the > > request, > > > > so I > > > > > > > only receive one number in the response? > > > > > > > > > > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira > > wrote: > > > > > > > > > > > > > > > If you mean using the term frequency function query, then I'm > > not > > > > sure > > > > > > > > there's a huge amount you can do to improve performance. > > > > > > > > > > > > > > > > The term frequency is a number that is used often, so it is > > stored > > > > in > > > > > > > > the index pre-calculated. Perhaps, if your data is not > > changing, > > > > > > > > optimising your index would reduce it to one segment, and thus > > > > might > > > > > > > > ever so slightly speed the aggregation of term frequencies, > > but I > > > > doubt > > > > > > > > it'd make enough difference to make it worth doing. > > > > > > > > > > > > > > > > Upayavira > > > > > > > > > > > > > > > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: > > > > > > > > > Thanks, Jack. I did some more research and found similar > > results. > > > > > > > > > > > > > > > > > > In our application, we are making multiple (think: 50)
Re: Does docValues impact termfreq ?
Certainly, yes. I'm just doing a word count, ie how often does a specific term come up in the corpus? On Oct 24, 2015 4:20 PM, "Upayavira" wrote: > yes, but what do you want to do with the TF? What problem are you > solving with it? If you are able to share that... > > On Sat, Oct 24, 2015, at 09:05 PM, Aki Balogh wrote: > > Yes, sorry, I am not being clear. > > > > We are not even doing scoring, just getting the raw TF values. We're > > doing > > this in solr because it can scale well. > > > > But with large corpora, retrieving the word counts takes some time, in > > part > > because solr is splitting up word count by document and generating a > > large > > request. We then get the request and just sum it all up. I'm wondering if > > there's a more direct way. > > On Oct 24, 2015 4:00 PM, "Upayavira" wrote: > > > > > Can you explain more what you are using TF for? Because it sounds > rather > > > like scoring. You could disable field norms and IDF and scoring would > be > > > mostly TF, no? > > > > > > Upayavira > > > > > > On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote: > > > > Thanks, let me think about that. > > > > > > > > We're using termfreq to get the TF score, but we don't know which > term > > > > we'll need the TF for. So we'd have to do a corpuswide summing of > > > > termfreq > > > > for each potential term across all documents in the corpus. It seems > like > > > > it'd require some development work to compute that, and our code > would be > > > > fragile. > > > > > > > > Let me think about that more. > > > > > > > > It might make sense to just move to solrcloud, it's the right > > > > architectural > > > > decision anyway. > > > > > > > > > > > > On Sat, Oct 24, 2015 at 1:54 PM, Upayavira wrote: > > > > > > > > > If you just want word length, then do work during indexing - index > a > > > > > field for the word length. Then, I believe you can do faceting - > e.g. > > > > > with the json faceting API I believe you can do a sum() > calculation on > > > a > > > > > field rather than the more traditional count. > > > > > > > > > > Thinking aloud, there might be an easier way - index a field that > is > > > the > > > > > same for all documents, and facet on it. Instead of counting the > number > > > > > of documents, calculate the sum() of your word count field. > > > > > > > > > > I *think* that should work. > > > > > > > > > > Upayavira > > > > > > > > > > On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote: > > > > > > Hi Jack, > > > > > > > > > > > > I'm just using solr to get word count across a large number of > > > documents. > > > > > > > > > > > > It's somewhat non-standard, because we're ignoring relevance, > but it > > > > > > seems > > > > > > to work well for this use case otherwise. > > > > > > > > > > > > My understanding then is: > > > > > > 1) since termfreq is pre-processed and fetched, there's no good > way > > > to > > > > > > speed it up (except by caching earlier calculations) > > > > > > > > > > > > 2) there's no way to have solr sum up all of the termfreqs > across all > > > > > > documents in a search and just return one number for total > termfreqs > > > > > > > > > > > > > > > > > > Are these correct? > > > > > > > > > > > > Thanks, > > > > > > Aki > > > > > > > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:20 AM, Jack Krupansky > > > > > > > > > > > > wrote: > > > > > > > > > > > > > That's what a normal query does - Lucene takes all the terms > used > > > in > > > > > the > > > > > > > query and sums them up for each document in the response, > > > producing a > > > > > > > single number, the score, for each document. That's the way > Solr is > > > > > > > designed to be used. You still haven't elaborated why you are > > > trying > > > > > to use > > > > > > > Solr in a way other than it was intended. > > > > > > > > > > > > > > -- Jack Krupansky > > > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh < > a...@marketmuse.com> > > > > > wrote: > > > > > > > > > > > > > > > Gotcha - that's disheartening. > > > > > > > > > > > > > > > > One idea: when I run termfreq, I get all of the termfreqs for > > > each > > > > > > > document > > > > > > > > one-by-one. > > > > > > > > > > > > > > > > Is there a way to have solr sum it up before creating the > > > request, > > > > > so I > > > > > > > > only receive one number in the response? > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Oct 24, 2015 at 11:05 AM, Upayavira > > > wrote: > > > > > > > > > > > > > > > > > If you mean using the term frequency function query, then > I'm > > > not > > > > > sure > > > > > > > > > there's a huge amount you can do to improve performance. > > > > > > > > > > > > > > > > > > The term frequency is a number that is used often, so it is > > > stored > > > > > in > > > > > > > > > the index pre-calculated. Perhaps, if your data is not > > > changing, > > > > > > > > > optimising your index would reduce it to one segment, and > thus > > > > > might > > > >
Re: any clean test failing
I've been seeing this happen a lot lately, it seems like a series of lock files are left around under some conditions. I've also incorporated some of Mark Miller's suggestions, but perhaps one of my upgrades undid that work. I've found it much less painful to remove all the *.lck files, I don't have to wait for a _long_ time to get all the files backI use find . -name "*.lck" | xargs rm there are other ways too. On Sat, Oct 24, 2015 at 12:12 PM, William Bell wrote: > OK I deleted /home/solr/.ivy2 and it started working. > > On Sat, Oct 24, 2015 at 11:57 AM, William Bell wrote: > >> It is getting stuck on resolve. >> >> ant clean test >> >> SOLR 5.3.1 >> >> [ivy:retrieve] retrieve done (5ms) >> >> Overriding previous definition of property "ivy.version" >> >> [ivy:retrieve] no resolved descriptor found: launching default resolve >> >> Overriding previous definition of property "ivy.version" >> >> [ivy:retrieve] using ivy parser to parse >> file:/home/solr/src/lucene_solr_5_3_1A/solr/server/ivy.xml >> >> [ivy:retrieve] :: resolving dependencies :: org.apache.solr# >> example;work...@hgsolr2devmstr.healthgrades.com >> >> [ivy:retrieve] confs: [logging] >> >> [ivy:retrieve] validate = true >> >> [ivy:retrieve] refresh = false >> >> [ivy:retrieve] resolving dependencies for configuration 'logging' >> >> [ivy:retrieve] == resolving dependencies for org.apache.solr# >> example;work...@hgsolr2devmstr.healthgrades.com [logging] >> >> [ivy:retrieve] == resolving dependencies >> org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->log4j#log4j;1.2.17 >> [logging->master] >> >> [ivy:retrieve] default: Checking cache for: dependency: log4j#log4j;1.2.17 >> {logging=[master]} >> >> [ivy:retrieve] don't use cache for log4j#log4j;1.2.17: checkModified=true >> >> [ivy:retrieve] tried >> /home/solr/.ivy2/local/log4j/log4j/1.2.17/ivys/ivy.xml >> >> [ivy:retrieve] tried >> /home/solr/.ivy2/local/log4j/log4j/1.2.17/jars/log4j.jar >> >> [ivy:retrieve] local: no ivy file nor artifact found for >> log4j#log4j;1.2.17 >> >> [ivy:retrieve] main: Checking cache for: dependency: log4j#log4j;1.2.17 >> {logging=[master]} >> >> [ivy:retrieve] main: module revision found in cache: log4j#log4j;1.2.17 >> >> [ivy:retrieve] found log4j#log4j;1.2.17 in public >> >> [ivy:retrieve] == resolving dependencies >> org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->org.slf4j#slf4j-api;1.7.7 >> [logging->master] >> >> [ivy:retrieve] default: Checking cache for: dependency: >> org.slf4j#slf4j-api;1.7.7 {logging=[master]} >> >> [ivy:retrieve] don't use cache for org.slf4j#slf4j-api;1.7.7: >> checkModified=true >> >> [ivy:retrieve] tried >> /home/solr/.ivy2/local/org.slf4j/slf4j-api/1.7.7/ivys/ivy.xml >> >> [ivy:retrieve] tried >> /home/solr/.ivy2/local/org.slf4j/slf4j-api/1.7.7/jars/slf4j-api.jar >> >> [ivy:retrieve] local: no ivy file nor artifact found for >> org.slf4j#slf4j-api;1.7.7 >> >> [ivy:retrieve] main: Checking cache for: dependency: >> org.slf4j#slf4j-api;1.7.7 {logging=[master]} >> >> [ivy:retrieve] main: module revision found in cache: >> org.slf4j#slf4j-api;1.7.7 >> >> [ivy:retrieve] found org.slf4j#slf4j-api;1.7.7 in public >> >> [ivy:retrieve] == resolving dependencies >> org.apache.solr#example;work...@hgsolr2devmstr.healthgrades.com->org.slf4j#jcl-over-slf4j;1.7.7 >> [logging->master] >> >> [ivy:retrieve] default: Checking cache for: dependency: >> org.slf4j#jcl-over-slf4j;1.7.7 {logging=[master]} >> >> [ivy:retrieve] don't use cache for org.slf4j#jcl-over-slf4j;1.7.7: >> checkModified=true >> >> -- >> Bill Bell >> billnb...@gmail.com >> cell 720-256-8076 >> > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076
RE: DIH Caching with Delta Import
Dyer, James-2 wrote > The DIH Cache feature does not work with delta import. Actually, much of > DIH does not work with delta import. The workaround you describe is > similar to the approach described here: > https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , > which in my opinion is the best way to implement partial updates with DIH. Not what I was hoping to hear but at least that explains the delta import funkyness we were experiencing. Thank you for providing the partial updates implementation link. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Caching-with-Delta-Import-tp4235598p4236384.html Sent from the Solr - User mailing list archive at Nabble.com.
Using the ExtractRequestHandler
Hi, We are using Solr and need help using the ExtractRequestHandler wherein we cannot decide what input parameters we need to specify.Kindly help *Salonee Rege* USC Viterbi School of Engineering University of Southern California Master of Computer Science - Student Computer Science - B.E salon...@usc.edu *||* *619-709-6756*
Re: EdgeNGramFilterFactory for Chinese characters
> I have rich-text documents that are in both English and Chinese, and > currently I have EdgeNGramFilterFactory enabled during indexing, as I need > it for partial matching for English words. But this means it will also > break up each of the Chinese characters into different tokens. EdgeNGramFilterFactory creates sub-strings (prefixes) from each token. Its behavior is independent of language. If you need to perform partial (prefix) match for **only English words**, you can create a separate field that keeps only English words (I've never tried that, but might be possible by PatternTokenizerFactory or other tokenizer/filter chains...,) and apply EdgeNGramFilterFactory to the field. Hope it helps, Tomoko 2015-10-23 13:04 GMT+09:00 Zheng Lin Edwin Yeo : > Hi, > > Would like to check, is it good to use EdgeNGramFilterFactory for indexes > that contains Chinese characters? > Will it affect the accuracy of the search for Chinese words? > > I have rich-text documents that are in both English and Chinese, and > currently I have EdgeNGramFilterFactory enabled during indexing, as I need > it for partial matching for English words. But this means it will also > break up each of the Chinese characters into different tokens. > > I'm using the HMMChineseTokenizerFactory for my tokenizer. > > Thank you. > > Regards, > Edwin >