I added a comment on the INFRA issue.
I don't understand why it periodically "gets stuck".
Mike McCandless
http://blog.mikemccandless.com
On Fri, Oct 23, 2015 at 11:27 AM, Kevin Risden
wrote:
> It looks like both Apache Git mirror (git://git.apache.org/lucene-solr.git)
> and GitHub mirror (ht
Looking at the code and jira I see that ordering actions in solrj update
request is currently not supported but I'd like to know if there is any
other way to get this capability. I took a quick look at the XML loader
and it appears to process actions as it sees them so if the order was
changed to
Hi Jamie!
On Sat, Oct 24, 2015 at 7:21 AM, Jamie Johnson wrote:
> Looking at the code and jira I see that ordering actions in solrj update
> request is currently not supported but I'd like to know if there is any
> other way to get this capability. I took a quick look at the XML loader
> and it
Thanks, Jack. I did some more research and found similar results.
In our application, we are making multiple (think: 50) concurrent requests
to calculate term frequency on a set of documents in "real-time". The
faster that results return, the better.
Most of these requests are unique, so cache on
If you mean using the term frequency function query, then I'm not sure
there's a huge amount you can do to improve performance.
The term frequency is a number that is used often, so it is stored in
the index pre-calculated. Perhaps, if your data is not changing,
optimising your index would reduce
Gotcha - that's disheartening.
One idea: when I run termfreq, I get all of the termfreqs for each document
one-by-one.
Is there a way to have solr sum it up before creating the request, so I
only receive one number in the response?
On Sat, Oct 24, 2015 at 11:05 AM, Upayavira wrote:
> If you m
That's what a normal query does - Lucene takes all the terms used in the
query and sums them up for each document in the response, producing a
single number, the score, for each document. That's the way Solr is
designed to be used. You still haven't elaborated why you are trying to use
Solr in a wa
Hi Jack,
I'm just using solr to get word count across a large number of documents.
It's somewhat non-standard, because we're ignoring relevance, but it seems
to work well for this use case otherwise.
My understanding then is:
1) since termfreq is pre-processed and fetched, there's no good way to
If you just want word length, then do work during indexing - index a
field for the word length. Then, I believe you can do faceting - e.g.
with the json faceting API I believe you can do a sum() calculation on a
field rather than the more traditional count.
Thinking aloud, there might be an easier
It is getting stuck on resolve.
ant clean test
SOLR 5.3.1
[ivy:retrieve] retrieve done (5ms)
Overriding previous definition of property "ivy.version"
[ivy:retrieve] no resolved descriptor found: launching default resolve
Overriding previous definition of property "ivy.version"
[ivy:retrieve]
Thanks, let me think about that.
We're using termfreq to get the TF score, but we don't know which term
we'll need the TF for. So we'd have to do a corpuswide summing of termfreq
for each potential term across all documents in the corpus. It seems like
it'd require some development work to compute
OK I deleted /home/solr/.ivy2 and it started working.
On Sat, Oct 24, 2015 at 11:57 AM, William Bell wrote:
> It is getting stuck on resolve.
>
> ant clean test
>
> SOLR 5.3.1
>
> [ivy:retrieve] retrieve done (5ms)
>
> Overriding previous definition of property "ivy.version"
>
> [ivy:retrieve] n
On 10/24/2015 5:21 AM, Jamie Johnson wrote:
> Looking at the code and jira I see that ordering actions in solrj update
> request is currently not supported but I'd like to know if there is any
> other way to get this capability. I took a quick look at the XML loader
> and it appears to process act
Can you explain more what you are using TF for? Because it sounds rather
like scoring. You could disable field norms and IDF and scoring would be
mostly TF, no?
Upayavira
On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote:
> Thanks, let me think about that.
>
> We're using termfreq to get the T
Yes, sorry, I am not being clear.
We are not even doing scoring, just getting the raw TF values. We're doing
this in solr because it can scale well.
But with large corpora, retrieving the word counts takes some time, in part
because solr is splitting up word count by document and generating a lar
yes, but what do you want to do with the TF? What problem are you
solving with it? If you are able to share that...
On Sat, Oct 24, 2015, at 09:05 PM, Aki Balogh wrote:
> Yes, sorry, I am not being clear.
>
> We are not even doing scoring, just getting the raw TF values. We're
> doing
> this in s
Certainly, yes. I'm just doing a word count, ie how often does a specific
term come up in the corpus?
On Oct 24, 2015 4:20 PM, "Upayavira" wrote:
> yes, but what do you want to do with the TF? What problem are you
> solving with it? If you are able to share that...
>
> On Sat, Oct 24, 2015, at 09
I've been seeing this happen a lot lately, it seems like a series of lock
files are left around under some conditions. I've also incorporated some
of Mark Miller's suggestions, but perhaps one of my upgrades undid
that work.
I've found it much less painful to remove all the *.lck files, I don't
ha
Dyer, James-2 wrote
> The DIH Cache feature does not work with delta import. Actually, much of
> DIH does not work with delta import. The workaround you describe is
> similar to the approach described here:
> https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ,
> which in my op
Hi,
We are using Solr and need help using the ExtractRequestHandler wherein
we cannot decide what input parameters we need to specify.Kindly help
*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon..
> I have rich-text documents that are in both English and Chinese, and
> currently I have EdgeNGramFilterFactory enabled during indexing, as I need
> it for partial matching for English words. But this means it will also
> break up each of the Chinese characters into different tokens.
EdgeNGramFil
21 matches
Mail list logo