Hello,
You are asking without giving a context. What's the size of sets, desired
TPS, key length, and even values?
It's hard to answer definitely. It's not primary usage for Lucene, it adds
some unnecessary overhead. However, community collected a few workaround
for such kind of problem. From the
HI Shawn,
Thanks for your reply.
The memory setting of my Solr box is
12G physically memory.
4G for java (-Xmx4096m)
The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0.
I do think the RAM size of java is one of the reasons for this slowness. I'm
doing one big commit an
Hi Erick,
As Ryan Ernst noticed, those big fields (eg majorTextSignalStem) is not
stored. There are a few stored fields in my schema, but they are very small
fields basically name or id for that document. I tried turn them off(only
store id filed) and that didn't make any difference.
Thanks,
Hi Guys,
Just some update.
I've tried with Solr 4.10 (same code for Solr 4.9). And that has the same index
speed as 4.0. The only problem left now is that Solr 4.10 takes more memory
than 4.0 so I'm trying to figure out what is the best number for Java heap size.
I think that proves there is s
Hello all,
as the migration from FAST to Solr is a relevant topic for several of
our customers, there is one issue that does not seem to be addressed by
Lucene/Solr: document vectors FAST-style. These document vectors are
used to form metrics of similarity, i.e., they may be used as a
"semantic f
Hi,
I have upgraded to from Solr 4.9 to 4.10 and the server side seems fine
but the client is reporting the following exception:
org.apache.solr.client.solrj.SolrServerException: IOException occured
when talking to server at: solr_host.somedomain
at
org.apache.solr.client.solrj.impl.
Hi,
Something like ?:
https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
And just to show some impressive search functionality of the wiki: ;)
https://cwiki.apache.org/confluence/dosearchsite.action?where=solr&spaceSearch=true&queryString=document+vectors
Cheers,
Jim
2014
Sorry I didn't give enough information so I'm adding to it, the SolrJ
client is on our webapp and the documents are getting indexed properly
into Solr, the only problem we are seeing is that with SolrJ 4.10 once
Solr server response comes back it seems like SolrJ client doesn't know
what to wit
Hello Jim,
yes, I am aware of the TermVector and MoreLikeThis stuff. I am
presently mapping docvectors to these mechanisms and create term vectors
myself from third-party text mining components.
However, it's not quite like the FAST docvectors. Particularily, the
performance of MoreLikeThis quer
Why do one big commit? You could do hard commits along the way but keep
searcher open and not see the changes until the end.
Obviously a separate issue from memory consumption discussion, but thought
I'll add it anyway.
Regards,
Alex
On 05/09/2014 3:30 am, "Li, Ryan" wrote:
> HI Shawn,
>
>
Hi,
If I'm correct you will get a statuscode="0" in the response if you
use XML messages for updating the solr index.
Is there a list of possible other statuscodes you can receive in case
anything fails and what these errorcodes mean?
THNX,
Jan.
Thanks for the comments!!
I found out the solution on how I can get the replica's state. Here's the
piece of code.
while (iter.hasNext()) {
Slice slice = iter.next();
for(Replica replica:slice.getReplicas()) {
System.out.println("replica state for " + replica.getStr("c
On Fri, Sep 5, 2014 at 3:22 PM, Alexandre Rafalovitch
wrote:
> Why do one big commit? You could do hard commits along the way but keep
> searcher open and not see the changes until the end.
>
Alexandre,
I don't think it's can happen in solr-user list, next search pickups the
new searcher.
Ryan,
For reference:
“Item Similarity Vector Reference
This property represents a similarity reference when searching for similar
items. This is a similarity vector representation that is returned for each
item in the query result in the docvector managed property.
The value is a string formatted ac
Jürgen,
I can't get it. Can you tell more about this feature or point to the doc?
Thanks
On Fri, Sep 5, 2014 at 11:44 AM, "Jürgen Wagner (DVT)" <
juergen.wag...@devoteam.com> wrote:
> Hello all,
> as the migration from FAST to Solr is a relevant topic for several of
> our customers, there is
Hello.
We have documents with multilingual words which consist of different
languages parts and seach queries of the same complexity, and it is a
worldwide used online application, so users generate content in all the
possible world languages.
For example:
言語-aware
Løgismose-alike
ຄໍາຮ້ອງສະຫມັກ-de
On Fri, Sep 5, 2014 at 9:55 AM, Mikhail Khludnev
wrote:
>> Why do one big commit? You could do hard commits along the way but keep
>> searcher open and not see the changes until the end.
>>
>
> Alexandre,
> I don't think it's can happen in solr-user list, next search pickups the
> new searcher.
W
It comes down to how you personally want to value compromises between
conflicting requirements, such as relative weighting of false positives and
false negatives. Provide a few use cases that illustrate the boundary cases
that you care most about. For example field values that have snippets in o
Hi,
I was looking out the options for sentence tokenizers default in solr but
could not find it. Does any one used? Integrated from any other language
tokenizers to solr. Example python etc.. Please let me know.
Thanks and regards,
Sandeep
Thanks for posting this. I was just about to send off a message of
similar content :-)
Important to add:
- In FAST ESP, you could have more than one such docvector associated
with a document, in order to reflect different metrics.
- Term weights in docvectors are document-relative, not absolute.
Sounds like a great future to add to Solr, especially if it would facilitate
more automatic relevancy enhancement. LucidWorks Search has a feature called
"unsupervised feedback" that does that but something like a docvector might
make it a more realistic default.
-- Jack Krupansky
-Origin
Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
On Sep 5, 2014 7:48 PM, "Sandeep B A" wrote:
> Hi,
>
> I was looking out the options for sentence tokenizers default in solr but
> could not find it. Does any one used? Integrated from any other language
> tokenizers to solr. Example python e
Thank you very much for responding. I want to do exactly the opposite of
what you said. I want to sort the relevant docs in reverse chronology. If
you sort by date before hand then the relevancy is lost. So I want to get
Top N relevant results and then rerank those Top N to achieve relevant
reverse
Hi - You can already achieve this by boosting on the document's recency. The
result set won't be exactly ordered by date but you will get the most relevant
and recent documents on top.
Markus
-Original message-
> From:Ravi Solr mailto:ravis...@gmail.com> >
> Sent: Friday 5th September
Alexandre:
It Depends (tm) of course. It all hinges on the setting in ,
whether is true or false.
In the former case, you, well, open a new searcher. In the latter you don't.
I agree, though, this is all tangential to the memory consumption issue since
the RAM buffer will be flushed regardless
OK, why can't you switch the clauses from Joel's suggestion?
Something like:
q=Malaysia plane crash&rq={!rerank reRankDocs=1000
reRankQuery=$myquery}&myquery=*:*&sort=date+desc
(haven't tried this yet, but you get the idea).
Best,
Erick
On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
wrote:
Boosting on recency is probably a better approach. A fixed re-ranking horizon
will always be a compromise, a guess at the precision of the query. It will
give poor results for queries that are more or less specific than the
assumption.
Think of the recency boost as a tie-breaker. When documents
Great!
We have some very long queries, where students paste entire homework problems.
One of them was 1051 words. Many of them are over 100 words. This could help.
In the Jira discussion, I saw some comments about handling the most sparse
lists first. We did something like that in the Infoseek
On 9/5/2014 3:50 AM, Guido Medina wrote:
> Sorry I didn't give enough information so I'm adding to it, the SolrJ
> client is on our webapp and the documents are getting indexed properly
> into Solr, the only problem we are seeing is that with SolrJ 4.10 once
> Solr server response comes back it see
Agree with the approach Jack suggested to use same source text in multiple
fields for each language and then doing a dismax query. Would love to hear if
it works for you?
Thanks,
Susheel
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, September 05
Hi Ilia,
I don't know if it would be helpful but below I've listed some academic
papers on this issue of how best to deal with mixed language/mixed script
queries and documents. They are probably taking a more complex approach
than you will want to use, but perhaps they will help to think about
There is SmartChineseSentenceTokenizerFactory or SentenceTokenizer which is
getting being deprecated & replaced with HMMChineseTokenizer. Not aware of
other tokenizer but you may to either build your own similar to
SentenceTokenizer or employ any external Sentence detection/recognizer & built
Erick, I believe when you apply sort this way it runs the query and sort
first and then tries to rerank...so basically it already lost the true
relevancy because of sort taking precedence. Am I making sense ?
Ravi Kiran Bhaskar
On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson
wrote:
> OK, why ca
Walter, thank you for the valuable insight. The problem I am facing is that
between the term frequencies, mm, date boost and stemming the results can
become very inconsistent...Look at the following examples
Here the chronology is all over the place because of what I mentioned above
http://www.was
We have a core with each document as a person.
We want to boost based on the sweater color, but if the person has sweaters
in their closet which are the same manufactuer we want to boost even more
by adding them together.
Peter Smit - Sweater: Blue = 1 : Nike, Sweater: Red = 2: Nike, Sweater:
Blu
You can probably use the FunctionQParserPlugin in conjunction with Query
ReRanking to achieve what you're trying to do.
q=foo&rq={!rerank reRankDocs=1000 reRankQuery=$qq}&qq={!func}someFunction()
What this is going to do is rerank the docs based on a function query.
Your function query will need
36 matches
Mail list logo