Re: indexing unique keys

2014-09-05 Thread Mikhail Khludnev
Hello, You are asking without giving a context. What's the size of sets, desired TPS, key length, and even values? It's hard to answer definitely. It's not primary usage for Lucene, it adds some unnecessary overhead. However, community collected a few workaround for such kind of problem. From the

Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Li, Ryan
HI Shawn, Thanks for your reply. The memory setting of my Solr box is 12G physically memory. 4G for java (-Xmx4096m) The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0. I do think the RAM size of java is one of the reasons for this slowness. I'm doing one big commit an

RE: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Li, Ryan
Hi Erick, As Ryan Ernst noticed, those big fields (eg majorTextSignalStem) is not stored. There are a few stored fields in my schema, but they are very small fields basically name or id for that document. I tried turn them off(only store id filed) and that didn't make any difference. Thanks,

Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Li, Ryan
Hi Guys, Just some update. I've tried with Solr 4.10 (same code for Solr 4.9). And that has the same index speed as 4.0. The only problem left now is that Solr 4.10 takes more memory than 4.0 so I'm trying to figure out what is the best number for Java heap size. I think that proves there is s

FAST-like document vector data structures in Solr?

2014-09-05 Thread Jürgen Wagner (DVT)
Hello all, as the migration from FAST to Solr is a relevant topic for several of our customers, there is one issue that does not seem to be addressed by Lucene/Solr: document vectors FAST-style. These document vectors are used to form metrics of similarity, i.e., they may be used as a "semantic f

SolrJ 4.10.0 errors

2014-09-05 Thread Guido Medina
Hi, I have upgraded to from Solr 4.9 to 4.10 and the server side seems fine but the client is reporting the following exception: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: solr_host.somedomain at org.apache.solr.client.solrj.impl.

Re: FAST-like document vector data structures in Solr?

2014-09-05 Thread jim ferenczi
Hi, Something like ?: https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component And just to show some impressive search functionality of the wiki: ;) https://cwiki.apache.org/confluence/dosearchsite.action?where=solr&spaceSearch=true&queryString=document+vectors Cheers, Jim 2014

Re: SolrJ 4.10.0 errors

2014-09-05 Thread Guido Medina
Sorry I didn't give enough information so I'm adding to it, the SolrJ client is on our webapp and the documents are getting indexed properly into Solr, the only problem we are seeing is that with SolrJ 4.10 once Solr server response comes back it seems like SolrJ client doesn't know what to wit

Re: FAST-like document vector data structures in Solr?

2014-09-05 Thread Jürgen Wagner (DVT)
Hello Jim, yes, I am aware of the TermVector and MoreLikeThis stuff. I am presently mapping docvectors to these mechanisms and create term vectors myself from third-party text mining components. However, it's not quite like the FAST docvectors. Particularily, the performance of MoreLikeThis quer

Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Alexandre Rafalovitch
Why do one big commit? You could do hard commits along the way but keep searcher open and not see the changes until the end. Obviously a separate issue from memory consumption discussion, but thought I'll add it anyway. Regards, Alex On 05/09/2014 3:30 am, "Li, Ryan" wrote: > HI Shawn, > >

statuscode list

2014-09-05 Thread Jan Verweij - Reeleez
Hi, If I'm correct you will get a statuscode="0" in the response if you use XML messages for updating the solr index. Is there a list of possible other statuscodes you can receive in case anything fails and what these errorcodes mean? THNX, Jan.

Re: Solr API for getting shard's leader/replica status

2014-09-05 Thread manohar211
Thanks for the comments!! I found out the solution on how I can get the replica's state. Here's the piece of code. while (iter.hasNext()) { Slice slice = iter.next(); for(Replica replica:slice.getReplicas()) { System.out.println("replica state for " + replica.getStr("c

Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Mikhail Khludnev
On Fri, Sep 5, 2014 at 3:22 PM, Alexandre Rafalovitch wrote: > Why do one big commit? You could do hard commits along the way but keep > searcher open and not see the changes until the end. > Alexandre, I don't think it's can happen in solr-user list, next search pickups the new searcher. Ryan,

Re: FAST-like document vector data structures in Solr?

2014-09-05 Thread Jack Krupansky
For reference: “Item Similarity Vector Reference This property represents a similarity reference when searching for similar items. This is a similarity vector representation that is returned for each item in the query result in the docvector managed property. The value is a string formatted ac

Re: FAST-like document vector data structures in Solr?

2014-09-05 Thread Mikhail Khludnev
Jürgen, I can't get it. Can you tell more about this feature or point to the doc? Thanks On Fri, Sep 5, 2014 at 11:44 AM, "Jürgen Wagner (DVT)" < juergen.wag...@devoteam.com> wrote: > Hello all, > as the migration from FAST to Solr is a relevant topic for several of > our customers, there is

How to implement multilingual word components fields schema?

2014-09-05 Thread Ilia Sretenskii
Hello. We have documents with multilingual words which consist of different languages parts and seach queries of the same complexity, and it is a worldwide used online application, so users generate content in all the possible world languages. For example: 言語-aware Løgismose-alike ຄໍາຮ້ອງສະຫມັກ-de

Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Alexandre Rafalovitch
On Fri, Sep 5, 2014 at 9:55 AM, Mikhail Khludnev wrote: >> Why do one big commit? You could do hard commits along the way but keep >> searcher open and not see the changes until the end. >> > > Alexandre, > I don't think it's can happen in solr-user list, next search pickups the > new searcher. W

Re: How to implement multilingual word components fields schema?

2014-09-05 Thread Jack Krupansky
It comes down to how you personally want to value compromises between conflicting requirements, such as relative weighting of false positives and false negatives. Provide a few use cases that illustrate the boundary cases that you care most about. For example field values that have snippets in o

Is there any sentence tokenizers in sold 4.9.0?

2014-09-05 Thread Sandeep B A
Hi, I was looking out the options for sentence tokenizers default in solr but could not find it. Does any one used? Integrated from any other language tokenizers to solr. Example python etc.. Please let me know. Thanks and regards, Sandeep

Re: FAST-like document vector data structures in Solr?

2014-09-05 Thread Jürgen Wagner (DVT)
Thanks for posting this. I was just about to send off a message of similar content :-) Important to add: - In FAST ESP, you could have more than one such docvector associated with a document, in order to reflect different metrics. - Term weights in docvectors are document-relative, not absolute.

Re: FAST-like document vector data structures in Solr?

2014-09-05 Thread Jack Krupansky
Sounds like a great future to add to Solr, especially if it would facilitate more automatic relevancy enhancement. LucidWorks Search has a feature called "unsupervised feedback" that does that but something like a docvector might make it a more realistic default. -- Jack Krupansky -Origin

Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-05 Thread Sandeep B A
Sorry for typo it is solr 4.9.0 instead of sold 4.9.0 On Sep 5, 2014 7:48 PM, "Sandeep B A" wrote: > Hi, > > I was looking out the options for sentence tokenizers default in solr but > could not find it. Does any one used? Integrated from any other language > tokenizers to solr. Example python e

Re: Query ReRanking question

2014-09-05 Thread Ravi Solr
Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse

RE: Query ReRanking question

2014-09-05 Thread Markus Jelsma
Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- > From:Ravi Solr mailto:ravis...@gmail.com> > > Sent: Friday 5th September

Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Erick Erickson
Alexandre: It Depends (tm) of course. It all hinges on the setting in , whether is true or false. In the former case, you, well, open a new searcher. In the latter you don't. I agree, though, this is all tangential to the memory consumption issue since the RAM buffer will be flushed regardless

Re: Query ReRanking question

2014-09-05 Thread Erick Erickson
OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crash&rq={!rerank reRankDocs=1000 reRankQuery=$myquery}&myquery=*:*&sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma wrote:

Re: Query ReRanking question

2014-09-05 Thread Walter Underwood
Boosting on recency is probably a better approach. A fixed re-ranking horizon will always be a compromise, a guess at the precision of the query. It will give poor results for queries that are more or less specific than the assumption. Think of the recency boost as a tie-breaker. When documents

Re: Edismax mm and efficiency

2014-09-05 Thread Walter Underwood
Great! We have some very long queries, where students paste entire homework problems. One of them was 1051 words. Many of them are over 100 words. This could help. In the Jira discussion, I saw some comments about handling the most sparse lists first. We did something like that in the Infoseek

Re: SolrJ 4.10.0 errors

2014-09-05 Thread Shawn Heisey
On 9/5/2014 3:50 AM, Guido Medina wrote: > Sorry I didn't give enough information so I'm adding to it, the SolrJ > client is on our webapp and the documents are getting indexed properly > into Solr, the only problem we are seeing is that with SolrJ 4.10 once > Solr server response comes back it see

RE: How to implement multilingual word components fields schema?

2014-09-05 Thread Susheel Kumar
Agree with the approach Jack suggested to use same source text in multiple fields for each language and then doing a dismax query. Would love to hear if it works for you? Thanks, Susheel -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, September 05

Re: How to implement multilingual word components fields schema?

2014-09-05 Thread Tom Burton-West
Hi Ilia, I don't know if it would be helpful but below I've listed some academic papers on this issue of how best to deal with mixed language/mixed script queries and documents. They are probably taking a more complex approach than you will want to use, but perhaps they will help to think about

RE: Is there any sentence tokenizers in sold 4.9.0?

2014-09-05 Thread Susheel Kumar
There is SmartChineseSentenceTokenizerFactory or SentenceTokenizer which is getting being deprecated & replaced with HMMChineseTokenizer. Not aware of other tokenizer but you may to either build your own similar to SentenceTokenizer or employ any external Sentence detection/recognizer & built

Re: Query ReRanking question

2014-09-05 Thread Ravi Solr
Erick, I believe when you apply sort this way it runs the query and sort first and then tries to rerank...so basically it already lost the true relevancy because of sort taking precedence. Am I making sense ? Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson wrote: > OK, why ca

Re: Query ReRanking question

2014-09-05 Thread Ravi Solr
Walter, thank you for the valuable insight. The problem I am facing is that between the term frequencies, mm, date boost and stemming the results can become very inconsistent...Look at the following examples Here the chronology is all over the place because of what I mentioned above http://www.was

How to solve?

2014-09-05 Thread William Bell
We have a core with each document as a person. We want to boost based on the sweater color, but if the person has sweaters in their closet which are the same manufactuer we want to boost even more by adding them together. Peter Smit - Sweater: Blue = 1 : Nike, Sweater: Red = 2: Nike, Sweater: Blu

Re: Query ReRanking question

2014-09-05 Thread Joel Bernstein
You can probably use the FunctionQParserPlugin in conjunction with Query ReRanking to achieve what you're trying to do. q=foo&rq={!rerank reRankDocs=1000 reRankQuery=$qq}&qq={!func}someFunction() What this is going to do is rerank the docs based on a function query. Your function query will need