Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
I believe that very many installations of solr actually need a query expansion such as the one you describe below with an indexing of each textual fields in multiple forms (string, straight (whitespace/ideaograms), stemmed, phonetic). Thanks to edismax, I think, you would do the following expansi

Invalid parsing with solr edismax operators

2015-11-01 Thread Mahmoud Almokadem
Hello, I'm using solr 4.8.1. Using edismax as the parser we got the undesirable parsed queries and results. The following is two different cases with strange behavior: Searching with these parameters "mm":"2", "df":"TotalField", "debug":"true", "indent":"true", "fl":"Title", "start":"

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Alexandre Rafalovitch
Which is what I believe Ted Sullivan is working on and presented at the latest Lucene/Solr Revolution. His presentation does not seem to be up, but he was writing about it on: http://lucidworks.com/blog/author/tedsullivan/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
Alexandre, I guess you are talking about that post: http://lucidworks.com/blog/2015/06/06/query-autofiltering-extended-language-logic-search/ I think it is very often impossible to solve properly. Words such as "direction" have very many meanings and would come in different fields. In IMDB, wo

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Yangrui Guo
I debugged the query and found the query has been translated into _text_:Kate AND _text_:Winslet, which _text_ is the default search field. Because my documents use parent/child relation it appeared that if there's no exact match of Kate Winslet, solr will return all documents contains "Kate" and "

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Erick Erickson
If your goal is to have docs with "kate" and "winslet" in the _name_ field be scored higher, just make that explicit as name:(kate AND winslet) perhaps boosting as name:(kate AND winslet)^10 or add it as a clause q=kate AND winslet OR name:(kate AND winslet)^10 or even q=kate AND winslet OR name:(k

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-11-01 Thread fabigol
hi, if i understand well. In the configuration following:

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Yangrui Guo
Could you tell me more about the edismax approach? I'm new to it. Thanks a lot On Sunday, November 1, 2015, Erick Erickson wrote: > If your goal is to have docs with "kate" and "winslet" > in the _name_ field be scored higher, just make that > explicit as > name:(kate AND winslet) > perhaps boos

logical steps to configuring file-based spell-check

2015-11-01 Thread Mark Fenbers
Greetings! I want my spell-checker to be based on a file (/usr/share/dict/linux.words should suffice). Word-breaks features would also be a benefit. I have previously indexed my docs for searching with minimal alterations to the baseline Solr configuration. My "docs" are user-typed text, t

How turn on logging for segment merging

2015-11-01 Thread Pushkar Raste
Is segment merging information logged at level finer than INFO? I have application setup with INFO level logging and I am indexing documents at rate of about few hundred a min. I am using default merge policy parameters. However I never see logs that can give me information about segment merging.

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Yangrui Guo
I've just read the post and it has addressed much of my issue. It is hard to detect phrases and disambiguate phrases but some existing approaches seem really promising. On Sunday, November 1, 2015, Paul Libbrecht wrote: > Alexandre, > > I guess you are talking about that post: > > > http://lucid

Re: How turn on logging for segment merging

2015-11-01 Thread Tomás Fernández Löbbe
You can turn on "infoStream" from the solrconfig: https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-OtherIndexingSettings Tomás On Sun, Nov 1, 2015 at 8:59 AM, Pushkar Raste wrote: > Is segment merging information logged at level finer than INFO?

Re: Question on index time de-duplication

2015-11-01 Thread shamik
That's what I observed as well. Perhaps there's a way to customize SignatureUpdateProcessorFactory to support my use case. I'll look into the source code and figure if there's a way to do it. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-index-time-de-duplicati

Re: Is it possible to use JiebaTokenizer for multilingual documents?

2015-11-01 Thread Zheng Lin Edwin Yeo
Here's my configuration in schmea.xml for the JiebaTokenizerFactory. Could there be any problems that might be causing the English characters issue? Regards, Edwin On 29 October 2015 at 17:51, Zheng Lin Edwin Yeo wrote: > I would like to check, is it p

Very high memory and CPU utilization.

2015-11-01 Thread Modassar Ather
Hi, I have a setup of 12 shard cluster started with 28gb memory each on a single server. There are no replica. The size of index is around 90gb on each shard. The Solr version is 5.2.1. When I query "network se*", the memory utilization goes upto 24-26 gb and the query takes around 3+ minutes to e

warning

2015-11-01 Thread Midas A
Please explain following warning Starting log replay tlog{file=/mnt/vol1/path/data/tlog/tlog.0060544 refcount=2} active=false starting pos=0 Is there any harm with this error ?