Re: Document Cache

2016-03-18 Thread Shawn Heisey
On 3/18/2016 8:22 AM, Rallavagu wrote: > So, each soft commit would create a new searcher that would invalidate > the old cache? > > Here is the configuration for Document Cache > > initialSize="10" autowarmCount="0"/> > > true In an earlier message, you indicated you're running into OOM. I

RE: Explain score is different from score

2016-03-18 Thread G, Rajesh
I don’t use boost at index time and query time. Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. This e-mail and/or its attachments are intended o

using solr AnalyticsQuery API vs facet API

2016-03-18 Thread sudsport s
Hi , I am planning to write custom aggregator in solr which will use some probabilistic data structures per shard to accumate results and then after shard merging results will be sent to user as integer. I explored 2 options to do this 1. Solr analytics API https://cwiki.apache.org/confluence/di

Re: Actual (specific) RT Search?

2016-03-18 Thread Erick Erickson
bq: My guess so far is that the filter has to fetch the unique key for all documents in results, which consumes a lot of resources. Guessing here and going from memory, but... If you have some code like reader.get(doc).get("id") it'll totally barf. Problem here is that to get the id field, it has

Re: Solr 5.5.0 ClassNotFoundException solr.MockTokenizerFactory after DIH setup

2016-03-18 Thread Shawn Heisey
On 3/17/2016 10:39 AM, Victor D'agostino wrote: > I have a java.lang.ClassNotFoundException: solr.MockTokenizerFactory > after a fresh 5.5.0 setup with DIH and a collection named "db". > > The tgz file is from > http://apache.crihan.fr/dist/lucene/solr/5.5.0/solr-5.5.0.tgz > > Any idea why this cla

Re: Solr Wiki - Request to add to contributors group

2016-03-18 Thread Alessandro Benedetti
Shawn, thank you very much ! So, I didn't have an account in the old wiki, can you add me as contributor ? Just created. I will then proceed adding the classification documentation. AlessandroBenedetti benedetti.ale...@gmail.com Cheers On Wed, Mar 16, 2016 at 1:01 AM, Shawn Heisey wrote: > O

Re: indexing pdf files using post tool

2016-03-18 Thread Binoy Dalal
Like Francisco said, use a custom update processor to map the fields the way you want and add it to your update chain. On Wed, 16 Mar 2016, 18:16 Francisco Andrés Fernández, wrote: > Vidya, I don't know if I'm understanding it very well but, I think that the > best way is to parse your text usin

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-18 Thread Shawn Heisey
On 3/16/2016 4:33 AM, Zheng Lin Edwin Yeo wrote: > I found that HMMChineseTokenizer will split a string that consist of > numbers and characters (alphanumeric). For example, if I have a code that > looks like "1a2b3c4d", it will be split to 1 | a | 2 | b | 3 | c | 4 | d > This has caused the search

Re: Solr5 Optimize

2016-03-18 Thread Erick Erickson
First of all, "optimize-like" does _not_ happen "every time a commit happens". What _does_ happen is the current state of the index is examined and if certain conditions are met _then_ segment merges happen. Think of these as "partial optimizes". This is under control of the TieredMergePolicy by d

Re: Making managed schema unmutable correctly?

2016-03-18 Thread Erick Erickson
Well, if using managed schema in SolrCloud, all the updates to the nodes is automatic so it's easier from that perspective. To me, the sweet spot for managed schema is that it lends itself to some kind of front end that allows you to deal with the schema visually, one can envision widgets, pick-li

RE: Why is multiplicative boost prefered over additive?

2016-03-18 Thread jimi.hullegard
On Thursday, March 17, 2016 7:58 PM, wun...@wunderwood.org wrote: > > Think about using popularity as a boost. If one movie has a million rentals > and one has a hundred rentals, there is no additive formula that balances > that with text relevance. Even with log(popularity), it doesn't work. I

Re: Solr 5.5 error at startup - ClassNotFoundException: org.simpleframework.xml.core.Persister

2016-03-18 Thread Shawn Heisey
On 3/17/2016 2:32 PM, Shamik Bandopadhyay wrote: > [2016-03-17 20:23:34,760]ERROR > 9350[coreLoadExecutor-7-thread-1-processing-n:54.176.219.134:8983_solr] - > org.apache.solr.core.CoreContainer.create(CoreContainer.java:827) - Error > creating core [knowledge]: org/simpleframework/xml/core/Persis

Re: Why is multiplicative boost prefered over additive?

2016-03-18 Thread Walter Underwood
That works fine if you have a query that matches things with a wide range of popularities. But that is the easy case. What about the query “twilight”, which matches all the Twilight movies, all of which are popular (millions of views). Or “Lord of the Rings” which only matches movies with hundr

Re: Why is multiplicative boost prefered over additive?

2016-03-18 Thread Walter Underwood
Popularity has a very wide range. Try my example, scale 1 million and 100 into the same 1.0-0.0 range. Even with log popularity. As another poster pointed out, text relevance scores also have a wide range. In practice, I never could get additive boost to work right at Netflix at both ends of th

Re: Boosts for relevancy (shopping products)

2016-03-18 Thread Nick Vasilyev
Tie does quite a bit, without it only the highest weighted field that has the term will be included in relevance score. Tie let's you include the other fields that match as well. On Mar 18, 2016 10:40 AM, "Robert Brown" wrote: > Thanks for the added input. > > I'll certainly look into the machine

Re: Solr:Skip document from indexing when it matches specific value

2016-03-18 Thread Jan Høydahl
Hi No OOTB as I know, but it would be 3 lines to create a custom one, which simply aborts the chain instead of calling super.processAdd(command) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 16. mar. 2016 kl. 12.36 skrev solr2020 : > > Hi, > > How we can ignore

FW: SolrCloud App Unit Testing

2016-03-18 Thread Madhire, Naveen
Hi, I am writing a Solr Application, can anyone please let me know how to Unit test the application? I see we have MiniSolrCloudCluster class available in Solr, but I am confused about how to use that for Unit testing. How should I create a embedded server for unit testing? Thanks, Naveen

RE: Making managed schema unmutable correctly?

2016-03-18 Thread Davis, Daniel (NIH/NLM) [C]
Thanks for saying. I thought as soon as I sent it that my motivation might just be to brag that I know something that long-time Solr folks like you might not. I actually know so very little, not just about how Lucene works, but how to make Solr solve concrete problems beyond the simple. I

Re: indexing pdf files using post tool

2016-03-18 Thread Jan Høydahl
Hi You can look at the Apache Tika project or the PDFBox project to parse your files before sending to Solr. Alternatively, if your processing is very simple, you can use the built-in Tika as U just did, and then deploy some UpdateRequestProcessor’s in order to modify the Tika output into whate

Re: Document Cache

2016-03-18 Thread Rallavagu
So, each soft commit would create a new searcher that would invalidate the old cache? Here is the configuration for Document Cache autowarmCount="0"/> true Thanks On 3/18/16 12:45 AM, Emir Arnautovic wrote: Hi, Your cache will be cleared on soft commits - every two minutes. It seems that i

Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-18 Thread Paul Hoffman
On Tue, Mar 15, 2016 at 07:58:21PM -0600, Shawn Heisey wrote: > On 3/15/2016 2:56 PM, Paul Hoffman wrote: > >> It sure looks like I started Solr from my blacklight project dir. > >> > >> Any ideas? Thanks, > >> > > You may need to get some help from the blacklight project. I've got > absolutely

BYOPW in security.json

2016-03-18 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
When using security.json (in Solr 5.4.1 for instance), is there a recommended method to allow users to change their own passwords? We certainly would not want to grant blanket security-edit to all users; but requiring users to divulge their intended passwords (in Email or by other means) to the

Re: No live SolrServers available to handle this request

2016-03-18 Thread Anil
HI Shawn, Thanks for your response. CDH is a Cloudera (third party) distribution. is there any to get the notifications copy of it when cluster state changed ? in logs ? I can assume that the exception is result of no availability of replicas only. Agree? Regards, Anil On 18 March 2016 at 18:20

stop words as blacklist

2016-03-18 Thread John Blythe
hey all, is there any out of the box way to use your stop words to completely skip a document? if something has X in its description when being indexed i just want to ignore it altogether / when something is searched with X then go ahead and automatically return 0 results. quick context: using so

Re: [nested] how to specify a path for multiple nesting?

2016-03-18 Thread Mikhail Khludnev
Hello, Please find inline On Wed, Mar 16, 2016 at 10:10 PM, Alisa Z. wrote: > Hi all, > I have a deeply multi-level data structure (up to 6-7 levels deep) where > due to the nature of the data some nested documents can have same type > names at various levels. How to form a proper query on a n

RE: Explain score is different from score

2016-03-18 Thread G, Rajesh
Can someone help? Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. This e-mail and/or its attachments are intended only for the use of the ad

Re: Making managed schema unmutable correctly?

2016-03-18 Thread Jay Potharaju
Does using schema API mean that no upconfig to zookeeper and no reloading of all the nodes in my solrcloud? In which scenario should I not use schema API, if any? Thanks Jay On Wed, Mar 16, 2016 at 6:22 PM, Shawn Heisey wrote: > On 3/16/2016 1:14 AM, Alexandre Rafalovitch wrote: > > So, I am loo

No live SolrServers available to handle this request

2016-03-18 Thread Anil
HI, We are using solrcloud with zookeeper and each collection has 5 shareds and 2 replicas. we are seeing "org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request". i dont see any issues with replicas. what would be root cause of the exception ? Than

Re: Solr 4.10 Suggestor

2016-03-18 Thread Alessandro Benedetti
Hi Matt, when you say : " soon looking to move to a different approach (ngrams) : do you mean creating a specific core, with a specific analysis for the fields of interest ? Upgrading Solr is not an option in your condition ? Cheers On Wed, Mar 16, 2016 at 10:05 PM, Matt Kuiper wrote: > Thanks