queryResultCache's size is not increasing

2014-10-07 Thread Lee Chunki
Hi, I am running Solr 4.1.0 and trying to use queryResultCache but “size” value at admin page is extremely smaller than incoming queries. I want to know why. settings and status are as fallow : * setting - solrconfig.xml * status - for 13 hours - # of requests : 6,711,920 - # of uniq

RE: Using CachedSqlEntityProcessor with delta imports in DIH

2014-10-07 Thread stockii
hey. are sending the cacheImpl in your request? or where are defining it? cacheImpl="${cache.impl}" if i let this string blank, import fails =( -- View this message in context: http://lucene.472066.n3.nabble.com/Using-CachedSqlEntityProcessor-with-delta-imports-in-DIH-tp4091620p4163106.html

Re: queryResultCache's size is not increasing

2014-10-07 Thread Yonik Seeley
It's your "full-import" every 5 minutes. A queryResultCache will be invalidated by changes to the index (i.e. a commit) and the size will drop back to 0. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data On Tue, Oct 7, 2014 at 4:53 AM, Lee Chunki wr

Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-07 Thread Thomas Michael Engelke
I have a problem with a stemmed german field. The field definition: stored="true" required="false" multiValued="false"/> ... positionIncrementGap="100" autoGeneratePhraseQueries="true"> words="stopwords.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateN

Advise on an architecture with lot of cores

2014-10-07 Thread Manoj Bharadwaj
Hi folks, My team inherited a SOLR setup with an architecture that has a core for every customer. We have a few different types of cores, say "A", "B", C", and for each one of this there is a core per customer - namely "A1", "A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know the

Re: Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-07 Thread Alexandre Rafalovitch
On 7 October 2014 08:25, Thomas Michael Engelke wrote: > So the culprit is the asterisk at the end. As far as we can read from the > docs, an asterisk is just 0 or more characters, which means that the literal > word in front of the asterisk should match the query. Not quite: http://wiki.apache.o

Re: Advise on an architecture with lot of cores

2014-10-07 Thread Toke Eskildsen
On Tue, 2014-10-07 at 14:27 +0200, Manoj Bharadwaj wrote: > My team inherited a SOLR setup with an architecture that has a core for > every customer. We have a few different types of cores, say "A", "B", C", > and for each one of this there is a core per customer - namely "A1", > "A2"..., "B1", "B2

Re: Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-07 Thread Markus Jelsma
Hi - you should not use wild cards for autocompletion, Lucene has far better tools for making very good autocompletion, also, since a wild card is a multi term query, they are not passed through your configured query time analyzer. Some other comments: - you use a porter stemmer but you should u

Re: Advise on an architecture with lot of cores

2014-10-07 Thread Manoj Bharadwaj
Hi Toke, Thank you for your insights. > Why do you want to collapse the cores? > Most of the cores are small and a few big ones make up the bulk. Our thinking was that it would be as easy to just have one core. Monitoring becomes easy as well (we are using a monitoring tool in which there is a

Re: Advise on an architecture with lot of cores

2014-10-07 Thread Jack Krupansky
You'll have to do a proof of concept test to determine how many collections Solr/SolrCloud can handle. With a very large number of customers you may have to do sharding of the clusters themselves - limit each cluster to however many customers/colllections work well (100? 250?) and then have se

Re: Advise on an architecture with lot of cores

2014-10-07 Thread youknow...@heroicefforts.net
"On the other hand, it [sic] most of the cores are idle most of the time, the 1 core/customer setup would be give better utilization of the hardware." This is an important point. I've seen performance go to hell when 10M, 100M, and 1B cloud collections were consolidated in a hardware constrained

Re: Advise on an architecture with lot of cores

2014-10-07 Thread Manoj Bharadwaj
Yes, we have plan to eventually have to shard the clusters - that will go hand in hand with how rest of the system gets partitioned as well (swim lanes). The other considerations for these lanes will be geo location etc (in a AWS context, zones in east coast will be used for swim lanes that cater t

Re: Advise on an architecture with lot of cores

2014-10-07 Thread Manoj Bharadwaj
Hi Toke, I don't think I answered your question properly. With the current 1 core/customer setup many cores are idle. The redesign we are working on will move most of our searches to being driven by SOLR vs database (current split is 90% database, 10% solr). With that change, all cores will see t

Re: Help with a slow filter query

2014-10-07 Thread heaven
The syntax for frange query parser is weird: {!frange cache=false cost=200 l=2001-01-01T00:00:00Z u=2013-12-01T23:59:59Z}nominated_at_d or simply {!frange cache=false cost=200 u=2013-12-01T23:59:59Z}nominated_at_d And I don't see any docs in Solr wiki explaining this syntax, these examples above I

Re: dismax query does not match with additional field in qf

2014-10-07 Thread Andreas Hubold
Andreas Hubold wrote on 09/30/2014 05:14 PM: I ran into a problem with the Solr dismax query parser. ... I'd expect that an additional field in the qf parameter would not lead to fewer matches. Okay, the above example is a rather crude test but I'd like to understand it. Is this a bug in Solr?

Having an issue with pivot faceting

2014-10-07 Thread cwhi
I'm having an issue getting pivot faceting working as expected. I'm trying to filter by a specific criteria, and then first facet by one of my document attributes called item_generator, then facet those results into 2 sets each: the first set is the count of documents satisfying that facet with nu

Re: dismax query does not match with additional field in qf

2014-10-07 Thread Jack Krupansky
I think what is happening is that your last term, the naked apostrophe is analyzing to zero terms and simply being ignored, but when you add the extra field, a string field, you now have another term in the query, and you have mm set to 100%, so that "new" term must match. It probably fails beca

SOLR query - restrict access to user documents

2014-10-07 Thread Nitin Agarwal
Hi, I have a question around SOLR query, I am trying to restrict access to SOLR data. We are running SOLR 4.7.1, and wish to expose the query capabilities to our customers for the data that belongs to them. Specifically "/select", with default configuration is the only Request Handler that custome

Exact matches are not coming on Top for autocomplete search results.

2014-10-07 Thread Shobhit
Hi, I am trying to get the exact search term coming at top in Solr search results, but solr is not returning the exact matching records on top, instead exact matching terms are coming somewhere down after few records. For example : I am searching for term "New York World", and I want the New Yor

Re: dismax query does not match with additional field in qf

2014-10-07 Thread Andreas Hubold
Okay, sounds reasonable. However I didn't expect this when reading the documentation of the dismax query parser. Especially the need to escape special characters (and which ones) was not clear to me as the dismax query parser "is designed to process simple phrases (without complex syntax) ente

Re: data import handler clarifications/ pros and cons.

2014-10-07 Thread Durga Palamakula
There is a built in scheduling @ http://wiki.apache.org/solr/DataImportHandler#Scheduling But as others have mentioned cron is the simplest. On Mon, Oct 6, 2014 at 8:56 PM, Karunakar Reddy wrote: > Thanks Shawn and Gora for your suggestions. > @Gora sounds good. I am just getting clarity over

Re: dismax query does not match with additional field in qf

2014-10-07 Thread Jack Krupansky
Your query term seems particularly inappropriate for dismax - think simple keyword queries. Also, don't confuse dismax and edismax - maybe you want the latter. The former is for... simple keyword queries. I'm still not sure what your actual use case really is. In particular, are you trying t

Re: Help with a slow filter query

2014-10-07 Thread Mikhail Khludnev
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser http://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html re "this syntax" https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries O

Best way to index wordpress blogs in solr

2014-10-07 Thread Vishal Sharma
Hi, I am trying to get some help on finding out if there is any best practice to index wordpress blogs in solr index? Can someone help with architecture I shoudl be setting up? Do, I need to write separate scripts to crawl wordpress and then pump posts back to Solr using its API? *Vishal Shar

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Alexandre Rafalovitch
On 7 October 2014 14:08, Vishal Sharma wrote: > Hi, > > I am trying to get some help on finding out if there is any best practice > to index wordpress blogs in solr index? Can someone help with architecture > I shoudl be setting up? > > Do, I need to write separate scripts to crawl wordpress and t

Re: Exact matches are not coming on Top for autocomplete search results.

2014-10-07 Thread Ahmet Arslan
Hi, You can add an optional (phrase) clause to boost exact matches. If you are using (e)dismay bq=typeahead:"New York World"^50 would do the trick. Ahmet On Tuesday, October 7, 2014 6:55 PM, Shobhit wrote: Hi, I am trying to get the exact search term coming at top in Solr search results, b

Re: data import handler clarifications/ pros and cons.

2014-10-07 Thread Ahmet Arslan
Hi Durga, That wiki talks about an uncommitted code. So it is not built in. Ahmet On Tuesday, October 7, 2014 7:17 PM, Durga Palamakula wrote: There is a built in scheduling @ http://wiki.apache.org/solr/DataImportHandler#Scheduling But as others have mentioned cron is the simplest. On

Re: data import handler clarifications/ pros and cons.

2014-10-07 Thread Gora Mohanty
On 8 October 2014 01:00, Ahmet Arslan wrote: > > > > Hi Durga, > > That wiki talks about an uncommitted code. So it is not built in. Maybe it is just me, but given that there are existing scheduling solutions in most operating systems, I fail to understand why people expect Solr to expand to incl

[ANNOUNCE] Luke 4.10.1 released

2014-10-07 Thread Dmitry Kan
Hello, Luke 4.10.1 has been released. Download it here: https://github.com/DmitryKey/luke/releases/tag/luke-4.10.1 The release has been tested against the solr-4.10.1 based index. Changes: https://github.com/DmitryKey/luke/issues/5 https://github.com/DmitryKey/luke/issues/6 Remember to pass th

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Vishal Sharma
Hey Alex, Thanks for the prompt response. Here is what I am trying to solve: I am showing search results from content coming from 3 different places on a single site. And, I have done that by pumping all this content to Solr server running on single flat schema by using different APIs of these pl

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Alexandre Rafalovitch
I am pretty sure Swift is not Solr. That's why I was asking whether you were starting from scratch. As to the other items, please re-read my original response. Solr has an example reading in RSS feeds, you could probably use that. Or a generic XML using DataImportHandler's mapping. Or directly fro

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Vishal Sharma
Makes sense. I'll just dive in now. Thanks so much. *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn] [image: Description: Twitter]

Re: SOLR query - restrict access to user documents

2014-10-07 Thread Jorge Luis Betancourt Gonzalez
I see you’re defining a default value for “rows” this could be overridden on the request, and requesting a lot of documents from solr can stress out your server/cluster, of course if the client in question has that many documents. if this is a fixed value and the clients can’t request more docum

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Vishal Sharma
Hey Alex, Do you have a fair comparison of Solr and Swift type you have read somewhere or from your past experience of using them. I would want to use that before I start building everything from scratch in my future implementations. *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754 E:

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Jorge Luis Betancourt Gonzalez
If you’re talking about a generic web crawl you could use something like Nutch [1] keep in mind that his a full web crawler and it does a pretty good job. I’ve been using it for over more than 2 years now and I’m very happy, although I don’t crawl just a couple of sites but a more wide spectrum

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Alexandre Rafalovitch
I have not used Swift before. I just heard of it. Sorry. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 7 October 2014 17:

Re: SOLR query - restrict access to user documents

2014-10-07 Thread Nitin Agarwal
Thanks for the info Jorge, I will look into invariants, good pointer. My API, forces the rows to be a max of 500. If the user specifies more than 500 docs, then we modify the rows param to be 500. On Tue, Oct 7, 2014 at 3:31 PM, Jorge Luis Betancourt Gonzalez < jlbetanco...@uci.cu> wrote: > I se

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Vishal Sharma
Ok not a problem. Thanks anyways. *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn] [image: Description: Twitter] [image: fbook]

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Vishal Sharma
Hey Jorge, I guess Nutch can help me. Thanks for this. I am sure I should be able to configure it to crawl only specific portions of the site. *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn]

Re: SOLR query - restrict access to user documents

2014-10-07 Thread Ahmet Arslan
How about using a fq in appends section of solrconfig.xml? {!term f=customerNumber v=$qq} And your query string will be : /select?q=&qq=123 https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries Ahmet On Wednesday, October 8, 2014 1:40 AM, Nitin Agarwal <2nitinagar

Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Ahmet Arslan
Hi Vishal, If you find Nutch heavy-weight, consider using http://manifoldcf.apache.org Ahmet On Wednesday, October 8, 2014 1:54 AM, Vishal Sharma wrote: Hey Jorge, I guess Nutch can help me. Thanks for this. I am sure I should be able to configure it to crawl only specific portions of the si

Re: Exact matches are not coming on Top for autocomplete search results.

2014-10-07 Thread Erick Erickson
bq: bq=typeahead:"New York World"^50 would do the trick. I don't think so. All the examples have that phrase in them, so the boost query would apply to all. But Shobhit can do something similar with a copyField to a field that, say, was a keyword tokenizer followed by a lower case filter and put

Re: Does CloudSolrServer hit zookeeper for every request?

2014-10-07 Thread Shawn Heisey
On 10/6/2014 12:15 PM, amitha wrote: > Yeah I saw that does if I call shutdown after a request and need to make > another request I get > org.apache.solr.client.solrj.SolrServerException: > java.lang.IllegalStateException: Connection pool shut down You would only call shutdown if you're done with

Re: Exact matches are not coming on Top for autocomplete search results.

2014-10-07 Thread Ahmet Arslan
Hi Erick, Yup you are correct. Sorry, I missed that all examples have that phrase. Ahmet On Wednesday, October 8, 2014 6:35 AM, Erick Erickson wrote: bq: bq=typeahead:"New York World"^50 would do the trick. I don't think so. All the examples have that phrase in them, so the boost query woul

Re: Does CloudSolrServer hit zookeeper for every request?

2014-10-07 Thread Shalin Shekhar Mangar
CloudSolrServer doesn't hit ZooKeeper for every request. I am guessing that you are creating a new CloudSolrServer object per request. Don't do that. Create just one CloudSolrServer and re-use it for all requests. It is a thread-safe object. On Mon, Oct 6, 2014 at 8:58 PM, Jonnakuti, Vijayalatha <

eDisMax parser and special characters

2014-10-07 Thread Lanke,Aniruddha
We are using a eDisMax parser in our configuration. When we search using the query term that has a ‘-‘ we don’t get any results back. Search term: red - yellow This doesn’t return any data back but Search term: red yellow Will give back result ‘red - yellow’ How does eDisMax treat special chara

eDisMax parser and special characters

2014-10-07 Thread Lanke,Aniruddha
We are using a eDisMax parser in our configuration. When we search using the query term that has a ‘-‘ we don’t get any results back. Search term: red - yellow This doesn’t return any data back but Search term: red yellow Will give back result ‘red - yellow’ How does eDisMax treat special chara

solr suggester not working with shards

2014-10-07 Thread rsi...@ambrac.nl
I try to use the suggest component (solr 4.6) with multiple cores. I added a search component and a request handler in my solrconfig. That works fine for 1 core but querying my solr instance with the shards parameter does not work. suggestDictionary org.apache.solr.spelling.

Re: Search multiple values with wildcards

2014-10-07 Thread J'roo
Hi Jack, Ahmet, Thanks for your tips! In the end I found this the best way to do it: q=proprietaryMessage_tis:(25++23456*++32A++130202US*) All the best -- View this message in context: http://lucene.472066.n3.nabble.com/Search-multiple-values-with-wildcards-tp4161916p4163263.html Sent from