Re: newbie getting started with solr

2013-11-07 Thread Alexandre Rafalovitch
Tried my book? It should explain that. You can see the collections with examples in GitHub: https://github.com/arafalov/solr-indexing-book/tree/master/published Start from collection1. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandre

Re: Error instantiating a Custom Filter in Solr

2013-11-07 Thread Dileepa Jayakody
Hi Erick, Thanks a lot for the pointer. I looked at the LowerCaseFilterFactory class [1] and it's parent abstract class AbstractAnalysisFactory API [2] , and modified my custom filter factory class as below; public class ContentFilterFactory extends TokenFilterFactory { public ContentFilterFacto

SOLR keyword search with fq queries

2013-11-07 Thread atuldj.jadhav
Hi All,I need your help to find a solution to one of the issue I am facing with the keyword search.We have to provide a keyword search functionality, on our website i.e. searching of a word will get you all the indexed documents where a match is found for that word./ (Not specific to any particular

Re: How to set default values for int fields

2013-11-07 Thread manju16832003
Erick, Thanks for replying :-). If I were to do that, we are trying to set string value to int and Solr throws an error. Oh wait, I guess it works because Solr would automatically parse based on the data type of the field. :-). As I could see from the exception java.lang.NumberFormatException.

Re: Multi-core support for indexing multiple servers

2013-11-07 Thread manju16832003
Eric, Just a question :-), wouldn't it be easy to use DIH to pull data from multiple data sources. I do use DIH to do that comfortably. I have three data sources - MySQL - URLDataSource that returns XML from an .NET application - URLDataSource that connects to an API and return XML Here is par

Re: Jetty 9?

2013-11-07 Thread Shawn Heisey
On 11/7/2013 6:40 PM, Bill Bell wrote: So no Jetty 9 until Solr 5? Java 7 is at rel 40 Is that our commitment to not require Java 7 until Solr 5? Most people are probably already on Java 7... Solr 4.x runs perfectly on Java 6 and has from day one. That doesn't affect you, me, or the like

Re: Jetty 9?

2013-11-07 Thread Bill Bell
So no Jetty 9 until Solr 5? Java 7 is at rel 40 Is that our commitment to not require Java 7 until Solr 5? Most people are probably already on Java 7... Bill Bell Sent from mobile > On Nov 7, 2013, at 1:29 AM, Furkan KAMACI wrote: > > Here is an issue points to that: > https://issues.ap

RE: fq efficiency

2013-11-07 Thread Scott Schneider
Digging a bit more, I think I have answered my own questions. Can someone please say if this sounds right? http://wiki.apache.org/solr/LotsOfCores looks like a pretty good solution. If I give each user his own shard, each query can be run in only one shard. The effect of the filter query wil

Re: Sharding and replicas (Solr Cloud)

2013-11-07 Thread Shawn Heisey
On 11/7/2013 4:34 PM, Software Dev wrote: I too want to be in control of everything that is created. Here is what I'm trying to do. 1) Start up a cluster of 5 Solr Instances 2) Import the configuration to Zookeeper 3) Manually create a collection via the collections api with number of shards an

Re: Sharding and replicas (Solr Cloud)

2013-11-07 Thread Software Dev
I too want to be in control of everything that is created. Here is what I'm trying to do. 1) Start up a cluster of 5 Solr Instances 2) Import the configuration to Zookeeper 3) Manually create a collection via the collections api with number of shards and replication factor Now there are some iss

Re: Sharding and replicas (Solr Cloud)

2013-11-07 Thread Shawn Heisey
On 11/7/2013 2:52 PM, Software Dev wrote: Sorry about the confusion. I meant I created my config via the ZkCLI and then I wanted to create my core via the CollectionsAPI. I *think* I have it working but was wondering why there are a crazy amount of core names under the admin "Core Selector"? Whe

Re: Error instantiating a Custom Filter in Solr

2013-11-07 Thread Erick Erickson
Well, the example you linked to is based on 3.6, and things have changed assuming you're using 4.0. It's probably that your ContentFilter isn't implementing what it needs to or it's not subclassing from the correct class for 4.0. Maybe take a look at something simple like LowerCaseFilterFactory a

Is this a reasonable way to boost?

2013-11-07 Thread Michael Tracey
I'm trying to boost results slightly on a price (not currency) field that are closer to a certain value. I want results that are not too expensive or too inexpensive to be favored. Here is what we currently are trying: bf=sub(1,abs(sub(15,price)))^0.2 where 15 is that "median" I want to boost

Re: Indexing URLs in Solr?

2013-11-07 Thread Erick Erickson
Right, the other thing to be wary of is special characters. It _might_ also have worked to escape the colon since that's a meta-character. Quoting the string should be fine too Best, Erick On Thu, Nov 7, 2013 at 1:07 PM, Jack Park wrote: > Spoke too soon. Hacking rocks! > Finally landed on

Re: Question on Lots Of cores - How do I know it's working

2013-11-07 Thread Erick Erickson
Hmmm, not really, you have to kind of take it on faith I'm afraid. You can check the Solr logs and you should see messages about cores unloading, but that's not very satisfactory. Actually sounds like a JIRA. See SOLR-5430 On Thu, Nov 7, 2013 at 12:43 PM, Vinay B, wrote: > As I understand it,

Re: Inconsistent number of hits returned by two solr instances (from the same index!)

2013-11-07 Thread Roman Chyla
Thanks Michael, haven't tried that yet. Anybody has suggestions on what might be the problem there? SOLR cache? Disk&I/O? Something else..? --roman On Wed, Nov 6, 2013 at 9:41 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > Wow, that's pretty weird. Have you tried tur

Re: Disjuctive Queries (OR queries) and FilterCache

2013-11-07 Thread Patanachai Tangchaisin
Hi Erick, About the size of filter cache, previously we set it to 4,000. After we faced this problem, we changed it to 10,000. Still at size of 10,000 (always full), hitratio was 0.78 and "eviction" was as high as "insertion". About 100% Cpu, yes, it was Solr using it. I profiled an app, it was

RE: fq efficiency

2013-11-07 Thread Scott Schneider
Thanks, that link is very helpful, especially the section, "Leapfrog, anyone?" This actually seems quite slow for my use case. Suppose we have 10,000 users and 1,000,000 documents. We search for "hello" for a particular user and let's assume that the fq set for the user is cached. "hello" is

Re: Sharding and replicas (Solr Cloud)

2013-11-07 Thread Software Dev
Sorry about the confusion. I meant I created my config via the ZkCLI and then I wanted to create my core via the CollectionsAPI. I *think* I have it working but was wondering why there are a crazy amount of core names under the admin "Core Selector"? When I create X amount of shards via the bootst

Re: Huge Response Time

2013-11-07 Thread Raymond Wiker
A few options: 1) Check what the response times are if you return only a small number of fields from the query (e.g, just the "id" field). If the response times improve greatly, you are probably returning some very long fields, and you may be able to drop some of these from the query result. 2

Re: Sharding and replicas (Solr Cloud)

2013-11-07 Thread Shawn Heisey
On 11/7/2013 1:58 PM, Mark wrote: If I create my collection via the ZkCLI (https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities) how do I configure the number of shards and replicas? I was not aware that you could create collections with zkcli. I did not think that was p

Sharding and replicas (Solr Cloud)

2013-11-07 Thread Mark
If I create my collection via the ZkCLI (https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities) how do I configure the number of shards and replicas? Thanks

Re: Function query matching

2013-11-07 Thread Peter Keegan
I'm trying to used a normalized score in a query as I described in a recent thread titled "Re: How to get similarity score between 0 and 1 not relative score" I'm using this query: select?qq={!edismax v='news' qf='title^2 body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$s

Huge Response Time

2013-11-07 Thread vibhoreng04
I have a Solr Cloud setup with 220 million records.They are separated into 2 shards without any replica.I have not changed any caching and every setting is a default one.In one case I have to return get top 5 candidates form the Solr. The response time approximately 50 seconds which is too high

Error instantiating a Custom Filter in Solr

2013-11-07 Thread Dileepa Jayakody
Hi All, I'm a novice in Solr and I'm continuously bumping into problems with my custom filter I'm trying to use for analyzing a fieldType during indexing as below; Below is my custom FilterFactory class; *public class ContentFilterFactory extends TokenFilterFactory {* * publ

Re: Indexing URLs in Solr?

2013-11-07 Thread Jack Park
Spoke too soon. Hacking rocks! Finally landed on this heuristic, and it works: resourceURL:"http://someotherserver.org/"; On Thu, Nov 7, 2013 at 9:52 AM, Jack Park wrote: > Figuring out a google query to gain an answer seems difficult given > the ambiguity; > > I have a field: > > > > into whic

Re: Function query matching

2013-11-07 Thread Jason Hellman
You can, of course, us a function range query: select?q=text:news&fq={!frange l=0 u=100}sum(x,y) http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html This will give you a bit more flexibility to meet your goal. On Nov 7, 2013, at 7:26 AM, Erik Hat

Re: Problem with size of segments

2013-11-07 Thread Jason Hellman
David, I find Mike McCandless’ blog article to be very informative. Give it a go and let us know if you are still seeking clarification: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Jason On Nov 7, 2013, at 5:09 AM, david.dav...@correo.aeat.es wrote: > Hi, >

Indexing URLs in Solr?

2013-11-07 Thread Jack Park
Figuring out a google query to gain an answer seems difficult given the ambiguity; I have a field: into which I store a URL which, when displayed as a result of a query, looks like this in the admin console: "resourceURL": "http://someotherserver.org/";, The query "resourceURL:*" will find a

Question on Lots Of cores - How do I know it's workin

2013-11-07 Thread vybe3142
As I understand it, the "lots of cores" features enables dynamic loading and unloading of cores This is how I set up my solr.xml for a test where I created more cores than the transientCacheSize. Here is a link to the config in case it doesn't format well via this post. https://gist.github.com/ano

Question on Lots Of cores - How do I know it's working

2013-11-07 Thread Vinay B,
As I understand it, the "lots of cores" features enables dynamic loading and unloading of cores This is how I set up my solr.xml for a test where I created more cores than the transientCacheSize. Here is a link to the config in case it doesn't format well via this post. https://gist.github.com/ano

Re: newbie getting started with solr

2013-11-07 Thread Tom Mortimer
Hi Eric, Solr configuration can certainly be confusing at first. And for some time after. :P If you're running start.jar from the example folder (which is fine for testing, and I've known some people to use it for production systems) then the default solr home is example/solr. This contains solr

date range tree

2013-11-07 Thread Andreas Owen
I would like to make a facet on a date field with the following tree: 2013 4.Quartal December November Oktober 3.Quartal September August Juli 2.Quartal June Mai April 1. Quartal March February January 2012 . Same as above So far I have this in solrconfig.xml:

Re: Function query matching

2013-11-07 Thread Erik Hatcher
Function queries score (all) documents, but don't filter them. All documents effectively match a function query. Erik On Nov 7, 2013, at 1:48 PM, Peter Keegan wrote: > Why does this function query return docs that don't match the embedded > query? > select?qq=text:news&q={!func}sum

Re: eDisMax, multiple language support and stopwords

2013-11-07 Thread Tom Mortimer
Ah, thanks Markus. I think I'll just add the Boolean operators to the stopwords list in that case. Tom On 7 November 2013 12:01, Markus Jelsma wrote: > This is an ancient problem. The issue here is your mm-parameter, it gets > confused because for separate fields different amount of tokens ar

newbie getting started with solr

2013-11-07 Thread Palmer, Eric
Sorry if this is obvious (because it isn't for me) I want to build a solr (4.5.1) + nutch (1.7.1) environment. I'm doing this on amazon linux (I may put nutch on a separate server eventually). Please let me know if my thinking is sound or off base in the example folder are a lot of files and f

Problem with size of segments

2013-11-07 Thread david . davila
Hi, I have an index very big, with 337 G more or less. I am using Solr 4.2. The problem we have is related with the size of segments: this is the size of the biggest ones: 324 G, 3.7G, 3.6 G, 1.6 G, 1.6 G, 465 M ... We have LogByteSizeMergePolicy with 10 as MergeFactor in our solrconfig. Reall

Function query matching

2013-11-07 Thread Peter Keegan
Why does this function query return docs that don't match the embedded query? select?qq=text:news&q={!func}sum(query($qq),0)

Re: SolrCloud statistics

2013-11-07 Thread Furkan KAMACI
Hi; I've written a patch to get Statistics from SolrCloud. However my implementation was based on Solrj and after I got feedback from Shalin Shekhar I come up to write a new patch that based on distributed search components. I can add that capability and improve my patch with that. -- Thanks; Fur

Re: Slow Indexing speed for csv files, multi-threaded indexing

2013-11-07 Thread Erick Erickson
Vikram: An experiment I've found useful: Just comment out the server.add() bit and run it. That won't index anything, but if that's also slow then your problem is acquiring the data and you know where to concentrate your efforts. I've seen this be the problem with slow indexing more often than not

Re: SolrCloud never fully recovers after slow disks

2013-11-07 Thread Henrik Ossipoff Hansen
Hey Erick, I have tried upping the timeouts quite a bit now, and have tried upping the zkTimeout setting in Solr itself (I found a few old posts on the mailing list suggesting this). I realise this is a sort of weird situation, where we are actually trying to work around some horrible hardware

Re: Does solr supports Federated search, if not what framework

2013-11-07 Thread Erick Erickson
First, please start a new thread when changing topics, see "thread hijacking" here http://people.apache.org/~hossman/#threadhijack But do be aware that scores are NOT comparable between different queries on the _same_ corpus. A score of .75 on one query has no relation to a score of .75 on another

Re: Disjuctive Queries (OR queries) and FilterCache

2013-11-07 Thread Erick Erickson
Yeah, Solr's fq cache is pretty simple-minded, order matters. There's no good way to improve that except try to write your fq queries in the same order. It's actually quite tricky to disassemble/reassemble arbitrary queries to fix this problem. But in your case, you could write a custom query comp

Re: solrcloud shards backup/restoration

2013-11-07 Thread adfel70
did you solve this eventually? Aditya Sakhuja wrote > How does one recover from an index corruption ? That's what I am trying to > eventually tackle here. > > Thanks > Aditya > > On Thursday, September 19, 2013, Aditya Sakhuja wrote: > >> Hi, >> >> Sorry for the late followup on this. Let me p

Re: SolrCloud statistics

2013-11-07 Thread Erick Erickson
Your servlet container logs often have this number, or your app can easily record them, I don't know of another way to do that. The variant here is that what's actually being reported is "QTime", which is also exclusive of actually gathering up the data to put in the return packet, it's just the r

Re: SolrCloud never fully recovers after slow disks

2013-11-07 Thread Erick Erickson
Right, can you up your ZK timeouts significantly? It sounds like your ZK timeout is short enough that when your system slows down, the timeout is exceeded and it's throwing Solr into a tailspin See zoo.cfg. Best, Erick On Tue, Nov 5, 2013 at 3:33 AM, Henrik Ossipoff Hansen < h...@entertainm

Re: Multi-core support for indexing multiple servers

2013-11-07 Thread Erick Erickson
Rob: What I think you're missing is that you are responsible for pulling the data from your separate sources and pushing it to solr via an update command. You can do this in SolrJ, PHP, or any other package that supports a Solr client. You simply address your requests (both update and query) to th

RE: eDisMax, multiple language support and stopwords

2013-11-07 Thread Markus Jelsma
This is an ancient problem. The issue here is your mm-parameter, it gets confused because for separate fields different amount of tokens are filtered/emitted so it is never going to work just like this. The easiest option is not to use the stopfilter. http://lucene.472066.n3.nabble.com/Dismax-M

Re: How to set default values for int fields

2013-11-07 Thread Erick Erickson
U, put a valid number in your default, not the empty string? LIke default="5" Best, Erick On Thu, Nov 7, 2013 at 2:57 AM, manju16832003 wrote: > How do I set default value for int fields > ex > > multiValued="false" default=""/> > > While indexing lets say if I have not set the value for m

eDisMax, multiple language support and stopwords

2013-11-07 Thread Tom Mortimer
Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords at search time, so that e.g. the query "oscar and wilde" is equivalent to "oscar wilde" (this is with lowercaseOperators=false). Fair enough, I have stopword "and" in the query analyser cha

Re: Solr cloud : Changing properties of alreadt loaded collection

2013-11-07 Thread Erick Erickson
You have several things here. First, changing the number of replicas is easy, just create another node and associate it with a shard of an existing collection. See the shard= param on the solrcloud page when creating nodes. If you don't specify a shard, it'll just be assigned to one of the existin

Block join query

2013-11-07 Thread danost
Sorry about the reposts here, but I can't seem to get on the mailing list... Hi I've been trying to play around with block join queries in the Solr 4.5 release and I was wondering if anyone else has any experience doing this? Basically I'm trying to create a parent->child->grandchild structure a

Re: Help to find BaseTokenFilterFactory to write a Custom TokenFilter

2013-11-07 Thread Dileepa Jayakody
Hi All, When following the above tutorial [1], to write a custom FilterFactory, I had to extend TokenFilterFactory instead of BaseTokenFilterFactory as per the API change in new lucene-analyzer-common library. Below is my custom TokenFilterFactory class: public class ContentFilterFactory extends

SolrCloud keeps repeating exception 'SolrCoreState already closed'

2013-11-07 Thread Eric Bus
Hi, I'm having a problem with one of my shards. Since yesterday, SOLR keeps repeating the same exception over and over for this shard. The webinterface for this SOLR instance is also not working (it hangs on the Loading indicator). Nov 7, 2013 9:08:12 AM org.apache.solr.update.processor.LogUpda

Re: Help to find BaseTokenFilterFactory to write a Custom TokenFilter

2013-11-07 Thread Dileepa Jayakody
Thanks Anuj, The jar containing the class can be found here : http://www.java2s.com/Code/JarDownload/lucene/lucene-analyzers-common-4.2.0.jar.zip On Thu, Nov 7, 2013 at 2:18 PM, Anuj Kumar wrote: > > http://stackoverflow.com/questions/13149627/where-did-basetokenfilterfactory-go-in-solr-4-0 > >

Re: Multi-core support for indexing multiple servers

2013-11-07 Thread manju16832003
Hi Rob, mlti-core approach is different. You could have two cares lets say marketing-core [Has its own schema.xml and data-config.xml] magento-core [Has its own schema.xml and data-config.xml] each core have their own schema.xml and data-config.xml If you go by multi-core approach I guess you won'

Re: Help to find BaseTokenFilterFactory to write a Custom TokenFilter

2013-11-07 Thread Anuj Kumar
http://stackoverflow.com/questions/13149627/where-did-basetokenfilterfactory-go-in-solr-4-0 On Thu, Nov 7, 2013 at 1:05 PM, Dileepa Jayakody wrote: > Hi All, > > I am writing a custom TokenFilter to post a token value to Apache Stanbol > for enhancement. In this Custom TokenFilter I'm trying to

Re: Jetty 9?

2013-11-07 Thread Furkan KAMACI
Here is an issue points to that: https://issues.apache.org/jira/browse/SOLR-4839 2013/11/7 William Bell > When are we moving Solr to Jetty 9? > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 >