Re: Retrieving Documents

2011-12-16 Thread Otis Gospodnetic
Hi Dan, 1) Are you looking for http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ? 2) Hundreds of words in a field should not be a problem for highlighting.  But it sounds like this long field may contain content that corresponds to N different pages in a publication and you would

Re: Looking for a good Text on Solr

2011-12-16 Thread Shiv Deepak
Hey Brendan, Hey Hector, That was very helpful. :) Thanks, Shiv Deepak On 17-Dec-2011, at 07:52 , Hector Castro wrote: > Hi Shiv, > > For me, a combination of the following has helped me learn a lot about Solr > in a short period of time: > > * Apache Solr 3 Enterprise Search Server: > ht

Re: Looking for a good Text on Solr

2011-12-16 Thread Hector Castro
Hi Shiv, For me, a combination of the following has helped me learn a lot about Solr in a short period of time: * Apache Solr 3 Enterprise Search Server: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book * Solr Wiki: http://wiki.apache.org/solr/ * Pretty much every single pos

Re: Looking for a good Text on Solr

2011-12-16 Thread Brendan Grainger
There is an update to that book for Solr 3: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book I actually bought it recently, but haven't looked at it yet. Good luck. Brendan On Dec 16, 2011, at 9:01 PM, Shiv Deepak wrote: > I am looking for a good book to read from and get a

Looking for a good Text on Solr

2011-12-16 Thread Shiv Deepak
I am looking for a good book to read from and get a better understanding of solr. On amazon, all the books on Solr have average rating (which I supposed no one tried them or bothered to post a review) but this one: "Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh" has a pretty dec

RE: Call RequestHandler from QueryComponent

2011-12-16 Thread Vazquez, Maria (STM)
I am very very sorry. My mail client was not working from work and it looked like it was not being delivered, that's why I tried a few times. Sorry everybody! -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Friday, December 16, 2011 3:23 PM To: solr-user

Re: Poor performance on distributed search

2011-12-16 Thread Erick Erickson
Right, are you falling afoul of the recursive shard thing? That is, if you shards point back to itself. As far as I understand, your shards parameter in your request handler shouldn't point back to itself But I'm guessing here. Best Erick On Fri, Dec 16, 2011 at 4:27 PM, ku3ia wrote: >>> OK

Re: how to setup to archive expired documents?

2011-12-16 Thread Chris Hostetter
: So if we use some sort of weekly or daily sharding, there needs to be : some mechanism in place to dynamically add the new shard when the : current one fills up. (Which would also ideally know where to put the : new shards on what server, etc.) Since SOLR does not implement that I : was thi

Call RequestHandler from QueryComponent

2011-12-16 Thread marita
Hi! I have a solrconfig.xml like: all 0 10 ABC score desc,rating asc CUSTOM FQ 2.2 CUSTOM FL validate CUSTOM ABC QUERY COMPONENT stats debug all 0 1 XYZ

Re: Core overhead

2011-12-16 Thread Chris Hostetter
: The list would be unreadable if everyone spammed at the bottom their : email like Otis'. It's just bad form. If you'd like to debate project policy on what is/isn't acceptible on any of the Lucene mailing lists, please start a new thread on general@lucene (the list that exists precisely for

Re: NRT or similar for Solr 3.5?

2011-12-16 Thread Steven Ou
Hey Vikram, I finally got around to getting Solr-RA installed but I'm having trouble getting the NRT to work. Could you help me out? I added these four lines immediately after in solrconfig.xml: true rankingalgorithm true rankingalgorithm Is that correct? I also read something about

Re: Call RequestHandler from QueryComponent

2011-12-16 Thread Chris Hostetter
Maria: sending the same email 4 times in less the 48 hours isn't really a good way to encourange people to help you -- it just means more total mail people have to wade thorugh which slows them down and makes them less likeely to want to help. : In ABC QUERY COMPONENT, I customize prepare() an

Re: r1201855 broke stats.facet on long fields

2011-12-16 Thread Chris Hostetter
Wow ... either i'm a huge idiot and everyone has just been really polite about it in most threads, or something about this thread in particular made me really stupid. (Luis: i'm sorry for all the things i have said so far in this email thread that were a complete waste of your time - hopefully

Call RequestHandler from QueryComponent

2011-12-16 Thread Vazquez, Maria (STM)
Hi! I have a solrconfig.xml like: all 0 10 ABC score desc,rating asc CUSTOM FQ 2.2 CUSTOM FL validate CUSTOM ABC QUERY COMPONENT stats debug all 0 1 XYZ

Retrieving Documents

2011-12-16 Thread Dan McGinn-Combs
I've been doing a fair amount of reading and experimenting with Solr lately. I find that it does a good job of indexing very structured documents. However, the application I have in mind is build around long EPUB documents. Of course, I found the Extract components useful for indexing the EPUBs. H

Re: Poor performance on distributed search

2011-12-16 Thread ku3ia
>> OK, so your speed differences are pretty much dependent upon whether you specify >> rows=2000 or rows=10, right? Why do you need 2,000 rows? Yes, big difference is 10 v. 2K records. Limit of 2K rows is setted by manager and I can't decrease it. It is a minimum row count needed to process data.

Re: Core overhead

2011-12-16 Thread Ted Dunning
We still disagree. On Fri, Dec 16, 2011 at 12:29 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Ted, > > The list would be unreadable if everyone spammed at the bottom their > email like Otis'. It's just bad form. > > Jason > > On Fri, Dec 16, 2011 at 12:00 PM, Ted Dunning > wrote:

Re: Core overhead

2011-12-16 Thread Jason Rutherglen
Ted, The list would be unreadable if everyone spammed at the bottom their email like Otis'. It's just bad form. Jason On Fri, Dec 16, 2011 at 12:00 PM, Ted Dunning wrote: > Sounds like we disagree. > > On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: >

Re: SolrCloud Cores

2011-12-16 Thread Mark Miller
On Fri, Dec 16, 2011 at 8:14 AM, Jamie Johnson wrote: > What is the most appropriate way to configure Solr when deploying in a > cloud environment? Should the core name on all instances be the > collection name or is it more appropriate that each shard be a > separate core, or should each solr i

Re: Possible to facet across two indices, or document types in single index?

2011-12-16 Thread Chris Hostetter
: Chris, you replied: : : > : But there is a workaround: : > : 1) Do a normal query without facets (you only need to request doc ids : > : at this point) : > : 2) Collect all the IDs of the documents returned : > : 3) Do a second query for all fields and facets, adding a filter to : > : restrict

Re: Poor performance on distributed search

2011-12-16 Thread Erick Erickson
OK, so your speed differences are pretty much dependent upon whether you specify rows=2000 or rows=10, right? Why do you need 2,000 rows? Or is the root question why there's such a difference when you specify qt=requestShards? In which case I'm curious to see that request handler definition... Be

Re: Core overhead

2011-12-16 Thread Ted Dunning
Sounds like we disagree. On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Ted, > > "...- FREE!" is stupid idiot spam. It's annoying and not suitable. > > On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning > wrote: > > I thought it was slightly clumsy, but it

Re: Core overhead

2011-12-16 Thread Jason Rutherglen
Ted, "...- FREE!" is stupid idiot spam. It's annoying and not suitable. On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning wrote: > I thought it was slightly clumsy, but it was informative.  It seemed like a > fine thing to say.  Effectively it was "I/we have developed a tool that > will help you so

Re: updates to runbot.sh script

2011-12-16 Thread Christopher Gross
Ha, sorry Hoss. Thought i hit user@nutch, gmail did the replace and I wasn't paying attention. -- Chris On Fri, Dec 16, 2011 at 2:46 PM, Chris Hostetter wrote: > > : http://wiki.apache.org/nutch/Crawl > : > : This script no longer works.  See: > > If you have a question about something on the

Re: updates to runbot.sh script

2011-12-16 Thread Chris Hostetter
: http://wiki.apache.org/nutch/Crawl : : This script no longer works. See: If you have a question about something on the nutch wiki, or included in the nutch release, i would suggest you email the nutch user list. -Hoss

Re: Core overhead

2011-12-16 Thread Ted Dunning
I thought it was slightly clumsy, but it was informative. It seemed like a fine thing to say. Effectively it was "I/we have developed a tool that will help you solve your problem". That is responsive to the OP and it is clear that it is a commercial deal. On Fri, Dec 16, 2011 at 10:02 AM, Jason

Re: Poor performance on distributed search

2011-12-16 Thread ku3ia
Hi, Erick, thanks for your reply Yeah, you are right - document cache is default, but I tried to decrease and increase values but I didn't get the desired result. I tried the tests. Here are results: >>1> try with "&rows=10" successfully started at 19:48:34 Queries interval is: 10 queries per mi

updates to runbot.sh script

2011-12-16 Thread Christopher Gross
http://wiki.apache.org/nutch/Crawl This script no longer works. See: echo "- Index (Step 5 of $steps) -" $NUTCH_HOME/bin/nutch index crawl/NEWindexes crawl/crawldb crawl/linkdb \ crawl/segments/* The "index" call doesn't existso what does this line get replaced with? Is there an

Re: Core overhead

2011-12-16 Thread Jason Rutherglen
Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic wrote: > Hi Yury, > > Not sure if this was already covered in this thread, but with N smaller cores > on a single N-CPU-core box you could run N queries in parallel over small

Re: Announcement of Soldash - a dashboard for multiple Solr instances

2011-12-16 Thread Otis Gospodnetic
Nice! May be good to upload some screenshots there... Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - > From: Alexander Valet | edelight > To: solr-user@lucene.apache.org > Cc: > Sent: Thursday, De

Re: edismax doesn't obey 'pf' parameter

2011-12-16 Thread Erick Erickson
That was a little confusing! " there's always exactly one token at position 0." Of course. What I meant to say was there is always exactly one token in a non-tokenized field and it's offset is always exactly 0. There will never be tokens at position 1. So asking to match phrases, which is based

Re: Solr Version Upgrade issue

2011-12-16 Thread Erick Erickson
Please start another thread and provide some details, there's not enough information here to say anything. You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, Dec 15, 2011 at 11:50 PM, Pawan Darira wrote: > Thanks. I re-started from scratch & at least things have s

Re: edismax doesn't obey 'pf' parameter

2011-12-16 Thread Erick Erickson
A side note: specifying qt and defType on the same query is probably not what you intend. I'd just omit the qt bit since you're essentially passing all the info you intend explicitly... I see the same behavior when I specify a non-tokenized field in 3.5 But I don't think this is a bug since it do

Re: how to setup to archive expired documents?

2011-12-16 Thread Robert Stewart
We actually have a system that uses weekly shards but that is all .NET (Lucene.NET) and has lots of code to manage adding new indexes. We want to move to SOLR for performance and maintenance reasons. So if we use some sort of weekly or daily sharding, there needs to be some mechanism in plac

Re: Poor performance on distributed search

2011-12-16 Thread Erick Erickson
The thing that jumps out at me is "&rows=2000". If you documentCache in solrconfig.xml is still the defaults, it only holds 512. So you're running all over your disk gathering up the fields to return, especially since you also specified "fl=*,score". And if you have large fields stored, you're doin

Re: Solr AutoComplete - Address Search

2011-12-16 Thread Vijay Sampath
Just to add to it, I'm using Suggester component to implement Auto Complete http://wiki.apache.org/solr/Suggester -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-AutoComplete-Address-Search-tp3590112p3592017.html Sent from the Solr - User mailing list archive at Nabble.c

Re: how to setup to archive expired documents?

2011-12-16 Thread Otis Gospodnetic
Hi, We've done a fair number of such things over the years. :) If daily shards don't work for you, why not weekly or monthly? Have a look at Zoie's Hourglass concept/code. Some Solr alternatives are currently better suited to handle this sort of setup... Otis  Performance Monitoring SaaS fo

Re: Lock obtain timed out

2011-12-16 Thread Otis Gospodnetic
Hi, >I'm using 3.2 because I can't get velocity to run on 3.5. Maybe this is worth asking about in a separate thread or maybe you already did that. >I've changed my writeLockTimeout from 1000 to 1, and my >commitLockTimeout from 1 to 5 > >Running on a large ec2 box, which has

Re: Core overhead

2011-12-16 Thread Otis Gospodnetic
Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests t

Re: Lock obtain timed out

2011-12-16 Thread Eric Tang
Hi Otis, I'm using 3.2 because I can't get velocity to run on 3.5. I've changed my writeLockTimeout from 1000 to 1, and my commitLockTimeout from 1 to 5 Running on a large ec2 box, which has 2 virtual cores. I don't know how to find out the # of concurrent indexer threads. Is that

Re: Core overhead

2011-12-16 Thread Otis Gospodnetic
Hi, I used to think this, too, but have learned this not to be entirely true.  We had a customer with a query rate of a few hundred QPS and 32 or 64 GB RAM (don't recall which any more) and a pretty large JVM heap.  Most queries were very fast, but once in a while a query would be very slow.  G

Re: Replication file become very very big

2011-12-16 Thread Otis Gospodnetic
Hi, Hm, I don't know what this could be caused by.  But if you want to get rid of it, remote that Linux server our of the load balancer pool, stop Solr, remove the index, and restart Solr.  Then force replication and put the server back in the load balancer pool. If you use SPM (see link in my

Re: Lock obtain timed out

2011-12-16 Thread Otis Gospodnetic
Hi Eric, And you are using the latest version of Solr, 3.5.0? What is the timeout in solrconfig.xml? How many CPU cores does the machine have and how many concurrent indexer threads do you have running? Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-m

Re: disable stemming on query parser.

2011-12-16 Thread Dmitry Kan
You can disable stemming in a copy field. So you need to define one field with your input data on which stemming will be done and the other field (copy field), on which stemming will not be done. Then on the client you can decide which field to search against. Dmitry On Fri, Dec 16, 2011 at 2:00

Lock obtain timed out

2011-12-16 Thread Eric Tang
Hi, I'm doing a lot reads and writes into a single solr server (on the magnitude of 50ish per second), and have around 300,000 documents in the index. Now every 5 minutes I get this exception: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/da

Re: How to disable Auto Commit and Auto optimize operation after addition of few documents through dataimport handler

2011-12-16 Thread Shawn Heisey
On 12/16/2011 5:57 AM, mechravi25 wrote: I would like to know how can we disable the commit and optimize operation is called by deafult after addition of few documents through dataimport handlers. Add this to the url you use to call the handler: &commit=false&optimize=false Thanks, Shawn

SolrCloud Cores

2011-12-16 Thread Jamie Johnson
What is the most appropriate way to configure Solr when deploying in a cloud environment? Should the core name on all instances be the collection name or is it more appropriate that each shard be a separate core, or should each solr instance be a separate core (i.e. master1, master1-replica are 2

How to disable Auto Commit and Auto optimize operation after addition of few documents through dataimport handler

2011-12-16 Thread mechravi25
Hi, I would like to know how can we disable the commit and optimize operation is called by deafult after addition of few documents through dataimport handlers. In our application, the master solr instance is used for indexing purpose and the slave solr is for user search request. Hence the replic

Re: Solr Optimization Fail

2011-12-16 Thread Rajani Maski
Oh, yes on windows, using java 1.6 and Solr 1.4.1. Ok let me try that one... Thank you so much. Regards, Rajani 2011/12/16 Tomás Fernández Löbbe > Are you on Windows? There is a JVM bug that makes Solr keep the old files, > even if they are not used anymore. The files are going to be eventu

Re: Solr Optimization Fail

2011-12-16 Thread Tomás Fernández Löbbe
Are you on Windows? There is a JVM bug that makes Solr keep the old files, even if they are not used anymore. The files are going to be eventually removed, but if you want them out of there immediately try optimizing twice, the second optimize doesn't do much but it will remove the old files. On F

full-data import suddenly stopped working. Total Rows Fetched remains 0

2011-12-16 Thread PeterKerk
My full-data import stopped working all of a sudden. Afaik I have not made any changes that would cause this. The response is: 0 0 wedding-data-config.xml full-import busy A command is still running... 0:6:4.112 1 0 0 0 2011-12-16 13:12:29 This response format is experimental. It is li

Re: Solr Optimization Fail

2011-12-16 Thread Rajani Maski
These parameters are commented in my solr config.xml see the parameters attached. When i do optimize on index of size 400 mb , it reduces the size of data folder to 200 mb. But when data is huge it doubles it. Why is that so? Optimization : Actually should reduce the size of the dat

RE: Solr Optimization Fail

2011-12-16 Thread Juan Pablo Mora
Maybe you are generating a snapshot of your index attached to the optimize ??? Look for post-commit or post-optimize events in your solr-config.xml De: Rajani Maski [rajinima...@gmail.com] Enviado el: viernes, 16 de diciembre de 2011 11:11 Para: solr-user@l

disable stemming on query parser.

2011-12-16 Thread meghana
Hi All, I am using Stemming in my solr , but i don't want to apply stemming always for each search request. i am thinking of to disable stemming on one specific query parser , can i do this? Any help much appreciated. Thanks in Advance -- View this message in context: http://lucene.472066.n3.

Solr Optimization Fail

2011-12-16 Thread Rajani Maski
Hi, When we do optimize, it actually reduces the data size right? I have index of size 6gb(5 million documents). Index is already created with commits for every 1 documents. Now I was trying to do optimization with http optimize command. When i did that, data size became - 12gb. Why th