Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 2:34 PM, Peter Spam wrote: > This is a very small number of documents (7000), so I am surprised Solr is > having such a hard time with it!! > > I do facet on 3 terms. > > Subsequent "hello" searches are faster, but still well over a second.  This > is a very fast Mac Pro,

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Lance Norskog
How much disk space is used by the index? If you run the Lucene CheckIndex program, how many terms etc. does it report? When you do the first facet query, how much does the memory in use grow? Are you storing the text fields, or only indexing? Do you fetch the facets only, or do you also fetch t

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Peter Spam
This is a very small number of documents (7000), so I am surprised Solr is having such a hard time with it!! I do facet on 3 terms. Subsequent "hello" searches are faster, but still well over a second. This is a very fast Mac Pro, with 6GB of RAM. Thanks, Peter On Aug 25, 2010, at 9:52 AM,

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam wrote: > So, I went through all the effort to break my documents into max 1 MB chunks, > and searching for hello still takes over 40 seconds (searching across 7433 > documents): > >        8 results (41980 ms) > > What is going on???  (scroll down for

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Peter Spam
he matter, i think it should (made) to be possible to > return multiple rows in an ArrayList. > > -Original message- > From: Peter Spam > Sent: Tue 17-08-2010 00:47 > To: solr-user@lucene.apache.org; > Subject: Re: Solr searching performance issues, using large docu

RE: Re: Solr searching performance issues, using large documents

2010-08-16 Thread Markus Jelsma
to return multiple rows in an ArrayList.   -Original message- From: Peter Spam Sent: Tue 17-08-2010 00:47 To: solr-user@lucene.apache.org; Subject: Re: Solr searching performance issues, using large documents Still stuck on this - any hints on how to write the JavaScript to split

Re: Solr searching performance issues, using large documents

2010-08-16 Thread Peter Spam
Still stuck on this - any hints on how to write the JavaScript to split a document? Thanks! -Pete On Aug 5, 2010, at 8:10 PM, Lance Norskog wrote: > You may have to write your own javascript to read in the giant field > and split it up. > > On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam wrote:

Re: Solr searching performance issues, using large documents

2010-08-05 Thread Lance Norskog
You may have to write your own javascript to read in the giant field and split it up. On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam wrote: > I've read through the DataImportHandler page a few times, and still can't > figure out how to separate a large document into smaller documents.  Any > hints?

Re: Solr searching performance issues, using large documents

2010-08-05 Thread Peter Spam
I've read through the DataImportHandler page a few times, and still can't figure out how to separate a large document into smaller documents. Any hints? :-) Thanks! -Peter On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote: > Spanning won't work- you would have to make overlapping mini-document

Re: Solr searching performance issues, using large documents

2010-08-02 Thread Lance Norskog
Spanning won't work- you would have to make overlapping mini-documents if you want to support this. I don't know how big the chunks should be- you'll have to experiment. Lance On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam wrote: > What would happen if the search query phrase spanned separate docu

Re: Solr searching performance issues, using large documents

2010-08-02 Thread Peter Spam
What would happen if the search query phrase spanned separate document chunks? Also, what would the optimal size of chunks be? Thanks! -Peter On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote: > Not that I know of. > > The DataImportHandler has the ability to create multiple documents > from o

Re: Solr searching performance issues, using large documents

2010-08-01 Thread Lance Norskog
Not that I know of. The DataImportHandler has the ability to create multiple documents from one input stream. It is possible to create a DIH file that reads large log files and splits each one into N documents, with the file name as a common field. The DIH wiki page tells you in general how to mak

Re: Solr searching performance issues, using large documents

2010-08-01 Thread Peter Spam
Thanks for the pointer, Lance! Is there an example of this somewhere? -Peter On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote: > Ah! You're not just highlighting, you're snippetizing. This makes it easier. > > Highlighting does not stream- it pulls the entire stored contents into > one string

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Lance Norskog
Ah! You're not just highlighting, you're snippetizing. This makes it easier. Highlighting does not stream- it pulls the entire stored contents into one string and then pulls out the snippet. If you want this to be fast, you have to split up the text into small pieces and only snippetize from the

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
However, I do need to search the entire document, or else the highlighting will sometimes be blank :-( Thanks! - Peter ps. sorry for the many responses - I'm rushing around trying to get this working. On Jul 31, 2010, at 1:11 PM, Peter Spam wrote: > Correction - it went from 17 seconds to 10

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
Correction - it went from 17 seconds to 10 seconds - I was changing the hl.regex.maxAnalyzedChars the first time. Thanks! -Peter On Jul 31, 2010, at 1:06 PM, Peter Spam wrote: > On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: > >> did you already try other values for hl.maxAnalyzedChars=21474

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: > did you already try other values for hl.maxAnalyzedChars=2147483647 Yes, I tried dropping it down to 21, but it didn't have much of an impact (one search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core Mac Pro with 6GB

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
On Jul 30, 2010, at 7:04 PM, Lance Norskog wrote: > Wait- how much text are you highlighting? You say these logfiles are X > big- how big are the actual documents you are storing? I want it to be like google - I put the entire (sometimes 60MB) doc in a field, and then just highlight 2-4 lines of

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Lance Norskog
Wait- how much text are you highlighting? You say these logfiles are X big- how big are the actual documents you are storing? On Fri, Jul 30, 2010 at 1:16 PM, Peter Karich wrote: > Hi Peter :-), > > did you already try other values for > > hl.maxAnalyzedChars=2147483647 > > ? Also regular expre

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Peter Karich
Hi Peter :-), did you already try other values for hl.maxAnalyzedChars=2147483647 ? Also regular expression highlighting is more expensive, I think. What does the 'fuzzy' variable mean? If you use this to query via "~someTerm" instead "someTerm" then you should try the trunk of solr which is a l

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Peter Spam
I do store term vector: -Pete On Jul 30, 2010, at 7:30 AM, Li Li wrote: > hightlight's time is mainly spent on getting the field which you want > to highlight and tokenize this field(If you don't store term vector) . > you can check what's wrong, > > 2010/7/30 Peter Spam : >> If I don't do hi

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Li Li
hightlight's time is mainly spent on getting the field which you want to highlight and tokenize this field(If you don't store term vector) . you can check what's wrong, 2010/7/30 Peter Spam : > If I don't do highlighting, it's really fast.  Optimize has no effect. > > -Peter > > On Jul 29, 2010, a

Re: Solr searching performance issues, using large documents

2010-07-29 Thread Peter Spam
If I don't do highlighting, it's really fast. Optimize has no effect. -Peter On Jul 29, 2010, at 11:54 AM, dc tech wrote: > Are you storing the entire log file text in SOLR? That's almost 3gb of > text that you are storing in the SOLR. Try to > 1) Is this first time performance or on repaat que

Re: Solr searching performance issues, using large documents

2010-07-29 Thread dc tech
Are you storing the entire log file text in SOLR? That's almost 3gb of text that you are storing in the SOLR. Try to 1) Is this first time performance or on repaat queries with the same fields? 2) Optimze the index and test performance again 3) index without storing the text and see what the perfor

Re: Solr searching performance issues, using large documents

2010-07-29 Thread Peter Spam
Any ideas? I've got 5000 documents with an average size of 850k each, and it sometimes takes 2 minutes for a query to come back when highlighting is turned on! Help! -Pete On Jul 21, 2010, at 2:41 PM, Peter Spam wrote: > From the mailing list archive, Koji wrote: > >> 1. Provide another fi

Re: Solr searching performance issues, using large documents

2010-07-21 Thread Peter Spam
>From the mailing list archive, Koji wrote: > 1. Provide another field for highlighting and use copyField to copy plainText > to the highlighting field. and Lance wrote: http://www.mail-archive.com/solr-user@lucene.apache.org/msg35548.html > If you want to highlight field X, doing the > termO