On Wed, Aug 25, 2010 at 2:34 PM, Peter Spam wrote:
> This is a very small number of documents (7000), so I am surprised Solr is
> having such a hard time with it!!
>
> I do facet on 3 terms.
>
> Subsequent "hello" searches are faster, but still well over a second. This
> is a very fast Mac Pro,
How much disk space is used by the index?
If you run the Lucene CheckIndex program, how many terms etc. does it report?
When you do the first facet query, how much does the memory in use grow?
Are you storing the text fields, or only indexing? Do you fetch the
facets only, or do you also fetch t
This is a very small number of documents (7000), so I am surprised Solr is
having such a hard time with it!!
I do facet on 3 terms.
Subsequent "hello" searches are faster, but still well over a second. This is
a very fast Mac Pro, with 6GB of RAM.
Thanks,
Peter
On Aug 25, 2010, at 9:52 AM,
On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam wrote:
> So, I went through all the effort to break my documents into max 1 MB chunks,
> and searching for hello still takes over 40 seconds (searching across 7433
> documents):
>
> 8 results (41980 ms)
>
> What is going on??? (scroll down for
he matter, i think it should (made) to be possible to
> return multiple rows in an ArrayList.
>
> -Original message-
> From: Peter Spam
> Sent: Tue 17-08-2010 00:47
> To: solr-user@lucene.apache.org;
> Subject: Re: Solr searching performance issues, using large docu
to
return multiple rows in an ArrayList.
-Original message-
From: Peter Spam
Sent: Tue 17-08-2010 00:47
To: solr-user@lucene.apache.org;
Subject: Re: Solr searching performance issues, using large documents
Still stuck on this - any hints on how to write the JavaScript to split
Still stuck on this - any hints on how to write the JavaScript to split a
document? Thanks!
-Pete
On Aug 5, 2010, at 8:10 PM, Lance Norskog wrote:
> You may have to write your own javascript to read in the giant field
> and split it up.
>
> On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam wrote:
You may have to write your own javascript to read in the giant field
and split it up.
On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam wrote:
> I've read through the DataImportHandler page a few times, and still can't
> figure out how to separate a large document into smaller documents. Any
> hints?
I've read through the DataImportHandler page a few times, and still can't
figure out how to separate a large document into smaller documents. Any hints?
:-) Thanks!
-Peter
On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote:
> Spanning won't work- you would have to make overlapping mini-document
Spanning won't work- you would have to make overlapping mini-documents
if you want to support this.
I don't know how big the chunks should be- you'll have to experiment.
Lance
On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam wrote:
> What would happen if the search query phrase spanned separate docu
What would happen if the search query phrase spanned separate document chunks?
Also, what would the optimal size of chunks be?
Thanks!
-Peter
On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote:
> Not that I know of.
>
> The DataImportHandler has the ability to create multiple documents
> from o
Not that I know of.
The DataImportHandler has the ability to create multiple documents
from one input stream. It is possible to create a DIH file that reads
large log files and splits each one into N documents, with the file
name as a common field. The DIH wiki page tells you in general how to
mak
Thanks for the pointer, Lance! Is there an example of this somewhere?
-Peter
On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote:
> Ah! You're not just highlighting, you're snippetizing. This makes it easier.
>
> Highlighting does not stream- it pulls the entire stored contents into
> one string
Ah! You're not just highlighting, you're snippetizing. This makes it easier.
Highlighting does not stream- it pulls the entire stored contents into
one string and then pulls out the snippet. If you want this to be
fast, you have to split up the text into small pieces and only
snippetize from the
However, I do need to search the entire document, or else the highlighting will
sometimes be blank :-(
Thanks!
- Peter
ps. sorry for the many responses - I'm rushing around trying to get this
working.
On Jul 31, 2010, at 1:11 PM, Peter Spam wrote:
> Correction - it went from 17 seconds to 10
Correction - it went from 17 seconds to 10 seconds - I was changing the
hl.regex.maxAnalyzedChars the first time.
Thanks!
-Peter
On Jul 31, 2010, at 1:06 PM, Peter Spam wrote:
> On Jul 30, 2010, at 1:16 PM, Peter Karich wrote:
>
>> did you already try other values for hl.maxAnalyzedChars=21474
On Jul 30, 2010, at 1:16 PM, Peter Karich wrote:
> did you already try other values for hl.maxAnalyzedChars=2147483647
Yes, I tried dropping it down to 21, but it didn't have much of an impact (one
search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core
Mac Pro with 6GB
On Jul 30, 2010, at 7:04 PM, Lance Norskog wrote:
> Wait- how much text are you highlighting? You say these logfiles are X
> big- how big are the actual documents you are storing?
I want it to be like google - I put the entire (sometimes 60MB) doc in a field,
and then just highlight 2-4 lines of
Wait- how much text are you highlighting? You say these logfiles are X
big- how big are the actual documents you are storing?
On Fri, Jul 30, 2010 at 1:16 PM, Peter Karich wrote:
> Hi Peter :-),
>
> did you already try other values for
>
> hl.maxAnalyzedChars=2147483647
>
> ? Also regular expre
Hi Peter :-),
did you already try other values for
hl.maxAnalyzedChars=2147483647
? Also regular expression highlighting is more expensive, I think.
What does the 'fuzzy' variable mean? If you use this to query via
"~someTerm" instead "someTerm"
then you should try the trunk of solr which is a l
I do store term vector:
-Pete
On Jul 30, 2010, at 7:30 AM, Li Li wrote:
> hightlight's time is mainly spent on getting the field which you want
> to highlight and tokenize this field(If you don't store term vector) .
> you can check what's wrong,
>
> 2010/7/30 Peter Spam :
>> If I don't do hi
hightlight's time is mainly spent on getting the field which you want
to highlight and tokenize this field(If you don't store term vector) .
you can check what's wrong,
2010/7/30 Peter Spam :
> If I don't do highlighting, it's really fast. Optimize has no effect.
>
> -Peter
>
> On Jul 29, 2010, a
If I don't do highlighting, it's really fast. Optimize has no effect.
-Peter
On Jul 29, 2010, at 11:54 AM, dc tech wrote:
> Are you storing the entire log file text in SOLR? That's almost 3gb of
> text that you are storing in the SOLR. Try to
> 1) Is this first time performance or on repaat que
Are you storing the entire log file text in SOLR? That's almost 3gb of
text that you are storing in the SOLR. Try to
1) Is this first time performance or on repaat queries with the same fields?
2) Optimze the index and test performance again
3) index without storing the text and see what the perfor
Any ideas? I've got 5000 documents with an average size of 850k each, and it
sometimes takes 2 minutes for a query to come back when highlighting is turned
on! Help!
-Pete
On Jul 21, 2010, at 2:41 PM, Peter Spam wrote:
> From the mailing list archive, Koji wrote:
>
>> 1. Provide another fi
>From the mailing list archive, Koji wrote:
> 1. Provide another field for highlighting and use copyField to copy plainText
> to the highlighting field.
and Lance wrote:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg35548.html
> If you want to highlight field X, doing the
> termO
26 matches
Mail list logo