de.
>
> Best
> Erick
>
> On Fri, Nov 4, 2011 at 3:23 AM, Peter Spam wrote:
>> Solr 4.0 (11/1 snapshot)
>> Data: 80k files, average size 2.5MB, largest is 750MB;
>> Solr: Each document is max 256k; total docs = 800k
>> Machine: Early 2009 Mac Pro, 6GB RAM
Wow, I tried with minGramSize=1 and maxgramSize=1000 (I want someone to be able
to search on any substring, just like "grep"), and the index is multiple orders
of magnitude larger than my data!
There's got to be a better way to support full grep-like searching?
Thanks!
Pete
On Nov 4, 2011, at
eally, I'd like to feed Solr the metadata and the entire file at once, and
have the back-end split the file into thousands of pieces. Is this possible?
Thanks!
Pete
On Nov 1, 2011, at 5:15 PM, Peter Spam wrote:
> Wow, 50 lines is tiny! Is that how small you need to go, to get good
&
Example data:
01/23/2011 05:12:34 [Test] a=1; hello_there=50; data=[1,5,30%];
I would love to be able to just "grep" the data - ie. if I search for "ello",
it finds and returns "ello", and if I search for "hello_there=5", it would
match too.
Here's what I'm using now:
;fl=id,score&defType=dismax&bf=sub(1000,caprice_score)&group=true&group.field=FileName
>
> Results are amazing, I am able to index and search from very larger log files
> (few 100 MBs) with very low memory requirements. Highlighting is also working
> fine.
>
&g
efType=dismax&bf=sub(1000,caprice_score)&group=true&group.field=FileName
>
> Results are amazing, I am able to index and search from very larger log files
> (few 100 MBs) with very low memory requirements. Highlighting is also working
> fine.
>
> Thanks & Regar
ny
> field or FunctionQuery. See http://wiki.apache.org/solr/FunctionQuery
>
> On Fri, Oct 21, 2011 at 7:03 PM, Peter Spam wrote:
>
>> Is there a way to use a custom sorter, to avoid re-indexing?
>>
>>
>> Thanks!
>> Pete
>>
>> On Oct 21, 20
om my iPhone
On Oct 23, 2011, at 2:01 PM, Erick Erickson wrote:
> Also be aware that by default Solr is configured to only index the
> first 10,000 lines
> of text. See maxFieldLength in solrconfig.xml
>
> Best
> Erick
>
> On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam w
; "sort=field1,field2,field3".
>
> Anyway, both this options require reindexing.
>
> Regards,
>
> Tomás
>
> On Fri, Oct 21, 2011 at 4:57 PM, Peter Spam wrote:
>
>> Hi everyone,
>>
>> I have a field that has a letter in it (for example, 1
Hi everyone,
I have a field that has a letter in it (for example, 1A1, 2A1, 11C15, etc.).
Sorting it seems to work most of the time, except for a few things, like 10A1
is lower than 8A100, and 10A100 is lower than 10A99. Any ideas? I bet if my
data had leading zeros (ie 10A099), it would beh
> Which means that you should divide your files and use Result Grouping / Field
> Collapsing to list only one hit per original document.
>
> (xtf also would solve your problem "out of the box" but xtf does not use
> solr).
>
> Best regards
> Karsten
>
ng
> to list only one hit per original document.
>
> (xtf also would solve your problem "out of the box" but xtf does not use
> solr).
>
> Best regards
> Karsten
>
> Original-Nachricht
>> Datum: Thu, 20 Oct 2011 17:59:04 -0700
>> Von
I have about 20k text files, some very small, but some up to 300MB, and would
like to do text searching with highlighting.
Imagine the text is the contents of your syslog.
I would like to type in some terms, such as "error" and "mail", and have Solr
return the syslog lines with those terms PLUS
I'm having the same problem - the standard query returns all my documents, but
the dismax one returns 0. Any ideas?
http://server:8983/solr/select?qt=standard&indent=on&q=*
−
0
3592
−
on
standard
*
−
−
[...]
---
My schema: id, name, checksum, body, notes, date
I'd like for a user to be able to add notes to the notes field, and not have to
re-index the document (since the body field may contain 100MB of text). Some
ideas:
1) How about creating another core which only contains id, checksum, and notes?
Thanks for the note, Shaun, but the documentation indicates that the sorting is
only in ascending order :-(
facet.sort
This param determines the ordering of the facet field constraints.
• count - sort the constraints by count (highest count first)
• index - to return the constra
n, have you seen grouping?
>
> Which is another way of asking why you want to do this, perhaps it's an
> XY problem
>
> Best
> Erick
>
> On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam wrote:
>
>> Hi,
>>
>> I have documents with a field that ha
//sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message
>> From: Peter Spam
>> To: solr-user@lucene.apache.org
>> Sent: Thu, April 7, 2011 1:13:44 AM
>> Subject: Tips for getting unique results
Hi,
I have documents with a field that has "1A2B3C" alphanumeric characters. I can
query for * and sort results based on this field, however I'd like to "uniq"
these results (remove duplicates) so that I can get the 5 largest unique
values. I can't use the StatsComponent because my values hav
at 9:52 AM, Yonik Seeley wrote:
> On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam wrote:
>> So, I went through all the effort to break my documents into max 1 MB
>> chunks, and searching for hello still takes over 40 seconds (searching
>> across 7433 documents):
>>
>&
he matter, i think it should (made) to be possible to
> return multiple rows in an ArrayList.
>
> -Original message-
> From: Peter Spam
> Sent: Tue 17-08-2010 00:47
> To: solr-user@lucene.apache.org;
> Subject: Re: Solr searching performance issues, using large docu
Still stuck on this - any hints on how to write the JavaScript to split a
document? Thanks!
-Pete
On Aug 5, 2010, at 8:10 PM, Lance Norskog wrote:
> You may have to write your own javascript to read in the giant field
> and split it up.
>
> On Thu, Aug 5, 2010 at 5:27 PM, Peter
e overlapping mini-documents
> if you want to support this.
>
> I don't know how big the chunks should be- you'll have to experiment.
>
> Lance
>
> On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam wrote:
>> What would happen if the search query phrase spanned separate d
ured documents.
>
> On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam wrote:
>> Thanks for the pointer, Lance! Is there an example of this somewhere?
>>
>>
>> -Peter
>>
>> On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote:
>>
>>> Ah! You're
2
> queries to achieve what you want, but the second query for the same
> query will be blindingly fast. Often <1ms.
>
> Good luck!
>
> Lance
>
> On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam wrote:
>> However, I do need to search the entire document, or
However, I do need to search the entire document, or else the highlighting will
sometimes be blank :-(
Thanks!
- Peter
ps. sorry for the many responses - I'm rushing around trying to get this
working.
On Jul 31, 2010, at 1:11 PM, Peter Spam wrote:
> Correction - it went from 17 secon
Correction - it went from 17 seconds to 10 seconds - I was changing the
hl.regex.maxAnalyzedChars the first time.
Thanks!
-Peter
On Jul 31, 2010, at 1:06 PM, Peter Spam wrote:
> On Jul 30, 2010, at 1:16 PM, Peter Karich wrote:
>
>> did you already try other values for hl.maxA
On Jul 30, 2010, at 1:16 PM, Peter Karich wrote:
> did you already try other values for hl.maxAnalyzedChars=2147483647
Yes, I tried dropping it down to 21, but it didn't have much of an impact (one
search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core
Mac Pro with 6GB
On Jul 30, 2010, at 7:04 PM, Lance Norskog wrote:
> Wait- how much text are you highlighting? You say these logfiles are X
> big- how big are the actual documents you are storing?
I want it to be like google - I put the entire (sometimes 60MB) doc in a field,
and then just highlight 2-4 lines of
I do store term vector:
-Pete
On Jul 30, 2010, at 7:30 AM, Li Li wrote:
> hightlight's time is mainly spent on getting the field which you want
> to highlight and tokenize this field(If you don't store term vector) .
> you can check what's wrong,
>
> 2010/7/30
performance or on repaat queries with the same fields?
> 2) Optimze the index and test performance again
> 3) index without storing the text and see what the performance looks like.
>
>
> On 7/29/10, Peter Spam wrote:
>> Any ideas? I've got 5000 documents with an aver
Any ideas? I've got 5000 documents with an average size of 850k each, and it
sometimes takes 2 minutes for a query to come back when highlighting is turned
on! Help!
-Pete
On Jul 21, 2010, at 2:41 PM, Peter Spam wrote:
> From the mailing list archive, Koji wrote:
>
>> 1
Still not working ... any ideas?
-Pete
On Jul 14, 2010, at 11:56 AM, Peter Spam wrote:
> Any other thoughts, Chris? I've been messing with this a bit, and can't seem
> to get (?m)^.*$ to do what I want.
>
> 1) I don't care how many characters it returns, I&
If I search for "foo", I get back a list of documents. Any way to get a
per-document hit count? Thanks!
-Pete
stored fields, then enabling lazy field
> loading can be a huge boon, especially if compressed fields are used.
What does this mean? How do you load a field lazily?
Thanks for your time, guys - this has started to become frustrating, since it
works so well, but is very slow!
-Pete
On Ju
Data set: About 4,000 log files (will eventually grow to millions). Average
log file is 850k. Largest log file (so far) is about 70MB.
Problem: When I search for common terms, the query time goes from under 2-3
seconds to about 60 seconds. TermVectors etc are enabled. When I disable
highlig
and the line after.
3) This should be like "grep -C1"
Thanks for your time!
-Pete
On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:
> Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works
> better, but I still get fragments before and after
Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works
better, but I still get fragments before and after some returns.
Thanks for the hint!
-Pete
On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
>
> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
].to_s +
"&hl=true&hl.snippets=1&hl.fragsize=0"
#&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" +
CGI::escape(regexv)
Thanks for your help.
-Peter
On Jul 8, 2010, at 3:47 PM, Koji Sekiguchi wrote:
> (10/07/09 2:44), Peter Spam
To clarify, I never want a snippet, I always want a whole line returned. Is
this possible? Thanks!
-Pete
On Jul 7, 2010, at 5:33 PM, Peter Spam wrote:
> Hi,
>
> I have a text file broken apart by carriage returns, and I'd like to only
> return entire lines. So, I
Hi,
I have a text file broken apart by carriage returns, and I'd like to only
return entire lines. So, I'm trying to use this:
&hl.fragmenter=regex
&hl.regex.pattern=^.*$
... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so.
I also tried a pattern of
Ah, I found this:
https://issues.apache.org/jira/browse/SOLR-634
... aka "solr-ui". Is there anything else along these lines? Thanks!
-Peter
On Jun 30, 2010, at 3:59 PM, Peter Spam wrote:
> Wow, thanks Lance - it's really fast now!
>
> The last piece of th
gParameters
>> has a good list, but you've probably already seen that page....
>>
>> Best
>> Erick
>>
>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam wrote:
>>
>>> To follow up, I've found that my queries are very fast (even with &am
To follow up, I've found that my queries are very fast (even with &fq=), until
I add &hl=true. What can I do to speed up highlighting? Should I consider
injecting a line at a time, rather than the entire file as a field?
-Pete
On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
web
> server is pointing at.
>
> Also, SOLR has no way of knowing you're modified your index
> with SolrJ, so it may not be automatically reopening an
> IndexReader so your recent changes may not be visible
> until you force the SOLR reader to reopen.
>
> HTH
> Er
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>> 1) I can get my docs in the index, but when I search, it
>> returns the entire document. I'd love to have it only
>> return the line (or two) around the search term.
>
> Solr can generate Google-like snippets as you describe.
> http://wiki.apa
Great, thanks for the pointers.
Thanks,
Peter
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>> 1) I can get my docs in the index, but when I search, it
>> returns the entire document. I'd love to have it only
>> return the line (or two) around the search term.
>
> Solr can generate Google-
Hi everyone,
I'm looking for a way to index a bunch of (potentially large) text files. I
would love to see results like Google, so I went through a few tutorials, but
I've still got questions:
1) I can get my docs in the index, but when I search, it returns the entire
document. I'd love to h
48 matches
Mail list logo