Re: Can Solr handle large text files?

2012-07-27 Thread Peter Spam
de. > > Best > Erick > > On Fri, Nov 4, 2011 at 3:23 AM, Peter Spam wrote: >> Solr 4.0 (11/1 snapshot) >> Data: 80k files, average size 2.5MB, largest is 750MB; >> Solr: Each document is max 256k; total docs = 800k >> Machine: Early 2009 Mac Pro, 6GB RAM

Re: Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Peter Spam
Wow, I tried with minGramSize=1 and maxgramSize=1000 (I want someone to be able to search on any substring, just like "grep"), and the index is multiple orders of magnitude larger than my data! There's got to be a better way to support full grep-like searching? Thanks! Pete On Nov 4, 2011, at

Re: Can Solr handle large text files?

2011-11-04 Thread Peter Spam
eally, I'd like to feed Solr the metadata and the entire file at once, and have the back-end split the file into thousands of pieces. Is this possible? Thanks! Pete On Nov 1, 2011, at 5:15 PM, Peter Spam wrote: > Wow, 50 lines is tiny! Is that how small you need to go, to get good &

Proper analyzer / tokenizer for syslog data?

2011-11-04 Thread Peter Spam
Example data: 01/23/2011 05:12:34 [Test] a=1; hello_there=50; data=[1,5,30%]; I would love to be able to just "grep" the data - ie. if I search for "ello", it finds and returns "ello", and if I search for "hello_there=5", it would match too. Here's what I'm using now:

Re: Can Solr handle large text files?

2011-11-01 Thread Peter Spam
;fl=id,score&defType=dismax&bf=sub(1000,caprice_score)&group=true&group.field=FileName > > Results are amazing, I am able to index and search from very larger log files > (few 100 MBs) with very low memory requirements. Highlighting is also working > fine. > &g

Re: Can Solr handle large text files?

2011-11-01 Thread Peter Spam
efType=dismax&bf=sub(1000,caprice_score)&group=true&group.field=FileName > > Results are amazing, I am able to index and search from very larger log files > (few 100 MBs) with very low memory requirements. Highlighting is also working > fine. > > Thanks & Regar

Re: Sorting fields with letters?

2011-10-24 Thread Peter Spam
ny > field or FunctionQuery. See http://wiki.apache.org/solr/FunctionQuery > > On Fri, Oct 21, 2011 at 7:03 PM, Peter Spam wrote: > >> Is there a way to use a custom sorter, to avoid re-indexing? >> >> >> Thanks! >> Pete >> >> On Oct 21, 20

Re: Can Solr handle large text files?

2011-10-24 Thread Peter Spam
om my iPhone On Oct 23, 2011, at 2:01 PM, Erick Erickson wrote: > Also be aware that by default Solr is configured to only index the > first 10,000 lines > of text. See maxFieldLength in solrconfig.xml > > Best > Erick > > On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam w

Re: Sorting fields with letters?

2011-10-21 Thread Peter Spam
; "sort=field1,field2,field3". > > Anyway, both this options require reindexing. > > Regards, > > Tomás > > On Fri, Oct 21, 2011 at 4:57 PM, Peter Spam wrote: > >> Hi everyone, >> >> I have a field that has a letter in it (for example, 1

Sorting fields with letters?

2011-10-21 Thread Peter Spam
Hi everyone, I have a field that has a letter in it (for example, 1A1, 2A1, 11C15, etc.). Sorting it seems to work most of the time, except for a few things, like 10A1 is lower than 8A100, and 10A100 is lower than 10A99. Any ideas? I bet if my data had leading zeros (ie 10A099), it would beh

Re: Can Solr handle large text files?

2011-10-21 Thread Peter Spam
> Which means that you should divide your files and use Result Grouping / Field > Collapsing to list only one hit per original document. > > (xtf also would solve your problem "out of the box" but xtf does not use > solr). > > Best regards > Karsten >

Re: Can Solr handle large text files?

2011-10-21 Thread Peter Spam
ng > to list only one hit per original document. > > (xtf also would solve your problem "out of the box" but xtf does not use > solr). > > Best regards > Karsten > > Original-Nachricht >> Datum: Thu, 20 Oct 2011 17:59:04 -0700 >> Von

Can Solr handle large text files?

2011-10-20 Thread Peter Spam
I have about 20k text files, some very small, but some up to 300MB, and would like to do text searching with highlighting. Imagine the text is the contents of your syslog. I would like to type in some terms, such as "error" and "mail", and have Solr return the syslog lines with those terms PLUS

Re: Dismax Request handler and Solrconfig.xml

2011-05-08 Thread Peter Spam
I'm having the same problem - the standard query returns all my documents, but the dismax one returns 0. Any ideas? http://server:8983/solr/select?qt=standard&indent=on&q=* − 0 3592 − on standard * − − [...] ---

Re: How to Update Value of One Field of a Document in Index?

2011-04-26 Thread Peter Spam
My schema: id, name, checksum, body, notes, date I'd like for a user to be able to add notes to the notes field, and not have to re-index the document (since the body field may contain 100MB of text). Some ideas: 1) How about creating another core which only contains id, checksum, and notes?

Re: Tips for getting unique results?

2011-04-08 Thread Peter Spam
Thanks for the note, Shaun, but the documentation indicates that the sorting is only in ascending order :-( facet.sort This param determines the ordering of the facet field constraints. • count - sort the constraints by count (highest count first) • index - to return the constra

Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
n, have you seen grouping? > > Which is another way of asking why you want to do this, perhaps it's an > XY problem > > Best > Erick > > On Thu, Apr 7, 2011 at 1:13 AM, Peter Spam wrote: > >> Hi, >> >> I have documents with a field that ha

Re: Tips for getting unique results?

2011-04-07 Thread Peter Spam
//sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Peter Spam >> To: solr-user@lucene.apache.org >> Sent: Thu, April 7, 2011 1:13:44 AM >> Subject: Tips for getting unique results

Tips for getting unique results?

2011-04-06 Thread Peter Spam
Hi, I have documents with a field that has "1A2B3C" alphanumeric characters. I can query for * and sort results based on this field, however I'd like to "uniq" these results (remove duplicates) so that I can get the 5 largest unique values. I can't use the StatsComponent because my values hav

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Peter Spam
at 9:52 AM, Yonik Seeley wrote: > On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam wrote: >> So, I went through all the effort to break my documents into max 1 MB >> chunks, and searching for hello still takes over 40 seconds (searching >> across 7433 documents): >> >&

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Peter Spam
he matter, i think it should (made) to be possible to > return multiple rows in an ArrayList. > > -Original message- > From: Peter Spam > Sent: Tue 17-08-2010 00:47 > To: solr-user@lucene.apache.org; > Subject: Re: Solr searching performance issues, using large docu

Re: Solr searching performance issues, using large documents

2010-08-16 Thread Peter Spam
Still stuck on this - any hints on how to write the JavaScript to split a document? Thanks! -Pete On Aug 5, 2010, at 8:10 PM, Lance Norskog wrote: > You may have to write your own javascript to read in the giant field > and split it up. > > On Thu, Aug 5, 2010 at 5:27 PM, Peter

Re: Solr searching performance issues, using large documents

2010-08-05 Thread Peter Spam
e overlapping mini-documents > if you want to support this. > > I don't know how big the chunks should be- you'll have to experiment. > > Lance > > On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam wrote: >> What would happen if the search query phrase spanned separate d

Re: Solr searching performance issues, using large documents

2010-08-02 Thread Peter Spam
ured documents. > > On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam wrote: >> Thanks for the pointer, Lance! Is there an example of this somewhere? >> >> >> -Peter >> >> On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote: >> >>> Ah! You're

Re: Solr searching performance issues, using large documents

2010-08-01 Thread Peter Spam
2 > queries to achieve what you want, but the second query for the same > query will be blindingly fast. Often <1ms. > > Good luck! > > Lance > > On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam wrote: >> However, I do need to search the entire document, or

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
However, I do need to search the entire document, or else the highlighting will sometimes be blank :-( Thanks! - Peter ps. sorry for the many responses - I'm rushing around trying to get this working. On Jul 31, 2010, at 1:11 PM, Peter Spam wrote: > Correction - it went from 17 secon

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
Correction - it went from 17 seconds to 10 seconds - I was changing the hl.regex.maxAnalyzedChars the first time. Thanks! -Peter On Jul 31, 2010, at 1:06 PM, Peter Spam wrote: > On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: > >> did you already try other values for hl.maxA

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: > did you already try other values for hl.maxAnalyzedChars=2147483647 Yes, I tried dropping it down to 21, but it didn't have much of an impact (one search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core Mac Pro with 6GB

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Peter Spam
On Jul 30, 2010, at 7:04 PM, Lance Norskog wrote: > Wait- how much text are you highlighting? You say these logfiles are X > big- how big are the actual documents you are storing? I want it to be like google - I put the entire (sometimes 60MB) doc in a field, and then just highlight 2-4 lines of

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Peter Spam
I do store term vector: -Pete On Jul 30, 2010, at 7:30 AM, Li Li wrote: > hightlight's time is mainly spent on getting the field which you want > to highlight and tokenize this field(If you don't store term vector) . > you can check what's wrong, > > 2010/7/30

Re: Solr searching performance issues, using large documents

2010-07-29 Thread Peter Spam
performance or on repaat queries with the same fields? > 2) Optimze the index and test performance again > 3) index without storing the text and see what the performance looks like. > > > On 7/29/10, Peter Spam wrote: >> Any ideas? I've got 5000 documents with an aver

Re: Solr searching performance issues, using large documents

2010-07-29 Thread Peter Spam
Any ideas? I've got 5000 documents with an average size of 850k each, and it sometimes takes 2 minutes for a query to come back when highlighting is turned on! Help! -Pete On Jul 21, 2010, at 2:41 PM, Peter Spam wrote: > From the mailing list archive, Koji wrote: > >> 1

Re: Using hl.regex.pattern to print complete lines

2010-07-21 Thread Peter Spam
Still not working ... any ideas? -Pete On Jul 14, 2010, at 11:56 AM, Peter Spam wrote: > Any other thoughts, Chris? I've been messing with this a bit, and can't seem > to get (?m)^.*$ to do what I want. > > 1) I don't care how many characters it returns, I&

Count hits per document?

2010-07-21 Thread Peter Spam
If I search for "foo", I get back a list of documents. Any way to get a per-document hit count? Thanks! -Pete

Re: Solr searching performance issues, using large documents

2010-07-21 Thread Peter Spam
stored fields, then enabling lazy field > loading can be a huge boon, especially if compressed fields are used. What does this mean? How do you load a field lazily? Thanks for your time, guys - this has started to become frustrating, since it works so well, but is very slow! -Pete On Ju

Solr searching performance issues, using large documents

2010-07-20 Thread Peter Spam
Data set: About 4,000 log files (will eventually grow to millions). Average log file is 850k. Largest log file (so far) is about 70MB. Problem: When I search for common terms, the query time goes from under 2-3 seconds to about 60 seconds. TermVectors etc are enabled. When I disable highlig

Re: Using hl.regex.pattern to print complete lines

2010-07-14 Thread Peter Spam
and the line after. 3) This should be like "grep -C1" Thanks for your time! -Pete On Jul 9, 2010, at 12:08 AM, Peter Spam wrote: > Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works > better, but I still get fragments before and after

Re: Using hl.regex.pattern to print complete lines

2010-07-09 Thread Peter Spam
Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works better, but I still get fragments before and after some returns. Thanks for the hint! -Pete On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote: > > : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single

Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Peter Spam
].to_s + "&hl=true&hl.snippets=1&hl.fragsize=0" #&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" + CGI::escape(regexv) Thanks for your help. -Peter On Jul 8, 2010, at 3:47 PM, Koji Sekiguchi wrote: > (10/07/09 2:44), Peter Spam

Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Peter Spam
To clarify, I never want a snippet, I always want a whole line returned. Is this possible? Thanks! -Pete On Jul 7, 2010, at 5:33 PM, Peter Spam wrote: > Hi, > > I have a text file broken apart by carriage returns, and I'd like to only > return entire lines. So, I

Using hl.regex.pattern to print complete lines

2010-07-07 Thread Peter Spam
Hi, I have a text file broken apart by carriage returns, and I'd like to only return entire lines. So, I'm trying to use this: &hl.fragmenter=regex &hl.regex.pattern=^.*$ ... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so. I also tried a pattern of

Re: Very basic questions: Faceted front-end?

2010-06-30 Thread Peter Spam
Ah, I found this: https://issues.apache.org/jira/browse/SOLR-634 ... aka "solr-ui". Is there anything else along these lines? Thanks! -Peter On Jun 30, 2010, at 3:59 PM, Peter Spam wrote: > Wow, thanks Lance - it's really fast now! > > The last piece of th

Re: Very basic questions: Faceted front-end?

2010-06-30 Thread Peter Spam
gParameters >> has a good list, but you've probably already seen that page.... >> >> Best >> Erick >> >> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam wrote: >> >>> To follow up, I've found that my queries are very fast (even with &am

Re: Very basic questions: Indexing text - working, but slow!

2010-06-29 Thread Peter Spam
To follow up, I've found that my queries are very fast (even with &fq=), until I add &hl=true. What can I do to speed up highlighting? Should I consider injecting a line at a time, rather than the entire file as a field? -Pete On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:

Re: Very basic questions: Indexing text - working, but slow!

2010-06-29 Thread Peter Spam
web > server is pointing at. > > Also, SOLR has no way of knowing you're modified your index > with SolrJ, so it may not be automatically reopening an > IndexReader so your recent changes may not be visible > until you force the SOLR reader to reopen. > > HTH > Er

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >> 1) I can get my docs in the index, but when I search, it >> returns the entire document. I'd love to have it only >> return the line (or two) around the search term. > > Solr can generate Google-like snippets as you describe. > http://wiki.apa

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
Great, thanks for the pointers. Thanks, Peter On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >> 1) I can get my docs in the index, but when I search, it >> returns the entire document. I'd love to have it only >> return the line (or two) around the search term. > > Solr can generate Google-

Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
Hi everyone, I'm looking for a way to index a bunch of (potentially large) text files. I would love to see results like Google, so I went through a few tutorials, but I've still got questions: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to h