RE: Understanding Lucene's File Format

2010-09-17 Thread Giovanni Fernandez-Kincade
e frq/prx pointers, so that on seek we can rebase the decoding. Mike On Fri, Sep 17, 2010 at 10:02 AM, Giovanni Fernandez-Kincade wrote: >> The terms index (once loaded into RAM) has absolute longs, too. > > So in the TermInfo Index(.tii), the FreqDelta, ProxDelta, And SkipDelta &

RE: Understanding Lucene's File Format

2010-09-17 Thread Giovanni Fernandez-Kincade
s. Mike On Thu, Sep 16, 2010 at 3:53 PM, Giovanni Fernandez-Kincade wrote: > Hi, > I've been trying to understand Lucene's file format and I keep getting hung > up on one detail - how can Lucene quickly find the frequency data (or > proximity data) for a particular term?

Understanding Lucene's File Format

2010-09-16 Thread Giovanni Fernandez-Kincade
Hi, I've been trying to understand Lucene's file format and I keep getting hung up on one detail - how can Lucene quickly find the frequency data (or proximity data) for a particular term? According to the file formats page on the Lucene website

RE: FSDirectory Synchronization Issues

2010-04-27 Thread Giovanni Fernandez-Kincade
bug... Mike On Tue, Apr 27, 2010 at 2:34 PM, Giovanni Fernandez-Kincade wrote: > I was considering it, but we're already tight on memory usage. How do you > configure Solr to use it?  Is this correct? > > http://www.mail-archive.com/solr-user@lucene.apache.org/msg28574.html

RE: FSDirectory Synchronization Issues

2010-04-27 Thread Giovanni Fernandez-Kincade
ctory.class=org.apache.lucene.store.MMapDirectory -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, April 27, 2010 2:28 PM To: solr-user@lucene.apache.org Subject: Re: FSDirectory Synchronization Issues Try MMapDirectory? Mike On Tue, Apr 27, 2010 at 2:09 PM, Giovann

FSDirectory Synchronization Issues

2010-04-27 Thread Giovanni Fernandez-Kincade
Hello, I'm encountering a lot of contention around SimpleFSDirectory$SimpleFSIndexInput.readInternal, pretty much identical to what this user described back in 2008: http://www.mail-archive.com/solr-user@lucene.apache.org/msg15516.html I also found this JIRA issue, where it appears that the conc

autocommiting with expungeDeletes=true

2010-04-08 Thread Giovanni Fernandez-Kincade
Is there any way to configure autocommit to expungeDeletes? Looking at the code it seems to be that there isn't... >From org.apache.solr.update.DirectUpdateHandler2: public synchronized void run() { long started = System.currentTimeMillis(); try { CommitUpdateCommand command =

RE: PDFBox/Tika Performance Issues

2010-03-23 Thread Giovanni Fernandez-Kincade
s, Chris On 3/23/10 7:59 AM, "Giovanni Fernandez-Kincade" wrote: Sorry for the late reply - been out of town for a couple of days. >From my solrconfig: ignored_ text -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf

RE: PDFBox/Tika Performance Issues

2010-03-23 Thread Giovanni Fernandez-Kincade
ser@lucene.apache.org Subject: Re: PDFBox/Tika Performance Issues What's your configuration look like for the ExtractReqHandler? On Mar 19, 2010, at 2:42 PM, Giovanni Fernandez-Kincade wrote: > Yeah I've been trying that - I keep getting this error when indexing a PDF > with a trunk-b

RE: PDFBox/Tika Performance Issues

2010-03-19 Thread Giovanni Fernandez-Kincade
al Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Friday, March 19, 2010 1:46 PM To: solr-user@lucene.apache.org Subject: Re: PDFBox/Tika Performance Issues Can you try trunk? On Mar 19, 2010, at 1:12 PM, Giovanni Fernandez-Kincade wrote: > Solr

RE: PDFBox/Tika Performance Issues

2010-03-19 Thread Giovanni Fernandez-Kincade
Time:Wed Mar 17 17:05:19 EDT 2010 -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Friday, March 19, 2010 1:02 PM To: solr-user@lucene.apache.org Subject: Re: PDFBox/Tika Performance Issues On Mar 16, 2010, at 6:55 PM, Giovanni Fernandez

RE: PDFBox/Tika Performance Issues

2010-03-19 Thread Giovanni Fernandez-Kincade
10 8:06 AM, "Giovanni Fernandez-Kincade" wrote: Hmm. Unfortunately that didn't work. Same problem - Solr doesn't report an error, but the data doesn't get extracted. Using the same PDF with my previous /Lib contents works fine. Any other ideas? These are the jar files

stream.url Contention

2010-03-18 Thread Giovanni Fernandez-Kincade
I recently switched from posting a file (PDFs in this case) to the Extract handler, to using the Stream.URL parameter. I've noticed a huge amount of contention around opening URL connections: http-8080-Processor36 [BLOCKED] CPU time: 0:47 sun.net.www.protocol.file.Handler.openConnection(URL) jav

RE: PDFBox/Tika Performance Issues

2010-03-17 Thread Giovanni Fernandez-Kincade
as to do with the lib deps. Try what I mentioned above and let's go from there. Cheers, Chris > -Original Message- > From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] > Sent: Tuesday, March 16, 2010 5:41 PM > To: solr-user@lucene.

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade
there were no errors logged as a result, but the PDF data does not appear to have been extracted (the field I used for map.content had an empty-string as a value). What's the right approach to perform this patch? -Original Message- From: Giovanni Fernandez-Kincade [mailto:

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade
efully next few weeks). Cheers, Chris [1] http://issues.apache.org/jira/browse/TIKA-380 [2] http://www.mail-archive.com/tika-u...@lucene.apache.org/msg00302.html On 3/16/10 2:31 PM, "Giovanni Fernandez-Kincade" wrote: Originally 16 (the number of CPUs on the machine), but even with 5

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade
ar 16, 2010, at 4:37 PM, Giovanni Fernandez-Kincade wrote: > I've been trying to bulk index about 11 million PDFs, and while profiling our > Solr instance, I noticed that all of the threads that are processing indexing > requests are constantly blocking each other during this c

PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade
I've been trying to bulk index about 11 million PDFs, and while profiling our Solr instance, I noticed that all of the threads that are processing indexing requests are constantly blocking each other during this call: http-8080-Processor39 [BLOCKED] CPU time: 9:35 java.util.Collections$Synchroni

Master Read Timeout

2010-01-25 Thread Giovanni Fernandez-Kincade
I have a slave that is pulling multiple cores from one master, and I'm very frequently seeing cases where the slave is getting timeouts when fetching from the master: 2010-01-25 11:00:22,819 [pool-3-thread-1] ERROR org.apache.solr.handler.SnapPuller - Master at: http://shredder:8080/solr/Filin

Cores + Replication Config

2010-01-11 Thread Giovanni Fernandez-Kincade
If you want to share one config amidst master & slaves, using Solr 1.4 replication, is there a way to specific whether a core is Master or Slave when using the CREATE Core command? Thanks, Gio.

RE: checkindex

2010-01-08 Thread Giovanni Fernandez-Kincade
p lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex -fix /path/to/solr/data/index/ hope that helps, -Ian On 1/8/10 2:09 PM, Giovanni Fernandez-Kincade wrote: > > I've seen many mentions of the Lucene CheckIndex tool, but where can I > find it? Is there any documentation on how to use

checkindex

2010-01-08 Thread Giovanni Fernandez-Kincade
I've seen many mentions of the Lucene CheckIndex tool, but where can I find it? Is there any documentation on how to use it? I noticed Luke has it built-in, but I can't get Luke to open my index with the "Don't open IndexReader(when opening corrupted index)" option check. Opening even an index

RE: replication --> missing field data file

2010-01-07 Thread Giovanni Fernandez-Kincade
up is just to take periodics backups not necessary for the Replicationhandler to work On Thu, Jan 7, 2010 at 2:37 AM, Giovanni Fernandez-Kincade wrote: > How can you tell when the backup is done? > > -Original Message- > From: noble.p...@gmail.com [mailto:noble.p...@gmail.c

RE: replication --> missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade
in the name "index" others will be stored as index On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade wrote: > How can you differentiate between the backup and the normal index files? > > -Original Message- > From: noble.p...@gmail.com [mailto:noble.p..

RE: replication --> missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade
eld data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade wrote: > I set up replication between 2 cores on one master and 2 cores on one slave. > Before doing this the master was working without issues, and I stopped all > indexing on the master. > > Now that repl

replication --> missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade
I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pr

Solr Replication Questions

2010-01-05 Thread Giovanni Fernandez-Kincade
http://wiki.apache.org/solr/SolrReplication I've been looking over this replication wiki and I'm still unclear on a two points about Solr Replication: 1. If there have been small changes to the index on the master, does the slave copy the entire contents of the index files that were affecte

RE: Solr Cell - PDFs plus literal metadata - GET or POST ?

2010-01-05 Thread Giovanni Fernandez-Kincade
Really? Doesn't it have to be delimited differently, if both the file contents and the document metadata will be part of the POST data? How does Solr Cell tell the difference between the literals and the start of the file? I've tried this before and haven't had any luck with it. -Original

RE:Delete, commit, optimize doesn't reduce index file size

2009-12-30 Thread Giovanni Fernandez-Kincade
Is there another way to make this happen without making further changes to the index? Maybe a bounce of the servlet server? On Tue, Dec 29, 2009 at 1:23 PM, markwaddle wrote: I have an index that used to have ~38M docs at 17.2GB. I deleted all but 13K docs using a delete by query, commit and t

RE: Unable to delete from index

2009-12-28 Thread Giovanni Fernandez-Kincade
config.xml Ankit -Original Message----- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, December 28, 2009 5:46 PM To: solr-user@lucene.apache.org Subject: RE: Unable to delete from index Sorry - hit reply too early. I edited my config as you suggested

RE: Unable to delete from index

2009-12-28 Thread Giovanni Fernandez-Kincade
Sorry - hit reply too early. I edited my config as you suggested, rebooted Tomcat, and I can still find the doc through the Solr Admin interface even though I can't find it in Luke. -Original Message- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]

RE: Unable to delete from index

2009-12-28 Thread Giovanni Fernandez-Kincade
My HTTP caching is currently configured for Open Time So that shouldn't be the problem, right? -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Monday, December 28, 2009 5:31 PM To: solr-user@lucene.apache.org Subject: RE: Unable to delete from index > I opened

RE: Unable to delete from index

2009-12-28 Thread Giovanni Fernandez-Kincade
onday, December 28, 2009 4:54 PM To: 'solr-user@lucene.apache.org' Subject: RE: Unable to delete from index Are you deleting from correct index.[Meaning verify - Solr home] Also inspect thru luke to check the contents Ankit -Original Message- From: Giovanni Fe

Unable to delete from index

2009-12-28 Thread Giovanni Fernandez-Kincade
I'm having trouble performing deletes on a Solr 1.4 index. Whether I perform the deletes by query or by id, the document in question doesn't seem to get removed from the index. Even after a commit. I thought the problem might be the fact that I wasn't committing with expungeDeletes=true, but I'

Concurrent Merge Scheduler & MaxThread Count

2009-12-03 Thread Giovanni Fernandez-Kincade
I'm having trouble getting Solr to use more than one thread during index optimizations. I have the following in my solrconfig.xml: 6 I had the same problem some time ago, but upgrading to Solr 1.4 solved the problem. Now it's happening again, with Solr 1.4. No matter what I

RE: *:* Returning no results

2009-11-30 Thread Giovanni Fernandez-Kincade
: Monday, November 30, 2009 4:02 PM To: solr-user@lucene.apache.org Cc: Giovanni Fernandez-Kincade Subject: Re: *:* Returning no results Add debugQuery=on to give you clues. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Nov 30, 2009, at 3:54 PM, Giovanni

*:* Returning no results

2009-11-30 Thread Giovanni Fernandez-Kincade
Hi, I created a brand new core (on Solr 1.4), added a few documents and then searched for *:*, but got no results. Strangely enough, if I search for a specific document I know is in the index, like say "versionId:3", I get the expected result. Any ideas on why that might be? Thank, Gio.

RE: Index Splitter

2009-11-25 Thread Giovanni Fernandez-Kincade
You can't really use this if you have an optimized index, right? -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, November 24, 2009 6:57 PM To: solr-user@lucene.apache.org Subject: Re: Index Splitter Giovanni Fernandez-Kincade wrote: > Hi, >

Index Splitter

2009-11-24 Thread Giovanni Fernandez-Kincade
Hi, I've heard about a tool that can be used to split Lucene indexes, for cases where you want to break up a large index into shards. Do you know where I can find it? Any observations/recommendations about its use? This seems promising but I'm not sure if there is anything more mature out there

RE: Too Many Boolean Clauses

2009-10-22 Thread Giovanni Fernandez-Kincade
ginal Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Thursday, October 22, 2009 6:31 PM To: solr-user@lucene.apache.org Subject: Re: Too Many Boolean Clauses Giovanni Fernandez-Kincade wrote: > Hi, > I'm trying to perform a search against an integer field

Too Many Boolean Clauses

2009-10-22 Thread Giovanni Fernandez-Kincade
Hi, I'm trying to perform a search against an integer field with a ton of OR statements for each of the unique values that I want to search for. I pasted an example at the bottom of this email. Solr fires back the following error: org.apache.lucene.queryParser.ParseException: Cannot parse .. ': t

RE: Lucene Merge Threads

2009-10-14 Thread Giovanni Fernandez-Kincade
In case anyone is having the same problem, I finally got this working, using the nightly build link that Yonik sent around: http://people.apache.org/builds/lucene/solr/nightly/ Thanks, Gio. -Original Message- From: Giovanni Fernandez-Kincade Sent: Wednesday, October 14, 2009 2:10 PM To

RE: Lucene Merge Threads

2009-10-14 Thread Giovanni Fernandez-Kincade
-Kincade Sent: Tuesday, October 13, 2009 7:59 PM To: Giovanni Fernandez-Kincade; 'solr-user@lucene.apache.org'; 'noble.p...@gmail.com' Subject: RE: Lucene Merge Threads I'm still getting the error after getting the latest from trunk and building it. This is what I ad

RE: Lucene Merge Threads

2009-10-13 Thread Giovanni Fernandez-Kincade
ang.Class.$$YJP$$forName0(Native Method) at java.lang.Class.forName0(Unknown Source) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294) ... 28 more -Original Message- From: Giovanni Fernande

RE: Lucene Merge Threads

2009-10-13 Thread Giovanni Fernandez-Kincade
Will do. Thanks! -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Tuesday, October 13, 2009 11:48 AM To: solr-user@lucene.apache.org Subject: Re: Lucene Merge Threads On Tue, Oct 13, 2009 at 8:19 PM, Giovanni Fernandez-Kincade < gfernandez-k

RE: Lucene Merge Threads

2009-10-13 Thread Giovanni Fernandez-Kincade
added recently On Tue, Oct 13, 2009 at 8:08 AM, Giovanni Fernandez-Kincade wrote: > This didn't end up working. I got the following error when I tried to commit: > > Oct 12, 2009 8:36:42 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrExceptio

RE: Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade
tart. We need to update the wiki? On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade wrote: > Hi, > I'm attempting to optimize a pretty large index, and even though the optimize > request timed out, I watched it using a profiler and saw that the optimize > thread contin

RE: Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade
: 1 Yes you can stop the process mid-merge. The partially merged files will be deleted on restart. We need to update the wiki? On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade wrote: > Hi, > I'm attempting to optimize a pretty large index, and even though the optimize

Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade
Hi, I'm attempting to optimize a pretty large index, and even though the optimize request timed out, I watched it using a profiler and saw that the optimize thread continued executing. Eventually it completed, but in the background I still see a thread performing a merge: Lucene Merge Thread #0

RE: IndexWriter InfoStream in solrconfig not working

2009-10-07 Thread Giovanni Fernandez-Kincade
I had the same problem. I'd be very interested to know how to get this working... -Gio. -Original Message- From: Burton-West, Tom [mailto:tburt...@umich.edu] Sent: Wednesday, October 07, 2009 12:13 PM To: solr-user@lucene.apache.org Subject: IndexWriter InfoStream in solrconfig not work

RE: Solr Timeouts

2009-10-06 Thread Giovanni Fernandez-Kincade
n NFS mount. Might want to search on that. > > That, of course, doesn't have anything to do with commits showing up > unexpectedly in stack traces, per your original email. > > -Todd > > -----Original Message- > From: Giovanni Fernandez-Kincade [mailto:gfernandez-k

RE: Solr Timeouts

2009-10-06 Thread Giovanni Fernandez-Kincade
specific thread was blocked for an hour? If so, I'd echo Lance... this is a local disk right? -Yonik http://www.lucidimagination.com On Mon, Oct 5, 2009 at 2:11 PM, Giovanni Fernandez-Kincade wrote: > I just grabbed another stack trace for a thread that has been similarly > blocking

RE: Solr Timeouts

2009-10-06 Thread Giovanni Fernandez-Kincade
sharing and this often does not work well. Lance On 10/6/09, Giovanni Fernandez-Kincade wrote: > Is it possible that deletions are triggering these commits? Some of the > documents that I'm making indexing requests for already exist in the index, > so they would result in dele

RE: Solr Timeouts

2009-10-06 Thread Giovanni Fernandez-Kincade
false 100 This is happening like every 30-40minutes and it's really hampering the indexing progress... -Original Message- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 2:11 PM To: solr-user@lu

RE: Solr Timeouts

2009-10-05 Thread Giovanni Fernandez-Kincade
nation.com On Mon, Oct 5, 2009 at 1:04 PM, Giovanni Fernandez-Kincade wrote: > I'm fairly certain that all of the indexing jobs are calling SOLR with > commit=false. They all construct the indexing URLs using a CLR function I > wrote, which takes in a Commit parameter, whic

RE: Solr Timeouts

2009-10-05 Thread Giovanni Fernandez-Kincade
that causes it to commit. I'll try and verify today unless someone else beats me to it. -Yonik http://www.lucidimagination.com On Mon, Oct 5, 2009 at 1:04 PM, Giovanni Fernandez-Kincade wrote: > I'm fairly certain that all of the indexing jobs are calling SOLR with > commit=false. T

RE: Solr Timeouts

2009-10-05 Thread Giovanni Fernandez-Kincade
could cause a commit - like commitWithin or something. -Yonik http://www.lucidimagination.com On Mon, Oct 5, 2009 at 12:44 PM, Giovanni Fernandez-Kincade wrote: > Is there somewhere other than solrConfig.xml that the autoCommit feature is > enabled? I've looked through that file and found autocommit to

RE: Solr Timeouts

2009-10-05 Thread Giovanni Fernandez-Kincade
ocessor, SolrParams, boolean) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse) I think Yonik gave you additional information for how to make it faster. -Todd -Original Message----- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, Octob

RE: Solr Timeouts

2009-10-05 Thread Giovanni Fernandez-Kincade
he rate of commits isn't slowed. -Todd From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:04 AM To: solr-user@lucene.apache.org Subject: Solr Timeouts Hi, I'm attempting to index approximately 6 mil