Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Grant Ingersoll

What version of Java do you have on Linux?

Also, is this easily reproducible?  How many threads are you adding  
documents with?  What is your Auto Commit setting?


Can you try Lucene's CheckIndex tool on it and report what it says?

On Aug 15, 2008, at 1:35 PM, Chris Harris wrote:


I have an index (different from the ones mentioned yesterday) that was
working fine with 3M docs or so, but when I added a bunch more docs,
bringing it closer to 4M docs, the index seemed to get corrupted. In
particular, now when I start Solr up, or when when my indexing process
tries add a document, I get a complaint about missing index files.

The error on startup looks like this:


 2008-08-15T10:18:54
 1218820734592
 92
 org.apache.solr.core.MultiCore
 SEVERE
 org.apache.solr.common.SolrException
 log
 10
 java.lang.RuntimeException: java.io.FileNotFoundException:
/ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
at org.apache.solr.core.SolrCore.(SolrCore.java:387)
at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
	at  
org 
.apache 
.solr 
.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
	at  
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
75)
	at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java: 
99)
	at  
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java: 
40)
	at  
org 
.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java: 
594)

at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
	at  
org 
.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java: 
1218)
	at  
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java: 
500)
	at  
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
	at  
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java: 
40)
	at  
org 
.mortbay 
.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
	at  
org 
.mortbay 
.jetty 
.handler 
.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
	at  
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java: 
40)
	at  
org 
.mortbay 
.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
	at  
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java: 
40)
	at  
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java: 
117)

at org.mortbay.jetty.Server.doStart(Server.java:210)
	at  
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java: 
40)

at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at  
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at  
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:616)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.io.FileNotFoundException:
/ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
	at org.apache.lucene.store.FSDirectory$FSIndexInput 
$Descriptor.(FSDirectory.java:506)
	at org.apache.lucene.store.FSDirectory$FSIndexInput. 
(FSDirectory.java:536)
	at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java: 
445)
	at org.apache.lucene.index.FieldsReader. 
(FieldsReader.java:75)
	at  
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java: 
308)

at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
	at org.apache.lucene.index.MultiSegmentReader. 
(MultiSegmentReader.java:55)
	at org.apache.lucene.index.DirectoryIndexReader 
$1.doBody(DirectoryIndexReader.java:75)
	at org.apache.lucene.index.SegmentInfos 
$FindSegmentsFile.run(SegmentInfos.java:636)
	at  
org 
.apache 
.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)

at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
	at org.apache.solr.search.SolrIndexSearcher. 
(SolrIndexSearcher.java:93)

at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724)
... 29 more



And the error on doc add looks like this:


 2008-08-15T09:51:30
 1218819090142
 6571937
 org.apache.solr.core.SolrCore
 SEVERE
 org.apache.solr.common.SolrException
 log
 14
 java.io.FileNotFoundException:
/ssd/solr-/solr/

Re: Solr Cache

2008-08-16 Thread Yonik Seeley
On Sat, Aug 16, 2008 at 12:04 AM, Tim Christensen <[EMAIL PROTECTED]> wrote:
> We have two servers, with the same index load balanced. The indexes are
> updated at the same time every day. Occasionally, a search on one server
> will return different results from the other server, even though the data
> used to create the index is exactly the same.
>
> Is this possibly due to caching?

No, it should not be possible... caches never keep stale entries, so
the result one gets from a cache will always be the result as if there
were no cache.

> Does the cache reset automatically after
> the commit?

Yes.  Caches are per-reader/searcher instance.

> The problem usually resolves itself - by all appearances, randomly, but I
> assume something I don't know is going on such as a new searcher starting up
> for example at some point in the day. All cache settings are the solrconfig
> defaults.

That could be the case... a client calls commit on one server and not the other?
Also make sure any time-based autocommit is disabled.
You should be able to easily search the logs for differences in when
new searchers are opened/registered.

-Yonik


Localisation, faceting

2008-08-16 Thread Pierre Auslaender

Hello,

I have a couple of questions:

1/ Is it possible to localise query operator names without writing code? 
For instance, I'd like to issue queries with French operator names, e.g. 
ET (instead of AND), OU (instead of OR), etc.


2/ Is it possible for Solr to generate, in the XML response, the URLs or 
complete queries for each facet in a faceted search?


Here's an example. Say my first query is :
http://localhost:8080/solr/select?q=bac&facet=true&facet.field=kind&facet.limit=-1

The "kind" field has three values: material, immaterial, time. I get 
back something like this:


   
   
   
   
   1024
   27633
   389
   
   
   

If I want to drill down into one facet, say into "material", I have to 
"manually" rebuild a query like this:

http://localhost:8080/solr/select?q=bac&facet=true&facet.field=kind&facet.limit=-1&fq=kind:"material";

It's not too difficult, but surely Solr could add this URL or query 
string under the "material" element. Is this possible? Or do I have to 
XSLT the result myself?


Thanks,

Pierre Auslaender


Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Chris Harris
On Sat, Aug 16, 2008 at 4:33 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> What version of Java do you have on Linux?

The Java version on *Linux* (where I'm seeing the trouble):

java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm pretty sure this is the latest one from the Ubuntu repository.

Maybe I should try the official Sun HotSpot build instead. I'm not
finding any complaints about OpenJDK on the Lucene list, though.

The Java version on *Windows* (where I created the initial compound
format index) is an official Sun build:

java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode, sharing)

> Also, is this easily reproducible?  How many threads are you adding
> documents with?  What is your Auto Commit setting?

I think it takes 12-24hr to get the index to screw up, so while I did
reproduce it once, I haven't yet tried again. Intuition says that if I
repeat the same procedure the same problem would arise. Of course,
what would be nice is if I could figure out how to reproduce it more
quickly, with a smaller index, and a simpler schema.

I'm adding documents with 5-10 threads. Since I'm using the rich
document update handler
(https://issues.apache.org/jira/browse/SOLR-284), there's going to be
PDF and HTML conversion going on within Solr alongside the normal
analysis and indexing.

Autocommit is:


  10
  180  


> Can you try Lucene's CheckIndex tool on it and report what it says?

Working on that now. It should take some time, though, due to the index size.

>
> On Aug 15, 2008, at 1:35 PM, Chris Harris wrote:
>
>> I have an index (different from the ones mentioned yesterday) that was
>> working fine with 3M docs or so, but when I added a bunch more docs,
>> bringing it closer to 4M docs, the index seemed to get corrupted. In
>> particular, now when I start Solr up, or when when my indexing process
>> tries add a document, I get a complaint about missing index files.
>>
>> The error on startup looks like this:
>>
>> 
>>  2008-08-15T10:18:54
>>  1218820734592
>>  92
>>  org.apache.solr.core.MultiCore
>>  SEVERE
>>  org.apache.solr.common.SolrException
>>  log
>>  10
>>  java.lang.RuntimeException: java.io.FileNotFoundException:
>> /ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
>> directory)
>>at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
>>at org.apache.solr.core.SolrCore.(SolrCore.java:387)
>>at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
>>at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
>>at
>> org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>>at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>at
>> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
>>at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
>>at
>> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
>>at
>> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
>>at
>> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
>>at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>at
>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>>at
>> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
>>at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>at
>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>>at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>at
>> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
>>at org.mortbay.jetty.Server.doStart(Server.java:210)
>>at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>at java.lang.reflect.Method.invoke(Method.java:616)
>>at org.mortbay.start.Main.invokeMain(Main.java:183)
>>at org.mortbay.start.Main.start(Main.java:497)
>>at org.mortbay.start.Main.main(Main.java:115)
>> Caused by: java.io.FileNotFoundException:
>> /ssd

Using Shingles to Increase Phrase Search Performance

2008-08-16 Thread Chris Harris
Mike Klaas suggested last month that I might be able to improve phrase
search performance by indexing word bigrams, aka bigram shingles. I've
been playing with this, and the initial results are very promising. (I
may post some performance data later.) I wanted to describe my
technique, which I'm not sure is what Mike had in mind, and see if
anyone has any feedback on it. Let me know if it would be better to
address this to the Lucene list.

[Note: These experiments are completely separate from the index
corruption case I described very recently.]

Here is an excerpt from my schema.xml:


  



  
  



  


For indexing, I've used the stock ShingleFilterFactory with the
outputUnigrams option, which tokenizes as follows:

[Exhibit A]
"please divide this sentence into shingles" ->
  "please", "please divide"
  "divide", "divide this"
  "this", "this sentence"
  "sentence", "sentence into"
  "into", "into shingles"
  "shingles"

(Tokens on the same line have no position offset between them.)

Now for querying:

I first tried using the exact same Exhibit A analyzer for queries, but
this definitely did not help phrase search performance. (It makes
sense why if you delve into the Lucene source, though I don't know how
to give a super-brief explanation.) So then I tried
outputUnigrams=false with the stock ShingleFilterFactory, thereby
tokenizing my queries as follows:

[Exhibit B]
"please divide this sentence into shingles" ->
  "please divide"
  "divide this"
  "this sentence"
  "sentence into"
  "into shingles"

And when I did this, things got really zippy. The only problem was
that it broke queries that were *not* phrase searches. That's because
in this setup a single-word query (e.g. "please") will get tokenized
into zero tokens, since a single word isn't long enough to be a
bigram.

So finally I modified the Lucene ShingleFilter class to add an
"outputUnigramIfNoNgram option". Basically, if you set that option,
and also set outputUnigrams=false, then the filter will tokenize just
as in Exhibit B, except that if the query is only one word long, it
will return a corresponding single token, rather than zero tokens. In
other words,

[Exhibit C]
"please" ->
  "please"

Things were still zippy. And, so far, I think I have seriously
improved my phrase search performance without ruining anything.

Are there any obvious drawbacks to this approach? I admit I haven't
thought through exactly how this would affect relevancy scoring. I'm
also not sure if the new Lucene ShingleMatrixFilter can be made to do
this more trivially than the standard ShingleFilter. (I don't really
understand the former yet.)

Cheers,
Chris


Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Chris Harris
On Sat, Aug 16, 2008 at 4:33 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Can you try Lucene's CheckIndex tool on it and report what it says?
>
> On Aug 15, 2008, at 1:35 PM, Chris Harris wrote:
>
>> I have an index (different from the ones mentioned yesterday) that was
>> working fine with 3M docs or so, but when I added a bunch more docs,
>> bringing it closer to 4M docs, the index seemed to get corrupted. In
>> particular, now when I start Solr up, or when when my indexing process
>> tries add a document, I get a complaint about missing index files.
>>
>> The error on startup looks like this:
>>
>> [...]

So I've run the Lucene CheckIndex tool twice.

First I ran it on my Windows machine, on the original compound format
index, which I then moved over to Linux and added to. It checked out
ok:


  NOTE: testing will be more thorough if you run java with
'-ea:org.apache.lucene', so assertions are enabled

  Opening index @ E:\solr-\solr\exhibitcore\data\index

  Segments file=segments_2 numSegments=1
version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
1 of 1: name=_0 docCount=2829254
  compound=true
  numFiles=1
  size (MB)=30,423.298
  no deletions
  test: open reader.OK
  test: fields, norms...OK [21 fields]
  test: terms, freq, prox...OK [20208545 terms; 1092415125
terms/docs pairs; 6041972577 tokens]
  test: stored fields...OK [82628579 total field count; avg
29.205 fields per doc]
  test: term vectorsOK [0 total vector count; avg 0
term/freq vector fields per doc]

  No problems were detected with this index.


Then I ran it on Linux, against the index in its problematic state.
(Note: This is the end of my message. Everything that follows is
output from the tool):


  Opening index @
/home/guy/corrupt-solr--from-ssd/solr/exhibitcore/data/index/

  Segments file=segments_10tu numSegments=48
version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
1 of 48: name=_0 docCount=2829254
  compound=true
  numFiles=1
  size (MB)=30,423.298
  no deletions
  test: open reader.OK
  test: fields, norms...OK [21 fields]
  test: terms, freq, prox...OK [20208545 terms; 1092415125
terms/docs pairs; 6041972577 tokens]
  test: stored fields...OK [82628579 total field count; avg
29.205 fields per doc]
  test: term vectorsOK [0 total vector count; avg 0
term/freq vector fields per doc]

2 of 48: name=_uz docCount=1567952
  compound=false
  numFiles=8
  size (MB)=17,238.136
  no deletions
  test: open reader.OK
  test: fields, norms...OK [21 fields]
  test: terms, freq, prox...OK [11529959 terms; 613302128
terms/docs pairs; 3503689207 tokens]
  test: stored fields...OK [45879263 total field count; avg
29.261 fields per doc]
  test: term vectorsOK [0 total vector count; avg 0
term/freq vector fields per doc]

3 of 48: name=_v7 docCount=24507
  compound=false
  numFiles=0
  size (MB)=0
  no deletions
  test: open reader.FAILED
  WARNING: would remove reference to this segment (-fix was not
specified); full exception:
  java.io.FileNotFoundException:
/home/guy/corrupt-solr--from-ssd/solr/exhibitcore/data/index/_v7.fnm
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:539)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:569)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:478)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:473)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:57)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:300)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:264)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:199)
at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:178)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:433)

4 of 48: name=_v6 docCount=1
  compound=false
  numFiles=0
  size (MB)=0
  docStoreOffset=6095
  docStoreSegment=_v4
  docStoreIsCompoundFile=false
  no deletions
  test: open reader.FAILED
  WARNING: would remove reference to this segment (-fix was not
specified); full exception:
  java.io.FileNotFoundException:
/home/guy/corrupt-solr--from-ssd/solr/exhibitcore/data/index/_v6.fnm
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:539)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:569)
at org.apache

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Otis Gospodnetic
I'd ignore Otis' message from 2005.  I haven't followed the thread carefully, 
but it looks like a bug deep in the guts of Lucene.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Chris Harris <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, August 15, 2008 4:30:07 PM
> Subject: Re: "Auto commit error" and java.io.FileNotFoundException
> 
> I've done some more sniffing on the Lucene list, and noticed that Otis
> made the following comment about a FileNotFoundException problem in
> late 2005:
> 
> Are you using Windows and a compound index format (look at your index
> dir - does it have .cfs file(s))?
> 
> This may be a bad combination, judging from people who reported this
> problem so far.
> 
> (http://www.nabble.com/fnm-file-disappear-td1531775.html#a1531775)
> 
> Again, a CFS index was indeed involved in my case, but my experience
> comes almost three years after Otis' message...
> 
> On Fri, Aug 15, 2008 at 10:35 AM, Chris Harris wrote:
> >
> > The following may or may not be relevant: I built the base 3M-ish doc
> > index on a Windows machine, and it's a compound (.cfs) format index.
> > (I actually created it not with Solr, but by using the index merging
> > tool that comes with Lucene in order to merge three different
> > non-compound format indexes that I'd previously made with Solr into a
> > single index.) Before I started adding documents, I moved the index to
> > a Linux machine running a newer version of Solr/Lucene than was on the
> > Windows machine. The stuff described above all happened on Linux.
> >
> > Any thoughts?
> >
> > Thanks a bunch,
> > Chris
> >



Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Otis Gospodnetic
How are you adding documents?  One at a time?  Multiple at a time?  From a 
single thread or multiple threads?
Have you tried building the latest and greatest Lucene from trunk and using 
that with Solr on the Linux box?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Chris Harris <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, August 15, 2008 1:35:20 PM
> Subject: "Auto commit error" and java.io.FileNotFoundException
> 
> I have an index (different from the ones mentioned yesterday) that was
> working fine with 3M docs or so, but when I added a bunch more docs,
> bringing it closer to 4M docs, the index seemed to get corrupted. In
> particular, now when I start Solr up, or when when my indexing process
> tries add a document, I get a complaint about missing index files.
> 
> The error on startup looks like this:
> 
> 
>   2008-08-15T10:18:54
>   1218820734592
>   92
>   org.apache.solr.core.MultiCore
>   SEVERE
>   org.apache.solr.common.SolrException
>   log
>   10
>   java.lang.RuntimeException: java.io.FileNotFoundException:
> /ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
> directory)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
> at org.apache.solr.core.SolrCore.(SolrCore.java:387)
> at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
> at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
> at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
> at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at 
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
> at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
> at 
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
> at 
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
> at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
> at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
> at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
> at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
> at org.mortbay.jetty.Server.doStart(Server.java:210)
> at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.mortbay.start.Main.invokeMain(Main.java:183)
> at org.mortbay.start.Main.start(Main.java:497)
> at org.mortbay.start.Main.main(Main.java:115)
> Caused by: java.io.FileNotFoundException:
> /ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
> directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536)
> at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
> at org.apache.lucene.index.FieldsReader.(FieldsReader.java:75)
> at 
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
> at 
> org.apache.lucene.index.MultiSegmentReader.(MultiSegmentReader.java:55)
> at 
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
> at 
> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
> at 
> org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93)
> at org.apache.solr.core.SolrCore.getS

Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-16 Thread Walter Underwood
I hate to blame the JDK, but we tried 1.6 for our production
webapp and it was crashing too often. Unless you need 1.6,
you might try 1.5. --wunder

On 8/16/08 1:54 PM, "Chris Harris" <[EMAIL PROTECTED]> wrote:

> On Sat, Aug 16, 2008 at 4:33 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>> What version of Java do you have on Linux?
> 
> The Java version on *Linux* (where I'm seeing the trouble):
> 
> java version "1.6.0"
> OpenJDK Runtime Environment (build 1.6.0-b09)
> OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)
> 
> I'm pretty sure this is the latest one from the Ubuntu repository.
> 
> Maybe I should try the official Sun HotSpot build instead. I'm not
> finding any complaints about OpenJDK on the Lucene list, though.
> 
> The Java version on *Windows* (where I created the initial compound
> format index) is an official Sun build:
> 
> java version "1.6.0_06"
> Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
> Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode, sharing)
> 
>> Also, is this easily reproducible?  How many threads are you adding
>> documents with?  What is your Auto Commit setting?
> 
> I think it takes 12-24hr to get the index to screw up, so while I did
> reproduce it once, I haven't yet tried again. Intuition says that if I
> repeat the same procedure the same problem would arise. Of course,
> what would be nice is if I could figure out how to reproduce it more
> quickly, with a smaller index, and a simpler schema.
> 
> I'm adding documents with 5-10 threads. Since I'm using the rich
> document update handler
> (https://issues.apache.org/jira/browse/SOLR-284), there's going to be
> PDF and HTML conversion going on within Solr alongside the normal
> analysis and indexing.
> 
> Autocommit is:
> 
> 
>   10
>   180  
> 
> 
>> Can you try Lucene's CheckIndex tool on it and report what it says?
> 
> Working on that now. It should take some time, though, due to the index size.
> 
>> 
>> On Aug 15, 2008, at 1:35 PM, Chris Harris wrote:
>> 
>>> I have an index (different from the ones mentioned yesterday) that was
>>> working fine with 3M docs or so, but when I added a bunch more docs,
>>> bringing it closer to 4M docs, the index seemed to get corrupted. In
>>> particular, now when I start Solr up, or when when my indexing process
>>> tries add a document, I get a complaint about missing index files.
>>> 
>>> The error on startup looks like this:
>>> 
>>> 
>>>  2008-08-15T10:18:54
>>>  1218820734592
>>>  92
>>>  org.apache.solr.core.MultiCore
>>>  SEVERE
>>>  org.apache.solr.common.SolrException
>>>  log
>>>  10
>>>  java.lang.RuntimeException: java.io.FileNotFoundException:
>>> /ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
>>> directory)
>>>at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
>>>at org.apache.solr.core.SolrCore.(SolrCore.java:387)
>>>at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
>>>at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
>>>at
>>> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.
>>> java:147)
>>>at
>>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
>>>at
>>> org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>>>at
>>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>>at
>>> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
>>>at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
>>>at
>>> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
>>>at
>>> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
>>>at
>>> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
>>>at
>>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>>at
>>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:1
>>> 47)
>>>at
>>> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCol
>>> lection.java:161)
>>>at
>>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>>at
>>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:1
>>> 47)
>>>at
>>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>>at
>>> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
>>>at org.mortbay.jetty.Server.doStart(Server.java:210)
>>>at
>>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>>at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
>>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>at
>>> 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57>

Help with word frequency / tag clouds

2008-08-16 Thread Gene Campbell
Hello Solrites,

I'm somewhat new to Solr and Lucene.  I would like to build a tag
cloud based on a filtered set of words from documents.  I have a
master list of approved tags.  So, what I need from each document is
the list of words and frequencies such that that words appear in the
master list (filtered.)  Then, I should be able to build a tag cloud
UI (in html/css)

Is this something I have to build?  If so, I'm guessing I would need
to do it during indexing, but how?  Perhaps I need an Analyzer or
Tokenizer that can give me counts of words, and the let me filter and
store in a DB, or back in the index.

Can anyone shed some advice?

thanks
gene