Re: core.SolrCore - java.io.FileNotFoundException

2013-01-14 Thread Jun Wang
2):C8294, _1oyx(4.0.0.2):C2031, _1oyw(4.0.0.2):C259,
_1oz3(4.0.0.2):C8375, _1oz1(4.0.0.2):C2836, _1oz5(4.0.0.2):C8231,
_1oyy(4.0.0.2):C29, _1oz4(4.0.0.2):C2988, _1oz8(4.0.0.2):C1,
_1ozb(4.0.0.2):C1] packetCount=4599
1491308 IW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-2]: hit
exception updating document
.....

It's seemed lucene used a segment that has been deleted.



2012/10/15 Jun Wang 

> Hi, Erick
> Thanks for your advice. My mergeFactor is set to 10, so it's impossible
> have so many segments, specially some .fdx, .fdt file is just empty. And
> sometime indexing is working fine, ended with 200+ files in data dir. My
> deployment is having two core and two shard for every core, using
> autocommit , DIH is used for pull data from DB,   merge policies is using 
> TieredMergePolicy.
> there is nothing customized.
>
> I am wondering how could empty .fdx file generated. may be some config
> in indexConfig is wrong. My final index is about 20G, having 40m+ docs.
> here is part of my solrconfig.xml
> -
> 32
> 100
>
> 10
>
> 
>   
>15000
>false
>  
> 
> -
>
> PS, I found an other kind of log, but I am not sure it's the reason or
> the consequence. I am planing to open debug log, to gather more information
> tomorrow.
>
>
> 2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit
> error...:java.io.FileNotFoundException: _cwj.fdt
> at
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266)
> at
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
> at
> org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103)
> at
> org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126)
> at
> org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495)
> at
> org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474)
> at
> org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201)
> at
> org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119)
> at
> org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148)
> at
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435)
> at
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551)
> at
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657)
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
> at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
>
>
>
>
>
>
>
> 2012/10/15 Erick Erickson 
>
>> I have no idea how you managed to get so many files in
>> your index directory, but that's definitely weird. How it
>> relates to your "file not found", I'm not quite sure, but it
>> could be something as simple as you've run out of file
>> handles.
>>
>> So you could try upping the number of
>> file handles as a _temporary_ fix just to see if that's
>> the problem. See your op-system's manuals for
>> how.
>>
>> If it does work, then I'd run an optimize
>> down to one segment and remove all the segment
>> files _other_ than that one segment. NOTE: this
>> means things like .fdt, .fdx, .tii files etc. NOT things
>> like segments.gen and segments_1. Make a
>> backup of course before you try this.
>>
>> But I think that's secondary. To generate this many
>> fiels 

Re: Shard update error when using DIH

2013-01-22 Thread Jun Wang
You shoud look at log of solr-shard-4, It's seem that some error occured in
this shard.
-- 
from Jun Wang


deletedPkQuery not work in solr 3.3

2012-09-05 Thread jun Wang
I have a data-config.xml with 2 entity, like


...


and


...


entity delta_build is for delta import, query is

?command=full-import&entity=delta_build&clean=false

and I want to using deletedPkQuery to delete index. So I have add those to
entity "delta_build"

deltaQuery="select -1 as ID from dual"

deltaImportQuery="select * from product where a.id='${dataimporter.delta.ID}' "

deletedPKQuery="select product_id as ID from modified_product where
gmt_create > to_date('${dataimporter.last_index_time}','-mm-dd
hh24:mi:ss') and modification = 'deleted'"

deltaQuery and deltaImportQuery is simply to avoid delta import any
records, course delta import has been implement by full import. and I am
just want using delta for delete index.

But when I hit query

?command=delta-import

deltaQuery and deltaImportQuery can be found in log, and without
deletedPKQuery. Is there any thing wrong in config file?

-- 
from Jun Wang


Re: Solrcloud dataimport failed at first time after restart

2012-10-09 Thread jun Wang
I have found the reason.  The reason is that I am using jboss JNDI
datasource, and oracle driver is placed in WEB-INFO/lib, this is a very
common error, driver should be placed in %JBOSS_HOME%\server\default\lib.

2012/10/10 jun Wang 

> Hi, all
> I found that dataimport will failed at first time after restart. and the
> log is here . It's seem like a bug.
>
> 2012-10-09 20:00:08,848 ERROR dataimport.DataImporter - Full Import
> failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: select a.id, a.subject, a.keywords, a.category_id,
> to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60)
> as gmt_modified,a.member_seq,b.standard_attr_desc,
> b.custom_attr_desc, decode(a.product_min_price, null, 0,
> a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) +
> 1 as is_offlinefrom ws_product_draft a,
> ws_product_attribute_draft bwhere a.id =
> b.product_id(+) Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: select a.id, a.subject, a.keywords, a.category_id,
> to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60)
> as gmt_modified,a.member_seq,b.standard_attr_desc,
> b.custom_attr_desc, decode(a.product_min_price, null, 0,
> a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) +
> 1 as is_offlinefrom ws_product_draft a,
> ws_product_attribute_draft bwhere a.id =
> b.product_id(+) Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
> ... 3 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query: select a.id, a.subject, a.keywords,
> a.category_id, to_number((a.gmt_modified -
> to_date('1970-01-01','-mm-dd'))*24*60*60) as gmt_modified,a.member_seq,
>b.standard_attr_desc, b.custom_attr_desc,
> decode(a.product_min_price, null, 0, a.product_min_price)/100 as min_price,
> sign(a.ws_offline_date - sysdate) + 1 as is_offline
>  from ws_product_draft a, ws_product_attribute_draft b
>where a.id = b.product_id(+) Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:252)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
> ... 5 more
> Caused by: java.lang.ClassNotFoundException: Unable to load null or
> org.apache.solr.handler.dataimport.null
> at
> org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:159)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:239)
> 

Re: segment number during optimize of index

2012-10-10 Thread jun Wang
I have an other question, does the number of segment affect speed for
update index?

2012/10/10 jame vaalet 

> Guys,
> thanks for all the inputs, I was continuing my research to know more about
> segments in Lucene. Below are my conclusion, please correct me if am wrong.
>
>1. Segments are independent sub-indexes in seperate file, while indexing
>its better to create new segment as it doesnt have to modify an existing
>file. where as while searching, smaller the segment the better it is
> since
>you open x (not exactly x but xn a value proportional to x) physical
> files
>to search if you have got x segments in the index.
>2. since lucene has memory map concept, for each file/segment in index a
>new m-map file is created and mapped to the physcial file in disk. Can
>someone explain or correct this in detail, i am sure there are lot many
>people wondering how m-map works while you merge or optimze index
> segments.
>
>
>
> On 6 October 2012 07:41, Otis Gospodnetic  >wrote:
>
> > If I were you and not knowing all your details...
> >
> > I would optimize indices that are static (not being modified) and
> > would optimize down to 1 segment.
> > I would do it when search traffic is low.
> >
> > Otis
> > --
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet 
> wrote:
> > > Hi Eric,
> > > I  am in a major dilemma with my index now. I have got 8 cores each
> > around
> > > 300 GB in size and half of them are deleted documents in it and above
> > that
> > > each has got around 100 segments as well. Do i issue a expungeDelete
> and
> > > allow the merge policy to take care of the segments or optimize them
> into
> > > single segment. Search performance is not at par compared to usual solr
> > > speed.
> > > If i have to optimize what segment number should i choose? my RAM size
> > > around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
> > > advice !
> > >
> > > thanks.
> > >
> > >
> > > On 6 October 2012 00:00, Erick Erickson 
> wrote:
> > >
> > >> because eventually you'd run out of file handles. Imagine a
> > >> long-running server with 100,000 segments. Totally
> > >> unmanageable.
> > >>
> > >> I think shawn was emphasizing that RAM requirements don't
> > >> depend on the number of segments. There are other
> > >> resources that file consume however.
> > >>
> > >> Best
> > >> Erick
> > >>
> > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet 
> > wrote:
> > >> > hi Shawn,
> > >> > thanks for the detailed explanation.
> > >> > I have got one doubt, you said it doesn matter how many segments
> index
> > >> have
> > >> > but then why does solr has this merge policy which merges segments
> > >> > frequently?  why can it leave the segments as it is rather than
> > merging
> > >> > smaller one's into bigger one?
> > >> >
> > >> > thanks
> > >> > .
> > >> >
> > >> > On 5 October 2012 05:46, Shawn Heisey  wrote:
> > >> >
> > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote:
> > >> >>
> > >> >>> so imagine i have merged the 150 Gb index into single segment,
> this
> > >> would
> > >> >>> make a single segment of 150 GB in memory. When new docs are
> > indexed it
> > >> >>> wouldn't alter this 150 Gb index unless i update or delete the
> older
> > >> docs,
> > >> >>> right? will 150 Gb single segment have problem with memory
> swapping
> > at
> > >> OS
> > >> >>> level?
> > >> >>>
> > >> >>
> > >> >> Supplement to my previous reply:  the real memory mentioned in the
> > last
> > >> >> paragraph does not include the memory that the OS uses to cache
> disk
> > >> >> access.  If more memory is needed and all the free memory is being
> > used
> > >> by
> > >> >> the disk cache, the OS will throw away part of the disk cache (a
> > >> >> near-instantaneous operation that should never involve disk I/O)
> and
> > >> give
> > >> >> that memory to the application that requests it.
> > >> >>
> > >> >> Here's a very good breakdown of how memory gets used with
> > MMapDirectory
> > >> in
> > >> >> Solr.  It's applicable to any program that uses memory mapping, not
> > just
> > >> >> Solr:
> > >> >>
> > >> >>
> > http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory<
> > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory>
> > >> >>
> > >> >> Thanks,
> > >> >> Shawn
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > -JAME
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > -JAME
> >
>
>
>
> --
>
> -JAME
>



-- 
from Jun Wang


It's there any way to specify config name for core in solr.xml?

2012-10-11 Thread jun Wang
Hi, all

I have two collections, and two machines. So, my deployment is like
|machine a  |machine b  |
|core a1 | core a2 | core b1 | core b2|

core a1 is for collection 1 shard1, core a2 is for collection 1 shard2.
config for collection is config 1.
core b1 is for collection 2 shard1, core b2 is for collection 2
shard2.  config for collection if config 2.

It's there any way to specify core config in solr.xml to start up two shard
in every machine whit correct config name?

-- 
from Jun Wang


Re: core.SolrCore - java.io.FileNotFoundException

2012-10-14 Thread Jun Wang
PS, I have found that there lots of segment in index directory, and most of
them is empty, like . totoal file number is 35314 in  index directory.
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3n.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdx
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdt
-rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdx




2012/10/15 Jun Wang 

> I have encounter the a FileNotFoundException exception occasionally when
> indexing, it's not occur every time. Anyone have some clue? Here is
> the traceback:
>
> 2012-10-14 11:37:28,105 ERROR core.SolrCore -
> java.io.FileNotFoundException:
> /home/admin/run/deploy/solr/core_p_shard2/data/index/_cwo.fnm (No such file
> or directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:216)
> at
> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:218)
> at
> org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
> at
> org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47)
> at
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:101)
> at
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:55)
> at
> org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120)
> at
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:267)
> at
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
> at
> org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180)
> at
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1430)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:432)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:315)
> at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:230)
> at
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
&g

Re: core.SolrCore - java.io.FileNotFoundException

2012-10-15 Thread Jun Wang
Hi, Erick
Thanks for your advice. My mergeFactor is set to 10, so it's impossible
have so many segments, specially some .fdx, .fdt file is just empty. And
sometime indexing is working fine, ended with 200+ files in data dir. My
deployment is having two core and two shard for every core, using
autocommit , DIH is used for pull data from DB,   merge policies is
using TieredMergePolicy.
there is nothing customized.

I am wondering how could empty .fdx file generated. may be some config
in indexConfig is wrong. My final index is about 20G, having 40m+ docs.
here is part of my solrconfig.xml
-
32
100

10


  
   15000
   false
 

-

PS, I found an other kind of log, but I am not sure it's the reason or
the consequence. I am planing to open debug log, to gather more information
tomorrow.


2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit
error...:java.io.FileNotFoundException: _cwj.fdt
at
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
at
org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103)
at
org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126)
at
org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495)
at
org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474)
at
org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201)
at
org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119)
at
org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148)
at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435)
at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551)
at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)








2012/10/15 Erick Erickson 

> I have no idea how you managed to get so many files in
> your index directory, but that's definitely weird. How it
> relates to your "file not found", I'm not quite sure, but it
> could be something as simple as you've run out of file
> handles.
>
> So you could try upping the number of
> file handles as a _temporary_ fix just to see if that's
> the problem. See your op-system's manuals for
> how.
>
> If it does work, then I'd run an optimize
> down to one segment and remove all the segment
> files _other_ than that one segment. NOTE: this
> means things like .fdt, .fdx, .tii files etc. NOT things
> like segments.gen and segments_1. Make a
> backup of course before you try this.
>
> But I think that's secondary. To generate this many
> fiels I suspect you've started a lot of indexing
> jobs that you then abort (hard kill?). To get this
> many files I'd guess it's something programmatic,
> but that's a guess.
>
> How are you committing? Autocommit? From a SolrJ
> (or equivalent) program? Have you implemented any
> custom merge policies?
>
> But to your immediate problem. You can try running
> CheckIndex (here's a tutorial from 2.9, but I think
> it's still good):
> http://java.dzone.com/news/lucene-and-solrs-checkindex
>
> If that doesn't help (and you can run it in diagnostic mode,
> without the --fix flag to see what it _would_ do) then I'm
> afraid you'll probably have to re-index.
>
> And you've got to get to the root of why you have so
> many segment files. That number is just crazy
>
> Best
> Erick
>
> On Sun, Oct 14, 2

What does _version_ field used for?

2012-10-17 Thread Jun Wang
I ma moving to solr4.0 from beta version. There is a exception was thrown,

Caused by: org.apache.solr.common.SolrException: _version_field must exist
in schema, using indexed="true" stored="true" and multiValued="false"
(_version_ does not exist)
at
org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
at org.apache.solr.core.SolrCore.(SolrCore.java:606)
... 26 more
2

It's seem that there need a field like
 
in schema.xml. I am wonder what does this used for?
-- 
from Jun Wang


Re: What does _version_ field used for?

2012-10-17 Thread Jun Wang
Is that said we just need to add this filed, and there is no more work?

2012/10/17 Rafał Kuć 

> Hello!
>
> It is used internally by Solr, for example by features like partial
> update functionality and update log.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > I ma moving to solr4.0 from beta version. There is a exception was
> thrown,
>
> > Caused by: org.apache.solr.common.SolrException: _version_field must
> exist
> > in schema, using indexed="true" stored="true" and multiValued="false"
> > (_version_ does not exist)
> > at
> >
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:606)
> > ... 26 more
> > 2
>
> > It's seem that there need a field like
> >  
> > in schema.xml. I am wonder what does this used for?
>
>


-- 
from Jun Wang


Re: What does _version_ field used for?

2012-10-17 Thread Jun Wang
Ok, I got it, thanks

2012/10/17 Alexandre Rafalovitch 

> Yes, just make sure you have it in the scheme. Solr handles the rest.
>
> Regards,
>Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Oct 17, 2012 at 12:57 PM, Jun Wang  wrote:
> > Is that said we just need to add this filed, and there is no more work?
> >
> > 2012/10/17 Rafał Kuć 
> >
> >> Hello!
> >>
> >> It is used internally by Solr, for example by features like partial
> >> update functionality and update log.
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >>
> >> > I ma moving to solr4.0 from beta version. There is a exception was
> >> thrown,
> >>
> >> > Caused by: org.apache.solr.common.SolrException: _version_field must
> >> exist
> >> > in schema, using indexed="true" stored="true" and multiValued="false"
> >> > (_version_ does not exist)
> >> > at
> >> >
> >>
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> >> > at org.apache.solr.core.SolrCore.(SolrCore.java:606)
> >> > ... 26 more
> >> > 2
> >>
> >> > It's seem that there need a field like
> >> >  
> >> > in schema.xml. I am wonder what does this used for?
> >>
> >>
> >
> >
> > --
> > from Jun Wang
>



-- 
from Jun Wang


Re: Solr 4.0 segment flush times has bigger difference between tow machines

2012-10-19 Thread Jun Wang
I have found that segment flush is controlled by
DocumentWriterFlushControl, and indexing is implemented by
DocumentWriterPerThread. DocumentWriterFlushControl has information about
number of doc and size of RAM buffer, but this seemed be shared by
all DocumentWriterPerThread. Is that RAM limit is sum of all buffer
of DocumentWriterPerThread?

2012/10/19 Jun Wang 

> Hi
>
> I have 2 machine for a collection, and it's using DIH to import data, DIH
> is trigger via url request at one machine, let's call it A, and A will
> forward some index to machine B. Recently I have found that segment flush
> happened more in machine B. here is part of INFOSTREAM.txt.
>
> Machine A:
> 
> DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as
> segment _4r3 numDocs=71616
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0
> deleted docs
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no
> vectors; no norms; no docValues; prox; freqs
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]:
> flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm,
> _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq]
> DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40
> D
>
> Machine B
> --
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings
> as segment _zi0 numDocs=4302
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
> has 0 deleted docs
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
> has no vectors; no norms; no docValues; prox; freqs
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]:
> flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt,
> _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip]
> DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed
> codec=Lucene40
> D
>
> I have found that flush occured  when number of doc in RAM reached
> 7~9000 in machine A, but the number in machine B is very different,
> almost is 4000.  It seem that every doc in buffer used more RAM in machine
> B then machine A, that result in more flush . Does any one know why this
> happened?
>
> My conf is here.
>
> 6410
>
>
>
>
> --
> from Jun Wang
>
>
>


-- 
from Jun Wang