Re: core.SolrCore - java.io.FileNotFoundException
2):C8294, _1oyx(4.0.0.2):C2031, _1oyw(4.0.0.2):C259, _1oz3(4.0.0.2):C8375, _1oz1(4.0.0.2):C2836, _1oz5(4.0.0.2):C8231, _1oyy(4.0.0.2):C29, _1oz4(4.0.0.2):C2988, _1oz8(4.0.0.2):C1, _1ozb(4.0.0.2):C1] packetCount=4599 1491308 IW 0 [Mon Jan 14 09:21:37 PST 2013; http-0.0.0.0-8080-2]: hit exception updating document ..... It's seemed lucene used a segment that has been deleted. 2012/10/15 Jun Wang > Hi, Erick > Thanks for your advice. My mergeFactor is set to 10, so it's impossible > have so many segments, specially some .fdx, .fdt file is just empty. And > sometime indexing is working fine, ended with 200+ files in data dir. My > deployment is having two core and two shard for every core, using > autocommit , DIH is used for pull data from DB, merge policies is using > TieredMergePolicy. > there is nothing customized. > > I am wondering how could empty .fdx file generated. may be some config > in indexConfig is wrong. My final index is about 20G, having 40m+ docs. > here is part of my solrconfig.xml > - > 32 > 100 > > 10 > > > >15000 >false > > > - > > PS, I found an other kind of log, but I am not sure it's the reason or > the consequence. I am planing to open debug log, to gather more information > tomorrow. > > > 2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit > error...:java.io.FileNotFoundException: _cwj.fdt > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266) > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177) > at > org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103) > at > org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126) > at > org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495) > at > org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474) > at > org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201) > at > org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119) > at > org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551) > at > org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793) > at > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531) > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > > > > > > > > 2012/10/15 Erick Erickson > >> I have no idea how you managed to get so many files in >> your index directory, but that's definitely weird. How it >> relates to your "file not found", I'm not quite sure, but it >> could be something as simple as you've run out of file >> handles. >> >> So you could try upping the number of >> file handles as a _temporary_ fix just to see if that's >> the problem. See your op-system's manuals for >> how. >> >> If it does work, then I'd run an optimize >> down to one segment and remove all the segment >> files _other_ than that one segment. NOTE: this >> means things like .fdt, .fdx, .tii files etc. NOT things >> like segments.gen and segments_1. Make a >> backup of course before you try this. >> >> But I think that's secondary. To generate this many >> fiels
Re: Shard update error when using DIH
You shoud look at log of solr-shard-4, It's seem that some error occured in this shard. -- from Jun Wang
deletedPkQuery not work in solr 3.3
I have a data-config.xml with 2 entity, like ... and ... entity delta_build is for delta import, query is ?command=full-import&entity=delta_build&clean=false and I want to using deletedPkQuery to delete index. So I have add those to entity "delta_build" deltaQuery="select -1 as ID from dual" deltaImportQuery="select * from product where a.id='${dataimporter.delta.ID}' " deletedPKQuery="select product_id as ID from modified_product where gmt_create > to_date('${dataimporter.last_index_time}','-mm-dd hh24:mi:ss') and modification = 'deleted'" deltaQuery and deltaImportQuery is simply to avoid delta import any records, course delta import has been implement by full import. and I am just want using delta for delete index. But when I hit query ?command=delta-import deltaQuery and deltaImportQuery can be found in log, and without deletedPKQuery. Is there any thing wrong in config file? -- from Jun Wang
Re: Solrcloud dataimport failed at first time after restart
I have found the reason. The reason is that I am using jboss JNDI datasource, and oracle driver is placed in WEB-INFO/lib, this is a very common error, driver should be placed in %JBOSS_HOME%\server\default\lib. 2012/10/10 jun Wang > Hi, all > I found that dataimport will failed at first time after restart. and the > log is here . It's seem like a bug. > > 2012-10-09 20:00:08,848 ERROR dataimport.DataImporter - Full Import > failed:java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: select a.id, a.subject, a.keywords, a.category_id, > to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60) > as gmt_modified,a.member_seq,b.standard_attr_desc, > b.custom_attr_desc, decode(a.product_min_price, null, 0, > a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) + > 1 as is_offlinefrom ws_product_draft a, > ws_product_attribute_draft bwhere a.id = > b.product_id(+) Processing Document # 1 > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429) > Caused by: java.lang.RuntimeException: > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: select a.id, a.subject, a.keywords, a.category_id, > to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60) > as gmt_modified,a.member_seq,b.standard_attr_desc, > b.custom_attr_desc, decode(a.product_min_price, null, 0, > a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) + > 1 as is_offlinefrom ws_product_draft a, > ws_product_attribute_draft bwhere a.id = > b.product_id(+) Processing Document # 1 > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234) > ... 3 more > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to execute query: select a.id, a.subject, a.keywords, > a.category_id, to_number((a.gmt_modified - > to_date('1970-01-01','-mm-dd'))*24*60*60) as gmt_modified,a.member_seq, >b.standard_attr_desc, b.custom_attr_desc, > decode(a.product_min_price, null, 0, a.product_min_price)/100 as min_price, > sign(a.ws_offline_date - sysdate) + 1 as is_offline > from ws_product_draft a, ws_product_attribute_draft b >where a.id = b.product_id(+) Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:252) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411) > ... 5 more > Caused by: java.lang.ClassNotFoundException: Unable to load null or > org.apache.solr.handler.dataimport.null > at > org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899) > at > org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:159) > at > org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362) > at > org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:239) >
Re: segment number during optimize of index
I have an other question, does the number of segment affect speed for update index? 2012/10/10 jame vaalet > Guys, > thanks for all the inputs, I was continuing my research to know more about > segments in Lucene. Below are my conclusion, please correct me if am wrong. > >1. Segments are independent sub-indexes in seperate file, while indexing >its better to create new segment as it doesnt have to modify an existing >file. where as while searching, smaller the segment the better it is > since >you open x (not exactly x but xn a value proportional to x) physical > files >to search if you have got x segments in the index. >2. since lucene has memory map concept, for each file/segment in index a >new m-map file is created and mapped to the physcial file in disk. Can >someone explain or correct this in detail, i am sure there are lot many >people wondering how m-map works while you merge or optimze index > segments. > > > > On 6 October 2012 07:41, Otis Gospodnetic >wrote: > > > If I were you and not knowing all your details... > > > > I would optimize indices that are static (not being modified) and > > would optimize down to 1 segment. > > I would do it when search traffic is low. > > > > Otis > > -- > > Search Analytics - http://sematext.com/search-analytics/index.html > > Performance Monitoring - http://sematext.com/spm/index.html > > > > > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet > wrote: > > > Hi Eric, > > > I am in a major dilemma with my index now. I have got 8 cores each > > around > > > 300 GB in size and half of them are deleted documents in it and above > > that > > > each has got around 100 segments as well. Do i issue a expungeDelete > and > > > allow the merge policy to take care of the segments or optimize them > into > > > single segment. Search performance is not at par compared to usual solr > > > speed. > > > If i have to optimize what segment number should i choose? my RAM size > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas > > > advice ! > > > > > > thanks. > > > > > > > > > On 6 October 2012 00:00, Erick Erickson > wrote: > > > > > >> because eventually you'd run out of file handles. Imagine a > > >> long-running server with 100,000 segments. Totally > > >> unmanageable. > > >> > > >> I think shawn was emphasizing that RAM requirements don't > > >> depend on the number of segments. There are other > > >> resources that file consume however. > > >> > > >> Best > > >> Erick > > >> > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet > > wrote: > > >> > hi Shawn, > > >> > thanks for the detailed explanation. > > >> > I have got one doubt, you said it doesn matter how many segments > index > > >> have > > >> > but then why does solr has this merge policy which merges segments > > >> > frequently? why can it leave the segments as it is rather than > > merging > > >> > smaller one's into bigger one? > > >> > > > >> > thanks > > >> > . > > >> > > > >> > On 5 October 2012 05:46, Shawn Heisey wrote: > > >> > > > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote: > > >> >> > > >> >>> so imagine i have merged the 150 Gb index into single segment, > this > > >> would > > >> >>> make a single segment of 150 GB in memory. When new docs are > > indexed it > > >> >>> wouldn't alter this 150 Gb index unless i update or delete the > older > > >> docs, > > >> >>> right? will 150 Gb single segment have problem with memory > swapping > > at > > >> OS > > >> >>> level? > > >> >>> > > >> >> > > >> >> Supplement to my previous reply: the real memory mentioned in the > > last > > >> >> paragraph does not include the memory that the OS uses to cache > disk > > >> >> access. If more memory is needed and all the free memory is being > > used > > >> by > > >> >> the disk cache, the OS will throw away part of the disk cache (a > > >> >> near-instantaneous operation that should never involve disk I/O) > and > > >> give > > >> >> that memory to the application that requests it. > > >> >> > > >> >> Here's a very good breakdown of how memory gets used with > > MMapDirectory > > >> in > > >> >> Solr. It's applicable to any program that uses memory mapping, not > > just > > >> >> Solr: > > >> >> > > >> >> > > http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory< > > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory> > > >> >> > > >> >> Thanks, > > >> >> Shawn > > >> >> > > >> >> > > >> > > > >> > > > >> > -- > > >> > > > >> > -JAME > > >> > > > > > > > > > > > > -- > > > > > > -JAME > > > > > > -- > > -JAME > -- from Jun Wang
It's there any way to specify config name for core in solr.xml?
Hi, all I have two collections, and two machines. So, my deployment is like |machine a |machine b | |core a1 | core a2 | core b1 | core b2| core a1 is for collection 1 shard1, core a2 is for collection 1 shard2. config for collection is config 1. core b1 is for collection 2 shard1, core b2 is for collection 2 shard2. config for collection if config 2. It's there any way to specify core config in solr.xml to start up two shard in every machine whit correct config name? -- from Jun Wang
Re: core.SolrCore - java.io.FileNotFoundException
PS, I have found that there lots of segment in index directory, and most of them is empty, like . totoal file number is 35314 in index directory. -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3n.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3o.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3p.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3q.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3r.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3s.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3t.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3u.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3v.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3w.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3x.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3y.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k3z.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k40.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k41.fdx -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdt -rw-rw-r-- 1 admin systems 0 Oct 14 11:37 _k42.fdx 2012/10/15 Jun Wang > I have encounter the a FileNotFoundException exception occasionally when > indexing, it's not occur every time. Anyone have some clue? Here is > the traceback: > > 2012-10-14 11:37:28,105 ERROR core.SolrCore - > java.io.FileNotFoundException: > /home/admin/run/deploy/solr/core_p_shard2/data/index/_cwo.fnm (No such file > or directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.(RandomAccessFile.java:216) > at > org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:218) > at > org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) > at > org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:101) > at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:55) > at > org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120) > at > org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:267) > at > org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928) > at > org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180) > at > org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1430) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:432) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:315) > at > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:230) > at > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275) &g
Re: core.SolrCore - java.io.FileNotFoundException
Hi, Erick Thanks for your advice. My mergeFactor is set to 10, so it's impossible have so many segments, specially some .fdx, .fdt file is just empty. And sometime indexing is working fine, ended with 200+ files in data dir. My deployment is having two core and two shard for every core, using autocommit , DIH is used for pull data from DB, merge policies is using TieredMergePolicy. there is nothing customized. I am wondering how could empty .fdx file generated. may be some config in indexConfig is wrong. My final index is about 20G, having 40m+ docs. here is part of my solrconfig.xml - 32 100 10 15000 false - PS, I found an other kind of log, but I am not sure it's the reason or the consequence. I am planing to open debug log, to gather more information tomorrow. 2012-10-14 10:13:19,854 ERROR update.CommitTracker - auto commit error...:java.io.FileNotFoundException: _cwj.fdt at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266) at org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177) at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:103) at org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:2126) at org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:495) at org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:474) at org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201) at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119) at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:435) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:551) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2657) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2012/10/15 Erick Erickson > I have no idea how you managed to get so many files in > your index directory, but that's definitely weird. How it > relates to your "file not found", I'm not quite sure, but it > could be something as simple as you've run out of file > handles. > > So you could try upping the number of > file handles as a _temporary_ fix just to see if that's > the problem. See your op-system's manuals for > how. > > If it does work, then I'd run an optimize > down to one segment and remove all the segment > files _other_ than that one segment. NOTE: this > means things like .fdt, .fdx, .tii files etc. NOT things > like segments.gen and segments_1. Make a > backup of course before you try this. > > But I think that's secondary. To generate this many > fiels I suspect you've started a lot of indexing > jobs that you then abort (hard kill?). To get this > many files I'd guess it's something programmatic, > but that's a guess. > > How are you committing? Autocommit? From a SolrJ > (or equivalent) program? Have you implemented any > custom merge policies? > > But to your immediate problem. You can try running > CheckIndex (here's a tutorial from 2.9, but I think > it's still good): > http://java.dzone.com/news/lucene-and-solrs-checkindex > > If that doesn't help (and you can run it in diagnostic mode, > without the --fix flag to see what it _would_ do) then I'm > afraid you'll probably have to re-index. > > And you've got to get to the root of why you have so > many segment files. That number is just crazy > > Best > Erick > > On Sun, Oct 14, 2
What does _version_ field used for?
I ma moving to solr4.0 from beta version. There is a exception was thrown, Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist) at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57) at org.apache.solr.core.SolrCore.(SolrCore.java:606) ... 26 more 2 It's seem that there need a field like in schema.xml. I am wonder what does this used for? -- from Jun Wang
Re: What does _version_ field used for?
Is that said we just need to add this filed, and there is no more work? 2012/10/17 Rafał Kuć > Hello! > > It is used internally by Solr, for example by features like partial > update functionality and update log. > > -- > Regards, > Rafał Kuć > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch > > > I ma moving to solr4.0 from beta version. There is a exception was > thrown, > > > Caused by: org.apache.solr.common.SolrException: _version_field must > exist > > in schema, using indexed="true" stored="true" and multiValued="false" > > (_version_ does not exist) > > at > > > org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57) > > at org.apache.solr.core.SolrCore.(SolrCore.java:606) > > ... 26 more > > 2 > > > It's seem that there need a field like > > > > in schema.xml. I am wonder what does this used for? > > -- from Jun Wang
Re: What does _version_ field used for?
Ok, I got it, thanks 2012/10/17 Alexandre Rafalovitch > Yes, just make sure you have it in the scheme. Solr handles the rest. > > Regards, >Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Wed, Oct 17, 2012 at 12:57 PM, Jun Wang wrote: > > Is that said we just need to add this filed, and there is no more work? > > > > 2012/10/17 Rafał Kuć > > > >> Hello! > >> > >> It is used internally by Solr, for example by features like partial > >> update functionality and update log. > >> > >> -- > >> Regards, > >> Rafał Kuć > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > ElasticSearch > >> > >> > I ma moving to solr4.0 from beta version. There is a exception was > >> thrown, > >> > >> > Caused by: org.apache.solr.common.SolrException: _version_field must > >> exist > >> > in schema, using indexed="true" stored="true" and multiValued="false" > >> > (_version_ does not exist) > >> > at > >> > > >> > org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57) > >> > at org.apache.solr.core.SolrCore.(SolrCore.java:606) > >> > ... 26 more > >> > 2 > >> > >> > It's seem that there need a field like > >> > > >> > in schema.xml. I am wonder what does this used for? > >> > >> > > > > > > -- > > from Jun Wang > -- from Jun Wang
Re: Solr 4.0 segment flush times has bigger difference between tow machines
I have found that segment flush is controlled by DocumentWriterFlushControl, and indexing is implemented by DocumentWriterPerThread. DocumentWriterFlushControl has information about number of doc and size of RAM buffer, but this seemed be shared by all DocumentWriterPerThread. Is that RAM limit is sum of all buffer of DocumentWriterPerThread? 2012/10/19 Jun Wang > Hi > > I have 2 machine for a collection, and it's using DIH to import data, DIH > is trigger via url request at one machine, let's call it A, and A will > forward some index to machine B. Recently I have found that segment flush > happened more in machine B. here is part of INFOSTREAM.txt. > > Machine A: > > DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as > segment _4r3 numDocs=71616 > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0 > deleted docs > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no > vectors; no norms; no docValues; prox; freqs > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: > flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm, > _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq] > DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40 > D > > Machine B > -- > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings > as segment _zi0 numDocs=4302 > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment > has 0 deleted docs > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment > has no vectors; no norms; no docValues; prox; freqs > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: > flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt, > _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip] > DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed > codec=Lucene40 > D > > I have found that flush occured when number of doc in RAM reached > 7~9000 in machine A, but the number in machine B is very different, > almost is 4000. It seem that every doc in buffer used more RAM in machine > B then machine A, that result in more flush . Does any one know why this > happened? > > My conf is here. > > 6410 > > > > > -- > from Jun Wang > > > -- from Jun Wang