Re: nutch and solr
thi is the problem! Becaus in my root there is a url! I write you my step-by-step configuration of nutch: (I use cygwin because I work on windows) *1. Extract the Nutch package* *2. Configure Solr* (*Copy the provided Nutch schema from directory apache-nutch-1.0/conf to directory apache-solr-1.3.0/example/solr/conf (override the existing file) for *to allow Solr to create the snippets for search results so we need to store the content in addition to indexing it: *b. Change schema.xml so that the stored attribute of field “content” is true.* ** We want to be able to tweak the relevancy of queries easily so we’ll create new dismax request handler configuration for our use case: *d. Open apache-solr-1.3.0/example/solr/conf/solrconfig.xml and paste following fragment to it* dismax explicit 0.01 content^0.5 anchor^1.0 title^1.2 content^0.5 anchor^1.5 title^1.2 site^1.5 url 2<-1 5<-2 6<90% 100 *:* title url content 0 title 0 url regex *3. Start Solr* cd apache-solr-1.3.0/example java -jar start.jar *4. Configure Nutch* *a. Open nutch-site.xml in directory apache-nutch-1.0/conf, replace it’s contents with the following (we specify our crawler name, active plugins and limit maximum url count for single host per run to be 100) :* http.agent.name nutch-solr-integration generate.max.per.host 100 plugin.includes protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic) *b. Open regex-urlfilter.txt in directory apache-nutch-1.0/conf, replace it’s content with following:* -^(https|telnet|file|ftp|mailto): # skip some suffixes -\.(swf|SWF|doc|DOC|mp3|MP3|WMV|wmv|txt|TXT|rtf|RTF|avi|AVI|m3u|M3U|flv|FLV| WAV|wav|mp4|MP4|avi|AVI|rss|RSS|xml|XML|pdf|PDF|js|JS|gif|GIF|jpg|JPG|png| PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG |bmp|BMP)$ # skip URLs containing certain characters as probable queries, etc. -[?*!@=] # allow urls in foofactory.fi domain +^http:*//([a-z0-9\-A-Z]*\.)*google.it/* # deny anything *else* -. *5. Create a seed list (the initial urls to fetch)* mkdir urls *(crea una cartella ‘urls’)* echo "http://www.google.it/"; > urls/seed.txt *6. Inject seed url(s) to nutch crawldb (execute in nutch directory)* bin/nutch inject crawl/crawldb urls AND HERE, THE MESSAGE ERROR about empty path. Why, in your opinion? thank you alessio Il giorno 24 febbraio 2012 17:51, tamanjit.bin...@yahoo.co.in < tamanjit.bin...@yahoo.co.in> ha scritto: > The empty path message is becayse nutch is unable to find a url in the url > location that you provide. > > Kindly ensure there is a url there. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3773089.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SIREn integration with SOLR
Hi Chitra, You can download the distribution using the details given here- http://siren.sindice.com/download.html License has been changed to AGPL3.0 Source code is available here- https://github.com/rdelbru/SIREn/ - Anuj On Wed, Feb 22, 2012 at 3:45 PM, chitra wrote: > Hi, > > We would like to implement semantic search in our websites. We > already have the full text search service by using SOLR. Heard that SIREn > plug-in for SOLR would be able to allow to index & query the semi-structred > data. > > Could any one of you provide me more details about SIREn, its integration > with SOLR and how to use it with PHP > > Thanks in advance... > > Regards > Chitra > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SIREn-integration-with-SOLR-tp3766056p3766056.html > Sent from the Solr - User mailing list archive at Nabble.com. >
upgrading Solr - org.apache.lucene.search.Filter and acceptDocs
I'm trying to upgrade an application I have from an old snapshot of solr to the latest stable trunk and see that the constructor for Filter has changed, specifically there is another parameter named acceptDocs, the API says the following acceptDocs - Bits that represent the allowable docs to match (typically deleted docs but possibly filtering other documents) but I'm not sure what specifically this means to my filter. How should this be used when trying to upgrade a filter?
Re: upgrading Solr - org.apache.lucene.search.Filter and acceptDocs
On Sat, Feb 25, 2012 at 3:16 PM, Jamie Johnson wrote: > I'm trying to upgrade an application I have from an old snapshot of > solr to the latest stable trunk and see that the constructor for > Filter has changed, specifically there is another parameter named > acceptDocs, the API says the following > > acceptDocs - Bits that represent the allowable docs to match > (typically deleted docs but possibly filtering other documents) > > but I'm not sure what specifically this means to my filter. How > should this be used when trying to upgrade a filter? If a document doesn't match acceptDocs, it should be returned by the filter. Lucene is basically asking "what documents match your filter AND match acceptDocs" -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: upgrading Solr - org.apache.lucene.search.Filter and acceptDocs
I am assuming you meant should not be returned right? I basically return a filtered doc id set and do the following return new FilteredDocIdSet(startingFilter.getDocIdSet(readerCtx, acceptDocs)) { @Override public boolean match(int doc) { //do custom stuff } }; does the filtereddocidset give me only the ones that match, or is there something additional I need to do in addition to my custom match logic here? I.e. just do if(!acceptDocs.get(doc)) return false; at the top? On Sat, Feb 25, 2012 at 3:23 PM, Yonik Seeley wrote: > On Sat, Feb 25, 2012 at 3:16 PM, Jamie Johnson wrote: >> I'm trying to upgrade an application I have from an old snapshot of >> solr to the latest stable trunk and see that the constructor for >> Filter has changed, specifically there is another parameter named >> acceptDocs, the API says the following >> >> acceptDocs - Bits that represent the allowable docs to match >> (typically deleted docs but possibly filtering other documents) >> >> but I'm not sure what specifically this means to my filter. How >> should this be used when trying to upgrade a filter? > > If a document doesn't match acceptDocs, it should be returned by the filter. > Lucene is basically asking "what documents match your filter AND match > acceptDocs" > > -Yonik > lucenerevolution.com - Lucene/Solr Open Source Search Conference. > Boston May 7-10
Solr 4.0 Question
I just got done reading http://www.searchworkings.org/blog/-/blogs/uwe-says%3A-is-your-reader-atomic and was specifically interested in the following line "Unfortunately, Apache Solr still uses this horrible code in a lot of places, leaving us with a major piece of work undone. Major parts of Solr’s facetting and filter caching need to be rewritten to work per atomic segment! For those implementing plugins or other components for Solr, SolrIndexSearcher exposes a “atomic view” of its underlying reader via SolrIndexSearcher.getAtomicReader()." Can someone give more details around this? Is there a JIRA to address this in Solr? I'm assuming that this is not something new, just something that can be improved?
Re: upgrading Solr - org.apache.lucene.search.Filter and acceptDocs
On Sat, Feb 25, 2012 at 3:37 PM, Jamie Johnson wrote: > I.e. just do if(!acceptDocs.get(doc)) return false; at > the top? Yep, that should do it. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Solr 4.0 Question
On Sat, Feb 25, 2012 at 3:39 PM, Jamie Johnson wrote: > "Unfortunately, Apache Solr still uses this horrible code in a lot of > places, leaving us with a major piece of work undone. Major parts of > Solr’s facetting and filter caching need to be rewritten to work per > atomic segment! For those implementing plugins or other components for > Solr, SolrIndexSearcher exposes a “atomic view” of its underlying > reader via SolrIndexSearcher.getAtomicReader()." Some of this is just a misunderstanding, and some of it is a difference of opinion. Solr uses a top-level FieldCache entry for certain types of faceting, but it's optional. Solr can also use per-segment FieldCache entries when faceting. The reason we haven't removed the top-level FieldCache faceting is that it's faster unless you are doing near-realtime (NRT) search (due to the cost of merging terms across segments). Top level fieldcache entries are also more memory efficient for Strings as string values are not repeated across each segment. The right approach depends on the specific use-case, and Solr will continue to strive to have faceting algorithms optimized for both NRT and non-NRT. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
RE: Problem with SolrCloud + Zookeeper + DataImportHandler
Hi, As you've asked. https://issues.apache.org/jira/browse/SOLR-3165 If you have any questions or need more details I can debug this problem more. Agnieszka > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, February 24, 2012 10:11 PM > To: solr-user@lucene.apache.org > Subject: Re: Problem with SolrCloud + Zookeeper + DataImportHandler > > The key piece is "ZkSolrResourceLoader does not support getConfigDir() > " > > Apparently DIH is doing something that requires getting the local > config dir path - but this is on ZK in SolrCloud mode, not the local > filesystem. > > Could you make a JIRA issue for this? I could look into a work around > depending on why DIH needs to do this. > > - Mark > > On Feb 20, 2012, at 7:28 AM, Agnieszka Kukałowicz wrote: > > > Hi All, > > > > I've recently downloaded latest solr trunk to configure solrcloud > with > > zookeeper > > using standard configuration from wiki: > > http://wiki.apache.org/solr/SolrCloud. > > > > The problem occurred when I tried to configure DataImportHandler in > > solrconfig.xml: > > > > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > > > db-data-config.xml > > > > > > > > > > After starting solr with zookeeper I've got errors: > > > > Feb 20, 2012 11:30:12 AM org.apache.solr.common.SolrException log > > SEVERE: null:org.apache.solr.common.SolrException > >at org.apache.solr.core.SolrCore.(SolrCore.java:606) > >at org.apache.solr.core.SolrCore.(SolrCore.java:490) > >at > > org.apache.solr.core.CoreContainer.create(CoreContainer.java:705) > >at > org.apache.solr.core.CoreContainer.load(CoreContainer.java:442) > >at > org.apache.solr.core.CoreContainer.load(CoreContainer.java:313) > >at > > > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer > .ja > > va:262) > >at > > > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java > :98 > > ) > >at > > org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) > >at > > > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50 > ) > >at > > > org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java > :71 > > 3) > >at > > org.mortbay.jetty.servlet.Context.startContext(Context.java:140) > >at > > > org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java: > 128 > > 2) > >at > > > org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:51 > 8) > >at > > > org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) > >at > > > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50 > ) > >at > > > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.j > ava > > :152) > >at > > > org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandl > erC > > ollection.java:156) > >at > > > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50 > ) > >at > > > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.j > ava > > :152) > >at > > > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50 > ) > >at > > > org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:13 > 0) > >at org.mortbay.jetty.Server.doStart(Server.java:224) > >at > > > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50 > ) > >at > > org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) > >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja > va: > > 39) > >at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso > rIm > > pl.java:25) > >at java.lang.reflect.Method.invoke(Method.java:597) > >at org.mortbay.start.Main.invokeMain(Main.java:194) > >at org.mortbay.start.Main.start(Main.java:534) > >at org.mortbay.start.Main.start(Main.java:441) > >at org.mortbay.start.Main.main(Main.java:119) > > Caused by: org.apache.solr.common.SolrException: FATAL: Could not > create > > importer. DataImporter config invalid > >at > > > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportH > and > > ler.java:120) > >at > > > org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java: > 542 > > ) > >at org.apache.solr.core.SolrCore.(SolrCore.java:601) > >... 31 more > > Caused by: org.apache.solr.common.cloud.ZooKeeperException: > > ZkSolrResourceLoader does not support getConfigDir() - likely, w > >at > > > org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceL > oad > > er.java:99) > >at > > > org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePr > ope
Re: lucene operators interfearing in edismax
Please backport to 3x. On Mon, Feb 20, 2012 at 2:22 PM, Yonik Seeley wrote: > This should be fixed in trunk by LUCENE-2566 > > QueryParser: Unary operators +,-,! will not be treated as operators if > they are followed by whitespace. > > -Yonik > lucidimagination.com > > > > On Mon, Feb 20, 2012 at 2:09 PM, jmlucjav wrote: >> Hi, >> >> I am using edismax with end user entered strings. One search was not finding >> what appeared to be the best match. The search was: >> >> Sage Creek Organics - Enchanted >> >> If I remove the -, the doc I want is found as best score. Turns out (I >> think) the - is the culprit as the best match has 'enchanted' and this makes >> it 'NOT enchanted' >> >> Is my analisys correct? I tried looking at the debug output but saw not NOT >> entries there... >> >> If so, is there a standard way (any filter) to remove lucene operators from >> user entered queries? I thought this must be something usual. >> >> thanks >> javi >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3761577.html >> Sent from the Solr - User mailing list archive at Nabble.com. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?
Well, you can give it a try, I don't know if anyone's done that before. And you're on your own, I haven't a clue what the results would be... Sorry I can't be more help here... Erick On Thu, Feb 23, 2012 at 10:44 PM, bing wrote: > Hi, all, > > I am using > org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory > (since Solr3.5.0) to do language detection, and it's cool. > > An issue: if I deploy Solr3.3.0, is it possible to import that factory in > Solr3.5.0 to be used in Solr3.3.0? > > Why I stick on Solr3.3.0 is because I am working on Dspace (discovery) to > call solr, and for now the highest version that Solr can be upgraded to is > 3.3.0. > > I would hope to do this while keep Dspace + Solr at the most. Say, import > that factory into Solr3.3.0, is it possible? Does any one happen to know > certain way to solve this? > > Best Regards, > Bing > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3771620.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing taking so much time to complete.
You have to tell us a lot more about what you're trying to do. I can import 32G in about 20 minutes, so obviously you're doing something different than I am... Perhaps you might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Sat, Feb 25, 2012 at 12:00 AM, Suneel wrote: > Hi All, > > I am using Apache solr 3.1 and trying to caching 50 gb records but it is > taking more then 20 hours this is very painful to update records. > > 1. Is there any way to reduce caching time or this time is ok for 50 gb > records ?. > > 2. What is the delta-import, this will be helpful for me cache only updated > record not rather then caching all records ?. > > > > Please help me in above mentioned question. > > > Thanks & Regards, > > - > Suneel Pandey > Sr. Software Developer > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-taking-so-much-time-to-complete-tp3774464p3774464.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: TermsComponent show only terms that matched query?
Jay: I've seen the this question go 'round before, but don't remember a satisfactory solution. Are you talking on a per-document basis here? If so, I vaguely remember it being possible to do something with highlighting, just counting the tags returned after highlighting. Best Erick On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill wrote: > I have a situation where I want to show the term counts as is done in the > TermsComponent, but *only* for terms that are *matched* in a query, so I > get something returned like this (pseudo code): > > q=title:(golf swing) > > > title: golf legends show how to improve your golf swing on the golf course > ...other fields > > > > golf (3) > swing (1) > > > rather than getting back all of the terms in the doc. > > Thanks, > -Jay
Re: TermsComponent show only terms that matched query?
I think you have to walk the term positions and offsets, look in the stored field, and find the terms that matched. Which is exactly what highlighting does. And this will only find the actual terms in the text, no synonyms. So if you search for Sempranillo and find Sempranillo in some wines and Tempranillo in others, you have to know yourself that they are synonyms. On Sat, Feb 25, 2012 at 2:54 PM, Erick Erickson wrote: > Jay: > > I've seen the this question go 'round before, but don't remember > a satisfactory solution. Are you talking on a per-document basis > here? If so, I vaguely remember it being possible to do something > with highlighting, just counting the tags returned after highlighting. > > Best > Erick > > On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill wrote: >> I have a situation where I want to show the term counts as is done in the >> TermsComponent, but *only* for terms that are *matched* in a query, so I >> get something returned like this (pseudo code): >> >> q=title:(golf swing) >> >> >> title: golf legends show how to improve your golf swing on the golf course >> ...other fields >> >> >> >> golf (3) >> swing (1) >> >> >> rather than getting back all of the terms in the doc. >> >> Thanks, >> -Jay -- Lance Norskog goks...@gmail.com
RE: Indexing taking so much time to complete.
What's your secret? OK, that question is not the kind recommended in the UsingMailingLists suggestions, so I will write again soon with a description of my data and what I am trying to do, and ask more specific questions. And I don't mean to hijack the thread, but I am in the same boat as the poster. I just started working with Solr less than two months ago, and after beginning with a completely naïve approach to indexing database contents with DataImportHandler and then making small adjustments to improve performance as I learned about them, I have gotten some smaller datasets to import in a reasonable amount of time, but the 60GB data set that I will need to index for the project I am working on would take over three days to import using the configuration that I have now. Obviously you're doing something different than I am... What things would you say have made the biggest improvement in indexing performance with the 32GB data set that you mentioned? How long do you think it would take to index that same data set if you used Solr more or less out of the box with no attempts to improve its performance? Thanks, Mike -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, February 25, 2012 2:51 PM To: solr-user@lucene.apache.org Subject: Re: Indexing taking so much time to complete. You have to tell us a lot more about what you're trying to do. I can import 32G in about 20 minutes, so obviously you're doing something different than I am... Perhaps you might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Sat, Feb 25, 2012 at 12:00 AM, Suneel wrote: > Hi All, > > I am using Apache solr 3.1 and trying to caching 50 gb records but it > is taking more then 20 hours this is very painful to update records. > > 1. Is there any way to reduce caching time or this time is ok for 50 > gb records ?. > > 2. What is the delta-import, this will be helpful for me cache only > updated record not rather then caching all records ?. > > > > Please help me in above mentioned question. > > > Thanks & Regards, > > - > Suneel Pandey > Sr. Software Developer > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-taking-so-much-time-to-com > plete-tp3774464p3774464.html Sent from the Solr - User mailing list > archive at Nabble.com.
Re: Indexing taking so much time to complete.
Right. My situation is simple, I have a 32G dump of Wikipedia data in a big XML file. I can parse it and dump it into a (local) Solr instance at 5-7K records/second. But it's stupid-simple, just a few fields and no database involved. Much of the 32G is XML. But that serves to illustrate that the size of the data to be imported isn't much information to go on... bq: 60GB data set that I will need to index for the project I am working on would take over three days to import using the configuration that I have now. OK, first thing I'd do is figure out what's taking the time. Consider switching to SolrJ for your indexing process, it can make debugging things much easier. Here's a blog post: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ When you start getting to 60G of data to import, you might want finer control over what you're doing, better error reporting, etc. as well as being better able to pinpoint where your problems are. And, you can do things like just spin through the data-retrieval part to answer the first question you need to answer, "what's taking the time?" Is it fetching the data? Sending it to Solr? Do you have Tika in here somewhere? Network latency? If you set up the SolrJ process, you can just selectively remove steps in the process to determine what the bottleneck is and go from there. Hope that helps Erick On Sat, Feb 25, 2012 at 8:55 PM, Mike O'Leary wrote: > What's your secret? > > OK, that question is not the kind recommended in the UsingMailingLists > suggestions, so I will write again soon with a description of my data and > what I am trying to do, and ask more specific questions. And I don't mean to > hijack the thread, but I am in the same boat as the poster. > > I just started working with Solr less than two months ago, and after > beginning with a completely naïve approach to indexing database contents with > DataImportHandler and then making small adjustments to improve performance as > I learned about them, I have gotten some smaller datasets to import in a > reasonable amount of time, but the 60GB data set that I will need to index > for the project I am working on would take over three days to import using > the configuration that I have now. Obviously you're doing something different > than I am... > > What things would you say have made the biggest improvement in indexing > performance with the 32GB data set that you mentioned? How long do you think > it would take to index that same data set if you used Solr more or less out > of the box with no attempts to improve its performance? > Thanks, > Mike > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Saturday, February 25, 2012 2:51 PM > To: solr-user@lucene.apache.org > Subject: Re: Indexing taking so much time to complete. > > You have to tell us a lot more about what you're trying to do. I can import > 32G in about 20 minutes, so obviously you're doing something different than I > am... > > Perhaps you might review: > http://wiki.apache.org/solr/UsingMailingLists > > Best > Erick > > On Sat, Feb 25, 2012 at 12:00 AM, Suneel wrote: >> Hi All, >> >> I am using Apache solr 3.1 and trying to caching 50 gb records but it >> is taking more then 20 hours this is very painful to update records. >> >> 1. Is there any way to reduce caching time or this time is ok for 50 >> gb records ?. >> >> 2. What is the delta-import, this will be helpful for me cache only >> updated record not rather then caching all records ?. >> >> >> >> Please help me in above mentioned question. >> >> >> Thanks & Regards, >> >> - >> Suneel Pandey >> Sr. Software Developer >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Indexing-taking-so-much-time-to-com >> plete-tp3774464p3774464.html Sent from the Solr - User mailing list >> archive at Nabble.com.
Re: Solr Transaction Log Question
On Sat, Feb 25, 2012 at 11:30 PM, Jamie Johnson wrote: > How large will the transaction log grow, and how long should it be kept > around? We keep around enough logs to satisfy a minimum of 100 updates lookback. Unneeded log files are deleted automatically. When a hard commit is done, we create a new log file (since we know the normal index files have been sync'd and hence we no longer need the update log for durability). -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10