Re: SolrIndexWriter holding reference to deleted file?
I haven't been able to get a profiler at the server yet, but I thought I might show how my code works, because it's quite different from the example in the link you provided... public synchronized ResultItem[] search(String query) throws CorruptIndexException, IOException{ SolrIndexSearcher searcher = new SolrIndexSearcher(solrCore.getSchema(), "MySearcher", solrCore.getIndexDir(), true); Hits hits = search(searcher, query); for(int i =0; i < hits.length(); i++){ parse(hits.doc(i)); //add to result-array } searcher.close(); //return result-array } private Hits search(SolrIndexSearcher searcher, String pQuery){ try { SolrQueryParser parser = new SolrQueryParser(solrCore.getSchema(), "text"); //default search field is called "text" Query query = parser.parse(pQuery); return searcher.search(query); } //catch exceptions } This is the code that does the searching. The searcher is passed as a parameter to the search-method, because it needs to be open while I'm parsing the documents in the hits. I know I should move the closure of the search-operation to a finally-block, will do that in any case, but I doubt it will solve the problem because I've never had any exceptions in this code. Might the problem be that I'm not using SolrQueryRequest objects? Best regards, Yonik Seeley wrote: > > This is probably related to "using Solr/Lucene embeddedly" > See the warning at the top of http://wiki.apache.org/solr/EmbeddedSolr > > It does sound like your SolrIndexSearcher objects aren't being closed. > Solr (via SolrCore) doesn't rely on garbage collection to close the > searchers (since gc unfortunately can't be triggered by low > descriptors). SolrIndexSearcher objects are reference counted and > closed when no longer in use. This means that SolrQueryRequest > objects must always be closed or the refcount will be off. > > Not sure where you could start except perhaps trying to verify the > number of live SolrIndexSearcher objects. > > -Yonik > > On Dec 20, 2007 8:20 AM, amamare <[EMAIL PROTECTED]> wrote: >> >> I have an application consisting of three web applications running on >> JBoss >> 1.4.2 on a Linux Redhat server. I'm using Solr/Lucene embeddedly to >> create >> and maintain a frequently updated index. Once updated, the index is >> copied >> to another directory used for searching. Old index-files in the search >> directory are then deleted. The streams used to copy the files are closed >> in >> finally-blocks. After a few days an IOException occurs because of "too >> many >> open files". When I run the linux command >> >> ls -l /proc/26788/fd/ >> >> where 26788 is jboss' process id, it gives me a seemingly ever-increasing >> list of deleted files (1 per update since I optimize on every update and >> use >> compound file format), marked with 'deleted' in parantheses. They are all >> located in the search directory. From what I understand this means that >> something still holds a reference to the file, and that the file will be >> permanently deleted once this something loses its reference to it. >> >> Only SolrIndexSearcher objects are in direct contact with these files in >> the >> search application. The searchers are local objects in search-methods, >> and >> are closed after every search operation. In theory, the garbage collector >> should collect these objects later (though while profiling other >> applications I've noticed that it often doesn't garbage collect until the >> allocated memory starts running out). >> >> The other objects in contact with the files are the FileOutputStreams >> used >> to copy them, but as stated above, these are closed in finally-blocks and >> thus should hold no reference to the files. >> >> I need to get rid of the "too many open files"-problem. I suspect that it >> is >> related to the almost-deleted files in the proc-dir, but I know too >> little >> of Linux to be sure. Does the problem ring a bell to anyone, or do you >> have >> any ideas as to how I can get rid of the problem? >> >> All help is greatly appreciated. >> -- >> View this message in context: >> http://www.nabble.com/SolrIndexWriter-holding-reference-to-deleted-file--tp14436326p14436326.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > -- View this message in context: http://www.nabble.com/SolrIndexWriter-holding-reference-to-deleted-file--tp14436326p14594325.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)
Hi Otis, after some thought (I must have been sleeping or something) it seems that it is indeed possible to remove the 2000 product-variant fields from the index and store them in an external store. I was doubting this option before as I mistakingly thought that I would still need to have the 2000 stored fields in place to store the product-variant keys for accessing the database. However I have some way of identifying the product-variants client-side, once Solr returns the products. This however makes that an external datastore must have 1 row per product-variant. Having an upper-range of about 200.000 products and up to 2000 product variants per product this would give a maximum of 400.000.000product-variant records in the external datastore. I really don't have a clue about possible performance given these numbers but it sounds rather large to me, although it may sound peanuts to you ;-) . The query would be to return 10 rows based on 10 product-variant id's. Any rough guestimates whether this sounds doable? I guess I'm just going to find out. Thanks for helping me think out of the box! Geert-Jan 2008/1/2, Otis Gospodnetic <[EMAIL PROTECTED]>: > > Maybe I'm not following your situation 100%, but it sounded like pulling > the values of purely stored fields is the slow part. *Perhaps* using a > non-Lucene data store just for the saved fields would be faster. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Geert-Jan Brits <[EMAIL PROTECTED] > > To: solr-user@lucene.apache.org > Sent: Monday, December 31, 2007 8:49:43 AM > Subject: Re: big perf-difference between solr-server vs. SOlrJ > req.process(solrserver) > > > Hi Otis, > > I don't really see how this would minimize my number of fields. > At the moment I have 1 pricefield (stored / indexed) and 1 multivalued > field > (stored) per product-variant. I have about 2000 product variants. > > I could indeed replace each multivalued field by a singlevaluedfield > with an > id pointing to a external store, where I get the needed fields. However > this > would not change the number of fields in my index (correct?) and thus > wouldn't matter for the big scanning-time I'm seeing. Moreover, it > wouldn't > matter for the query-time either I guess. > > Thanks, > Geert-Jan > > > > > > 2007/12/29, Otis Gospodnetic < [EMAIL PROTECTED]>: > > > > Hi Geert-Jan, > > > > Have you considered storing this data in an external data store and > not > > Lucene index? In other words, use the Lucene index only to index the > > content you need to search. Then, when you search this index, just > pull out > > the single stored fields, the unique ID for each of top N hits, and > use > > those ID to pull the actual content for display purposes from the > external > > store. This external store could be a RDBMS, an ODBMS, a BDB, etc. > I've > > worked with very large indices where we successfully used BDBs for > this > > purpose. > > > > Otis > > > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Geert-Jan Brits < [EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Thursday, December 27, 2007 11:44:13 AM > > Subject: Re: big perf-difference between solr-server vs. SOlrJ > req.process > > (solrserver) > > > > yeah, that makes sense. > > so, in in all, could scanning all the fields and loading the 10 > fields > > add > > up to cost about the same or even more as performing the intial > query? > > (Just > > making sure) > > > > I am wondering if the following change to the schema would help in > this > > case: > > > > current setup: > > It's possible to have up to 2000 product-variants. > > each product-variant has: > > - 1 price field (stored / indexed) > > - 1 multivalued field which contains product-variant characteristics > > (strored / not indexed). > > > > This adds up to the 4000 fields described. Moreover there are some > > fields on > > the product level but these would contibute just a tiny bit to the > > overall > > scanning / loading costs (about 50 -stored and indexed- fields in > > total) > > > > possible new setup (only the changes) : > > - index but not store the price-field. > > - store the price as just another one of the product-variant > > characteristics > > in the multivalued product-variant field. > > > > as a result this would bring back the maximum number of stored fields > > to > > about 2050 from 4050 and thereby about halving scanning / loading > costs > > while leaving the current quering-costs intact. > > Indexing costs would increase a bit. > > > > Would you expect the same performance gain? > > > > Thanks, > > Geert-Jan > > > > 2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>: > > > > > > On Dec 27, 2007 11:01 AM, Britske < [EMAIL PROTECTED]> wrote: > > > > after inspecting solrconfig.xml I see that I already have enabled > > lazy > > > field > > > > loading by: > > > > true (I guess it > > was > > > > enabl
Field collapsing
Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Re: correct escapes in csv-Update files
CSV doesn't use backslash escaping. http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm "This is text with a ""quoted"" string" -Yonik On Jan 2, 2008 8:21 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > I use UpdateCSV to feed my data into SOLR and it works very well. The > only thing I don't understand is how to properly escape the encapsulator > and the backslash. > An example with the default encapsulator ("): > "This is a text with a \"quote\"" > "This gives one \ backslash" > "This gives two backslashes before the \\\"quote\"" > "This gives an error \\"quote\"" > > So what if I want only one backslash before the quote, e.g. the > unescaped data looks like this: > Text with \"funny characters > (a real backslash before a real quote not an escaped quote) > > I know this isn't common and perhaps it would be possible to find an > encapsulator that will be very, very unlikely to be found in the data > but you can never be sure. > So is there a way to correctly escape or otherwise encode all possible > combinations of special characters? > > -Michael > >
Re: Leading WildCard in Query
On Dec 12, 2007 6:51 AM, Michael Kimsal <[EMAIL PROTECTED]> wrote: > Please vote for SOLR-218. I'm not aware of any other way to accomplish the > leading wildcard functionality that would be convenient. SOLR-218 is not > asking that it be enabled by default, only that it be functionality that is > exposed to SOLR admins via config.xml. I'm actually still in favor of it being enabled by default. There are a lot of ways to make really slow queries, and it's not Solr's job to protect against these IMO (that's the job of the app that uses Solr). Preventing a leading wildcard simply reduces functionality. -Yonik
Re: Backup of a Solr index
Charlie Jackson wrote: Solr indexes are file-based, so there's no need to "dump" the index to a file. But however one has first to shutdown the Solr server before copying the index folder? In terms of how to create backups and move those backups to other servers, check out this page http://wiki.apache.org/solr/CollectionDistribution. It notes a script "abc", but I cannot find it in my Solr distribution (nightly build)? Run those scripts on Windows XP?
RE: Backup of a Solr index
> But however one has first to shutdown the Solr server before copying the index folder? If you want to copy the hard files from the data/index directory, yes, you'll probably want to shut down the server first. You may be able to get away with leaving the server up but stopping any index/commit operations, but I could be wrong. > It notes a script "abc", but I cannot find it in my Solr distribution (nightly build)? All of the collection distribution scripts can be found in src/scripts in the nightly build if they aren't in the bin directory of the example solr directory. > Run those scripts on Windows XP? No, unfortunately the Collection Distribution scripts won't work in Windows because they use Unix filesystem trickery to operate. -Original Message- From: Jörg Kiegeland [mailto:[EMAIL PROTECTED] Sent: Thursday, January 03, 2008 11:00 AM To: solr-user@lucene.apache.org Subject: Re: Backup of a Solr index Charlie Jackson wrote: > Solr indexes are file-based, so there's no need to "dump" the index to a > file. > But however one has first to shutdown the Solr server before copying the index folder? > In terms of how to create backups and move those backups to other servers, > check out this page http://wiki.apache.org/solr/CollectionDistribution. > It notes a script "abc", but I cannot find it in my Solr distribution (nightly build)? Run those scripts on Windows XP?
Re: Field collapsing
Hi Doug, Is the problem in applying the patch or getting it to work once it is applied? -Grant On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote: Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Re: Field collapsing
I think the last patch is pre QueryComponent infrastructure it needs to be transformed into a QueryComponent to work. I don't think anyone has tackled that yet... ryan Doug Steigerwald wrote: Modifying the patch to apply. StandardRequestHandler and DisMaxRequestHandler were changed a lot in mid-November and I've been having a hard time figuring out where the changes should be reapplied. Doug Grant Ingersoll wrote: Hi Doug, Is the problem in applying the patch or getting it to work once it is applied? -Grant On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote: Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Re: Performance stats for indeces with over 10MM documents
I had exactly the same thought. That query is not an information retrieval (text search) query. It is data retrieval and would work great on a relational database. wunder On 1/2/08 9:53 PM, "John Stewart" <[EMAIL PROTECTED]> wrote: > Alex, > > Not to be a pain, but the response I had when looking at the query > was, why not do this in a SQL database, which is designed precisely to > process this sort of request at speed? I've noticed that people > sometimes try to get Solr to act as a generalized information store -- > I'm not sure that's what you're doing, but be aware of this pitfall. > > jds > > On Jan 3, 2008 12:52 AM, Alex Benjamen <[EMAIL PROTECTED]> wrote: >> Mike, >> >> Thanks for the input, it's really valueable. Several forum users have >> suggested using fq to separate >> the caching of filters, and I can immediately see how this would help. I'm >> changing the code right now >> and going to run some benchmarks, hopefully see a big gain just from that >> >> >>> - use range queries when querying contiguous disjunctions (age:[28 TO 33] >>> rather than what you have above). >> I actually started with the above, using int type field, and it somehow >> seemed slower than using explicit, but I will >> certainly try again. >> >> >>> - convert the expensive, heap-based age filter disjunction into a bitset >>> created directly from the term enum >> Can you pls. elaborate a little more? Are you advising to use fq=age:[28 TO >> 33], or should that simply be part >> of the regular query? Also, what is the best "type" to use when defining age? >> I'm currently using "text", should >> I use "int" instead... I didn't see any difference with using the type "int". >> >> One of the issues is that the age ranges are not "pre-defined" - they can be >> any combination, 22-23, 22-85, 45-49, etc. >> I realize that pre-defining age ranges would drastically improve performance >> but then we're greatly reducing the value >> of this type of search >> >> Thanks, >> Alex >>
Re: Field collapsing
Modifying the patch to apply. StandardRequestHandler and DisMaxRequestHandler were changed a lot in mid-November and I've been having a hard time figuring out where the changes should be reapplied. Doug Grant Ingersoll wrote: Hi Doug, Is the problem in applying the patch or getting it to work once it is applied? -Grant On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote: Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
RE: Performance stats for indeces with over 10MM documents
we currently use a relational system, and it doesn't perform. Also, even though a lot of our queries are structured, we do combine them with text search, so for instance, there could be an additional clause which is a free text search for a favorite TV show -- I had exactly the same thought. That query is not an information retrieval (text search) query. It is data retrieval and would work great on a relational database. wunder
Re: Mixing adds, deletes and commit in the same message
On 3-Jan-08, at 11:38 AM, Leonardo Santagada wrote: I tried to put some adds and deletes in the same request to solr but it didn't work, have I done something wrong or this is really not suported? It isn't supported. -Mike
Re: Solr RPS is painfully low
: fq=gender:f&fq=( friends:y )&fq= country:us&fq= age:(18 || 19 || 20 || : 21)&fq=photos:y that would be my suggestion based on waht i'm guessing your typical use cases are ... but it's really hard to infer patterns from only a single example URL. the queryResultCache isn't nearly as interesting in cases like this as the filterCache is ... your filterCache doesn't even need to be very big to give you huge wins for the type of use cases i'm guessing you have. -Hoss
RE: Solr RPS is painfully low
: I'm only requesting 20 rows, and I'm not specifically sorting by any field. Does solr : automatically induce sort by default, and if so, how do I disable it? default sorting is by score, which is cheap ... walter's question was mainly to verify that you are not sorting sice it is expensive (we have to make guesses as to what might be causing you problems in the absence of seeing your configs or full URLs) -Hoss
Mixing adds, deletes and commit in the same message
I tried to put some adds and deletes in the same request to solr but it didn't work, have I done something wrong or this is really not suported? This is one example: document one9a10b11c12d Test Document Thanks in advance []'s -- Leonardo Santagada
Re: Mixing adds, deletes and commit in the same message
You can commit after a update command by adding a request parameter: /update?commit=true POST: your xml ... ryan Mike Klaas wrote: On 3-Jan-08, at 11:38 AM, Leonardo Santagada wrote: I tried to put some adds and deletes in the same request to solr but it didn't work, have I done something wrong or this is really not suported? It isn't supported. -Mike
Re: Field collapsing
I finally took more than 30 minutes to try and apply the patch and got it to (mostly) work. Will try to submit it tomorrow for review if there's interest. Doug Ryan McKinley wrote: I think the last patch is pre QueryComponent infrastructure it needs to be transformed into a QueryComponent to work. I don't think anyone has tackled that yet... ryan Doug Steigerwald wrote: Modifying the patch to apply. StandardRequestHandler and DisMaxRequestHandler were changed a lot in mid-November and I've been having a hard time figuring out where the changes should be reapplied. Doug Grant Ingersoll wrote: Hi Doug, Is the problem in applying the patch or getting it to work once it is applied? -Grant On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote: Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Re: Field collapsing
excellent! Yes, there is interest. Doug Steigerwald wrote: I finally took more than 30 minutes to try and apply the patch and got it to (mostly) work. Will try to submit it tomorrow for review if there's interest. Doug Ryan McKinley wrote: I think the last patch is pre QueryComponent infrastructure it needs to be transformed into a QueryComponent to work. I don't think anyone has tackled that yet... ryan Doug Steigerwald wrote: Modifying the patch to apply. StandardRequestHandler and DisMaxRequestHandler were changed a lot in mid-November and I've been having a hard time figuring out where the changes should be reapplied. Doug Grant Ingersoll wrote: Hi Doug, Is the problem in applying the patch or getting it to work once it is applied? -Grant On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote: Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Configure solr on tomcat with different indexes
Hello, I have configured solr with tomcat for multiple webapp. This configuration use common index, so now I want to configure solr on different Indexes with tomcat, Please let me how it is possible. -- Thanks, Laxmilal menaria http://www.chambal.com/ http://www.minalyzer.com/ http://www.bucketexplorer.com/
Re: Configure solr on tomcat with different indexes
http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac using different values for solr home should give you new indexes for each. ryan Laxmilal Menaria wrote: Hello, I have configured solr with tomcat for multiple webapp. This configuration use common index, so now I want to configure solr on different Indexes with tomcat, Please let me how it is possible.
Re: Configure solr on tomcat with different indexes
I have tried with solr1.xml and add a solr/home in that, but after that its not showing any results, because its search by default in Tomcat\solr\data\index. LM On 1/4/08, Ryan McKinley <[EMAIL PROTECTED]> wrote: > > > http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac > > using different values for solr home should give you new indexes for each. > > ryan > > Laxmilal Menaria wrote: > > Hello, > > > > I have configured solr with tomcat for multiple webapp. This > configuration > > use common index, so now I want to configure solr on different Indexes > with > > tomcat, Please let me how it is possible. > > > > -- Thanks, Laxmilal menaria http://www.chambal.com/ http://www.minalyzer.com/ http://www.bucketexplorer.com/