Re: Tagging
On Feb 22, 2007, at 11:30 PM, Gmail Account wrote: I use solr for searching and facets and love it.. The performance is awesome. However I am about to add tagging to my application and I'm having a hard time deciding if I should just database my tags for now until a better solr solution is worked out... Does anyone know what technology some of the larger sites use for tagging? Database (MySQL, SQL Server) with denormalized cache tables everywhere, something similar to solr/lucene, or something else? Simpy, Otis Gospodnetic's creation, uses Lucene. I suspect most of the others use a relational database and lots and lots of caching... especially since the others use tags but not full-text search. Simpy is special! Erik
index browsing with solr
Hello everybody, I'm new to this mailing list, so excuse me if my question has already been debated here (I've searched on the web and found nothing about it). I've used solr for two weeks now, and so far it's a really neat solution. I've replaced my previous index searcher app by solr in my current project, but can not find a way to substitute the browseIndex(field, startterm, numberoftermsretuened) function i've used so far. It's a very usefull method and I'm sure it can be accomplished with solr, but I can't figure how. Lucene does it throught the terms method from the class IndexReader, I think : abstract TermEnum terms(Term t) : Returns an enumeration of all terms after a given term. Does an implementation of this method exists in solr ? If not, is it difficult to develop new instructions for solr ? where I must start to do so ? Thanks ! Pierre-Yves Landron _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
Re: index browsing with solr
Does an implementation of this method exists in solr ? i don;t think so. If not, is it difficult to develop new instructions for solr ? where I must start to do so ? it will be easy to add. take a look at a simple SolrRequestHandler: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/IndexInfoRequestHandler.java this gets the IndexReader and writes out some stuff.
Problem indexing
Hi, This is related to SOLR-81. Things bomb when I try indexing with my probably misconfigured schema.xml: - see it at http://www.krumpir.com/schema.xml - I added a few new fieldTypes, fields, and copyFields - just the diff: http://www.krumpir.com/schema.xml-diff.txt I've created a dictionary.xml file with the following content: application I tried posting that, like this: $ java -jar post.jar http://localhost:8983/solr/update dictionary.xml That bombed with this: SimplePostTool: WARNING: Unexpected response from Solr: 'java.lang.NullPointerException at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248) at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134) at org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler2.java:380) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:236) Does this look familiar to anyone? Anything in that schema.xml looks fishy or plain wrong? Thanks, Otis
Re: Problem indexing
On 2/23/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hi, This is related to SOLR-81. Things bomb when I try indexing with my probably misconfigured schema.xml: - see it at http://www.krumpir.com/schema.xml - I added a few new fieldTypes, fields, and copyFields - just the diff: http://www.krumpir.com/schema.xml-diff.txt I've created a dictionary.xml file with the following content: application I tried posting that, like this: $ java -jar post.jar http://localhost:8983/solr/update dictionary.xml That bombed with this: SimplePostTool: WARNING: Unexpected response from Solr: 'java.lang.NullPointerException at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248) at org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134) at org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler2.java:380) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:236) Does this look familiar to anyone? Anything in that schema.xml looks fishy or plain wrong? I think the document you are adding is missing the uniqueKeyField -Yonik
Re: Problem indexing
Oh, look at that, adding 1 took care of the bombing, nice! Thanks, Otis > I tried posting that, like this: > $ java -jar post.jar http://localhost:8983/solr/update dictionary.xml > > That bombed with this: > > SimplePostTool: WARNING: Unexpected response from Solr: ' status="1">java.lang.NullPointerException > at > org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248) > at > org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134) > at > org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler2.java:380) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:236) > > Does this look familiar to anyone? Anything in that schema.xml looks fishy > or plain wrong? I think the document you are adding is missing the uniqueKeyField -Yonik
Re: Problem indexing
It is a bug, though. That should send an error message, not a stack trace. --wunder On 2/23/07 10:39 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Oh, look at that, adding 1 took care of the bombing, > nice! > > Thanks, > Otis > >> I tried posting that, like this: >> $ java -jar post.jar http://localhost:8983/solr/update dictionary.xml >> >> That bombed with this: >> >> SimplePostTool: WARNING: Unexpected response from Solr: '> status="1">java.lang.NullPointerException >> at >> org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:248) >> at >> org.apache.solr.update.UpdateHandler.getIndexedId(UpdateHandler.java:134) >> at >> org.apache.solr.update.DirectUpdateHandler2.overwriteBoth(DirectUpdateHandler >> 2.java:380) >> at >> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java: >> 236) >> >> Does this look familiar to anyone? Anything in that schema.xml looks fishy >> or plain wrong? > > I think the document you are adding is missing the uniqueKeyField > > -Yonik > > >
Re: Problem indexing
: It is a bug, though. That should send an error message, not a : stack trace. --wunder I opened SOLR-172 to track getting a better exception then an NPE in this case, but that *is* an error message being returned to the client, the message just happens to be a stack trace ... SOLR-141 should hopefully make the way errors get reported more uniform (if/when i/we ever get arround to tackling it) -Hoss
Re: index browsing with solr
On 2/23/07, Pierre-Yves LANDRON <[EMAIL PROTECTED]> wrote: I've used solr for two weeks now, and so far it's a really neat solution. I've replaced my previous index searcher app by solr in my current project, but can not find a way to substitute the browseIndex(field, startterm, numberoftermsretuened) function i've used so far. It's a very usefull method and I'm sure it can be accomplished with solr, but I can't figure how. Lucene does it throught the terms method from the class IndexReader, I think : abstract TermEnum terms(Term t) : Returns an enumeration of all terms after a given term. Does an implementation of this method exists in solr ? You can get this functionality from the current faceting implementation, the downside is that it will be slower. -Yonik
WordNet Ontologies
Hi: Does Solr supports ontology somehow? Has it been tried? Any tips on how should I go about doing so? Thanks. Regards
lots of inserts very fast, out of heap or file descs
I'm trying to add lots of documents at once (hundreds of thousands) in a loop. I don't need these docs to appear as results until I'm done, though. For a simple test, I call the post.sh script in a loop with the same moderately sized xml file. This adds a 20K doc and then commits. Repeat hundreds of thousands of times. This works fine for a while, but eventually (only 10K docs in or so) the Solr instance starts taking longer and longer to respond to my s (I print out the curl time, near the end it takes 10s an add) and the web server (resin 3.0) eventually log dumps out with "out of heap space" (my max heap is 1GB on a 4GB machine.) I also see the "(Too many open files in system)" stacktrace coming from Lucene's SegmentReader during this test. My fs.file-max was 361990, which bumped up to 2m, but I don't know how/why Solr/Lucene would open that many. My question is about best practices for this sort of "bulk add." Since insert time is not a concern, I have some leeway. Should I commit after every add? Should I optimize every so many commits? Is there some reaper on a thread or timer that I should let breathe?
Re: lots of inserts very fast, out of heap or file descs
On 2/23/07, Brian Whitman <[EMAIL PROTECTED]> wrote: I'm trying to add lots of documents at once (hundreds of thousands) in a loop. I don't need these docs to appear as results until I'm done, though. For a simple test, I call the post.sh script in a loop with the same moderately sized xml file. This adds a 20K doc and then commits. Repeat hundreds of thousands of times. Try not committing so often (perhaps until you are done). Don't use post.sh, or modify it to remove the commit. -Yonik
Re: lots of inserts very fast, out of heap or file descs
On 2/23/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 2/23/07, Brian Whitman <[EMAIL PROTECTED]> wrote: > I'm trying to add lots of documents at once (hundreds of thousands) > in a loop. I don't need these docs to appear as results until I'm > done, though. > > For a simple test, I call the post.sh script in a loop with the same > moderately sized xml file. This adds a 20K doc and then commits. > Repeat hundreds of thousands of times. Try not committing so often (perhaps until you are done). Don't use post.sh, or modify it to remove the commit. Part of the problem might be repeatedly inserting the same doc over and over again-- that is an odd pattern of deletes, which might be triggering a bad performance case on the lucene or solr side (I'm assuming the doc has a unique key). -Mike
Re: lots of inserts very fast, out of heap or file descs
Try not committing so often (perhaps until you are done). Don't use post.sh, or modify it to remove the commit. OK, I modified it to not commit after and I also realized I had SOLR-126 (autocommit) on, which I disabled. Is there a rule of thumb on when to commit / optimize? Part of the problem might be repeatedly inserting the same doc over and over again-- that is an odd pattern of deletes, which might be triggering a bad performance case on the lucene or solr side (I'm assuming the doc has a unique key). Could be, but the same issue occurs on 400K unique docs. I made the post.sh test case to see what exactly the issue is. I've discovered something wonky with commits and open files... If either autoCommit is on, or I commit after every add, the number of open file descriptors from Lucene goes up way high and does not go back down. I just ran # lsof | grep solr | wc -l after adding 1000 docs and got 125265 open fdescs. If I stop adding docs this count does not go down -- it does not go down until I restart solr. This would be the cause of my too many files open problem. Turning off autocommit / not commiting after every add keeps this count steady at 100-200. The files are all of type: java 32254 bwhitman 3654r REG 8,1 12 15417767 /home/bwhitman/solr/working/data/index/_86u.nrm (deleted) java 32254 bwhitman 3655r REG 8,1 42024 15417813 /home/bwhitman/solr/working/data/index/_86t.fdt (deleted) java 32254 bwhitman 3656r REG 8,1 16 15417814 /home/bwhitman/solr/working/data/index/_86t.fdx (deleted) java 32254 bwhitman 3657r REG 8,1 27420 15417817 /home/bwhitman/solr/working/data/index/_86t.tis (deleted) java 32254 bwhitman 3658r REG 8,1368 15417818 /home/bwhitman/solr/working/data/index/_86t.tii (deleted) java 32254 bwhitman 3659r REG 8,1 7652 15417815 /home/bwhitman/solr/working/data/index/_86t.frq (deleted) java 32254 bwhitman 3660r REG 8,1 24860 15417816 /home/bwhitman/solr/working/data/index/_86t.prx (deleted) java 32254 bwhitman 3661r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3662r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3663r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3664r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3665r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3666r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3667r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3668r REG 8,1 20 15417819 /home/bwhitman/solr/working/data/index/_86t.nrm (deleted) java 32254 bwhitman 3669r REG 8,1 21012 15417669 /home/bwhitman/solr/working/data/index/_85y.fdt java 32254 bwhitman 3670r REG 8,1 8 15417670 /home/bwhitman/solr/working/data/index/_85y.fdx java 32254 bwhitman 3671r REG 8,1 46736 15417673 /home/bwhitman/solr/working/data/index/_85y.tis java 32254 bwhitman 3672r REG 8,1503 15417674 /home/bwhitman/solr/working/data/index/_85y.tii java 32254 bwhitman 3673r REG 8,1 43936224 15417671 /home/bwhitman/solr/working/data/index/_85y.frq java 32254 bwhitman 3674r REG 8,1 12430 15417672 /home/bwhitman/solr/working/data/index/_85y.prx java 32254 bwhitman 3675r REG 8,1 80004 15417675 /home/bwhitman/solr/working/data/index/_85y.nrm java 32254 bwhitman 3676r REG 8,1 80004 15417675 /home/bwhitman/solr/working/data/index/_85y.nrm java 32254 bwhitman 3677r REG 8,1 80004 15417675 /home/bwhitman/solr/working/data/index/_85y.nrm java 32254 bwhitman 3678r REG 8,1 80004 15417675 /home/bwhitman/solr/working/data/index/_85y.nrm java 32254 bwhitman 3679r REG 8,1 80004 15417675 /home/bwhitman/solr/working/data/index/_85y.nrm java 32254 bwhitman 3680r REG 8,1 80004 15417675 /home/bwhitman/solr/working/data/index/_85y.nrm java 32254 bwhitman 3681r REG 8,1 80004 15417675 /home/bwhitman/solr/working/data/index/_85y.nrm java 32254 bwhitman 3682r REG 8,1 80004 15417675 /home/bwhitman/solr/working/d
Re: WordNet Ontologies
On Feb 23, 2007, at 5:33 PM, rubdabadub wrote: Does Solr supports ontology somehow? Has it been tried? Any tips on how should I go about doing so? What are you wanting to do exactly? Erik
Re: lots of inserts very fast, out of heap or file descs
On 2/23/07, Brian Whitman <[EMAIL PROTECTED]> wrote: > > Try not committing so often (perhaps until you are done). > Don't use post.sh, or modify it to remove the commit. > OK, I modified it to not commit after and I also realized I had SOLR-126 (autocommit) on, which I disabled. Is there a rule of thumb on when to commit / optimize? There is a map entry (UniqueKey->Integer) per document added/deleted, and that's really the only in-memory state that is kept. So you should be good with at least >100K docs. If either autoCommit is on, or I commit after every add, the number of open file descriptors from Lucene goes up way high and does not go back down. Do you have any warming configured? Too many searchers trying to initialize fieldcache entries can blow out the memory, causing most of the CPU to be consumed by the garbage collector. I just ran # lsof | grep solr | wc -l after adding 1000 docs and got 125265 open fdescs. If I stop adding docs this count does not go down Is there detectable activity going on (like CPU usage)? Does the admin page list all of these open searchers (check the statistics page under "CORE") -- it does not go down until I restart solr. This would be the cause of my too many files open problem. Turning off autocommit / not commiting after every add keeps this count steady at 100-200. The files are all of type: [...] Bug or feature? If the searchers holding these index files open are still working, then this is a problem, but not exactly a bug. If not, you may have hit a new bug in searcher synchronization. A workaround is to limit the number of warming searchers (see maxWarmingSearchers in solrconfig.xml) -Yonik
Re: lots of inserts very fast, out of heap or file descs
On Feb 23, 2007, at 8:31 PM, Yonik Seeley wrote: -- it does not go down until I restart solr. This would be the cause of my too many files open problem. Turning off autocommit / not commiting after every add keeps this count steady at 100-200. The files are all of type: [...] Bug or feature? If the searchers holding these index files open are still working, then this is a problem, but not exactly a bug. If not, you may have hit a new bug in searcher synchronization. It doesn't look like it. I hope I'm not getting a reputation on here for "discovering" bugs that seem to be my own fault, you'd all laugh if you knew how much time I wasted before posting about it this evening... But I just narrowed this down to a bad line in my solrconfig.xml. The one I was using said this for some reason : class="solr.XmlUpdateRequestHandler" changing my line to the trunk line fixed the fdesc problem. The confounding thing to me is that the solr install worked fine otherwise. I don't know what would make removing the /xml path make a ton of files open but everything else work OK. If you want to reproduce it: 1) Download trunk/nightly 2) Change line 347 of example/solr/conf/solrconfig.xml to 3) java -jar start.jar... 3) Run post.sh a bunch of times on the same xml file... (in a shell script or whatever) 4) After a few seconds/minutes jetty will crash with "too many open files" Now, to see if this also caused my heap overflow problems. Thanks Mike and Yonik...
Re: lots of inserts very fast, out of heap or file descs
it sounds like we may have a very bad bug in the XmlUpdateRequestHandler to clarify for people who may not know: the long standing "/update" URL has historicaly been handled using a custom servlet, recently some of that code was refactored into a RequestHandler along with a new Dispatcher for RequestHandlers that works based on path mapping -- the goal being to allow more customizable update processing and start accepting updates in a variety of input formats ... if XmlUpdateRequestHandler is mapped to the name "/update" it intercepts requests to the legacy update servlet, and should have functioned exactly the same way. Based on Brain's email, it sounds like it didn't work in *exactly* the same way, because it caused some filedescriptor leaks (and possibly some memory leaks) Hopefully Ryan will be a rock star and spot the probably immediately -- but i'll try to look into it later this weekend. : Date: Fri, 23 Feb 2007 22:33:10 -0500 : From: Brian Whitman <[EMAIL PROTECTED]> : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: lots of inserts very fast, out of heap or file descs : : On Feb 23, 2007, at 8:31 PM, Yonik Seeley wrote: : : >> -- it does not go down until I : >> restart solr. This would be the cause of my too many files open : >> problem. Turning off autocommit / not commiting after every add keeps : >> this count steady at 100-200. The files are all of type: : > [...] : >> Bug or feature? : > : > If the searchers holding these index files open are still working, : > then this is a problem, but not exactly a bug. If not, you may have : > hit a new bug in searcher synchronization. : : It doesn't look like it. I hope I'm not getting a reputation on here : for "discovering" bugs that seem to be my own fault, you'd all laugh : if you knew how much time I wasted before posting about it this : evening... : : But I just narrowed this down to a bad line in my solrconfig.xml. : : The one I was using said this for some reason : : : : 3) java -jar start.jar... : 3) Run post.sh a bunch of times on the same xml file... (in a shell : script or whatever) : 4) After a few seconds/minutes jetty will crash with "too many open : files" : : Now, to see if this also caused my heap overflow problems. Thanks : Mike and Yonik... : : : -Hoss