Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"
Hello, I still have this issue using Solr 4.4, removing firstSearcher queries did make the problem go away. Note that I'm using Tomcat 7 and that if I'm using my own Java application launching an Embedded Solr Server pointing to the same Solr configuration the server fully starts with no hang. What is the xml tag syntax to have spellcheck=false for firstSearcher discussed above? Cheers, /jonatan --- HANG with Tomcat 7 (firstSearcher queries on) --- <...> 2409 [coreLoadExecutor-3-thread-3] INFO org.apache.solr.handler.component.SpellCheckComponent – No queryConverter defined, using default converter 2409 [coreLoadExecutor-3-thread-3] INFO org.apache.solr.handler.component.QueryElevationComponent – Loading QueryElevation from: /var/lib/myapp/conf/elevate.xml 2415 [coreLoadExecutor-3-thread-3] INFO org.apache.solr.handler.ReplicationHandler – Commits will be reserved for 1 2415 [searcherExecutor-16-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@5c43ecf0main{StandardDirectoryReader(segments_3:23 _9(4.4):C57862)} 2417 [searcherExecutor-16-thread-1] INFO org.apache.solr.core.SolrCore – [foo-20130912] webapp=null path=null params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false} hits=0 status=0 QTime=1 2417 [searcherExecutor-16-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 2417 [searcherExecutor-16-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent – Loading spell index for spellchecker: default 2417 [searcherExecutor-16-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent – Loading spell index for spellchecker: wordbreak 2418 [searcherExecutor-16-thread-1] INFO org.apache.solr.core.SolrCore – [foo-20130912] Registered new searcher Searcher@5c43ecf0main{StandardDirectoryReader(segments_3:23 _9(4.4):C57862)} 2420 [coreLoadExecutor-3-thread-3] INFO org.apache.solr.core.CoreContainer – registering core: foo-20130912 --- NO HANG EmbeddedSolrServer (firstSearcher queries on) --- <...> 1797 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent – No queryConverter defined, using default converter 1797 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.handler.component.QueryElevationComponent – Loading QueryElevation from: /var/lib/myapp/conf/elevate.xml 1800 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.handler.ReplicationHandler – Commits will be reserved for 1 1801 [searcherExecutor-15-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@27b104d7main{StandardDirectoryReader(segments_3:23 _9(4.4):C57862)} 1801 [searcherExecutor-15-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 1801 [searcherExecutor-15-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent – Loading spell index for spellchecker: default 1801 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CoreContainer – registering core: foo-20130912 1801 [searcherExecutor-15-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent – Loading spell index for spellchecker: wordbreak 1801 [searcherExecutor-15-thread-1] INFO org.apache.solr.core.SolrCore – [foo-20130912] Registered new searcher Searcher@27b104d7main{StandardDirectoryReader(segments_3:23 _9(4.4):C57862)} On Fri, Sep 6, 2013 at 4:29 PM, Austin Rasmussen wrote: > : Do all of your cores have "newSearcher" event listners configured or just > : 2 (i'm trying to figure out if it's a timing fluke that these two are > stalled, or if it's something special about the configs) > > All of my cores have both the "newSearcher" and "firstSearcher" event > listeners configured. (The firstSearcher actually doesn't have any queries > configured against it, so it probably should just be removed altogether) > > : Can you try removing the newSearcher listners to confirm that that does > in fact make the problem go away? > > Removing the "newSearcher" listeners does not make the problem go away; > however, removing the "firstSearcher" listener (even if the "newSearcher" > listener is still configured) does make the problem go away. > > : With the newSearcher listeners in place, Can you try setting > "spellcheck=false" as a query param on the newSearcher listeners you have > configured and > : see if that works arround the problem? > > Adding the "spellcheck=false" param to the "firstSearcher" listener does > appear to work around the problem. > > : Assuming it's just 2 cores using these listeners: can you reproduce this > problem with a simpler seup where only one of the affected cores is in use? > > Since it's not just these two cores, I'm not sure how to produce much of a > simpler setup. I did attempt to limit how many cores are loaded in the > solr.xml, and found that if I cut it down to 56, it was able to load > successfully (without any of the above config changed). > > If I cut i
Solr 4.3 and SLF4j
Hi, I've read from http://wiki.apache.org/solr/SolrLogging that Solr no longer ships with Logging jars bundled into the WAR file. For simplicity in package management, other than Solr, I'm trying to stay with stock packages from Ubuntu 12.04 (e.g. Tomcat7 etc.) Now I'm trying to find out what do I need to install to meet the Solr Logging requirements, using Ubuntu packages if possible at all. Initially I thought having 'libslf4j-java' would be enough but that still gave me that Tomcat 7 error at startup: May 06, 2013 1:28:00 PM org.apache.catalina.core.StandardContext filterStart SEVERE: Exception starting filter SolrRequestFilter org.apache.solr.common.SolrException: Could not find necessary SLF4j logging jars. If using Jetty, the SLF4j logging jars need to go in the jetty lib/ext directory. For other containers, the corresponding directory should be used. For more information, see: http://wiki.apache.org/solr/SolrLogging at org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:105) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at java.lang.Class.newInstance0(Class.java:374) at java.lang.Class.newInstance(Class.java:327) at org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:125) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:256) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory at org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:103) ... 24 more Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1701) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1546) ... 25 more Anybody testing 4.3 on Tomcat at the moment? Any help would be appreciated related to Tomcat configuration etc. Cheers, /jonatan
Re: Updating documents
On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho wrote: > Hi there. > > I was checking the faq and found that solr does not support field updates > right. So I assume that in order to update a document, one should first > retrieve it by its Id and then change the required field and update the doc > again. But then I wonder about fields that are indexed and not stored, > since the new document that is sent to the index does not have the values, > would this mean we will loose them? > > BTW any chances we see field level updates on 4.0 like elastic search has? I'm actually also looking a this new feature in 4.0-ALPHA: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ I was wondering where the new xml tags where documented to support these "set", "add to multi-value" etc. -- jonatan > > Regards > > -- > The intuitive mind is a sacred gift and the > rational mind is a faithful servant. We have > created a society that honors the servant and > has forgotten the gift.
Re: Updating documents
Erick, On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson wrote: > Vinicius: > > No, fetching the document from the index, changing selected values and > re-indexing probably > won't work at all. The problem is that you only get _stored_ values > back from Solr. So unless > you've specified 'stored="true" ' for all your fields, you can't use > the doc fetched from Solr to > update a field. > > The partial documents update that Jonatan references also requires > that all the fields be stored. If my only fields with stored="false" are copyField (e.g. I don't need their content to rebuild the document), are they gonna be re-copied with the partial document update? -- jonatan > > You're best bet is to go back to your system-of-record for the data > and re-index the whole > document. > > Best > Erick > > On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier > wrote: >> On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho >> wrote: >>> Hi there. >>> >>> I was checking the faq and found that solr does not support field updates >>> right. So I assume that in order to update a document, one should first >>> retrieve it by its Id and then change the required field and update the doc >>> again. But then I wonder about fields that are indexed and not stored, >>> since the new document that is sent to the index does not have the values, >>> would this mean we will loose them? >>> >>> BTW any chances we see field level updates on 4.0 like elastic search has? >> >> I'm actually also looking a this new feature in 4.0-ALPHA: >> >> http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ >> >> I was wondering where the new xml tags where documented to support >> these "set", "add to multi-value" etc. >> >> -- >> jonatan >> >>> >>> Regards >>> >>> -- >>> The intuitive mind is a sacred gift and the >>> rational mind is a faithful servant. We have >>> created a society that honors the servant and >>> has forgotten the gift.
Re: Updating documents
Yonik, On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley wrote: > On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier > wrote: >> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson >>> The partial documents update that Jonatan references also requires >>> that all the fields be stored. >> >> If my only fields with stored="false" are copyField (e.g. I don't need >> their content to rebuild the document), are they gonna be re-copied >> with the partial document update? > > Correct - your setup should be fine. Only original source fields (non > copyField targets) should have stored=true Another question I had related to partial update... $ ./post.sh foo.json {"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document not found for update. id=foo","code":409}} Is there a flag for: if document does not exist, create it for me? The thing is that I don't know in advance if the document already exist (of course I could query first.. but I have millions of entry to process, might exist, might be an update I don't know...) My naive approach was to have in the same request two documents, one with only "set" using the unique ID, and then in the second one all the "add" (concerning multivalue field). So it would do the following: 1. Document (with id) exist or not don't care, use the following "set" command to update/create 2. 2nd pass, I know you exist (with above id), please add all those to the multivalue fields (none of those fields are in the initial updates) My rationale is that if the document exists, reset some fields, and then append the multivalue fields (those multivalue fields express historical updates) The reason I created 2 documents is that Solr doesn't seem happy if I mix set and add in the same document :) -- jonatan > > -Yonik > http://lucidimagination.com
Re: Updating documents
On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley wrote: > On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier > wrote: >> Is there a flag for: if document does not exist, create it for me? > > Not currently, but it certainly makes sense. > The implementation should be easy. The most difficult part is figuring > out the best syntax to specify this. > > Another idea: we could possibly switch to create-if-not-exist by > default, and use the existing optimistic concurrency mechanism to > specify that the document should exist. > > So specify _version_=1 if the document should exist and _version_=0 > (the default) if you don't care. Yes that would be neat! One more question related to partial document update. So far I'm able to append to multivalue fields, set new value to regular/multivalue fields. One thing I didn't find is the "remove" command, what is its JSON syntax? Thanks, -- jonatan > > -Yonik > http://lucidimagination.com
Re: Updating documents
On Fri, Jul 13, 2012 at 1:43 PM, Yonik Seeley wrote: > On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier > wrote: >> On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley >> wrote: >>> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier >>> wrote: >>>> Is there a flag for: if document does not exist, create it for me? >>> >>> Not currently, but it certainly makes sense. >>> The implementation should be easy. The most difficult part is figuring >>> out the best syntax to specify this. >>> >>> Another idea: we could possibly switch to create-if-not-exist by >>> default, and use the existing optimistic concurrency mechanism to >>> specify that the document should exist. >>> >>> So specify _version_=1 if the document should exist and _version_=0 >>> (the default) if you don't care. >> >> Yes that would be neat! > > I've just committed this change. Super thanks! I assume it will end up in the 4.0 release? > >> One more question related to partial document update. So far I'm able >> to append to multivalue fields, set new value to regular/multivalue >> fields. One thing I didn't find is the "remove" command, what is its >> JSON syntax? > > Set it to the JSON value of null. > > -Yonik > http://lucidimagination.com
Re: Updating documents
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier wrote: > Yonik, > > On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley > wrote: >> On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier >> wrote: >>> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson >>>> The partial documents update that Jonatan references also requires >>>> that all the fields be stored. >>> >>> If my only fields with stored="false" are copyField (e.g. I don't need >>> their content to rebuild the document), are they gonna be re-copied >>> with the partial document update? >> >> Correct - your setup should be fine. Only original source fields (non >> copyField targets) should have stored=true > > Another question I had related to partial update... > > $ ./post.sh foo.json > {"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document > not found for update. id=foo","code":409}} > > Is there a flag for: if document does not exist, create it for me? The > thing is that I don't know in advance if the document already exist > (of course I could query first.. but I have millions of entry to > process, might exist, might be an update I don't know...) > > My naive approach was to have in the same request two documents, one > with only "set" using the unique ID, and then in the second one all > the "add" (concerning multivalue field). > > So it would do the following: > > 1. Document (with id) exist or not don't care, use the following "set" > command to update/create > 2. 2nd pass, I know you exist (with above id), please add all those to > the multivalue fields (none of those fields are in the initial > updates) > > My rationale is that if the document exists, reset some fields, and > then append the multivalue fields (those multivalue fields express > historical updates) Probably silly mistake on my side, but I don't seem to get the "append/add" JSON syntax right for multiValue fields... On my document initial creation it works great with ... "mv_f":"cat1", "mv_f":"cat2", ... But later on when I want to "append" cat3 to the field by doing this: "mv_f":{"add":"cat3"}, ... I end up with something like this in the index: "mv_f":["{add=cat3}"], Obviously something is wrong with my syntax ;) -- jonatan > > The reason I created 2 documents is that Solr doesn't seem happy if I > mix set and add in the same document :) > > -- > jonatan > >> >> -Yonik >> http://lucidimagination.com
Importing data to Solr
Hello, I was wondering if there's other ways to import data in Solr than posting xml/json/csv to the server URL (e.g. locally building the index). Is the DataImporter only for database? My data is in an enormous text file that is parsed in python, I get clean json/xml out of it if I want, but the thing is that it drills down to about 300 millions "documents", so I don't want to execute 300 millions http post in a for loop, even with relaxed soft commits etc it will take weeks, months to populate the index. I need to do that only once on an offline server and never add data back to the index (e.g. becomes a read-only instance). Any temporary index configuration I could have to populate the server with optimal add speed, then turn back the settings optimized for a read only instance? Thanks! -- jonatan
Updating document with the Solr Java API
Hi, What is the Java syntax to create an update document? I was using this in JSON to update/reset some fields of document 12345 (it contains other fields, only updating those): { "add" : { "doc" : { "id":"12345", "foo":{"set":null}, "bar":{"set":"baz"} } } } Now I'm trying to find the equivalent in Java (Embedded Server), I'm doing this: SolrInputDocument solrDoc = new SolrInputDocument(); solrDoc.addField( "id", "12345" ); solrDoc.setField( "foo", null ); solrDoc.setField( "bar", "baz" ); server.add( solrDoc ); But instead of updating like with JSON, it overwrites the whole document in the index. Something I'm missing? I also tried: SolrInputDocument solrDoc = new SolrInputDocument(); solrDoc.setField( "id", "12345" ); solrDoc.setField( "foo", null ); solrDoc.setField( "bar", "baz" ); server.add( solrDoc ); But it does the same thing. Interesting fact, when using setField() and the id doesn't exist it will still create the document, which wasn't the case with JSON before Yunik added a change (I'm still using 4.0.0-ALPHA, not trunk) I've discussed with him previously on this list. Should we be expecting the same behavior from the API and the http JSON/XML/CSV interface? Cheers, --jonatan
Re: Updating document with the Solr Java API
On Tue, Jul 31, 2012 at 10:16 AM, Jonatan Fournier wrote: > Hi, > > What is the Java syntax to create an update document? > > I was using this in JSON to update/reset some fields of document 12345 > (it contains other fields, only updating those): > > { > "add" : { > "doc" : { > "id":"12345", > "foo":{"set":null}, > "bar":{"set":"baz"} > } > } > } > > Now I'm trying to find the equivalent in Java (Embedded Server), I'm doing > this: > > SolrInputDocument solrDoc = new SolrInputDocument(); > solrDoc.addField( "id", "12345" ); > solrDoc.setField( "foo", null ); > solrDoc.setField( "bar", "baz" ); > server.add( solrDoc ); > > But instead of updating like with JSON, it overwrites the whole > document in the index. Something I'm missing? Sorry I just realized that the setField/addField only applies to the SolrInputDocument you manipulate, not setting internal flags for the indexer to treat the SolrInputDocument differently based on if set or add was called... :) > > I also tried: > > SolrInputDocument solrDoc = new SolrInputDocument(); > solrDoc.setField( "id", "12345" ); > solrDoc.setField( "foo", null ); > solrDoc.setField( "bar", "baz" ); > server.add( solrDoc ); > > But it does the same thing. Interesting fact, when using setField() > and the id doesn't exist it will still create the document, which > wasn't the case with JSON before Yunik added a change (I'm still using > 4.0.0-ALPHA, not trunk) I've discussed with him previously on this > list. > > Should we be expecting the same behavior from the API and the http > JSON/XML/CSV interface? > > Cheers, > > --jonatan
Index not loading
Hi, I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer. Within my SolrJ application, the documents are added to the server using the commitWithin parameter (in my case 60s). After 1 day my 125 millions document are all added to the server and I can see 89G of index data files. I stop my SolrJ application and reload my Solr instance in Tomcat. >From the Solr admin panel related to my Core (collection1) I see this info: Last Modified: Num Docs:0 Max Doc:0 Version:1 Segment Count:0 Optimized: (green check) Current: (green check) Master: Version: 0 Gen: 1 Size: 88.14 GB >From the general Core Admin panel I see: lastModified: version:1 numDocs:0 maxDoc:0 optimized: (red circle) current: (green check) hasDeletions: (red circle) If I query my index for *:* I get 0 result. If I trigger optimize it wipes ALL my data inside the index and reset to empty. I've played around my EmbeddedServer initially using autoCommit/softCommit and it was working fine. Now that I've switched to commitWithin the document add query, it always do that! I'm never able to reload my index within Tomcat/Solr. Any idea? Cheers, /jonathan
Re: Index not loading
Hi Erick, On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson wrote: > This is quite odd, it really sounds like you're not > actually committing. So, some questions. > > 1> What happens if you search before you shut > down your tomcat? Do you see docs then? If so, > somehow you're doing soft commits and never > doing a hard commit. No I'm not seeing any documents if I do search for anything. Like mentioned above, Num and Max docs are 0. Like I mentioned below, my index files are not deleted when I start/restart tomcat, but when within tomcat I send a commit/optimize command. On thing I noticed that was different in the log output from the embedded server was that when I use the solrconfig.xml autoCommit, after the delay I see some stdout message about commiting to the index. But when relying on the commitWithin, I never see the solr server output freeze for a moment while commiting, I only see all my add document stdout message. Should the behavior be the same? Or the commit messages pass by so fast I don't see them? It must be trying to do some kind of commit/merge, because when I was monitoring the memory I could see periodic memory increase (when I assumed it was merging) then memory decreased until the next delay... > > 2> What happens if, as the last statement in your SolrJ > program you do a commit()? Let me try that and come back to you, for now here's the commands I was using in the 3 test scenarios: SolrInputDocument doc = new SolrInputDocument(); solrDoc.addField("id", someId); ... server.add(doc); // In the case I have either autoCommit 6 enabled in the solrconfig.xml or // Both scenarios works, in those 2 cases when I shutdown my embeddedserver and restart tomcat I have all my data indexed/commited or server.add(doc, 6) // In the case I don't have autoCommit enabled, try to rely on commitWithin param. > > 3> While you're indexing, what do you see in your index > directory? You should see multiple segments being > created, and possibly merged so the number of > files should go up and down. If you only have a single > set of files, you're somehow not doing a commit. No I do see a bunch of files being created/merged, at the end I had a bout 89G in many many files. Another thing I was playing around when trying to use the commitWithin is to change the true and 10 to reduce the number of files created. Could it impact things? > > 4> Is there something really silly going on like your > restart scripts delete the index directory? Or you're > using a VM that restores a blank image? No VM, no scripts, no replication. > > 5> When you do restart, are there any files at all > in your index directory? When I restart tomcat I do see all the same 89G files that was created using the embedded server, they only vanish when I force a commit or optimize, then it's like if my data directory didn't exist and the 2 initial segment files are being created and all the rest deleted. > > I really suspect you've got some configuration problem > here Maybe, but other than playing with the compound file thingy I don't have any fancy config changes. Cheers, /jonathan > > Best > Erick > > > > On Mon, Aug 13, 2012 at 9:11 AM, Jonatan Fournier > wrote: >> Hi, >> >> I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer. >> >> Within my SolrJ application, the documents are added to the server >> using the commitWithin parameter (in my case 60s). After 1 day my 125 >> millions document are all added to the server and I can see 89G of >> index data files. I stop my SolrJ application and reload my Solr >> instance in Tomcat. >> >> From the Solr admin panel related to my Core (collection1) I see this info: >> >> >> Last Modified: >> Num Docs:0 >> Max Doc:0 >> Version:1 >> Segment Count:0 >> Optimized: (green check) >> Current: (green check) >> Master: >> Version: 0 >> Gen: 1 >> Size: 88.14 GB >> >> >> From the general Core Admin panel I see: >> >> lastModified: >> version:1 >> numDocs:0 >> maxDoc:0 >> optimized: (red circle) >> current: (green check) >> hasDeletions: (red circle) >> >> If I query my index for *:* I get 0 result. If I trigger optimize it >> wipes ALL my data inside the index and reset to empty. I've played >> around my EmbeddedServer initially using autoCommit/softCommit and it >> was working fine. Now that I've switched to commitWithin the document >> add query, it always do that! I'm never able to reload my index within >> Tomcat/Solr. >> >> Any idea? >> >> Cheers, >> >> /jonathan
Re: Index not loading
On Tue, Aug 14, 2012 at 11:14 AM, Jack Krupansky wrote: > If you send a dummy document using a curl command, without the commit > option, does it auto-commit and become visible in 1 minute? Sending a JSON document using curl: { "add": { "commitWithin": 6, "overwrite": false, "doc": { "id" : "1", "type" : "foo" } } } This worked fine. But If use the EmbeddedServer.add(doc, commitWithin) it doesn't show up in the search result. >From this article: http://www.cominvent.com/2011/09/09/discover-commitwithin-in-solr/ I see there's is multiple ways to specify this commitWithin options: https://issues.apache.org/jira/browse/SOLR-2742 introduced it to the .add() methods for SolrServer, could it be broken only there? I will go try this syntax: UpdateRequest req = new UpdateRequest(); req.add(mySolrInputDocument); req.setCommitWithin(1); req.process(server); Cheers, /jonathan > > -- Jack Krupansky > > -Original Message- From: Jonatan Fournier > Sent: Tuesday, August 14, 2012 11:03 AM > To: solr-user@lucene.apache.org ; erickerick...@gmail.com > Subject: Re: Index not loading > > > Hi Erick, > > On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson > wrote: >> >> This is quite odd, it really sounds like you're not >> actually committing. So, some questions. >> >> 1> What happens if you search before you shut >> down your tomcat? Do you see docs then? If so, >> somehow you're doing soft commits and never >> doing a hard commit. > > > No I'm not seeing any documents if I do search for anything. Like > mentioned above, Num and Max docs are 0. > > Like I mentioned below, my index files are not deleted when I > start/restart tomcat, but when within tomcat I send a commit/optimize > command. > > On thing I noticed that was different in the log output from the > embedded server was that when I use the solrconfig.xml autoCommit, > after the delay I see some stdout message about commiting to the > index. But when relying on the commitWithin, I never see the solr > server output freeze for a moment while commiting, I only see all my > add document stdout message. Should the behavior be the same? Or the > commit messages pass by so fast I don't see them? > > It must be trying to do some kind of commit/merge, because when I was > monitoring the memory I could see periodic memory increase (when I > assumed it was merging) then memory decreased until the next delay... > >> >> 2> What happens if, as the last statement in your SolrJ >> program you do a commit()? > > > Let me try that and come back to you, for now here's the commands I > was using in the 3 test scenarios: > > SolrInputDocument doc = new SolrInputDocument(); > solrDoc.addField("id", someId); > ... > server.add(doc); // In the case I have either autoCommit > 6 enabled in the solrconfig.xml or > > // Both scenarios works, in those 2 cases when I shutdown my > embeddedserver and restart tomcat I have all my data indexed/commited > > or > > server.add(doc, 6) // In the case I don't have autoCommit enabled, > try to rely on commitWithin param. > > >> >> 3> While you're indexing, what do you see in your index >> directory? You should see multiple segments being >> created, and possibly merged so the number of >> files should go up and down. If you only have a single >> set of files, you're somehow not doing a commit. > > > No I do see a bunch of files being created/merged, at the end I had a > bout 89G in many many files. > > Another thing I was playing around when trying to use the commitWithin > is to change the true and > 10 to reduce the number of files created. > Could it impact things? > >> >> 4> Is there something really silly going on like your >> restart scripts delete the index directory? Or you're >> using a VM that restores a blank image? > > > No VM, no scripts, no replication. > >> >> 5> When you do restart, are there any files at all >> in your index directory? > > > When I restart tomcat I do see all the same 89G files that was created > using the embedded server, they only vanish when I force a commit or > optimize, then it's like if my data directory didn't exist and the 2 > initial segment files are being created and all the rest deleted. > >> >> I really suspect you've got some configuration problem >> here > > > Maybe, but other than playing with the compound file thingy I don't > have
Re: Index not loading
si, _q.fnm, _p_nrm.cfs, _8_Lucene40_0.tip, _j_nrm.cfs, _q_Lucene40_0.prx, _g.si, _l.fnm, _p.fnm, _k.fdt, _k.fdx, _h_nrm.cfe, _s.fnm, _a.fdt, _9_Lucene40_0.prx, _a.fdx, _l_Lucene40_0.frq, _g.fnm, _6_nrm.cfs, _p_Lucene40_0.tim, _h_nrm.cfs, _p_Lucene40_0.tip, _0.si, _5.fnm, _9_Lucene40_0.tim, _j_Lucene40_0.prx, _6_nrm.cfe, _0_nrm.cfs, _s.fdx, _j.fnm, _0_nrm.cfe, _5.fdx, _0.fdx, _8.fdx, _i.fnm, _0.fdt, segments_4, _8.fdt] commit{dir=/mnt/data/solr/couids/data/index,segFN=segments_5,generation=5,filenames=[_5_nrm.cfe, _v.fdx, _s.fdt, _l.si, _w_Lucene40_0.prx, _k.fnm, _0_Lucene40_0.prx, _r_nrm.cfs, _m.si, _8.si, _8_Lucene40_0.frq, _a_Lucene40_0.frq, _v.fnm, _w.fnm, _r_nrm.cfe, _0_Lucene40_0.tim, _w.fdt, _s.si, _w.fdx, _t_Lucene40_0.tim, _9.fdt, _t_Lucene40_0.tip, _9.fdx, _u_Lucene40_0.frq, _9_Lucene40_0.frq, _0_Lucene40_0.tip, _5_nrm.cfs, _l.fdx, _l_Lucene40_0.prx, _l.fdt, _6.fdt, _t.fdt, _a.fnm, _j.fdx, _k_Lucene40_0.tim, _w.si, _m_Lucene40_0.frq, _k_Lucene40_0.tip, _r_Lucene40_0.frq, _j.fdt, _6.fdx, _a_Lucene40_0.tim, _u.fdx, _t_Lucene40_0.prx, _a_Lucene40_0.tip, _v_Lucene40_0.frq, _m_Lucene40_0.tip, _m_Lucene40_0.tim, _k_Lucene40_0.frq, _r.fdt, _r.fnm, _u.fnm, _5_Lucene40_0.tim, _0_Lucene40_0.frq, _5_Lucene40_0.tip, _r.fdx, _r_Lucene40_0.tim, _r_Lucene40_0.tip, _m_Lucene40_0.prx, _j.si, _v.si, _9.fnm, _p.si, _j_Lucene40_0.tip, _v_Lucene40_0.prx, _p_Lucene40_0.prx, _j_Lucene40_0.tim, _v_Lucene40_0.tip, _s_Lucene40_0.prx, _m.fdt, _v_Lucene40_0.tim, _m.fdx, _6.si, _6.fnm, _5_Lucene40_0.prx, _8_nrm.cfe, _8_Lucene40_0.tim, _p.fdx, _5.fdt, _l_nrm.cfe, _6_Lucene40_0.tim, _p.fdt, _6_Lucene40_0.tip, _u_Lucene40_0.tip, _t_Lucene40_0.frq, _s_Lucene40_0.frq, _u_Lucene40_0.tim, _l_Lucene40_0.tim, _l_nrm.cfs, _l_Lucene40_0.tip, _9_nrm.cfs, _k_Lucene40_0.prx, _9_Lucene40_0.tip, _9.si, _j_Lucene40_0.frq, _m.fnm, _k.si, _s_nrm.cfe, _m_nrm.cfe, _p_Lucene40_0.frq, _5_Lucene40_0.frq, _a_nrm.cfe, _k_nrm.cfe, _0.fnm, _j_nrm.cfe, _a_Lucene40_0.prx, _9_nrm.cfe, _8_Lucene40_0.prx, _s_Lucene40_0.tip, _s_Lucene40_0.tim, _a.si, _a_nrm.cfs, _r_Lucene40_0.prx, _s_nrm.cfs, _6_Lucene40_0.frq, _p_nrm.cfe, _8_nrm.cfs, _5.si, _k_nrm.cfs, _8.fnm, _m_nrm.cfs, _u.si, _u.fdt, _6_Lucene40_0.prx, _r.si, _p_nrm.cfs, _8_Lucene40_0.tip, _j_nrm.cfs, _l.fnm, _t.fnm, _p.fnm, _k.fdt, _w_Lucene40_0.tip, _k.fdx, _s.fnm, _a.fdt, _w_Lucene40_0.tim, _t_nrm.cfs, _9_Lucene40_0.prx, _v_nrm.cfs, _a.fdx, _l_Lucene40_0.frq, _t.si, _6_nrm.cfs, _u_nrm.cfs, _p_Lucene40_0.tim, _p_Lucene40_0.tip, _w_nrm.cfe, _0.si, _w_Lucene40_0.frq, _u_Lucene40_0.prx, _5.fnm, _9_Lucene40_0.tim, _j_Lucene40_0.prx, _v.fdt, _u_nrm.cfe, _6_nrm.cfe, _w_nrm.cfs, _0_nrm.cfs, _s.fdx, _j.fnm, _0_nrm.cfe, _t.fdx, _5.fdx, _v_nrm.cfe, _t_nrm.cfe, _0.fdx, _8.fdx, segments_5, _0.fdt, _8.fdt] Aug 14, 2012 1:02:59 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 5 Aug 14, 2012 1:02:59 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush I don't think my config is wrong, since using the dummy commitWithin JSON update is working, that my autoCommit is always working... What else could be wrong other than the SolrServer in SolrJ? Cheers, /jonathan On Tue, Aug 14, 2012 at 12:30 PM, Jonatan Fournier wrote: > On Tue, Aug 14, 2012 at 11:14 AM, Jack Krupansky > wrote: >> If you send a dummy document using a curl command, without the commit >> option, does it auto-commit and become visible in 1 minute? > > Sending a JSON document using curl: > > { > "add": { > "commitWithin": 6, > "overwrite": false, > "doc": { > "id" : "1", > "type" : "foo" > } > } > } > > This worked fine. But If use the EmbeddedServer.add(doc, commitWithin) > it doesn't show up in the search result. > > From this article: > http://www.cominvent.com/2011/09/09/discover-commitwithin-in-solr/ > > I see there's is multiple ways to specify this commitWithin options: > > https://issues.apache.org/jira/browse/SOLR-2742 introduced it to the > .add() methods for SolrServer, could it be broken only there? > > I will go try this syntax: > > UpdateRequest req = new UpdateRequest(); > req.add(mySolrInputDocument); > req.setCommitWithin(1); > req.process(server); > > Cheers, > > /jonathan > >> >> -- Jack Krupansky >> >> -Original Message- From: Jonatan Fournier >> Sent: Tuesday, August 14, 2012 11:03 AM >> To: solr-user@lucene.apache.org ; erickerick...@gmail.com >> Subject: Re: Index not loading >> >> >> Hi Erick, >> >> On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson >> wrote: >>> >>> This is quite odd, it really sounds like you're not >>> actually committing. So, some questions. >>> >>&g
Re: Index not loading
On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson wrote: > This is quite odd, it really sounds like you're not > actually committing. So, some questions. > > 1> What happens if you search before you shut > down your tomcat? Do you see docs then? If so, > somehow you're doing soft commits and never > doing a hard commit. > > 2> What happens if, as the last statement in your SolrJ > program you do a commit()? When using commitWithin, if I introduce server.commit() within the data load process the data gets commited ( I didn't reproduce with my 89G of data...), if I shutdown my EmbeddedServer and restart it and send a commit, like on Tomcat, all data gets wiped out too. So I guess that there's state loss somewhere. Cheers, /jonathan > > 3> While you're indexing, what do you see in your index > directory? You should see multiple segments being > created, and possibly merged so the number of > files should go up and down. If you only have a single > set of files, you're somehow not doing a commit. > > 4> Is there something really silly going on like your > restart scripts delete the index directory? Or you're > using a VM that restores a blank image? > > 5> When you do restart, are there any files at all > in your index directory? > > I really suspect you've got some configuration problem > here > > Best > Erick > > > > On Mon, Aug 13, 2012 at 9:11 AM, Jonatan Fournier > wrote: >> Hi, >> >> I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer. >> >> Within my SolrJ application, the documents are added to the server >> using the commitWithin parameter (in my case 60s). After 1 day my 125 >> millions document are all added to the server and I can see 89G of >> index data files. I stop my SolrJ application and reload my Solr >> instance in Tomcat. >> >> From the Solr admin panel related to my Core (collection1) I see this info: >> >> >> Last Modified: >> Num Docs:0 >> Max Doc:0 >> Version:1 >> Segment Count:0 >> Optimized: (green check) >> Current: (green check) >> Master: >> Version: 0 >> Gen: 1 >> Size: 88.14 GB >> >> >> From the general Core Admin panel I see: >> >> lastModified: >> version:1 >> numDocs:0 >> maxDoc:0 >> optimized: (red circle) >> current: (green check) >> hasDeletions: (red circle) >> >> If I query my index for *:* I get 0 result. If I trigger optimize it >> wipes ALL my data inside the index and reset to empty. I've played >> around my EmbeddedServer initially using autoCommit/softCommit and it >> was working fine. Now that I've switched to commitWithin the document >> add query, it always do that! I'm never able to reload my index within >> Tomcat/Solr. >> >> Any idea? >> >> Cheers, >> >> /jonathan
Re: Index not loading
On Tue, Aug 14, 2012 at 5:37 PM, Jonatan Fournier wrote: > On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson > wrote: >> This is quite odd, it really sounds like you're not >> actually committing. So, some questions. >> >> 1> What happens if you search before you shut >> down your tomcat? Do you see docs then? If so, >> somehow you're doing soft commits and never >> doing a hard commit. Yeah I just realized the behavior is the same as softCommit, is it the default for commitWithin? Cheers, /jonathan >> >> 2> What happens if, as the last statement in your SolrJ >> program you do a commit()? > > When using commitWithin, if I introduce server.commit() within the > data load process the data gets commited ( I didn't reproduce with my > 89G of data...), if I shutdown my EmbeddedServer and restart it and > send a commit, like on Tomcat, all data gets wiped out too. So I guess > that there's state loss somewhere. > > Cheers, > > /jonathan > >> >> 3> While you're indexing, what do you see in your index >> directory? You should see multiple segments being >> created, and possibly merged so the number of >> files should go up and down. If you only have a single >> set of files, you're somehow not doing a commit. >> >> 4> Is there something really silly going on like your >> restart scripts delete the index directory? Or you're >> using a VM that restores a blank image? >> >> 5> When you do restart, are there any files at all >> in your index directory? >> >> I really suspect you've got some configuration problem >> here >> >> Best >> Erick >> >> >> >> On Mon, Aug 13, 2012 at 9:11 AM, Jonatan Fournier >> wrote: >>> Hi, >>> >>> I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer. >>> >>> Within my SolrJ application, the documents are added to the server >>> using the commitWithin parameter (in my case 60s). After 1 day my 125 >>> millions document are all added to the server and I can see 89G of >>> index data files. I stop my SolrJ application and reload my Solr >>> instance in Tomcat. >>> >>> From the Solr admin panel related to my Core (collection1) I see this info: >>> >>> >>> Last Modified: >>> Num Docs:0 >>> Max Doc:0 >>> Version:1 >>> Segment Count:0 >>> Optimized: (green check) >>> Current: (green check) >>> Master: >>> Version: 0 >>> Gen: 1 >>> Size: 88.14 GB >>> >>> >>> From the general Core Admin panel I see: >>> >>> lastModified: >>> version:1 >>> numDocs:0 >>> maxDoc:0 >>> optimized: (red circle) >>> current: (green check) >>> hasDeletions: (red circle) >>> >>> If I query my index for *:* I get 0 result. If I trigger optimize it >>> wipes ALL my data inside the index and reset to empty. I've played >>> around my EmbeddedServer initially using autoCommit/softCommit and it >>> was working fine. Now that I've switched to commitWithin the document >>> add query, it always do that! I'm never able to reload my index within >>> Tomcat/Solr. >>> >>> Any idea? >>> >>> Cheers, >>> >>> /jonathan
Duplicate in copyField
Hi, I have something strange happening (4.0-BETA), I have a title field: And a copyField: Note that I don't have multivalue set for the title field, but I do end up with multiple value in my field: { "responseHeader":{ "status":0, "QTime":371, "params":{ "indent":"true", "wt":"json", "q":"domain:dyslexia-test.com"}}, "response":{"numFound":1,"start":0,"maxScore":13.414578,"docs":[ { "id":"9f13185f8134ff75cb1c6106ac5db63f", "foo":"bar", "title":["bar", "bar"], ... } I made two operations on that document. First I created it by populating some of its fields, and in a second pass, I queried the document via "id" add other values to the un-populated fields and send the document back. Why is there more than one value for title? At worst should the 2nd operation overwrites the original value? Cheers, /jonathan
Re: Duplicate in copyField
I didn't realize that copyField are implemented via multivalue, I thought they were flat field. What I was trying to do was to have one common field between two different schema, so that my GUI could use both index source for listing by title... I guess I will populate this field manually from my data importer script. Cheers, /jonathan On Tue, Sep 18, 2012 at 1:35 PM, Jonatan Fournier wrote: > Hi, > > I have something strange happening (4.0-BETA), I have a title field: > > omitNorms="true"/> > > And a copyField: > > > > Note that I don't have multivalue set for the title field, but I do > end up with multiple value in my field: > > { > "responseHeader":{ > "status":0, > "QTime":371, > "params":{ > "indent":"true", > "wt":"json", > "q":"domain:dyslexia-test.com"}}, > "response":{"numFound":1,"start":0,"maxScore":13.414578,"docs":[ > { > "id":"9f13185f8134ff75cb1c6106ac5db63f", > "foo":"bar", > "title":["bar", > "bar"], > ... > } > > I made two operations on that document. > > First I created it by populating some of its fields, and in a second > pass, I queried the document via "id" add other values to the > un-populated fields and send the document back. > > Why is there more than one value for title? At worst should the 2nd > operation overwrites the original value? > > Cheers, > > /jonathan