Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

2013-10-18 Thread Jonatan Fournier
Hello,

I still have this issue using Solr 4.4, removing firstSearcher queries did
make the problem go away.

Note that I'm using Tomcat 7 and that if I'm using my own Java application
launching an Embedded Solr Server pointing to the same Solr configuration
the server fully starts with no hang.

What is the xml tag syntax to have spellcheck=false for firstSearcher
discussed above?

Cheers,

/jonatan

--- HANG with Tomcat 7 (firstSearcher queries on) ---
<...>
2409 [coreLoadExecutor-3-thread-3] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – No queryConverter
defined, using default converter
2409 [coreLoadExecutor-3-thread-3] INFO
 org.apache.solr.handler.component.QueryElevationComponent  – Loading
QueryElevation from: /var/lib/myapp/conf/elevate.xml
2415 [coreLoadExecutor-3-thread-3] INFO
 org.apache.solr.handler.ReplicationHandler  – Commits will be reserved for
 1
2415 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener sending requests to
Searcher@5c43ecf0main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}
2417 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
[foo-20130912] webapp=null path=null
params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false}
hits=0 status=0 QTime=1
2417 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener done.
2417 [searcherExecutor-16-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: default
2417 [searcherExecutor-16-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: wordbreak
2418 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
[foo-20130912] Registered new searcher
Searcher@5c43ecf0main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}
2420 [coreLoadExecutor-3-thread-3] INFO  org.apache.solr.core.CoreContainer
 – registering core: foo-20130912

--- NO HANG EmbeddedSolrServer (firstSearcher queries on) ---
<...>
1797 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – No queryConverter
defined, using default converter
1797 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.handler.component.QueryElevationComponent  – Loading
QueryElevation from: /var/lib/myapp/conf/elevate.xml
1800 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.handler.ReplicationHandler  – Commits will be reserved for
 1
1801 [searcherExecutor-15-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener sending requests to
Searcher@27b104d7main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}
1801 [searcherExecutor-15-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener done.
1801 [searcherExecutor-15-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: default
1801 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.CoreContainer
 – registering core: foo-20130912
1801 [searcherExecutor-15-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: wordbreak
1801 [searcherExecutor-15-thread-1] INFO  org.apache.solr.core.SolrCore  –
[foo-20130912] Registered new searcher
Searcher@27b104d7main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}


On Fri, Sep 6, 2013 at 4:29 PM, Austin Rasmussen wrote:

> : Do all of your cores have "newSearcher" event listners configured or just
> : 2 (i'm trying to figure out if it's a timing fluke that these two are
> stalled, or if it's something special about the configs)
>
> All of my cores have both the "newSearcher" and "firstSearcher" event
> listeners configured. (The firstSearcher actually doesn't have any queries
> configured against it, so it probably should just be removed altogether)
>
> : Can you try removing the newSearcher listners to confirm that that does
> in fact make the problem go away?
>
> Removing the "newSearcher" listeners does not make the problem go away;
> however, removing the "firstSearcher" listener (even if the "newSearcher"
> listener is still configured) does make the problem go away.
>
> : With the newSearcher listeners in place, Can you try setting
> "spellcheck=false" as a query param on the newSearcher listeners you have
> configured and
> : see if that works arround the problem?
>
> Adding the "spellcheck=false" param to the "firstSearcher" listener does
> appear to work around the problem.
>
> : Assuming it's just 2 cores using these listeners: can you reproduce this
> problem with a simpler seup where only one of the affected cores is in use?
>
> Since it's not just these two cores, I'm not sure how to produce much of a
> simpler setup.  I did attempt to limit how many cores are loaded in the
> solr.xml, and found that if I cut it down to 56, it was able to load
> successfully (without any of the above config changed).
>
> If I cut i

Solr 4.3 and SLF4j

2013-05-06 Thread Jonatan Fournier
Hi,

I've read from http://wiki.apache.org/solr/SolrLogging that Solr no longer
ships with Logging jars bundled into the WAR file.

For simplicity in package management, other than Solr, I'm trying to stay
with stock packages from Ubuntu 12.04 (e.g. Tomcat7 etc.)

Now I'm trying to find out what do I need to install to meet the Solr
Logging requirements, using Ubuntu packages if possible at all.

Initially I thought having 'libslf4j-java' would be enough but that still
gave me that Tomcat 7 error at startup:

May 06, 2013 1:28:00 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: Could not find necessary SLF4j
logging jars. If using Jetty, the SLF4j logging jars need to go in the
jetty lib/ext directory. For other containers, the corresponding directory
should be used. For more information, see:
http://wiki.apache.org/solr/SolrLogging
at
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:105)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at java.lang.Class.newInstance0(Class.java:374)
at java.lang.Class.newInstance(Class.java:327)
at
org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:125)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:256)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:103)
... 24 more
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1701)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1546)
... 25 more

Anybody testing 4.3 on Tomcat at the moment? Any help would be appreciated
related to Tomcat configuration etc.

Cheers,

/jonatan


Re: Updating documents

2012-07-11 Thread Jonatan Fournier
On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
 wrote:
> Hi there.
>
> I was checking the faq and found that solr does not support field updates
> right. So I assume that in order to update a document, one should first
> retrieve it by its Id and then change the required field and update the doc
> again. But then I wonder about fields that are indexed and not stored,
> since the new document that is sent to the index does not have the values,
> would this mean we will loose them?
>
> BTW any chances we see field level updates on 4.0 like elastic search has?

I'm actually also looking a this new feature in 4.0-ALPHA:

http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

I was wondering where the new xml tags where documented to support
these "set", "add to multi-value" etc.

--
jonatan

>
> Regards
>
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.


Re: Updating documents

2012-07-12 Thread Jonatan Fournier
Erick,

On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
 wrote:
> Vinicius:
>
> No, fetching the document from the index, changing selected values and
> re-indexing probably
> won't work at all. The problem is that you only get _stored_ values
> back from Solr. So unless
> you've specified 'stored="true" ' for all your fields, you can't use
> the doc fetched from Solr to
> update a field.
>
> The partial documents update that Jonatan references also requires
> that all the fields be stored.

If my only fields with stored="false" are copyField (e.g. I don't need
their content to rebuild the document), are they gonna be re-copied
with the partial document update?

--
jonatan

>
> You're best bet is to go back to your system-of-record for the data
> and re-index the whole
> document.
>
> Best
> Erick
>
> On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier
>  wrote:
>> On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
>>  wrote:
>>> Hi there.
>>>
>>> I was checking the faq and found that solr does not support field updates
>>> right. So I assume that in order to update a document, one should first
>>> retrieve it by its Id and then change the required field and update the doc
>>> again. But then I wonder about fields that are indexed and not stored,
>>> since the new document that is sent to the index does not have the values,
>>> would this mean we will loose them?
>>>
>>> BTW any chances we see field level updates on 4.0 like elastic search has?
>>
>> I'm actually also looking a this new feature in 4.0-ALPHA:
>>
>> http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
>>
>> I was wondering where the new xml tags where documented to support
>> these "set", "add to multi-value" etc.
>>
>> --
>> jonatan
>>
>>>
>>> Regards
>>>
>>> --
>>> The intuitive mind is a sacred gift and the
>>> rational mind is a faithful servant. We have
>>> created a society that honors the servant and
>>> has forgotten the gift.


Re: Updating documents

2012-07-12 Thread Jonatan Fournier
Yonik,

On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
 wrote:
> On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
>  wrote:
>> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
>>> The partial documents update that Jonatan references also requires
>>> that all the fields be stored.
>>
>> If my only fields with stored="false" are copyField (e.g. I don't need
>> their content to rebuild the document), are they gonna be re-copied
>> with the partial document update?
>
> Correct - your setup should be fine.  Only original source fields (non
> copyField targets) should have stored=true

Another question I had related to partial update...

$ ./post.sh foo.json
{"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
not found for update.  id=foo","code":409}}

Is there a flag for: if document does not exist, create it for me? The
thing is that I don't know in advance if the document already exist
(of course I could query first.. but I have millions of entry to
process, might exist, might be an update I don't know...)

My naive approach was to have in the same request two documents, one
with only "set" using the unique ID, and then in the second one all
the "add" (concerning multivalue field).

So it would do the following:

1. Document (with id) exist or not don't care, use the following "set"
command to update/create
2. 2nd pass, I know you exist (with above id), please add all those to
the multivalue fields (none of those fields are in the initial
updates)

My rationale is that if the document exists, reset some fields, and
then append the multivalue fields (those multivalue fields express
historical updates)

The reason I created 2 documents is that Solr doesn't seem happy if I
mix set and add in the same document :)

--
jonatan

>
> -Yonik
> http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Jonatan Fournier
On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
 wrote:
> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>  wrote:
>> Is there a flag for: if document does not exist, create it for me?
>
> Not currently, but it certainly makes sense.
> The implementation should be easy. The most difficult part is figuring
> out the best syntax to specify this.
>
> Another idea: we could possibly switch to create-if-not-exist by
> default, and use the existing optimistic concurrency mechanism to
> specify that the document should exist.
>
> So specify _version_=1 if the document should exist and _version_=0
> (the default) if you don't care.

Yes that would be neat!

One more question related to partial document update. So far I'm able
to append to multivalue fields, set new value to regular/multivalue
fields. One thing I didn't find is the "remove" command, what is its
JSON syntax?

Thanks,

--
jonatan

>
> -Yonik
> http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Jonatan Fournier
On Fri, Jul 13, 2012 at 1:43 PM, Yonik Seeley
 wrote:
> On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier
>  wrote:
>> On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
>>  wrote:
>>> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>>>  wrote:
>>>> Is there a flag for: if document does not exist, create it for me?
>>>
>>> Not currently, but it certainly makes sense.
>>> The implementation should be easy. The most difficult part is figuring
>>> out the best syntax to specify this.
>>>
>>> Another idea: we could possibly switch to create-if-not-exist by
>>> default, and use the existing optimistic concurrency mechanism to
>>> specify that the document should exist.
>>>
>>> So specify _version_=1 if the document should exist and _version_=0
>>> (the default) if you don't care.
>>
>> Yes that would be neat!
>
> I've just committed this change.

Super thanks! I assume it will end up in the 4.0 release?

>
>> One more question related to partial document update. So far I'm able
>> to append to multivalue fields, set new value to regular/multivalue
>> fields. One thing I didn't find is the "remove" command, what is its
>> JSON syntax?
>
> Set it to the JSON value of null.
>
> -Yonik
> http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Jonatan Fournier
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
 wrote:
> Yonik,
>
> On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
>  wrote:
>> On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
>>  wrote:
>>> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
>>>> The partial documents update that Jonatan references also requires
>>>> that all the fields be stored.
>>>
>>> If my only fields with stored="false" are copyField (e.g. I don't need
>>> their content to rebuild the document), are they gonna be re-copied
>>> with the partial document update?
>>
>> Correct - your setup should be fine.  Only original source fields (non
>> copyField targets) should have stored=true
>
> Another question I had related to partial update...
>
> $ ./post.sh foo.json
> {"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
> not found for update.  id=foo","code":409}}
>
> Is there a flag for: if document does not exist, create it for me? The
> thing is that I don't know in advance if the document already exist
> (of course I could query first.. but I have millions of entry to
> process, might exist, might be an update I don't know...)
>
> My naive approach was to have in the same request two documents, one
> with only "set" using the unique ID, and then in the second one all
> the "add" (concerning multivalue field).
>
> So it would do the following:
>
> 1. Document (with id) exist or not don't care, use the following "set"
> command to update/create
> 2. 2nd pass, I know you exist (with above id), please add all those to
> the multivalue fields (none of those fields are in the initial
> updates)
>
> My rationale is that if the document exists, reset some fields, and
> then append the multivalue fields (those multivalue fields express
> historical updates)

Probably silly mistake on my side, but I don't seem to get the
"append/add" JSON syntax right for multiValue fields...

On my document initial creation it works great with

...
"mv_f":"cat1",
"mv_f":"cat2",
...

But later on when I want to "append" cat3 to the field by doing this:

"mv_f":{"add":"cat3"},
...

I end up with something like this in the index:

"mv_f":["{add=cat3}"],

Obviously something is wrong with my syntax ;)

--
jonatan

>
> The reason I created 2 documents is that Solr doesn't seem happy if I
> mix set and add in the same document :)
>
> --
> jonatan
>
>>
>> -Yonik
>> http://lucidimagination.com


Importing data to Solr

2012-07-19 Thread Jonatan Fournier
Hello,

I was wondering if there's other ways to import data in Solr than
posting xml/json/csv to the server URL (e.g. locally building the
index). Is the DataImporter only for database?

My data is in an enormous text file that is parsed in python, I get
clean json/xml out of it if I want, but the thing is that it drills
down to about 300 millions "documents", so I don't want to execute 300
millions http post in a for loop, even with relaxed soft commits etc
it will take weeks, months to populate the index.

I need to do that only once on an offline server and never add data
back to the index (e.g. becomes a read-only instance).

Any temporary index configuration I could have to populate the server
with optimal add speed, then turn back the settings optimized for a
read only instance?

Thanks!

--
jonatan


Updating document with the Solr Java API

2012-07-31 Thread Jonatan Fournier
Hi,

What is the Java syntax to create an update document?

I was using this in JSON to update/reset some fields of document 12345
(it contains other fields, only updating those):

{
  "add" : {
"doc" : {
  "id":"12345",
  "foo":{"set":null},
  "bar":{"set":"baz"}
}
  }
}

Now I'm trying to find the equivalent in Java (Embedded Server), I'm doing this:

SolrInputDocument solrDoc = new SolrInputDocument();
solrDoc.addField( "id", "12345" );
solrDoc.setField( "foo", null );
solrDoc.setField( "bar", "baz" );
server.add( solrDoc );

But instead of updating like with JSON, it overwrites the whole
document in the index. Something I'm missing?

I also tried:

SolrInputDocument solrDoc = new SolrInputDocument();
solrDoc.setField( "id", "12345" );
solrDoc.setField( "foo", null );
solrDoc.setField( "bar", "baz" );
server.add( solrDoc );

But it does the same thing. Interesting fact, when using setField()
and the id doesn't exist it will still create the document, which
wasn't the case with JSON before Yunik added a change (I'm still using
4.0.0-ALPHA, not trunk) I've discussed with him previously on this
list.

Should we be expecting the same behavior from the API and the http
JSON/XML/CSV interface?

Cheers,

--jonatan


Re: Updating document with the Solr Java API

2012-07-31 Thread Jonatan Fournier
On Tue, Jul 31, 2012 at 10:16 AM, Jonatan Fournier
 wrote:
> Hi,
>
> What is the Java syntax to create an update document?
>
> I was using this in JSON to update/reset some fields of document 12345
> (it contains other fields, only updating those):
>
> {
>   "add" : {
> "doc" : {
>   "id":"12345",
>   "foo":{"set":null},
>   "bar":{"set":"baz"}
> }
>   }
> }
>
> Now I'm trying to find the equivalent in Java (Embedded Server), I'm doing 
> this:
>
> SolrInputDocument solrDoc = new SolrInputDocument();
> solrDoc.addField( "id", "12345" );
> solrDoc.setField( "foo", null );
> solrDoc.setField( "bar", "baz" );
> server.add( solrDoc );
>
> But instead of updating like with JSON, it overwrites the whole
> document in the index. Something I'm missing?

Sorry I just realized that the setField/addField only applies to the
SolrInputDocument you manipulate, not setting internal flags for the
indexer to treat the SolrInputDocument differently based on if set or
add was called... :)

>
> I also tried:
>
> SolrInputDocument solrDoc = new SolrInputDocument();
> solrDoc.setField( "id", "12345" );
> solrDoc.setField( "foo", null );
> solrDoc.setField( "bar", "baz" );
> server.add( solrDoc );
>
> But it does the same thing. Interesting fact, when using setField()
> and the id doesn't exist it will still create the document, which
> wasn't the case with JSON before Yunik added a change (I'm still using
> 4.0.0-ALPHA, not trunk) I've discussed with him previously on this
> list.
>
> Should we be expecting the same behavior from the API and the http
> JSON/XML/CSV interface?
>
> Cheers,
>
> --jonatan


Index not loading

2012-08-13 Thread Jonatan Fournier
Hi,

I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer.

Within my SolrJ application, the documents are added to the server
using the commitWithin parameter (in my case 60s). After 1 day my 125
millions document are all added to the server and I can see 89G of
index data files. I stop my SolrJ application and reload my Solr
instance in Tomcat.

>From the Solr admin panel related to my Core (collection1) I see this info:


Last Modified:
Num Docs:0
Max Doc:0
Version:1
Segment Count:0
Optimized: (green check)
Current:  (green check)
Master: 
Version: 0
Gen: 1
Size: 88.14 GB


>From the general Core Admin panel I see:

lastModified:
version:1
numDocs:0
maxDoc:0
optimized: (red circle)
current: (green check)
hasDeletions: (red circle)

If I query my index for *:* I get 0 result. If I trigger optimize it
wipes ALL my data inside the index and reset to empty. I've played
around my EmbeddedServer initially using autoCommit/softCommit and it
was working fine. Now that I've switched to commitWithin the document
add query, it always do that! I'm never able to reload my index within
Tomcat/Solr.

Any idea?

Cheers,

/jonathan


Re: Index not loading

2012-08-14 Thread Jonatan Fournier
Hi Erick,

On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson
 wrote:
> This is quite odd, it really sounds like you're not
> actually committing. So, some questions.
>
> 1> What happens if you search before you shut
> down your tomcat? Do you see docs then? If so,
> somehow you're doing soft commits and never
> doing a hard commit.

No I'm not seeing any documents if I do search for anything. Like
mentioned above, Num and Max docs are 0.

Like I mentioned below, my index files are not deleted when I
start/restart tomcat, but when within tomcat I send a commit/optimize
command.

On thing I noticed that was different in the log output from the
embedded server was that when I use the solrconfig.xml autoCommit,
after the delay I see some stdout message about commiting to the
index. But when relying on the commitWithin, I never see the solr
server output freeze for a moment while commiting, I only see all my
add document stdout message. Should the behavior be the same? Or the
commit messages pass by so fast I don't see them?

It must be trying to do some kind of commit/merge, because when I was
monitoring the memory I could see periodic memory increase (when I
assumed it was merging) then memory decreased until the next delay...

>
> 2> What happens if, as the last statement in your SolrJ
> program you do a commit()?

Let me try that and come back to you, for now here's the commands I
was using in the 3 test scenarios:

SolrInputDocument doc = new SolrInputDocument();
solrDoc.addField("id", someId);
...
server.add(doc); // In the case I have either autoCommit
6 enabled in the solrconfig.xml or

// Both scenarios works, in those 2 cases when I shutdown my
embeddedserver and restart tomcat I have all my data indexed/commited

or

server.add(doc, 6) // In the case I don't have autoCommit enabled,
try to rely on commitWithin param.


>
> 3> While you're indexing, what do you see in your index
> directory? You should see multiple segments being
> created, and possibly merged so the number of
> files should go up and down. If you only have a single
> set of files, you're somehow not doing a commit.

No I do see a bunch of files being created/merged, at the end I had a
bout 89G in many many files.

Another thing I was playing around when trying to use the commitWithin
is to change the true and
10 to reduce the number of files created.
Could it impact things?

>
> 4> Is there something really silly going on like your
> restart scripts delete the index directory? Or you're
> using a VM that restores a blank image?

No VM, no scripts, no replication.

>
> 5> When you do restart, are there any files at all
> in your index directory?

When I restart tomcat I do see all the same 89G files that was created
using the embedded server, they only vanish when I force a commit or
optimize, then it's like if my data directory didn't exist and the 2
initial segment files are being created and all the rest deleted.

>
> I really suspect you've got some configuration problem
> here

Maybe, but other than playing with the compound file thingy I don't
have any fancy config changes.

Cheers,

/jonathan

>
> Best
> Erick
>
>
>
> On Mon, Aug 13, 2012 at 9:11 AM, Jonatan Fournier
>  wrote:
>> Hi,
>>
>> I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer.
>>
>> Within my SolrJ application, the documents are added to the server
>> using the commitWithin parameter (in my case 60s). After 1 day my 125
>> millions document are all added to the server and I can see 89G of
>> index data files. I stop my SolrJ application and reload my Solr
>> instance in Tomcat.
>>
>> From the Solr admin panel related to my Core (collection1) I see this info:
>>
>>
>> Last Modified:
>> Num Docs:0
>> Max Doc:0
>> Version:1
>> Segment Count:0
>> Optimized: (green check)
>> Current:  (green check)
>> Master:
>> Version: 0
>> Gen: 1
>> Size: 88.14 GB
>>
>>
>> From the general Core Admin panel I see:
>>
>> lastModified:
>> version:1
>> numDocs:0
>> maxDoc:0
>> optimized: (red circle)
>> current: (green check)
>> hasDeletions: (red circle)
>>
>> If I query my index for *:* I get 0 result. If I trigger optimize it
>> wipes ALL my data inside the index and reset to empty. I've played
>> around my EmbeddedServer initially using autoCommit/softCommit and it
>> was working fine. Now that I've switched to commitWithin the document
>> add query, it always do that! I'm never able to reload my index within
>> Tomcat/Solr.
>>
>> Any idea?
>>
>> Cheers,
>>
>> /jonathan


Re: Index not loading

2012-08-14 Thread Jonatan Fournier
On Tue, Aug 14, 2012 at 11:14 AM, Jack Krupansky
 wrote:
> If you send a dummy document using a curl command, without the commit
> option, does it auto-commit and become visible in 1 minute?

Sending a JSON document using curl:

{
  "add": {
"commitWithin": 6,
"overwrite": false,
"doc": {
  "id" : "1",
  "type" : "foo"
}
  }
}

This worked fine. But If use the EmbeddedServer.add(doc, commitWithin)
it doesn't show up in the search result.

>From this article:
http://www.cominvent.com/2011/09/09/discover-commitwithin-in-solr/

I see there's is multiple ways to specify this commitWithin options:

https://issues.apache.org/jira/browse/SOLR-2742 introduced it to the
.add() methods for SolrServer, could it be broken only there?

I will go try this syntax:

UpdateRequest req = new UpdateRequest();
req.add(mySolrInputDocument);
req.setCommitWithin(1);
req.process(server);

Cheers,

/jonathan

>
> -- Jack Krupansky
>
> -Original Message- From: Jonatan Fournier
> Sent: Tuesday, August 14, 2012 11:03 AM
> To: solr-user@lucene.apache.org ; erickerick...@gmail.com
> Subject: Re: Index not loading
>
>
> Hi Erick,
>
> On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson
>  wrote:
>>
>> This is quite odd, it really sounds like you're not
>> actually committing. So, some questions.
>>
>> 1> What happens if you search before you shut
>> down your tomcat? Do you see docs then? If so,
>> somehow you're doing soft commits and never
>> doing a hard commit.
>
>
> No I'm not seeing any documents if I do search for anything. Like
> mentioned above, Num and Max docs are 0.
>
> Like I mentioned below, my index files are not deleted when I
> start/restart tomcat, but when within tomcat I send a commit/optimize
> command.
>
> On thing I noticed that was different in the log output from the
> embedded server was that when I use the solrconfig.xml autoCommit,
> after the delay I see some stdout message about commiting to the
> index. But when relying on the commitWithin, I never see the solr
> server output freeze for a moment while commiting, I only see all my
> add document stdout message. Should the behavior be the same? Or the
> commit messages pass by so fast I don't see them?
>
> It must be trying to do some kind of commit/merge, because when I was
> monitoring the memory I could see periodic memory increase (when I
> assumed it was merging) then memory decreased until the next delay...
>
>>
>> 2> What happens if, as the last statement in your SolrJ
>> program you do a commit()?
>
>
> Let me try that and come back to you, for now here's the commands I
> was using in the 3 test scenarios:
>
> SolrInputDocument doc = new SolrInputDocument();
> solrDoc.addField("id", someId);
> ...
> server.add(doc); // In the case I have either autoCommit
> 6 enabled in the solrconfig.xml or
> 
> // Both scenarios works, in those 2 cases when I shutdown my
> embeddedserver and restart tomcat I have all my data indexed/commited
>
> or
>
> server.add(doc, 6) // In the case I don't have autoCommit enabled,
> try to rely on commitWithin param.
>
>
>>
>> 3> While you're indexing, what do you see in your index
>> directory? You should see multiple segments being
>> created, and possibly merged so the number of
>> files should go up and down. If you only have a single
>> set of files, you're somehow not doing a commit.
>
>
> No I do see a bunch of files being created/merged, at the end I had a
> bout 89G in many many files.
>
> Another thing I was playing around when trying to use the commitWithin
> is to change the true and
> 10 to reduce the number of files created.
> Could it impact things?
>
>>
>> 4> Is there something really silly going on like your
>> restart scripts delete the index directory? Or you're
>> using a VM that restores a blank image?
>
>
> No VM, no scripts, no replication.
>
>>
>> 5> When you do restart, are there any files at all
>> in your index directory?
>
>
> When I restart tomcat I do see all the same 89G files that was created
> using the embedded server, they only vanish when I force a commit or
> optimize, then it's like if my data directory didn't exist and the 2
> initial segment files are being created and all the rest deleted.
>
>>
>> I really suspect you've got some configuration problem
>> here
>
>
> Maybe, but other than playing with the compound file thingy I don't
> have

Re: Index not loading

2012-08-14 Thread Jonatan Fournier
si, _q.fnm,
_p_nrm.cfs, _8_Lucene40_0.tip, _j_nrm.cfs, _q_Lucene40_0.prx, _g.si,
_l.fnm, _p.fnm, _k.fdt, _k.fdx, _h_nrm.cfe, _s.fnm, _a.fdt,
_9_Lucene40_0.prx, _a.fdx, _l_Lucene40_0.frq, _g.fnm, _6_nrm.cfs,
_p_Lucene40_0.tim, _h_nrm.cfs, _p_Lucene40_0.tip, _0.si, _5.fnm,
_9_Lucene40_0.tim, _j_Lucene40_0.prx, _6_nrm.cfe, _0_nrm.cfs, _s.fdx,
_j.fnm, _0_nrm.cfe, _5.fdx, _0.fdx, _8.fdx, _i.fnm, _0.fdt,
segments_4, _8.fdt]

commit{dir=/mnt/data/solr/couids/data/index,segFN=segments_5,generation=5,filenames=[_5_nrm.cfe,
_v.fdx, _s.fdt, _l.si, _w_Lucene40_0.prx, _k.fnm, _0_Lucene40_0.prx,
_r_nrm.cfs, _m.si, _8.si, _8_Lucene40_0.frq, _a_Lucene40_0.frq,
_v.fnm, _w.fnm, _r_nrm.cfe, _0_Lucene40_0.tim, _w.fdt, _s.si, _w.fdx,
_t_Lucene40_0.tim, _9.fdt, _t_Lucene40_0.tip, _9.fdx,
_u_Lucene40_0.frq, _9_Lucene40_0.frq, _0_Lucene40_0.tip, _5_nrm.cfs,
_l.fdx, _l_Lucene40_0.prx, _l.fdt, _6.fdt, _t.fdt, _a.fnm, _j.fdx,
_k_Lucene40_0.tim, _w.si, _m_Lucene40_0.frq, _k_Lucene40_0.tip,
_r_Lucene40_0.frq, _j.fdt, _6.fdx, _a_Lucene40_0.tim, _u.fdx,
_t_Lucene40_0.prx, _a_Lucene40_0.tip, _v_Lucene40_0.frq,
_m_Lucene40_0.tip, _m_Lucene40_0.tim, _k_Lucene40_0.frq, _r.fdt,
_r.fnm, _u.fnm, _5_Lucene40_0.tim, _0_Lucene40_0.frq,
_5_Lucene40_0.tip, _r.fdx, _r_Lucene40_0.tim, _r_Lucene40_0.tip,
_m_Lucene40_0.prx, _j.si, _v.si, _9.fnm, _p.si, _j_Lucene40_0.tip,
_v_Lucene40_0.prx, _p_Lucene40_0.prx, _j_Lucene40_0.tim,
_v_Lucene40_0.tip, _s_Lucene40_0.prx, _m.fdt, _v_Lucene40_0.tim,
_m.fdx, _6.si, _6.fnm, _5_Lucene40_0.prx, _8_nrm.cfe,
_8_Lucene40_0.tim, _p.fdx, _5.fdt, _l_nrm.cfe, _6_Lucene40_0.tim,
_p.fdt, _6_Lucene40_0.tip, _u_Lucene40_0.tip, _t_Lucene40_0.frq,
_s_Lucene40_0.frq, _u_Lucene40_0.tim, _l_Lucene40_0.tim, _l_nrm.cfs,
_l_Lucene40_0.tip, _9_nrm.cfs, _k_Lucene40_0.prx, _9_Lucene40_0.tip,
_9.si, _j_Lucene40_0.frq, _m.fnm, _k.si, _s_nrm.cfe, _m_nrm.cfe,
_p_Lucene40_0.frq, _5_Lucene40_0.frq, _a_nrm.cfe, _k_nrm.cfe, _0.fnm,
_j_nrm.cfe, _a_Lucene40_0.prx, _9_nrm.cfe, _8_Lucene40_0.prx,
_s_Lucene40_0.tip, _s_Lucene40_0.tim, _a.si, _a_nrm.cfs,
_r_Lucene40_0.prx, _s_nrm.cfs, _6_Lucene40_0.frq, _p_nrm.cfe,
_8_nrm.cfs, _5.si, _k_nrm.cfs, _8.fnm, _m_nrm.cfs, _u.si, _u.fdt,
_6_Lucene40_0.prx, _r.si, _p_nrm.cfs, _8_Lucene40_0.tip, _j_nrm.cfs,
_l.fnm, _t.fnm, _p.fnm, _k.fdt, _w_Lucene40_0.tip, _k.fdx, _s.fnm,
_a.fdt, _w_Lucene40_0.tim, _t_nrm.cfs, _9_Lucene40_0.prx, _v_nrm.cfs,
_a.fdx, _l_Lucene40_0.frq, _t.si, _6_nrm.cfs, _u_nrm.cfs,
_p_Lucene40_0.tim, _p_Lucene40_0.tip, _w_nrm.cfe, _0.si,
_w_Lucene40_0.frq, _u_Lucene40_0.prx, _5.fnm, _9_Lucene40_0.tim,
_j_Lucene40_0.prx, _v.fdt, _u_nrm.cfe, _6_nrm.cfe, _w_nrm.cfs,
_0_nrm.cfs, _s.fdx, _j.fnm, _0_nrm.cfe, _t.fdx, _5.fdx, _v_nrm.cfe,
_t_nrm.cfe, _0.fdx, _8.fdx, segments_5, _0.fdt, _8.fdt]
Aug 14, 2012 1:02:59 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 5
Aug 14, 2012 1:02:59 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush

I don't think my config is wrong, since using the dummy commitWithin
JSON update is working, that my autoCommit is always working... What
else could be wrong other than the SolrServer in SolrJ?

Cheers,

/jonathan

On Tue, Aug 14, 2012 at 12:30 PM, Jonatan Fournier
 wrote:
> On Tue, Aug 14, 2012 at 11:14 AM, Jack Krupansky
>  wrote:
>> If you send a dummy document using a curl command, without the commit
>> option, does it auto-commit and become visible in 1 minute?
>
> Sending a JSON document using curl:
>
> {
>   "add": {
> "commitWithin": 6,
> "overwrite": false,
> "doc": {
>   "id" : "1",
>   "type" : "foo"
> }
>   }
> }
>
> This worked fine. But If use the EmbeddedServer.add(doc, commitWithin)
> it doesn't show up in the search result.
>
> From this article:
> http://www.cominvent.com/2011/09/09/discover-commitwithin-in-solr/
>
> I see there's is multiple ways to specify this commitWithin options:
>
> https://issues.apache.org/jira/browse/SOLR-2742 introduced it to the
> .add() methods for SolrServer, could it be broken only there?
>
> I will go try this syntax:
>
> UpdateRequest req = new UpdateRequest();
> req.add(mySolrInputDocument);
> req.setCommitWithin(1);
> req.process(server);
>
> Cheers,
>
> /jonathan
>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Jonatan Fournier
>> Sent: Tuesday, August 14, 2012 11:03 AM
>> To: solr-user@lucene.apache.org ; erickerick...@gmail.com
>> Subject: Re: Index not loading
>>
>>
>> Hi Erick,
>>
>> On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson
>>  wrote:
>>>
>>> This is quite odd, it really sounds like you're not
>>> actually committing. So, some questions.
>>>
>>&g

Re: Index not loading

2012-08-14 Thread Jonatan Fournier
On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson
 wrote:
> This is quite odd, it really sounds like you're not
> actually committing. So, some questions.
>
> 1> What happens if you search before you shut
> down your tomcat? Do you see docs then? If so,
> somehow you're doing soft commits and never
> doing a hard commit.
>
> 2> What happens if, as the last statement in your SolrJ
> program you do a commit()?

When using commitWithin, if I introduce server.commit() within the
data load process the data gets commited ( I didn't reproduce with my
89G of data...), if I shutdown my EmbeddedServer and restart it and
send a commit, like on Tomcat, all data gets wiped out too. So I guess
that there's state loss somewhere.

Cheers,

/jonathan

>
> 3> While you're indexing, what do you see in your index
> directory? You should see multiple segments being
> created, and possibly merged so the number of
> files should go up and down. If you only have a single
> set of files, you're somehow not doing a commit.
>
> 4> Is there something really silly going on like your
> restart scripts delete the index directory? Or you're
> using a VM that restores a blank image?
>
> 5> When you do restart, are there any files at all
> in your index directory?
>
> I really suspect you've got some configuration problem
> here
>
> Best
> Erick
>
>
>
> On Mon, Aug 13, 2012 at 9:11 AM, Jonatan Fournier
>  wrote:
>> Hi,
>>
>> I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer.
>>
>> Within my SolrJ application, the documents are added to the server
>> using the commitWithin parameter (in my case 60s). After 1 day my 125
>> millions document are all added to the server and I can see 89G of
>> index data files. I stop my SolrJ application and reload my Solr
>> instance in Tomcat.
>>
>> From the Solr admin panel related to my Core (collection1) I see this info:
>>
>>
>> Last Modified:
>> Num Docs:0
>> Max Doc:0
>> Version:1
>> Segment Count:0
>> Optimized: (green check)
>> Current:  (green check)
>> Master:
>> Version: 0
>> Gen: 1
>> Size: 88.14 GB
>>
>>
>> From the general Core Admin panel I see:
>>
>> lastModified:
>> version:1
>> numDocs:0
>> maxDoc:0
>> optimized: (red circle)
>> current: (green check)
>> hasDeletions: (red circle)
>>
>> If I query my index for *:* I get 0 result. If I trigger optimize it
>> wipes ALL my data inside the index and reset to empty. I've played
>> around my EmbeddedServer initially using autoCommit/softCommit and it
>> was working fine. Now that I've switched to commitWithin the document
>> add query, it always do that! I'm never able to reload my index within
>> Tomcat/Solr.
>>
>> Any idea?
>>
>> Cheers,
>>
>> /jonathan


Re: Index not loading

2012-08-15 Thread Jonatan Fournier
On Tue, Aug 14, 2012 at 5:37 PM, Jonatan Fournier
 wrote:
> On Tue, Aug 14, 2012 at 10:25 AM, Erick Erickson
>  wrote:
>> This is quite odd, it really sounds like you're not
>> actually committing. So, some questions.
>>
>> 1> What happens if you search before you shut
>> down your tomcat? Do you see docs then? If so,
>> somehow you're doing soft commits and never
>> doing a hard commit.

Yeah I just realized the behavior is the same as softCommit, is it the
default for commitWithin?

Cheers,

/jonathan

>>
>> 2> What happens if, as the last statement in your SolrJ
>> program you do a commit()?
>
> When using commitWithin, if I introduce server.commit() within the
> data load process the data gets commited ( I didn't reproduce with my
> 89G of data...), if I shutdown my EmbeddedServer and restart it and
> send a commit, like on Tomcat, all data gets wiped out too. So I guess
> that there's state loss somewhere.
>
> Cheers,
>
> /jonathan
>
>>
>> 3> While you're indexing, what do you see in your index
>> directory? You should see multiple segments being
>> created, and possibly merged so the number of
>> files should go up and down. If you only have a single
>> set of files, you're somehow not doing a commit.
>>
>> 4> Is there something really silly going on like your
>> restart scripts delete the index directory? Or you're
>> using a VM that restores a blank image?
>>
>> 5> When you do restart, are there any files at all
>> in your index directory?
>>
>> I really suspect you've got some configuration problem
>> here
>>
>> Best
>> Erick
>>
>>
>>
>> On Mon, Aug 13, 2012 at 9:11 AM, Jonatan Fournier
>>  wrote:
>>> Hi,
>>>
>>> I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer.
>>>
>>> Within my SolrJ application, the documents are added to the server
>>> using the commitWithin parameter (in my case 60s). After 1 day my 125
>>> millions document are all added to the server and I can see 89G of
>>> index data files. I stop my SolrJ application and reload my Solr
>>> instance in Tomcat.
>>>
>>> From the Solr admin panel related to my Core (collection1) I see this info:
>>>
>>>
>>> Last Modified:
>>> Num Docs:0
>>> Max Doc:0
>>> Version:1
>>> Segment Count:0
>>> Optimized: (green check)
>>> Current:  (green check)
>>> Master:
>>> Version: 0
>>> Gen: 1
>>> Size: 88.14 GB
>>>
>>>
>>> From the general Core Admin panel I see:
>>>
>>> lastModified:
>>> version:1
>>> numDocs:0
>>> maxDoc:0
>>> optimized: (red circle)
>>> current: (green check)
>>> hasDeletions: (red circle)
>>>
>>> If I query my index for *:* I get 0 result. If I trigger optimize it
>>> wipes ALL my data inside the index and reset to empty. I've played
>>> around my EmbeddedServer initially using autoCommit/softCommit and it
>>> was working fine. Now that I've switched to commitWithin the document
>>> add query, it always do that! I'm never able to reload my index within
>>> Tomcat/Solr.
>>>
>>> Any idea?
>>>
>>> Cheers,
>>>
>>> /jonathan


Duplicate in copyField

2012-09-18 Thread Jonatan Fournier
Hi,

I have something strange happening (4.0-BETA), I have a title field:



And a copyField:



Note that I don't have multivalue set for the title field, but I do
end up with multiple value in my field:

{
  "responseHeader":{
"status":0,
"QTime":371,
"params":{
  "indent":"true",
  "wt":"json",
  "q":"domain:dyslexia-test.com"}},
  "response":{"numFound":1,"start":0,"maxScore":13.414578,"docs":[
  {
"id":"9f13185f8134ff75cb1c6106ac5db63f",
"foo":"bar",
"title":["bar",
  "bar"],
...
}

I made two operations on that document.

First I created it by populating some of its fields, and in a second
pass, I queried the document via "id" add other values to the
un-populated fields and send the document back.

Why is there more than one value for title? At worst should the 2nd
operation overwrites the original value?

Cheers,

/jonathan


Re: Duplicate in copyField

2012-09-18 Thread Jonatan Fournier
I didn't realize that copyField are implemented via multivalue, I
thought they were flat field.

What I was trying to do was to have one common field between two
different schema, so that my GUI could use both index source for
listing by title...

I guess I will populate this field manually from my data importer script.

Cheers,

/jonathan

On Tue, Sep 18, 2012 at 1:35 PM, Jonatan Fournier
 wrote:
> Hi,
>
> I have something strange happening (4.0-BETA), I have a title field:
>
>  omitNorms="true"/>
>
> And a copyField:
>
> 
>
> Note that I don't have multivalue set for the title field, but I do
> end up with multiple value in my field:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":371,
> "params":{
>   "indent":"true",
>   "wt":"json",
>   "q":"domain:dyslexia-test.com"}},
>   "response":{"numFound":1,"start":0,"maxScore":13.414578,"docs":[
>   {
> "id":"9f13185f8134ff75cb1c6106ac5db63f",
> "foo":"bar",
> "title":["bar",
>   "bar"],
> ...
> }
>
> I made two operations on that document.
>
> First I created it by populating some of its fields, and in a second
> pass, I queried the document via "id" add other values to the
> un-populated fields and send the document back.
>
> Why is there more than one value for title? At worst should the 2nd
> operation overwrites the original value?
>
> Cheers,
>
> /jonathan