Unique id
Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu
RE: Unique id
Ok got it. I am indexing two tables differently. I am using Solrj to index with @Field annotation. I make two queries initially and fetch the data from two tables and index them separately. But what if the ids in two tables are same? That means documents with same id will be deleted when doing update. How does this work? Please explain. Thanks. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 3:49 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: > Hi, > > Is the uniqueKey in schema.xml really required? > > > Reason is, I am indexing two tables and I have id as unique key in > schema.xml but id field is not there in one of the tables and indexing > fails. Do I really require this unique field for Solr to index it better > or can I do away with this? > > > Thanks, > > Rahgu > -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Re: Error in indexing timestamp format.
Hi Nobble, Thank you very much That removed the error while server startup. But I don't think the data is getting indexed upon running the dataimport. I am unable to display the date field values on searching. This is my complete configs: In the schema.xml I have: Do I need some other configurations. Thanks in advance con Noble Paul നോബിള് नोब्ळ् wrote: > > sorry I meant wrong dest field name > > On Wed, Nov 19, 2008 at 12:41 PM, con <[EMAIL PROTECTED]> wrote: >> >> Hi Nobble >> >> I have cross checked. This is my copy field of schema.xml >> >> >> >> I am still getting that error. >> >> thanks >> con >> >> >> >> Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> yoour copyField has the wrong source field name . Field name is not >>> "date" it is 'CREATED_DATE' >>> >>> On Wed, Nov 19, 2008 at 11:49 AM, con <[EMAIL PROTECTED]> wrote: Hi Shalin Please find the log data. 10:18:30,819 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 10:18:30,838 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI 10:18:30,839 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: C:\Search\solr 10:18:30,844 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: C:\Search\solr\solr.xml 10:18:30,845 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader INFO: Solr home set to 'C:\Search\solr/' 10:18:30,846 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr classloader 10:18:30,847 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr classloader 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/ojdbc6-11.1.0.6.0.1.jar' to Solr classloader 10:18:30,849 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/servlet-api-2.5-6.1.3.jar' to Solr classloader 10:18:30,864 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.CoreContainer load INFO: loading shared library: C:\Search\solr\lib 10:18:30,867 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr classloader 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr classloader 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader 10:18:30,871 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/ojdbc6-11.1.0.6.0.1.jar' to Solr classloader 10:18:30,872 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/C:/Search/solr/lib/servlet-api-2.5-6.1.3.jar' to Solr classloader 10:18:30,896 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader INFO: Solr home set to 'C:\Search\solr\feedback/' 10:18:30,896 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Reusing parent classloader 10:18:31,328 ERROR [STDERR] 19 Nov, 2008 10:18:31 AM org.apache.solr.core.SolrConfig INFO: Loaded SolrConfig: solrconfig.xml 10:18:31,370 ERROR [STDERR] 19 Nov, 2008 10:18:31 AM org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr Schema 10:18:31,381 ERROR [STDERR] 19 Nov, 2008 10:18:31 AM org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=feedback schema 10:18:31,403 ERROR [STDERR] 19 Nov, 2008 10:18:31 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created string: org.apache.solr.schema.StrField 10:18
Re: Unique id
Ok, but how do you map your table structure to the index? As far as I can understand, the two tables have different structre, so why/how do you map two different datastructures onto a single index? Are the two tables connected in some way? If so, you could make your index structure reflect the union of both tables and just make one insertion into the index per entry of the two tables. Maybe you could post the table structure so that I can get a better understanding of your use-case... - Aleks On Wed, 19 Nov 2008 11:25:56 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Ok got it. I am indexing two tables differently. I am using Solrj to index with @Field annotation. I make two queries initially and fetch the data from two tables and index them separately. But what if the ids in two tables are same? That means documents with same id will be deleted when doing update. How does this work? Please explain. Thanks. -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 3:49 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu
Re: Use SOLR like the "MySQL LIKE"
On Tue, 18 Nov 2008 14:26:02 +0100 "Aleksander M. Stensby" <[EMAIL PROTECTED]> wrote: > Well, then I suggest you index the field in two different ways if you want > both possible ways of searching. One, where you treat the entire name as > one token (in lowercase) (then you can search for avera* and match on for > instance "average joe" etc.) And then another field where you tokenize on > whitespace for instance, if you want/need that possibility aswell. Look at > the solr copy fields and try it out, it works like a charm :) You should also make extensive use of analysis.jsp to see how data in your field (1) is tokenized, filtered and indexed, and how your search terms are tokenized, filtered and matched against (1). Hint 1 : check all the checkboxes ;) Hint 2: you don't need to reindex all your data, just enter test data in the form and give it a go. You will of course have to tweak schema.xml and restart your service when you do this. good luck, B _ {Beto|Norberto|Numard} Meijome "Intellectual: 'Someone who has been educated beyond his/her intelligence'" Arthur C. Clarke, from "3001, The Final Odyssey", Sources. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Unique id
Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
Upgrade from 1.2 to 1.3 gives 3x slowdown
Hello, I have a CSV file with 6M records which took 22min to index with solr 1.2. I then stopped tomcat replaced the solr stuff inside webapps with version 1.3, wiped my index and restarted tomcat. Indexing the exact same content now takes 69min. My machine has 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M. Are there any tweaks I can use to get the original index time back. I read through the release notes and was expecting a speed up. I saw the bit about increasing ramBufferSizeMB and set it to 64MB; it had no effect. -- === Fergus McMenemie Email:[EMAIL PROTECTED] Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Unique id
Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique "key" of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: Yes it is. You need a unique id because the add method works as and "add or update" method. When adding a document whose ID is already found in the index, the old document will be deleted and the new will be added. Are you indexing two tables into the same index? Or does one entry in the index consist of data from both tables? How are these linked together without an ID? - Aleksander On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED] > wrote: Hi, Is the uniqueKey in schema.xml really required? Reason is, I am indexing two tables and I have id as unique key in schema.xml but id field is not there in one of the tables and indexing fails. Do I really require this unique field for Solr to index it better or can I do away with this? Thanks, Rahgu -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no
RE: Unique id
I am not indexing to same index. I have two methods which adds doc by calling server.addBeans(list) twice. (2 different lists obtained from DB). Now I call server.query("some query1") and obtain result. Then from this I create query based on first result and call server.query("some query2"); -Original Message- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 4:22 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Ok, but how do you map your table structure to the index? As far as I can understand, the two tables have different structre, so why/how do you map two different datastructures onto a single index? Are the two tables connected in some way? If so, you could make your index structure reflect the union of both tables and just make one insertion into the index per entry of the two tables. Maybe you could post the table structure so that I can get a better understanding of your use-case... - Aleks On Wed, 19 Nov 2008 11:25:56 +0100, Raghunandan Rao <[EMAIL PROTECTED]> wrote: > Ok got it. > I am indexing two tables differently. I am using Solrj to index with > @Field annotation. I make two queries initially and fetch the data from > two tables and index them separately. But what if the ids in two tables > are same? That means documents with same id will be deleted when doing > update. > > How does this work? Please explain. > > Thanks. > > -Original Message- > From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 19, 2008 3:49 PM > To: solr-user@lucene.apache.org > Subject: Re: Unique id > > Yes it is. You need a unique id because the add method works as and "add > > or update" method. When adding a document whose ID is already found in > the > index, the old document will be deleted and the new will be added. Are > you > indexing two tables into the same index? Or does one entry in the index > > consist of data from both tables? How are these linked together without > an > ID? > > - Aleksander > > On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao > <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> Is the uniqueKey in schema.xml really required? >> >> >> Reason is, I am indexing two tables and I have id as unique key in >> schema.xml but id field is not there in one of the tables and indexing >> fails. Do I really require this unique field for Solr to index it > better >> or can I do away with this? >> >> >> Thanks, >> >> Rahgu >> > >
DataImportHandler: Javascript transformer for splitting field-values
Hi everyone, I'm currently working with the nightly build of Solr (solr-2008-11-17) and trying to figure out how to transform a row-object with Javascript to include multiple values (in a single multivalued field). When I try something like this as a transformer: function splitTerms(row) { //each term should be duplicated into count field-values //dummy-code to show the idea row.put('terms',['term','term','term']); return row; } [...] The DataImportHandler debugger returns: sun.org.mozilla.javascript.internal.NativeArray:[EMAIL PROTECTED] What it *should* return: term term term So, what am I doing wrong? My transformer will be invoked multiple times from a MySQL-Query and in turn has to insert multiple values to the same field during each invocation. It should do something similar to the RegexTransformer (field splitBy)... is that possible? Right now I have to use a workaround that includes the term-duplication on the database sides, which is kinda ugly if a term has to be duplicated a lot. Greetings, Steffen
Question about autocommit
Hello, I would like some details on the autocommit mechanism. I tried to search the wiki, but found only the standard maxDoc/time settings. i have set the autocommit parameters in solrconfig.xml to 8000 docs and 30milis. Indexing at around 200 docs per second (from multiple processes, using the CommonsHttpSolrServer class), i would have expected autocommits to occur around every 40 seconds, however the jvm log shows the following - sometimes more than two calls per second: $ tail -f jvm-default.log | grep "commit" [16:18:15.862] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:55.079] {pool-2-thread-1} end_commit_flush additionally, in the solr admin page , the update handler reports as many autocommits as commits - so i assume it is not some commit(); line lost in my code. I actually get the feeling that the commits are triggered more and more often - with not-so-nice influence on indexing speed over time. Restarting resin seems to get the commit rate to the original level. Optimizing has no effect. Is there some other parameter influencing autocommit? Thank you very much. Nickolai
Re: Question about autocommit
Interesting...could go along with the earlier guys post about slow indexing... Nickolai Toupikov wrote: Hello, I would like some details on the autocommit mechanism. I tried to search the wiki, but found only the standard maxDoc/time settings. i have set the autocommit parameters in solrconfig.xml to 8000 docs and 30milis. Indexing at around 200 docs per second (from multiple processes, using the CommonsHttpSolrServer class), i would have expected autocommits to occur around every 40 seconds, however the jvm log shows the following - sometimes more than two calls per second: $ tail -f jvm-default.log | grep "commit" [16:18:15.862] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:55.079] {pool-2-thread-1} end_commit_flush additionally, in the solr admin page , the update handler reports as many autocommits as commits - so i assume it is not some commit(); line lost in my code. I actually get the feeling that the commits are triggered more and more often - with not-so-nice influence on indexing speed over time. Restarting resin seems to get the commit rate to the original level. Optimizing has no effect. Is there some other parameter influencing autocommit? Thank you very much. Nickolai
Re: Question about autocommit
Could also go with the thread safety issues with pending and the deadlock that was reported the other day. All could pretty easily be related. Do we have a JIRA issue on it yet? Suppose I'll look... Mark Miller wrote: Interesting...could go along with the earlier guys post about slow indexing... Nickolai Toupikov wrote: Hello, I would like some details on the autocommit mechanism. I tried to search the wiki, but found only the standard maxDoc/time settings. i have set the autocommit parameters in solrconfig.xml to 8000 docs and 30milis. Indexing at around 200 docs per second (from multiple processes, using the CommonsHttpSolrServer class), i would have expected autocommits to occur around every 40 seconds, however the jvm log shows the following - sometimes more than two calls per second: $ tail -f jvm-default.log | grep "commit" [16:18:15.862] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:55.079] {pool-2-thread-1} end_commit_flush additionally, in the solr admin page , the update handler reports as many autocommits as commits - so i assume it is not some commit(); line lost in my code. I actually get the feeling that the commits are triggered more and more often - with not-so-nice influence on indexing speed over time. Restarting resin seems to get the commit rate to the original level. Optimizing has no effect. Is there some other parameter influencing autocommit? Thank you very much. Nickolai
RE: Question about autocommit
Could trigger the commit in this case? -Original Message- From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 8:36 Joe To: solr-user@lucene.apache.org Subject: Question about autocommit Hello, I would like some details on the autocommit mechanism. I tried to search the wiki, but found only the standard maxDoc/time settings. i have set the autocommit parameters in solrconfig.xml to 8000 docs and 30milis. Indexing at around 200 docs per second (from multiple processes, using the CommonsHttpSolrServer class), i would have expected autocommits to occur around every 40 seconds, however the jvm log shows the following - sometimes more than two calls per second: $ tail -f jvm-default.log | grep "commit" [16:18:15.862] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:55.079] {pool-2-thread-1} end_commit_flush additionally, in the solr admin page , the update handler reports as many autocommits as commits - so i assume it is not some commit(); line lost in my code. I actually get the feeling that the commits are triggered more and more often - with not-so-nice influence on indexing speed over time. Restarting resin seems to get the commit rate to the original level. Optimizing has no effect. Is there some other parameter influencing autocommit? Thank you very much. Nickolai
Re: Question about autocommit
They are separate commits. ramBufferSizeMB controls when the underlying Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter commiting or closing). The solr autocommit controls when solr asks IndexWriter to commit what its done so far. Nguyen, Joe wrote: Could trigger the commit in this case? -Original Message- From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 8:36 Joe To: solr-user@lucene.apache.org Subject: Question about autocommit Hello, I would like some details on the autocommit mechanism. I tried to search the wiki, but found only the standard maxDoc/time settings. i have set the autocommit parameters in solrconfig.xml to 8000 docs and 30milis. Indexing at around 200 docs per second (from multiple processes, using the CommonsHttpSolrServer class), i would have expected autocommits to occur around every 40 seconds, however the jvm log shows the following - sometimes more than two calls per second: $ tail -f jvm-default.log | grep "commit" [16:18:15.862] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:55.079] {pool-2-thread-1} end_commit_flush additionally, in the solr admin page , the update handler reports as many autocommits as commits - so i assume it is not some commit(); line lost in my code. I actually get the feeling that the commits are triggered more and more often - with not-so-nice influence on indexing speed over time. Restarting resin seems to get the commit rate to the original level. Optimizing has no effect. Is there some other parameter influencing autocommit? Thank you very much. Nickolai
RE: Question about autocommit
As far as I know, commit could be triggered by Manually 1. invoke commit() method Automatically 2. maxDoc 3. maxTime Since the document size is arbitrary and some document could be huge, could commit also be triggered by memory buffered size? -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 9:09 Joe To: solr-user@lucene.apache.org Subject: Re: Question about autocommit They are separate commits. ramBufferSizeMB controls when the underlying Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter commiting or closing). The solr autocommit controls when solr asks IndexWriter to commit what its done so far. Nguyen, Joe wrote: > Could trigger the commit in this case? > > -Original Message- > From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 19, 2008 8:36 Joe > To: solr-user@lucene.apache.org > Subject: Question about autocommit > > Hello, > I would like some details on the autocommit mechanism. I tried to > search the wiki, but found only the standard maxDoc/time settings. > i have set the autocommit parameters in solrconfig.xml to 8000 docs > and 30milis. > Indexing at around 200 docs per second (from multiple processes, > using the CommonsHttpSolrServer class), i would have expected > autocommits to occur around every 40 seconds, however the jvm log > shows the following > - sometimes more than two calls per second: > > $ tail -f jvm-default.log | grep "commit" > [16:18:15.862] {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] > {pool-2-thread-1} start > commit(optimize=false,waitFlush=true,waitSearcher=true) > [16:19:55.079] {pool-2-thread-1} end_commit_flush > > > additionally, in the solr admin page , the update handler reports as > many autocommits as commits - so i assume it is not some commit(); > line lost in my code. > > I actually get the feeling that the commits are triggered more and > more often - with not-so-nice influence on indexing speed over time. > Restarting resin seems to get the commit rate to the original level. > Optimizing has no effect. > Is there some other parameter influencing autocommit? > > Thank you very much. > > Nickolai >
Re: Question about autocommit
The documents have an average size of about a kilobyte i would say. bigger ones can pop up, but not nearly often enough to trigger memory-commits every couple of seconds. I dont have the exact figures, but i would expect the memory buffer limit to be far beyond the 8000 document one in most of the cases. actually i have first started indexing with a 2000 document limit - a commit expected every ten seconds or so. in a couple of hours the speed of indexing choked down from over 200 to under 100 documents per second - and all the same i had several autocommits a second. so i restarted with a limit at 8000. with the results i mentionned in the previous email. Nguyen, Joe wrote: As far as I know, commit could be triggered by Manually 1. invoke commit() method Automatically 2. maxDoc 3. maxTime Since the document size is arbitrary and some document could be huge, could commit also be triggered by memory buffered size? -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 9:09 Joe To: solr-user@lucene.apache.org Subject: Re: Question about autocommit They are separate commits. ramBufferSizeMB controls when the underlying Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter commiting or closing). The solr autocommit controls when solr asks IndexWriter to commit what its done so far. Nguyen, Joe wrote: Could trigger the commit in this case? -Original Message- From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 8:36 Joe To: solr-user@lucene.apache.org Subject: Question about autocommit Hello, I would like some details on the autocommit mechanism. I tried to search the wiki, but found only the standard maxDoc/time settings. i have set the autocommit parameters in solrconfig.xml to 8000 docs and 30milis. Indexing at around 200 docs per second (from multiple processes, using the CommonsHttpSolrServer class), i would have expected autocommits to occur around every 40 seconds, however the jvm log shows the following - sometimes more than two calls per second: $ tail -f jvm-default.log | grep "commit" [16:18:15.862] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:55.079] {pool-2-thread-1} end_commit_flush additionally, in the solr admin page , the update handler reports as many autocommits as commits - so i assume it is not some commit(); line lost in my code. I actually get the feeling that the commits are triggered more and more often - with not-so-nice influence on indexing speed over time. Restarting resin seems to get the commit rate to the original level. Optimizing has no effect. Is there some other parameter influencing autocommit? Thank you very much
RE: Question about autocommit
First it was fast, but after a couple of hours, it was slow down... Could mergeFactor affect the indexing speed since solr would take time to merge multiple segments into a single one? http://wiki.apache.org/solr/SolrPerformanceFactors#head-224d9a793c7c57d8 662d5351f955ddf8c0a3ebcd -Original Message- From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 9:51 Joe To: solr-user@lucene.apache.org Subject: Re: Question about autocommit The documents have an average size of about a kilobyte i would say. bigger ones can pop up, but not nearly often enough to trigger memory-commits every couple of seconds. I dont have the exact figures, but i would expect the memory buffer limit to be far beyond the 8000 document one in most of the cases. actually i have first started indexing with a 2000 document limit - a commit expected every ten seconds or so. in a couple of hours the speed of indexing choked down from over 200 to under 100 documents per second - and all the same i had several autocommits a second. so i restarted with a limit at 8000. with the results i mentionned in the previous email. Nguyen, Joe wrote: > As far as I know, commit could be triggered by > > Manually > 1. invoke commit() method > Automatically > 2. maxDoc > 3. maxTime > > Since the document size is arbitrary and some document could be huge, > could commit also be triggered by memory buffered size? > > -Original Message- > From: Mark Miller [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 19, 2008 9:09 Joe > To: solr-user@lucene.apache.org > Subject: Re: Question about autocommit > > They are separate commits. ramBufferSizeMB controls when the > underlying Lucene IndexWriter flushes ram to disk (this isnt like the > IndexWriter commiting or closing). The solr autocommit controls when > solr asks IndexWriter to commit what its done so far. > > Nguyen, Joe wrote: > >> Could trigger the commit in this case? >> >> -Original Message- >> From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, November 19, 2008 8:36 Joe >> To: solr-user@lucene.apache.org >> Subject: Question about autocommit >> >> Hello, >> I would like some details on the autocommit mechanism. I tried to >> search the wiki, but found only the standard maxDoc/time settings. >> i have set the autocommit parameters in solrconfig.xml to 8000 docs >> and 30milis. >> Indexing at around 200 docs per second (from multiple processes, >> using the CommonsHttpSolrServer class), i would have expected >> autocommits to occur around every 40 seconds, however the jvm log >> shows the following >> - sometimes more than two calls per second: >> >> $ tail -f jvm-default.log | grep "commit" >> [16:18:15.862] {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=true,waitSearcher=true) >> [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] >> {pool-2-thread-1} start >> commit(optimize=false,waitFlush=
Re: Question about autocommit
I dont know. After reading my last email, i realized i did not say explicitly that by 'restarting' i merely meant 'restarting resin' . I did not restart indexing from scratch. And - if I understand correctly - if the merge factor was the culprit, restarting the servlet container would have had no effect. Nguyen, Joe wrote: First it was fast, but after a couple of hours, it was slow down... Could mergeFactor affect the indexing speed since solr would take time to merge multiple segments into a single one? http://wiki.apache.org/solr/SolrPerformanceFactors#head-224d9a793c7c57d8 662d5351f955ddf8c0a3ebcd -Original Message- From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 9:51 Joe To: solr-user@lucene.apache.org Subject: Re: Question about autocommit The documents have an average size of about a kilobyte i would say. bigger ones can pop up, but not nearly often enough to trigger memory-commits every couple of seconds. I dont have the exact figures, but i would expect the memory buffer limit to be far beyond the 8000 document one in most of the cases. actually i have first started indexing with a 2000 document limit - a commit expected every ten seconds or so. in a couple of hours the speed of indexing choked down from over 200 to under 100 documents per second - and all the same i had several autocommits a second. so i restarted with a limit at 8000. with the results i mentionned in the previous email. Nguyen, Joe wrote: As far as I know, commit could be triggered by Manually 1. invoke commit() method Automatically 2. maxDoc 3. maxTime Since the document size is arbitrary and some document could be huge, could commit also be triggered by memory buffered size? -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 9:09 Joe To: solr-user@lucene.apache.org Subject: Re: Question about autocommit They are separate commits. ramBufferSizeMB controls when the underlying Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter commiting or closing). The solr autocommit controls when solr asks IndexWriter to commit what its done so far. Nguyen, Joe wrote: Could trigger the commit in this case? -Original Message- From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 8:36 Joe To: solr-user@lucene.apache.org Subject: Question about autocommit Hello, I would like some details on the autocommit mechanism. I tried to search the wiki, but found only the standard maxDoc/time settings. i have set the autocommit parameters in solrconfig.xml to 8000 docs and 30milis. Indexing at around 200 docs per second (from multiple processes, using the CommonsHttpSolrServer class), i would have expected autocommits to occur around every 40 seconds, however the jvm log shows the following - sometimes more than two calls per second: $ tail -f jvm-default.log | grep "commit" [16:18:15.862] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher=true) [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] {pool-2-thread-1} start commit(optimize=false,waitFlush=true,waitSearcher
Multi word Synonym
I am trying to figure out how the synonym filter processes multi word inputs. I have checked the analyzer in the GUI with some confusing results. The indexed field has ³The North Face² as a value. The synonym file has morthface, morth face, noethface, noeth face, norhtface, norht face, nortface, nort face, northfac, north fac, northfac3e, north fac3e, northface, north face, northfae, north fae, northfaqce, north faqce, northfave, north fave, northhace, north hace, nothface, noth face, thenorhface, the norh face, thenorth, the north, thenorthandface, the north and face, thenortheface, the northe face, thenorthfac, the north fac, thenorthface, thenorthfacee, the north facee, thenothface, the noth face, thenotrhface, the notrh face, thenrothface, the nroth face, tnf => The North Face I have the field type using the WhiteSpaceTokenizer before the synonyms are running. My confusion on this is when the term ³morth fac² is run somehow the system knows to map it to the correct term even though the term is not present in the file. How is this happening? Is the synonym process tokenzing as well? The datatype schema is as follows: -Jeff
No search result behavior (a la Amazon)
It appears to me that Amazon is using a 100% minimum match policy. If there are no matches, they break down the original search terms and give suggestion search results. example: http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywords=ipod+nano+4th+generation+8gb+blue+calcium&x=0&y=0 Can Solr natively achieve something similar? If not, can you suggest a way to achieve this? A custom RequestHandler? Thanks! -- View this message in context: http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp20587024p20587024.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: No search result behavior (a la Amazon)
Have a look at DisMaxRequestHandler and play with mm (miminum terms should match) http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28CategorySo lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5fe4 1d68f3910ed544311435393f5727408e61 -Original Message- From: Caligula [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 11:11 Joe To: solr-user@lucene.apache.org Subject: No search result behavior (a la Amazon) It appears to me that Amazon is using a 100% minimum match policy. If there are no matches, they break down the original search terms and give suggestion search results. example: http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywor ds=ipod+nano+4th+generation+8gb+blue+calcium&x=0&y=0 Can Solr natively achieve something similar? If not, can you suggest a way to achieve this? A custom RequestHandler? Thanks! -- View this message in context: http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058 7024p20587024.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr schema Lucene's StandardAnalyser equivalent?
Hello, I am looking for the Solr schema equivalent to Lucene's StandardAnalyser. Is it the Solr schema type:
RE: No search result behavior (a la Amazon)
I understand how to do the "100% mm" part. It's the behavior when there are no matches that i'm asking about :) Nguyen, Joe-2 wrote: > > Have a look at DisMaxRequestHandler and play with mm (miminum terms > should match) > > http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28CategorySo > lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5fe4 > 1d68f3910ed544311435393f5727408e61 > > > -Original Message- > From: Caligula [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 19, 2008 11:11 Joe > To: solr-user@lucene.apache.org > Subject: No search result behavior (a la Amazon) > > > It appears to me that Amazon is using a 100% minimum match policy. If > there are no matches, they break down the original search terms and give > suggestion search results. > > example: > > http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywor > ds=ipod+nano+4th+generation+8gb+blue+calcium&x=0&y=0 > > > Can Solr natively achieve something similar? If not, can you suggest a > way to achieve this? A custom RequestHandler? > > > Thanks! > -- > View this message in context: > http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058 > 7024p20587024.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp20587024p20587896.html Sent from the Solr - User mailing list archive at Nabble.com.
filtering on blank OR specific range
hi all :) I'm having difficultly filtering my documents when a field is either blank or set to a specific value. I would have thought this would work fq=-Type:[* TO *] OR Type:blue which I would expect to find all document where either Type is undefined or Type is "blue". my actual result set is zero. using a similar filter fq=-Type:[* TO *] OR OtherThing:cat does what I would expect (documents with undefined type or documents with cats), so it feels like solr is getting confused with the range negation and ORing, but only when the field is the same. adding various parentheses makes no difference. I know this is kind of nebulous sounding, but I was hoping someone would look at this and go "you're doing it wrong. your filter should be..." the field is defined as if it matters. tia --Geoff
RE: filtering on blank OR specific range
Try: Type:blue OR -Type:[* TO *] You can't have a negative clause at the beginning. Yes, Lucene should barf about this. -Original Message- From: Geoffrey Young [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 12:17 PM To: solr-user@lucene.apache.org Subject: filtering on blank OR specific range hi all :) I'm having difficultly filtering my documents when a field is either blank or set to a specific value. I would have thought this would work fq=-Type:[* TO *] OR Type:blue which I would expect to find all document where either Type is undefined or Type is "blue". my actual result set is zero. using a similar filter fq=-Type:[* TO *] OR OtherThing:cat does what I would expect (documents with undefined type or documents with cats), so it feels like solr is getting confused with the range negation and ORing, but only when the field is the same. adding various parentheses makes no difference. I know this is kind of nebulous sounding, but I was hoping someone would look at this and go "you're doing it wrong. your filter should be..." the field is defined as if it matters. tia --Geoff
Logging in Solr.
I kind if remember hearing that Solr was using SLF4J for the logging, but I haven't been able to find any information about it. And in that case where do you set it to redirect to you log4j server for example? Regards Erik
Re: filtering on blank OR specific range
Lance Norskog wrote: > Try: Type:blue OR -Type:[* TO *] > > You can't have a negative clause at the beginning. Yes, Lucene should barf > about this. I did try that, before and again now, and still no luck. anything else? --Geoff
Re: Logging in Solr.
the trunk (solr-1.4-dev) is now using SLF4J If you are using the packaged .war, the behavior should be identical to 1.3 -- that is, it uses the java.util.logging implementation. However, if you are using solr.jar, you select what logging framework you actully want to use by including that connector in your classpath. For example, to use log4j you add slf4j-jdk14-1.5.5.jar to your classpath, then everything will behave as though it were configured using log4j. See: http://www.slf4j.org/ for more info ryan On Nov 19, 2008, at 4:21 PM, Erik Holstad wrote: I kind if remember hearing that Solr was using SLF4J for the logging, but I haven't been able to find any information about it. And in that case where do you set it to redirect to you log4j server for example? Regards Erik
RE: No search result behavior (a la Amazon)
Seemed like its first search required match all terms. If it could not find it, like you motioned, you broke down into multiple smaller term set and ran search to get total hit for each smaller term set, sort the results by total hits, and display summary page. Searching for "A B C" would be 1. q= +A +B +C Match all terms 2. q= +A +B -C Match A and B but not C 3. q =+A -B +C 4. q = -Original Message- From: Caligula [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 11:52 Joe To: solr-user@lucene.apache.org Subject: RE: No search result behavior (a la Amazon) I understand how to do the "100% mm" part. It's the behavior when there are no matches that i'm asking about :) Nguyen, Joe-2 wrote: > > Have a look at DisMaxRequestHandler and play with mm (miminum terms > should match) > > http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28Category > So > lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5f > e4 > 1d68f3910ed544311435393f5727408e61 > > > -Original Message- > From: Caligula [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 19, 2008 11:11 Joe > To: solr-user@lucene.apache.org > Subject: No search result behavior (a la Amazon) > > > It appears to me that Amazon is using a 100% minimum match policy. If > there are no matches, they break down the original search terms and > give suggestion search results. > > example: > > http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keyw > or ds=ipod+nano+4th+generation+8gb+blue+calcium&x=0&y=0 > > > Can Solr natively achieve something similar? If not, can you suggest > a way to achieve this? A custom RequestHandler? > > > Thanks! > -- > View this message in context: > http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp20 > 58 > 7024p20587024.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058 7024p20587896.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr schema Lucene's StandardAnalyser equivalent?
Glen: $ ff \*Standard\*java | grep analysis ./src/java/org/apache/solr/analysis/HTMLStripStandardTokenizerFactory.java ./src/java/org/apache/solr/analysis/StandardFilterFactory.java ./src/java/org/apache/solr/analysis/StandardTokenizerFactory.java Does that do it? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Glen Newton <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, November 19, 2008 2:49:26 PM Subject: Solr schema Lucene's StandardAnalyser equivalent? Hello, I am looking for the Solr schema equivalent to Lucene's StandardAnalyser. Is it the Solr schema type:
Searchable/indexable newsgroups
Does anybody know of a good way to index newsgroups using SOLR? Basically would like to build a searchable list of newsgroup content. Any help would be greatly appreciated. -John
Re: Solr schema Lucene's StandardAnalyser equivalent?
Thanks. I've decided to use: which appears to be close to what is found at http://lucene.apache.org/java/2_3_1/api/index.html "Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words." -Glen 2008/11/19 Otis Gospodnetic <[EMAIL PROTECTED]>: > Glen: > > $ ff \*Standard\*java | grep analysis > ./src/java/org/apache/solr/analysis/HTMLStripStandardTokenizerFactory.java > ./src/java/org/apache/solr/analysis/StandardFilterFactory.java > ./src/java/org/apache/solr/analysis/StandardTokenizerFactory.java > > > Does that do it? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > From: Glen Newton <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, November 19, 2008 2:49:26 PM > Subject: Solr schema Lucene's StandardAnalyser equivalent? > > Hello, > > I am looking for the Solr schema equivalent to Lucene's StandardAnalyser. > > Is it the Solr schema type: > > Is there some way of directly invoking Lucene's StandardAnalyser? > > Thanks, > Glen > -- > > - > -- -
RE: Searchable/indexable newsgroups
Can Nutch crawl newsgroups? Anyone? -Todd Feak -Original Message- From: John Martyniak [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 3:06 PM To: solr-user@lucene.apache.org Subject: Searchable/indexable newsgroups Does anybody know of a good way to index newsgroups using SOLR? Basically would like to build a searchable list of newsgroup content. Any help would be greatly appreciated. -John
Solr schema 1.3 -> 1.4-dev (changes?)
Hi, I wanted to try the TermVectorComponent w/ current schema setup and I did a build off trunk but it's giving me something like ... org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' Even though it is declared in schema.xml (lowercase), before I grep replace the entire file would that be my issue? Thanks. - Jon
Re: Solr schema 1.3 -> 1.4-dev (changes?)
schema fields should be case sensitive... so DOCTYPE != doctype is the behavior different for you in 1.3 with the same file/schema? On Nov 19, 2008, at 6:26 PM, Jon Baer wrote: Hi, I wanted to try the TermVectorComponent w/ current schema setup and I did a build off trunk but it's giving me something like ... org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' Even though it is declared in schema.xml (lowercase), before I grep replace the entire file would that be my issue? Thanks. - Jon
Re: Solr schema Lucene's StandardAnalyser equivalent?
Note that you can use a standard Lucene Analyzer subclass too. The example schema shows how with this commented out: Erik On Nov 19, 2008, at 6:24 PM, Glen Newton wrote: Thanks. I've decided to use: positionIncrementGap="100" > which appears to be close to what is found at http://lucene.apache.org/java/2_3_1/api/index.html "Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words." -Glen 2008/11/19 Otis Gospodnetic <[EMAIL PROTECTED]>: Glen: $ ff \*Standard\*java | grep analysis ./src/java/org/apache/solr/analysis/ HTMLStripStandardTokenizerFactory.java ./src/java/org/apache/solr/analysis/StandardFilterFactory.java ./src/java/org/apache/solr/analysis/StandardTokenizerFactory.java Does that do it? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Glen Newton <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, November 19, 2008 2:49:26 PM Subject: Solr schema Lucene's StandardAnalyser equivalent? Hello, I am looking for the Solr schema equivalent to Lucene's StandardAnalyser. Is it the Solr schema type: -- -
Re: Newbe! Trying to run solr-1.3.0 under tomcat. Please help
check procedure: 1: rm -r $tomcat/webapps/* 2: rm -r $solr/data ,,,ur index data directory 3: check xml(any xml u modified) 4: start tomcat i had same error, but i forgot how to fix...so u can use my check procedure, i think it will help you i use tomcat+solr in win2003, freebsd, mac osx 10.5.5, they all work well -- regards j.L
Re: posting error in solr
first u sure the xml is utf-8,,and field value is utf-8,, second u should post xml by utf-8 my advice : All encoding use utf-8... it make my solr work well,,, i use chinese -- regards j.L
Tomcat undeploy/shutdown exception
In analyzing a clients Solr logs, from Tomcat, I came across the exception below. Anyone encountered issues with Tomcat shutdowns or undeploys of Solr contexts? I'm not sure if this is an anomaly due to some wonky Tomcat handling, or if this is some kind of bug in Solr. I haven't actually duplicated the issue myself though. Thanks, Erik Oct 29, 2008 10:14:31 AM org.apache.catalina.startup.HostConfig undeployApps WARNING: Error while removing context [/search] java.lang.NullPointerException at org .apache .solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:123) at org .apache .catalina .core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:253) at org .apache.catalina.core.StandardContext.filterStop(StandardContext.java: 3670) at org.apache.catalina.core.StandardContext.stop(StandardContext.java: 4354) at org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java: 893) at org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java: 1191) at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1162) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 313) at org .apache .catalina .util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1055) at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1067) at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java: 448) at org.apache.catalina.core.StandardService.stop(StandardService.java: 510) at org.apache.catalina.core.StandardServer.stop(StandardServer.java: 734) at org.apache.catalina.startup.Catalina.stop(Catalina.java:602) at org.apache.catalina.startup.Catalina.start(Catalina.java:577) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 39) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433)
Re: Solr schema 1.3 -> 1.4-dev (changes?)
Sorry I should have mentioned this is from using the DataImportHandler ... it seems case insensitive ... ie my columns are UPPERCASE and schema field names are lowercase and it works fine in 1.3 but not in 1.4 ... it seems strict. Going to resolve all the field names to uppercase to see if that resolves the problem. Thanks. - Jon On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote: schema fields should be case sensitive... so DOCTYPE != doctype is the behavior different for you in 1.3 with the same file/schema? On Nov 19, 2008, at 6:26 PM, Jon Baer wrote: Hi, I wanted to try the TermVectorComponent w/ current schema setup and I did a build off trunk but it's giving me something like ... org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' Even though it is declared in schema.xml (lowercase), before I grep replace the entire file would that be my issue? Thanks. - Jon
Re: Solr schema 1.3 -> 1.4-dev (changes?)
Hi John, it could probably not the expected behavior? only 'explicit' fields must be case-sensitive. Could you tell me the usecase or can you paste the data-config? --Noble On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer <[EMAIL PROTECTED]> wrote: > Sorry I should have mentioned this is from using the DataImportHandler ... > it seems case insensitive ... ie my columns are UPPERCASE and schema field > names are lowercase and it works fine in 1.3 but not in 1.4 ... it seems > strict. Going to resolve all the field names to uppercase to see if that > resolves the problem. Thanks. > > - Jon > > On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote: > >> schema fields should be case sensitive... so DOCTYPE != doctype >> >> is the behavior different for you in 1.3 with the same file/schema? >> >> >> On Nov 19, 2008, at 6:26 PM, Jon Baer wrote: >> >>> Hi, >>> >>> I wanted to try the TermVectorComponent w/ current schema setup and I did >>> a build off trunk but it's giving me something like ... >>> >>> org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' >>> >>> Even though it is declared in schema.xml (lowercase), before I grep >>> replace the entire file would that be my issue? >>> >>> Thanks. >>> >>> - Jon >> > > -- --Noble Paul
Re: DataImportHandler: Javascript transformer for splitting field-values
unfortunately native JS objects are not handled by the ScriptTransformer yet. but what you can do in the script is create a new java.util.ArrayList() and add each item into that . some thing like var jsarr = ['term','term','term'] var arr = new java.util.ArrayList(); for each in jsarr... arr.add(item) row.put('terms',arr); On Wed, Nov 19, 2008 at 9:03 PM, Steffen <[EMAIL PROTECTED]> wrote: > Hi everyone, > I'm currently working with the nightly build of Solr (solr-2008-11-17) > and trying to figure out how to transform a row-object with Javascript > to include multiple values (in a single multivalued field). When I try > something like this as a transformer: > function splitTerms(row) { >//each term should be duplicated into count > field-values >//dummy-code to show the idea >row.put('terms',['term','term','term']); >return row; > } > [...] > query="SELECT term,count FROM termtable WHERE id=${parent.id}" /> > > The DataImportHandler debugger returns: > > >sun.org.mozilla.javascript.internal.NativeArray:[EMAIL PROTECTED] > > > What it *should* return: > > term > term > term > > > So, what am I doing wrong? My transformer will be invoked multiple > times from a MySQL-Query and in turn has to insert multiple values to > the same field during each invocation. It should do something similar > to the RegexTransformer (field splitBy)... is that possible? Right now > I have to use a workaround that includes the term-duplication on the > database sides, which is kinda ugly if a term has to be duplicated a > lot. > Greetings, > Steffen > -- --Noble Paul
Re: Solr schema 1.3 -> 1.4-dev (changes?)
Schema: DIH: The column is uppercase ... isn't there some automagic happening now where DIH will introspect the fields @ load time? - Jon On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള് नोब्ळ् wrote: Hi John, it could probably not the expected behavior? only 'explicit' fields must be case-sensitive. Could you tell me the usecase or can you paste the data-config? --Noble On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer <[EMAIL PROTECTED]> wrote: Sorry I should have mentioned this is from using the DataImportHandler ... it seems case insensitive ... ie my columns are UPPERCASE and schema field names are lowercase and it works fine in 1.3 but not in 1.4 ... it seems strict. Going to resolve all the field names to uppercase to see if that resolves the problem. Thanks. - Jon On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote: schema fields should be case sensitive... so DOCTYPE != doctype is the behavior different for you in 1.3 with the same file/schema? On Nov 19, 2008, at 6:26 PM, Jon Baer wrote: Hi, I wanted to try the TermVectorComponent w/ current schema setup and I did a build off trunk but it's giving me something like ... org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' Even though it is declared in schema.xml (lowercase), before I grep replace the entire file would that be my issue? Thanks. - Jon -- --Noble Paul
Re: Solr schema 1.3 -> 1.4-dev (changes?)
So originally you had the field declaration as follows . right? we did some refactoring to minimize the object creation for case-insensitive comparisons. I guess it should be rectified soon. Thanks for bringing it to our notice. --Noble On Thu, Nov 20, 2008 at 10:05 AM, Jon Baer <[EMAIL PROTECTED]> wrote: > Schema: > > > DIH: > > > The column is uppercase ... isn't there some automagic happening now where > DIH will introspect the fields @ load time? > > - Jon > > On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള് नोब्ळ् wrote: > >> Hi John, >> it could probably not the expected behavior? >> >> only 'explicit' fields must be case-sensitive. >> >> Could you tell me the usecase or can you paste the data-config? >> >> --Noble >> >> >> >> >> >> >> On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer <[EMAIL PROTECTED]> wrote: >>> >>> Sorry I should have mentioned this is from using the DataImportHandler >>> ... >>> it seems case insensitive ... ie my columns are UPPERCASE and schema >>> field >>> names are lowercase and it works fine in 1.3 but not in 1.4 ... it seems >>> strict. Going to resolve all the field names to uppercase to see if that >>> resolves the problem. Thanks. >>> >>> - Jon >>> >>> On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote: >>> schema fields should be case sensitive... so DOCTYPE != doctype is the behavior different for you in 1.3 with the same file/schema? On Nov 19, 2008, at 6:26 PM, Jon Baer wrote: > Hi, > > I wanted to try the TermVectorComponent w/ current schema setup and I > did > a build off trunk but it's giving me something like ... > > org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' > > Even though it is declared in schema.xml (lowercase), before I grep > replace the entire file would that be my issue? > > Thanks. > > - Jon >>> >>> >> >> >> >> -- >> --Noble Paul > > -- --Noble Paul
Re: Solr schema 1.3 -> 1.4-dev (changes?)
Correct ... it is the unfortunate side effect of having some legacy tables in uppercase :-\ I thought the explicit declaration of field name attribute was ok. - Jon On Nov 19, 2008, at 11:53 PM, Noble Paul നോബിള് नोब्ळ् wrote: So originally you had the field declaration as follows . right? we did some refactoring to minimize the object creation for case-insensitive comparisons. I guess it should be rectified soon. Thanks for bringing it to our notice. --Noble On Thu, Nov 20, 2008 at 10:05 AM, Jon Baer <[EMAIL PROTECTED]> wrote: Schema: DIH: The column is uppercase ... isn't there some automagic happening now where DIH will introspect the fields @ load time? - Jon On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള് नोब्ळ् wrote: Hi John, it could probably not the expected behavior? only 'explicit' fields must be case-sensitive. Could you tell me the usecase or can you paste the data-config? --Noble On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer <[EMAIL PROTECTED]> wrote: Sorry I should have mentioned this is from using the DataImportHandler ... it seems case insensitive ... ie my columns are UPPERCASE and schema field names are lowercase and it works fine in 1.3 but not in 1.4 ... it seems strict. Going to resolve all the field names to uppercase to see if that resolves the problem. Thanks. - Jon On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote: schema fields should be case sensitive... so DOCTYPE != doctype is the behavior different for you in 1.3 with the same file/ schema? On Nov 19, 2008, at 6:26 PM, Jon Baer wrote: Hi, I wanted to try the TermVectorComponent w/ current schema setup and I did a build off trunk but it's giving me something like ... org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' Even though it is declared in schema.xml (lowercase), before I grep replace the entire file would that be my issue? Thanks. - Jon -- --Noble Paul -- --Noble Paul
Field collapsing (SOLR-236) and Solr 1.3.0 release version
Hi, A requirement has come up in a project where we're going to need to group by a field in the result set. I looked into the SOLR-236 patch and it seems there are a couple versions out now that are supposed to work against the Solr 1.3.0 release. This is a production site, it really can't be running anything that's going to crash or take up too many resources. I wanted to check with the list and see if anyone is using this patch with the Solr 1.3.0 release and if it is stable enough / performs well enough for serious usage. We have an index of 3M+ documents and a grouped result set would be about 50-75% the total size of the ungrouped results. Thanks for any information or pointers. -- Steve Weiss Stylesight
Re: Error in indexing timestamp format.
Hi Noble Thanks for your update. Sorry, that's a typo that I put same name for both soure and dest. Actually i failed to removed it at some stage of trial and error. I removed the copyfield as it is not fully necessary at this stage. My scenario is like: I have various date fields in my database, with the format like: 22-10-08 03:57:11.63700 PM. I want to index and search these dates. even after making the above updates I am still not able to index or search these values. Expecting your reply Thanks in advance con Noble Paul നോബിള് नोब्ळ् wrote: > > could you explain me what is the purpose of this line? > > > I mean what are you trying to achive? > Where did you get the documentation for copyField. may be I need to check > it out > > On Wed, Nov 19, 2008 at 3:29 PM, con <[EMAIL PROTECTED]> wrote: >> >> >> Hi Nobble, >> Thank you very much >> That removed the error while server startup. >> >> But I don't think the data is getting indexed upon running the >> dataimport. I >> am unable to display the date field values on searching. >> This is my complete configs: >> >> > transformer="TemplateTransformer,DateFormatTransformer" pk="EMP_ID" >> query="select EMP_ID, CREATED_DATE, CUST_ID FROM EMP, CUST where >> EMP.EMP_ID >> = CUST.EMP_ID" > >> >> >> >> > dateTimeFormat="dd-MM-yy HH:mm:ss.S a" /> >> >> >> In the schema.xml I have: >> >> >> >> >> >> Do I need some other configurations. >> >> Thanks in advance >> con >> >> >> >> >> >> >> Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> sorry I meant wrong dest field name >>> >>> On Wed, Nov 19, 2008 at 12:41 PM, con <[EMAIL PROTECTED]> wrote: Hi Nobble I have cross checked. This is my copy field of schema.xml I am still getting that error. thanks con Noble Paul നോബിള് नोब्ळ् wrote: > > yoour copyField has the wrong source field name . Field name is not > "date" it is 'CREATED_DATE' > > On Wed, Nov 19, 2008 at 11:49 AM, con <[EMAIL PROTECTED]> wrote: >> >> Hi Shalin >> Please find the log data. >> >> 10:18:30,819 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.servlet.SolrDispatchFilter init >> INFO: SolrDispatchFilter.init() >> 10:18:30,838 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader locateInstanceDir >> INFO: No /solr/home in JNDI >> 10:18:30,839 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader locateInstanceDir >> INFO: using system property solr.solr.home: C:\Search\solr >> 10:18:30,844 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.CoreContainer$Initializer initialize >> INFO: looking for solr.xml: C:\Search\solr\solr.xml >> 10:18:30,845 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader >> INFO: Solr home set to 'C:\Search\solr/' >> 10:18:30,846 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr >> classloader >> 10:18:30,847 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr >> classloader >> 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader >> 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/ojdbc6-11.1.0.6.0.1.jar' to >> Solr >> classloader >> 10:18:30,849 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/servlet-api-2.5-6.1.3.jar' to >> Solr >> classloader >> 10:18:30,864 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.CoreContainer load >> INFO: loading shared library: C:\Search\solr\lib >> 10:18:30,867 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr >> classloader >> 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr >> classloader >> 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrResourceLoader createClassLoader >> INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader >> 10:18:30,871 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM >> org.apache.solr.core.SolrR
RE: Unique id
Basically, I am working on two views. First one has an ID column. The second view has no unique ID column. What to do in such situations? There are 3 other columns where I can make a composite key out of those. I have to index these two views now. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 5:24 PM To: solr-user@lucene.apache.org Subject: Re: Unique id Technically, no, a uniqueKey field is NOT required. I've yet to run into a situation where it made sense not to use one though. As for indexing database tables - if one of your tables doesn't have a primary key, does it have an aggregate unique "key" of some sort? Do you plan on updating the rows in that table and reindexing them? Seems like some kind of unique key would make sense for updating documents. But yeah, a more detailed description of your table structure and searching needs would be helpful. Erik On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote: > Yes it is. You need a unique id because the add method works as and > "add or update" method. When adding a document whose ID is already > found in the index, the old document will be deleted and the new > will be added. Are you indexing two tables into the same index? Or > does one entry in the index consist of data from both tables? How > are these linked together without an ID? > > - Aleksander > > On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao <[EMAIL PROTECTED] > > wrote: > >> Hi, >> >> Is the uniqueKey in schema.xml really required? >> >> >> Reason is, I am indexing two tables and I have id as unique key in >> schema.xml but id field is not there in one of the tables and >> indexing >> fails. Do I really require this unique field for Solr to index it >> better >> or can I do away with this? >> >> >> Thanks, >> >> Rahgu >> > > > > -- > Aleksander M. Stensby > Senior software developer > Integrasco A/S > www.integrasco.no
Re: Tomcat undeploy/shutdown exception
Eric, which Solr version is that stack trace from? On Thu, Nov 20, 2008 at 7:57 AM, Erik Hatcher <[EMAIL PROTECTED]>wrote: > In analyzing a clients Solr logs, from Tomcat, I came across the exception > below. Anyone encountered issues with Tomcat shutdowns or undeploys of Solr > contexts? I'm not sure if this is an anomaly due to some wonky Tomcat > handling, or if this is some kind of bug in Solr. I haven't actually > duplicated the issue myself though. > > Thanks, >Erik > > > Oct 29, 2008 10:14:31 AM org.apache.catalina.startup.HostConfig > undeployApps > WARNING: Error while removing context [/search] > java.lang.NullPointerException >at > org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:123) >at > org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:253) >at > org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:3670) >at > org.apache.catalina.core.StandardContext.stop(StandardContext.java:4354) >at > org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:893) >at > org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1191) >at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1162) >at > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:313) >at > org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) >at > org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1055) >at > org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1067) >at > org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:448) >at > org.apache.catalina.core.StandardService.stop(StandardService.java:510) >at > org.apache.catalina.core.StandardServer.stop(StandardServer.java:734) >at org.apache.catalina.startup.Catalina.stop(Catalina.java:602) >at org.apache.catalina.startup.Catalina.start(Catalina.java:577) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >at java.lang.reflect.Method.invoke(Method.java:597) >at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) >at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433) > > -- Regards, Shalin Shekhar Mangar.
Re: Solr schema 1.3 -> 1.4-dev (changes?)
Jon, I just committed a fix for this issue at https://issues.apache.org/jira/browse/SOLR-873 Can you please use trunk and see if it solved your problem? On Thu, Nov 20, 2008 at 10:32 AM, Jon Baer <[EMAIL PROTECTED]> wrote: > Correct ... it is the unfortunate side effect of having some legacy tables > in uppercase :-\ I thought the explicit declaration of field name attribute > was ok. > > - Jon > > > On Nov 19, 2008, at 11:53 PM, Noble Paul നോബിള് नोब्ळ् wrote: > > So originally you had the field declaration as follows . right? >> >> >> we did some refactoring to minimize the object creation for >> case-insensitive comparisons. >> >> I guess it should be rectified soon. >> >> Thanks for bringing it to our notice. >> --Noble >> >> >> >> >> >> On Thu, Nov 20, 2008 at 10:05 AM, Jon Baer <[EMAIL PROTECTED]> wrote: >> >>> Schema: >>> >>> >>> DIH: >>> >> template="PLAYER-${players.PLAYERID}"/> >>> >>> The column is uppercase ... isn't there some automagic happening now >>> where >>> DIH will introspect the fields @ load time? >>> >>> - Jon >>> >>> On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> Hi John, it could probably not the expected behavior? only 'explicit' fields must be case-sensitive. Could you tell me the usecase or can you paste the data-config? --Noble On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer <[EMAIL PROTECTED]> wrote: > > Sorry I should have mentioned this is from using the DataImportHandler > ... > it seems case insensitive ... ie my columns are UPPERCASE and schema > field > names are lowercase and it works fine in 1.3 but not in 1.4 ... it > seems > strict. Going to resolve all the field names to uppercase to see if > that > resolves the problem. Thanks. > > - Jon > > On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote: > > schema fields should be case sensitive... so DOCTYPE != doctype >> >> is the behavior different for you in 1.3 with the same file/schema? >> >> >> On Nov 19, 2008, at 6:26 PM, Jon Baer wrote: >> >> Hi, >>> >>> I wanted to try the TermVectorComponent w/ current schema setup and I >>> did >>> a build off trunk but it's giving me something like ... >>> >>> org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE' >>> >>> Even though it is declared in schema.xml (lowercase), before I grep >>> replace the entire file would that be my issue? >>> >>> Thanks. >>> >>> - Jon >>> >> >> > > -- --Noble Paul >>> >>> >>> >> >> >> -- >> --Noble Paul >> > > -- Regards, Shalin Shekhar Mangar.