Re: DataImport
If I've copied the solr.war under tomcat/webapps directory, after restarting it the archive extracts itself and I get solr directory. Why do I need to set example-solr-home/solr, which is not in the /webapps directory, as home directory? Quoting Shalin Shekhar Mangar : No, the steps are as follows: 1. Download the example-solr-home.jar from the DataImportHandler wiki page 2. Extract it. You'll find a folder named "example-solr-home" and a solr.war file after extraction 3. Copy the solr.war to tomcat_home/webapps. You don't need any other solr instance. This war is self-sufficient. 4. You need to set the example-solr-home/solr folder as the solr home folder. For instructions on how to do that, look at http://wiki.apache.org/solr/SolrTomcat From the port number of the URL you are trying, it seems that you're using the Jetty supplied with Solr instead of Tomcat. 2008/6/9 Mihails Agafonovs : > I've placed the solr.war under the tomcat directory, restarted tomcat > to deploy the solr.war. But still... there is no .jar, no folder named > "example-data-config", and hitting > http://localhost:8983/solr/dataimport doesn't work. > Do I need the original Solr instance to use this .war with? > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You can > use the solr.war file. If you really > need a jar, you'll need to use the SOLR-469.patch at > http://issues.apache.org/jira/browse/SOLR-469 and build solr from > source > after applying that patch. > 2. The jar contains a folder named "example-solr-home". Please check > again. > Please let me know if you run into any problems. > 2008/6/9 Mihails Agafonovs : > > Looked through the tutorial on data import, section "Full Import > > Example". > > 1) Where is this dataimport.jar? There is no such file in the > > extracted example-solr-home.jar. > > 2) "Use the solr folder inside example-data-config folder as your > > solr home." What does this mean? Anyway, there is no folder > > example-data-config. > > Ar cieņu, Mihails > -- > Regards, > Shalin Shekhar Mangar. > Ar cieņu, Mihails > > Links: > -- > [1] mailto:[EMAIL PROTECTED] > -- Regards, Shalin Shekhar Mangar. Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED]
Problem with add a XML
Hi, I make tests my first tests with solr. My problem ist I have 2 xml files 1) TWINX2048-3200PRO CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail Corsair Microsystems Inc. electronics memory CAS latency 2, 2-3-3-6 timing, 2.75v, unbuffered, heat-spreader 185 5 true (this ist the XML from the Samples) 2) 85f4fdf9-e596-4974-a5b9-57778e38067b 143885 28.10.2005 13:06:15 Rechnung 2005-025235 Rechnungsduplikate 2002 330T.doc KIS Bonow 25906 Hofma GmbH Mandant (the second have I create) now I want tho add die files to solr. I have start solr on windows in the example directory with java -jar start.jar I have the following Error Message: C:\test\output>java -jar post.jar *.xml SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file 1.xml SimplePostTool: POSTing file 2.xml SimplePostTool: FATAL: Connection error (is Solr running at http://localhost:8983/solr/update ?): java.io.IOException: S erver returned HTTP response code: 400 for URL: http://localhost:8983/solr/update C:\test\output> Regards Thomas Lauer __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3175 (20080611) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com
RE: DataImport
Mihails, Put the solr.war into the webapps directory and restart tomcat, then follow up the console and you'll see messages saying solr.war is getting deployed. Use a recent nightly build as that has the dataimport related patch included. Regards Kishore. -Original Message- From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 1:13 PM To: solr-user@lucene.apache.org Subject: Re: DataImport If I've copied the solr.war under tomcat/webapps directory, after restarting it the archive extracts itself and I get solr directory. Why do I need to set example-solr-home/solr, which is not in the /webapps directory, as home directory? Quoting Shalin Shekhar Mangar : No, the steps are as follows: 1. Download the example-solr-home.jar from the DataImportHandler wiki page 2. Extract it. You'll find a folder named "example-solr-home" and a solr.war file after extraction 3. Copy the solr.war to tomcat_home/webapps. You don't need any other solr instance. This war is self-sufficient. 4. You need to set the example-solr-home/solr folder as the solr home folder. For instructions on how to do that, look at http://wiki.apache.org/solr/SolrTomcat From the port number of the URL you are trying, it seems that you're using the Jetty supplied with Solr instead of Tomcat. 2008/6/9 Mihails Agafonovs : > I've placed the solr.war under the tomcat directory, restarted tomcat > to deploy the solr.war. But still... there is no .jar, no folder named > "example-data-config", and hitting > http://localhost:8983/solr/dataimport doesn't work. > Do I need the original Solr instance to use this .war with? > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You can > use the solr.war file. If you really > need a jar, you'll need to use the SOLR-469.patch at > http://issues.apache.org/jira/browse/SOLR-469 and build solr from > source > after applying that patch. > 2. The jar contains a folder named "example-solr-home". Please check > again. > Please let me know if you run into any problems. > 2008/6/9 Mihails Agafonovs : > > Looked through the tutorial on data import, section "Full Import > > Example". > > 1) Where is this dataimport.jar? There is no such file in the > > extracted example-solr-home.jar. > > 2) "Use the solr folder inside example-data-config folder as your > > solr home." What does this mean? Anyway, there is no folder > > example-data-config. > > Ar cieņu, Mihails > -- > Regards, > Shalin Shekhar Mangar. > Ar cieņu, Mihails > > Links: > -- > [1] mailto:[EMAIL PROTECTED] > -- Regards, Shalin Shekhar Mangar. Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED]
RE: DataImport
I've already done that, but cannot access solr via web, and apache log says something wrong with solr home directory. - Couldn't start SOLR. Check solr/home property. - Quoting "Chakraborty, Kishore K." : Mihails, Put the solr.war into the webapps directory and restart tomcat, then follow up the console and you'll see messages saying solr.war is getting deployed. Use a recent nightly build as that has the dataimport related patch included. Regards Kishore. -Original Message- From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 1:13 PM To: solr-user@lucene.apache.org Subject: Re: DataImport If I've copied the solr.war under tomcat/webapps directory, after restarting it the archive extracts itself and I get solr directory. Why do I need to set example-solr-home/solr, which is not in the /webapps directory, as home directory? Quoting Shalin Shekhar Mangar : No, the steps are as follows: 1. Download the example-solr-home.jar from the DataImportHandler wiki page 2. Extract it. You'll find a folder named "example-solr-home" and a solr.war file after extraction 3. Copy the solr.war to tomcat_home/webapps. You don't need any other solr instance. This war is self-sufficient. 4. You need to set the example-solr-home/solr folder as the solr home folder. For instructions on how to do that, look at http://wiki.apache.org/solr/SolrTomcat From the port number of the URL you are trying, it seems that you're using the Jetty supplied with Solr instead of Tomcat. 2008/6/9 Mihails Agafonovs : > I've placed the solr.war under the tomcat directory, restarted tomcat > to deploy the solr.war. But still... there is no .jar, no folder named > "example-data-config", and hitting > http://localhost:8983/solr/dataimport doesn't work. > Do I need the original Solr instance to use this .war with? > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You can > use the solr.war file. If you really > need a jar, you'll need to use the SOLR-469.patch at > http://issues.apache.org/jira/browse/SOLR-469 and build solr from > source > after applying that patch. > 2. The jar contains a folder named "example-solr-home". Please check > again. > Please let me know if you run into any problems. > 2008/6/9 Mihails Agafonovs : > > Looked through the tutorial on data import, section "Full Import > > Example". > > 1) Where is this dataimport.jar? There is no such file in the > > extracted example-solr-home.jar. > > 2) "Use the solr folder inside example-data-config folder as your > > solr home." What does this mean? Anyway, there is no folder > > example-data-config. > > Ar cieņu, Mihails > -- > Regards, > Shalin Shekhar Mangar. > Ar cieņu, Mihails > > Links: > -- > [1] mailto:[EMAIL PROTECTED] > -- Regards, Shalin Shekhar Mangar. Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED] Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED]
RE: DataImport
There are two parts of solr application. Solr.war is only the web app. Then there is the example-solr which you need to download and unpack in a different folder. There are 3 ways to make the example-solr visible to the solr webapp. Refer to these pages : http://wiki.apache.org/solr/SolrInstall And http://wiki.apache.org/solr/SolrTomcat (mainly the Configuring Solr Home with JNDI section) Hope it helps. Regards Kishore. -Original Message- From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 1:33 PM To: solr-user@lucene.apache.org Subject: RE: DataImport I've already done that, but cannot access solr via web, and apache log says something wrong with solr home directory. - Couldn't start SOLR. Check solr/home property. - Quoting "Chakraborty, Kishore K." : Mihails, Put the solr.war into the webapps directory and restart tomcat, then follow up the console and you'll see messages saying solr.war is getting deployed. Use a recent nightly build as that has the dataimport related patch included. Regards Kishore. -Original Message- From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 1:13 PM To: solr-user@lucene.apache.org Subject: Re: DataImport If I've copied the solr.war under tomcat/webapps directory, after restarting it the archive extracts itself and I get solr directory. Why do I need to set example-solr-home/solr, which is not in the /webapps directory, as home directory? Quoting Shalin Shekhar Mangar : No, the steps are as follows: 1. Download the example-solr-home.jar from the DataImportHandler wiki page 2. Extract it. You'll find a folder named "example-solr-home" and a solr.war file after extraction 3. Copy the solr.war to tomcat_home/webapps. You don't need any other solr instance. This war is self-sufficient. 4. You need to set the example-solr-home/solr folder as the solr home folder. For instructions on how to do that, look at http://wiki.apache.org/solr/SolrTomcat From the port number of the URL you are trying, it seems that you're using the Jetty supplied with Solr instead of Tomcat. 2008/6/9 Mihails Agafonovs : > I've placed the solr.war under the tomcat directory, restarted tomcat > to deploy the solr.war. But still... there is no .jar, no folder named > "example-data-config", and hitting > http://localhost:8983/solr/dataimport doesn't work. > Do I need the original Solr instance to use this .war with? > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You can > use the solr.war file. If you really > need a jar, you'll need to use the SOLR-469.patch at > http://issues.apache.org/jira/browse/SOLR-469 and build solr from > source > after applying that patch. > 2. The jar contains a folder named "example-solr-home". Please check > again. > Please let me know if you run into any problems. > 2008/6/9 Mihails Agafonovs : > > Looked through the tutorial on data import, section "Full Import > > Example". > > 1) Where is this dataimport.jar? There is no such file in the > > extracted example-solr-home.jar. > > 2) "Use the solr folder inside example-data-config folder as your > > solr home." What does this mean? Anyway, there is no folder > > example-data-config. > > Ar cieņu, Mihails > -- > Regards, > Shalin Shekhar Mangar. > Ar cieņu, Mihails > > Links: > -- > [1] mailto:[EMAIL PROTECTED] > -- Regards, Shalin Shekhar Mangar. Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED] Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED]
Re: DataImport
Hi Mihails, The solr home is a directory which contains the conf/ and data/ folders. The conf folder contains solrconfig.xml, schema.xml and other such configuration files. The data/ folder contains the index files. Other than adding the war file to tomcat, you also need to designate a certain folder as solr home, so that solr knows from where to load it's configuration. By default, solr searches for a folder named "solr" under the current working directory (pwd) to use as home. There are other ways of configuring it as given in solr wiki. Hope that helpes. 2008/6/11 Mihails Agafonovs <[EMAIL PROTECTED]>: > I've already done that, but cannot access solr via web, and apache log > says something wrong with solr home directory. > - > Couldn't start SOLR. Check solr/home property. > - > Quoting "Chakraborty, Kishore K." : Mihails, > Put the solr.war into the webapps directory and restart tomcat, then > follow up the console and you'll see messages saying solr.war is > getting deployed. > Use a recent nightly build as that has the dataimport related patch > included. > Regards > Kishore. > -Original Message- > From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 11, 2008 1:13 PM > To: solr-user@lucene.apache.org > Subject: Re: DataImport > If I've copied the solr.war under tomcat/webapps directory, after > restarting it the archive extracts itself and I get solr directory. > Why do I need to set example-solr-home/solr, which is not in the > /webapps directory, as home directory? > Quoting Shalin Shekhar Mangar : No, the steps are as follows: > 1. Download the example-solr-home.jar from the DataImportHandler > wiki page > 2. Extract it. You'll find a folder named "example-solr-home" and a > solr.war > file after extraction > 3. Copy the solr.war to tomcat_home/webapps. You don't need any > other solr > instance. This war is self-sufficient. > 4. You need to set the example-solr-home/solr folder as the solr > home > folder. For instructions on how to do that, look at > http://wiki.apache.org/solr/SolrTomcat > From the port number of the URL you are trying, it seems that you're > using > the Jetty supplied with Solr instead of Tomcat. > 2008/6/9 Mihails Agafonovs : > > I've placed the solr.war under the tomcat directory, restarted > tomcat > > to deploy the solr.war. But still... there is no .jar, no folder > named > > "example-data-config", and hitting > > http://localhost:8983/solr/dataimport doesn't work. > > Do I need the original Solr instance to use this .war with? > > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You > can > > use the solr.war file. If you really > > need a jar, you'll need to use the SOLR-469.patch at > > http://issues.apache.org/jira/browse/SOLR-469 and build solr from > > source > > after applying that patch. > > 2. The jar contains a folder named "example-solr-home". Please > check > > again. > > Please let me know if you run into any problems. > > 2008/6/9 Mihails Agafonovs : > > > Looked through the tutorial on data import, section "Full > Import > > > Example". > > > 1) Where is this dataimport.jar? There is no such file in the > > > extracted example-solr-home.jar. > > > 2) "Use the solr folder inside example-data-config folder as > your > > > solr home." What does this mean? Anyway, there is no folder > > > example-data-config. > > > Ar cieņu, Mihails > > -- > > Regards, > > Shalin Shekhar Mangar. > > Ar cieņu, Mihails > > > > Links: > > -- > > [1] mailto:[EMAIL PROTECTED] > > > -- > Regards, > Shalin Shekhar Mangar. > Ar cieņu, Mihails > Links: > -- > [1] mailto:[EMAIL PROTECTED] > Ar cieņu, Mihails > > Links: > -- > [1] mailto:[EMAIL PROTECTED] > -- Regards, Shalin Shekhar Mangar.
Re: DataImport
I'm stuck... I now have /tomcat5.5/webapps/solr (exploded solr.war), /tomcat5.5/webapps/solr/solr-example/. I've ran export JAVA_OPTS="$JAVA_OPTS-Dsolr.solr.home=/usr/share/tomcat5.5/webapps/solr/example/solr/ to make /example/solr/ as a home directory. What am I doing wrong? Quoting Shalin Shekhar Mangar : Hi Mihails, The solr home is a directory which contains the conf/ and data/ folders. The conf folder contains solrconfig.xml, schema.xml and other such configuration files. The data/ folder contains the index files. Other than adding the war file to tomcat, you also need to designate a certain folder as solr home, so that solr knows from where to load it's configuration. By default, solr searches for a folder named "solr" under the current working directory (pwd) to use as home. There are other ways of configuring it as given in solr wiki. Hope that helpes. 2008/6/11 Mihails Agafonovs : > I've already done that, but cannot access solr via web, and apache log > says something wrong with solr home directory. > - > Couldn't start SOLR. Check solr/home property. > - > Quoting "Chakraborty, Kishore K." : Mihails, > Put the solr.war into the webapps directory and restart tomcat, then > follow up the console and you'll see messages saying solr.war is > getting deployed. > Use a recent nightly build as that has the dataimport related patch > included. > Regards > Kishore. > -Original Message- > From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 11, 2008 1:13 PM > To: solr-user@lucene.apache.org > Subject: Re: DataImport > If I've copied the solr.war under tomcat/webapps directory, after > restarting it the archive extracts itself and I get solr directory. > Why do I need to set example-solr-home/solr, which is not in the > /webapps directory, as home directory? > Quoting Shalin Shekhar Mangar : No, the steps are as follows: > 1. Download the example-solr-home.jar from the DataImportHandler > wiki page > 2. Extract it. You'll find a folder named "example-solr-home" and a > solr.war > file after extraction > 3. Copy the solr.war to tomcat_home/webapps. You don't need any > other solr > instance. This war is self-sufficient. > 4. You need to set the example-solr-home/solr folder as the solr > home > folder. For instructions on how to do that, look at > http://wiki.apache.org/solr/SolrTomcat > From the port number of the URL you are trying, it seems that you're > using > the Jetty supplied with Solr instead of Tomcat. > 2008/6/9 Mihails Agafonovs : > > I've placed the solr.war under the tomcat directory, restarted > tomcat > > to deploy the solr.war. But still... there is no .jar, no folder > named > > "example-data-config", and hitting > > http://localhost:8983/solr/dataimport doesn't work. > > Do I need the original Solr instance to use this .war with? > > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You > can > > use the solr.war file. If you really > > need a jar, you'll need to use the SOLR-469.patch at > > http://issues.apache.org/jira/browse/SOLR-469 and build solr from > > source > > after applying that patch. > > 2. The jar contains a folder named "example-solr-home". Please > check > > again. > > Please let me know if you run into any problems. > > 2008/6/9 Mihails Agafonovs : > > > Looked through the tutorial on data import, section "Full > Import > > > Example". > > > 1) Where is this dataimport.jar? There is no such file in the > > > extracted example-solr-home.jar. > > > 2) "Use the solr folder inside example-data-config folder as > your > > > solr home." What does this mean? Anyway, there is no folder > > > example-data-config. > > > Ar cieņu, Mihails > > -- > > Regards, > > Shalin Shekhar Mangar. > > Ar cieņu, Mihails > > > > Links: > > -- > > [1] mailto:[EMAIL PROTECTED] > > > -- > Regards, > Shalin Shekhar Mangar. > Ar cieņu, Mihails > Links: > -- > [1] mailto:[EMAIL PROTECTED] > Ar cieņu, Mihails > > Links: > -- > [1] mailto:[EMAIL PROTECTED] > -- Regards, Shalin Shekhar Mangar. Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED]
Re: DataImportHandler : How to mix XPathEntityProcessor and TemplateTransformer
Thanks a million for your time and help. It indeed works smoothly now. I also, by the way, had to apply the "patch" attached to the following message : http://www.nabble.com/Re%3A-How-to-describe-2-entities-in-dataConfig- for-the-DataImporter--p17577610.html in order to have the TemplateTransformer to not throw Null Pointer exceptions :) Cheers ! -- Nicolas Pastorino On Jun 10, 2008, at 18:05 , Noble Paul നോബിള് नोब्ळ् wrote: It is a bug, nice catch there needs to be a null check there in the method can us just try replacing the method with the following? private Node getMatchingChild(XMLStreamReader parser) { if(childNodes == null) return null; String localName = parser.getLocalName(); for (Node n : childNodes) { if (n.name.equals(localName)) { if (n.attribAndValues == null) return n; if (checkForAttributes(parser, n.attribAndValues)) return n; } } return null; } I tried with that code and it is working. We shall add it in the next patch --Noble On Tue, Jun 10, 2008 at 9:11 PM, Nicolas Pastorino <[EMAIL PROTECTED]> wrote: I just forgot to mention the error related to the description below. I get the following when running a full-import ( sorry for the noise .. ) : SEVERE: Full Import failed java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords (XPathRecordReader.java:85) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery (XPathEntityProcessor.java:207) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow( XPathEntityProcessor.java:161) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow (XPathEntityProcessor.java:144) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument (DocBuilder.java:280) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument (DocBuilder.java:302) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump (DocBuilder.java:173) at org.apache.solr.handler.dataimport.DocBuilder.execute (DocBuilder.java:134) at org.apache.solr.handler.dataimport.DataImporter.doFullImport (DataImporter.java:323) at org.apache.solr.handler.dataimport.DataImporter.rumCmd (DataImporter.java:374) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBod y(DataImportHandler.java:179) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:272) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter (ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle (ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle (SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle (SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle (ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle (ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle (HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle (HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete (HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable (HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle (HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run (SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run (BoundedThreadPool.java:442) Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader $Node.getMatchingChild(XPathRecordReader.java:198) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse (XPathRecordReader.java:171) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse (XPathRecordReader.java:174) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse (XPathRecordReader.java:174) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access $000(XPathRecordReader.java:89) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords (XPathRecordReader.java
Re: DataImportHandler : How to mix XPathEntityProcessor and TemplateTransformer
We are cutting a a patch which incorporates all the recent bug fixes, so that you guys do not have to apply patches over patches --Noble On Wed, Jun 11, 2008 at 3:49 PM, Nicolas Pastorino <[EMAIL PROTECTED]> wrote: > Thanks a million for your time and help. > It indeed works smoothly now. > > I also, by the way, had to apply the "patch" attached to the following > message : > http://www.nabble.com/Re%3A-How-to-describe-2-entities-in-dataConfig-for-the-DataImporter--p17577610.html > in order to have the TemplateTransformer to not throw Null Pointer > exceptions :) > > Cheers ! > -- > Nicolas Pastorino > > On Jun 10, 2008, at 18:05 , Noble Paul നോബിള് नोब्ळ् wrote: > >> It is a bug, nice catch >> there needs to be a null check there in the method >> can us just try replacing the method with the following? >> >> private Node getMatchingChild(XMLStreamReader parser) { >> if(childNodes == null) return null; >> String localName = parser.getLocalName(); >> for (Node n : childNodes) { >>if (n.name.equals(localName)) { >> if (n.attribAndValues == null) >>return n; >> if (checkForAttributes(parser, n.attribAndValues)) >>return n; >>} >> } >> return null; >>} >> >> I tried with that code and it is working. We shall add it in the next >> patch >> >> >> --Noble >> On Tue, Jun 10, 2008 at 9:11 PM, Nicolas Pastorino <[EMAIL PROTECTED]> wrote: >>> >>> I just forgot to mention the error related to the description below. I >>> get >>> the following when running a full-import ( sorry for the noise .. ) : >>> >>> SEVERE: Full Import failed >>> java.lang.RuntimeException: java.lang.NullPointerException >>> at >>> >>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85) >>> at >>> >>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:207) >>> at >>> >>> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:161) >>> at >>> >>> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:144) >>> at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:280) >>> at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:302) >>> at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:173) >>> at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:134) >>> at >>> >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:323) >>> at >>> >>> org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:374) >>> at >>> >>> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) >>> at >>> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) >>> at >>> >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) >>> at >>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272) >>> at >>> >>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) >>> at >>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) >>> at >>> >>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >>> at >>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) >>> at >>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) >>> at >>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) >>> at >>> >>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) >>> at >>> >>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) >>> at >>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) >>> at org.mortbay.jetty.Server.handle(Server.java:285) >>> at >>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) >>> at >>> >>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) >>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) >>> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) >>> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) >>> at >>> >>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) >>> at >>> >>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) >>> Caused by: java.lang.NullPointerException >>> at >>> >>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.getMatchingChild(XPathRecordReader.java:198)
Re: Problem with add a XML
On Jun 11, 2008, at 3:46 AM, Thomas Lauer wrote: now I want tho add die files to solr. I have start solr on windows in the example directory with java -jar start.jar I have the following Error Message: C:\test\output>java -jar post.jar *.xml SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported This is your issue right here. You have to save that second file in UTF-8. SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file 1.xml SimplePostTool: POSTing file 2.xml SimplePostTool: FATAL: Connection error (is Solr running at http://localhost:8983/solr/update ?): java.io.IOException: S erver returned HTTP response code: 400 for URL: http://localhost:8983/solr/update C:\test\output> Regards Thomas Lauer __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank- Version 3175 (20080611) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: searching only within allowed documents
Solr allows you to specify filters in separate parameters that are applied to the main query, but cached separately. q=the user query&fq=folder:f13&fq=folder:f24 I've been wanting more explanation around this for a while, so maybe now is a good time to ask :) the "cached separately" verbiage here is the same as in the twiki, but I don't really understand what it means. more precisely, I'm wondering what the real performance, caching, etc differences are between q=fielda:foo+fieldb:bar&mm=100% and q=fielda:foo&fq=fieldb:bar my situation is similar to the original poster's in that documents matching fielda is very large and common (say theaters across the world) while fieldb would narrow it considerably (one by country, then one by zipcode, etc). thanks --Geoff
Re: DataImport
Ok, let's start again from scratch with a clean Tomcat installation. 1. Download example-solr-home.jar from the wiki and extract it to a local folder for example to /home// 2. You will now see a folder called example-solr-home where you extracted the jar file in the above step 3. Copy /home//example-solr-home/solr.war to /tomcat5.5/webapps/solr.war 4. export JAVA_OPTS="-Dsolr.solr.home=/home//example-solr-home/solr" 5. start tomcat from the same shell after exporting the above variable Verify that tomcat starts without showing any exceptions in the logs. Now you will be able to run the examples given in the DataImportHandler wiki. 2008/6/11 Mihails Agafonovs <[EMAIL PROTECTED]>: > I'm stuck... > > I now have /tomcat5.5/webapps/solr (exploded solr.war), > /tomcat5.5/webapps/solr/solr-example/. > I've ran > > export > > JAVA_OPTS="$JAVA_OPTS-Dsolr.solr.home=/usr/share/tomcat5.5/webapps/solr/example/solr/ > to make /example/solr/ as a home directory. > > What am I doing wrong? > > Quoting Shalin Shekhar Mangar : Hi Mihails, > The solr home is a directory which contains the conf/ and data/ > folders. The > conf folder contains solrconfig.xml, schema.xml and other such > configuration > files. The data/ folder contains the index files. > Other than adding the war file to tomcat, you also need to designate > a > certain folder as solr home, so that solr knows from where to load > it's > configuration. By default, solr searches for a folder named "solr" > under the > current working directory (pwd) to use as home. There are other ways > of > configuring it as given in solr wiki. Hope that helpes. > 2008/6/11 Mihails Agafonovs : > > I've already done that, but cannot access solr via web, and apache > log > > says something wrong with solr home directory. > > - > > Couldn't start SOLR. Check solr/home property. > > - > > Quoting "Chakraborty, Kishore K." : Mihails, > > Put the solr.war into the webapps directory and restart tomcat, > then > > follow up the console and you'll see messages saying solr.war is > > getting deployed. > > Use a recent nightly build as that has the dataimport related > patch > > included. > > Regards > > Kishore. > > -Original Message- > > From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, June 11, 2008 1:13 PM > > To: solr-user@lucene.apache.org > > Subject: Re: DataImport > > If I've copied the solr.war under tomcat/webapps directory, after > > restarting it the archive extracts itself and I get solr > directory. > > Why do I need to set example-solr-home/solr, which is not in the > > /webapps directory, as home directory? > > Quoting Shalin Shekhar Mangar : No, the steps are as follows: > > 1. Download the example-solr-home.jar from the DataImportHandler > > wiki page > > 2. Extract it. You'll find a folder named "example-solr-home" and > a > > solr.war > > file after extraction > > 3. Copy the solr.war to tomcat_home/webapps. You don't need any > > other solr > > instance. This war is self-sufficient. > > 4. You need to set the example-solr-home/solr folder as the solr > > home > > folder. For instructions on how to do that, look at > > http://wiki.apache.org/solr/SolrTomcat > > From the port number of the URL you are trying, it seems that > you're > > using > > the Jetty supplied with Solr instead of Tomcat. > > 2008/6/9 Mihails Agafonovs : > > > I've placed the solr.war under the tomcat directory, restarted > > tomcat > > > to deploy the solr.war. But still... there is no .jar, no > folder > > named > > > "example-data-config", and hitting > > > http://localhost:8983/solr/dataimport doesn't work. > > > Do I need the original Solr instance to use this .war with? > > > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. > You > > can > > > use the solr.war file. If you really > > > need a jar, you'll need to use the SOLR-469.patch at > > > http://issues.apache.org/jira/browse/SOLR-469 and build solr > from > > > source > > > after applying that patch. > > > 2. The jar contains a folder named "example-solr-home". Please > > check > > > again. > > > Please let me know if you run into any problems. > > > 2008/6/9 Mihails Agafonovs : > > > > Looked through the tutorial on data import, section "Full > > Import > > > > Example". > > > > 1) Where is this dataimport.jar? There is no such file in > the > > > > extracted example-solr-home.jar. > > > > 2) "Use the solr folder inside example-data-config folder as > > your > > > > solr home." What does this mean? Anyway, there is no folder > > > > example-data-config. > > > > Ar cieņu, Mihails > > > -- > > > Regards, > > > Shalin Shekhar Mangar. > > > Ar cieņu, Mihails > > > > > > Links: > > > -- > > > [1] mailto:[EMAIL PROTECTED] > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > Ar cieņu, Mihails > > Links
Re: DataImport
"Exception in Lucene Index Updater". Anyway, for some reasons I'm able to start Solr only using its own Jetty. Everything else works fine on my Tomcat, except Solr. Quoting Shalin Shekhar Mangar : Ok, let's start again from scratch with a clean Tomcat installation. 1. Download example-solr-home.jar from the wiki and extract it to a local folder for example to /home// 2. You will now see a folder called example-solr-home where you extracted the jar file in the above step 3. Copy /home//example-solr-home/solr.war to /tomcat5.5/webapps/solr.war 4. export JAVA_OPTS="-Dsolr.solr.home=/home//example-solr-home/solr" 5. start tomcat from the same shell after exporting the above variable Verify that tomcat starts without showing any exceptions in the logs. Now you will be able to run the examples given in the DataImportHandler wiki. 2008/6/11 Mihails Agafonovs : > I'm stuck... > > I now have /tomcat5.5/webapps/solr (exploded solr.war), > /tomcat5.5/webapps/solr/solr-example/. > I've ran > > export > > JAVA_OPTS="$JAVA_OPTS-Dsolr.solr.home=/usr/share/tomcat5.5/webapps/solr/example/solr/ > to make /example/solr/ as a home directory. > > What am I doing wrong? > > Quoting Shalin Shekhar Mangar : Hi Mihails, > The solr home is a directory which contains the conf/ and data/ > folders. The > conf folder contains solrconfig.xml, schema.xml and other such > configuration > files. The data/ folder contains the index files. > Other than adding the war file to tomcat, you also need to designate > a > certain folder as solr home, so that solr knows from where to load > it's > configuration. By default, solr searches for a folder named "solr" > under the > current working directory (pwd) to use as home. There are other ways > of > configuring it as given in solr wiki. Hope that helpes. > 2008/6/11 Mihails Agafonovs : > > I've already done that, but cannot access solr via web, and apache > log > > says something wrong with solr home directory. > > - > > Couldn't start SOLR. Check solr/home property. > > - > > Quoting "Chakraborty, Kishore K." : Mihails, > > Put the solr.war into the webapps directory and restart tomcat, > then > > follow up the console and you'll see messages saying solr.war is > > getting deployed. > > Use a recent nightly build as that has the dataimport related > patch > > included. > > Regards > > Kishore. > > -Original Message- > > From: Mihails Agafonovs [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, June 11, 2008 1:13 PM > > To: solr-user@lucene.apache.org > > Subject: Re: DataImport > > If I've copied the solr.war under tomcat/webapps directory, after > > restarting it the archive extracts itself and I get solr > directory. > > Why do I need to set example-solr-home/solr, which is not in the > > /webapps directory, as home directory? > > Quoting Shalin Shekhar Mangar : No, the steps are as follows: > > 1. Download the example-solr-home.jar from the DataImportHandler > > wiki page > > 2. Extract it. You'll find a folder named "example-solr-home" and > a > > solr.war > > file after extraction > > 3. Copy the solr.war to tomcat_home/webapps. You don't need any > > other solr > > instance. This war is self-sufficient. > > 4. You need to set the example-solr-home/solr folder as the solr > > home > > folder. For instructions on how to do that, look at > > http://wiki.apache.org/solr/SolrTomcat > > From the port number of the URL you are trying, it seems that > you're > > using > > the Jetty supplied with Solr instead of Tomcat. > > 2008/6/9 Mihails Agafonovs : > > > I've placed the solr.war under the tomcat directory, restarted > > tomcat > > > to deploy the solr.war. But still... there is no .jar, no > folder > > named > > > "example-data-config", and hitting > > > http://localhost:8983/solr/dataimport doesn't work. > > > Do I need the original Solr instance to use this .war with? > > > Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. > You > > can > > > use the solr.war file. If you really > > > need a jar, you'll need to use the SOLR-469.patch at > > > http://issues.apache.org/jira/browse/SOLR-469 and build solr > from > > > source > > > after applying that patch. > > > 2. The jar contains a folder named "example-solr-home". Please > > check > > > again. > > > Please let me know if you run into any problems. > > > 2008/6/9 Mihails Agafonovs : > > > > Looked through the tutorial on data import, section "Full > > Import > > > > Example". > > > > 1) Where is this dataimport.jar? There is no such file in > the > > > > extracted example-solr-home.jar. > > > > 2) "Use the solr folder inside example-data-config folder as > > your > > > > solr home." What does this mean? Anyway, there is no folder > > > > example-data-conf
range query highlighting
Hi, I'm using solr built from trunk and highlighting for range queries doesn't work. If I search for "2008" everything works as expected but if I search for "[2000 TO 2008]" nothing gets highlighted. The field I'm searching on is a TextField and I've confirmed that the query and index analyzers are working as expected. I didn't find anything in the issue tracker about this. Any ideas? TIA, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6
RE: [jira] Updated: (SOLR-469) Data Import RequestHandler
Shalin, Thanks for consolidating the patch. Any idea, when the dB Import request handler will be part of the nightly build? Thanks again ** julio -Original Message- From: Shalin Shekhar Mangar (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 8:43 AM To: [EMAIL PROTECTED] Subject: [jira] Updated: (SOLR-469) Data Import RequestHandler [ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugi n.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-469: --- Attachment: SOLR-469.patch A new patch file (SOLR-469.patch) consisting of some important bug fixes and minor enhancements. The changes and the corresponding classes are given below *Changes* * Set fetch size to Integer.MIN_VALUE if batchSize in configuration is -1 as per Patrick's suggestion -- JdbcDataSource * Transformers can add a boost to a document by adding a key/value pair row.put("$docBoost", 2.0f) from any entity -- DocBuilder,SolrWriter and DataImportHandler * Fixes for infinite loop in SqlEntityProcessor when delta query fails for some reason and NullPointerException is thrown in EntityProcessorBase -- EntityProcessorBase * Fix for NullPointerException in TemplateTransformer and corresponding test -- TemplateTransformer, TestTemplateTransformer * Enhancement for specifying table.column syntax for pk attribute in entity as per issue reported by Chris Moser and Olivier Poitrey -- SqlEntityProcessor,TestSqlEntityProcessor2 * Fix for NullPointerException in XPathRecordReader when attribute specified through xpath is null -- XPathRecordReader, TestXPathRecordReader * Enhancement to DataSource interface to provide a close method -- DataSource, FileDataSource, HttpDataSource, MockDataSource * Context interface has a new method getDataSource(String entityName) for getting a new DataSource instance for the given entity -- Context, ContextImpl, DataImporter, DocBuilder * FileListEntityProcessor implements olderThan and newerThan filtering parameters -- FileListEntityProcessor, TestFileListEntityProcessor * Debug Mode can be disabled from solrconfig.xml by enableDebug=false -- DataImporter, DataImportHandler * Running statistics are exposed on the Solr Statistics page in addition to cumulative statictics -- DataImportHandler, DocBuilder > Data Import RequestHandler > -- > > Key: SOLR-469 > URL: https://issues.apache.org/jira/browse/SOLR-469 > Project: Solr > Issue Type: New Feature > Components: update >Affects Versions: 1.3 >Reporter: Noble Paul >Assignee: Grant Ingersoll > Fix For: 1.3 > > Attachments: SOLR-469-contrib.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch > > > We need a RequestHandler Which can import data from a DB or other dataSources into the Solr index .Think of it as an advanced form of SqlUpload Plugin (SOLR-103). > The way it works is as follows. > * Provide a configuration file (xml) to the Handler which takes in the necessary SQL queries and mappings to a solr schema > - It also takes in a properties file for the data source configuraution > * Given the configuration it can also generate the solr schema.xml > * It is registered as a RequestHandler which can take two commands do-full-import, do-delta-import > - do-full-import - dumps all the data from the Database into the index (based on the SQL query in configuration) > - do-delta-import - dumps all the data that has changed since last import. (We assume a modified-timestamp column in tables) > * It provides a admin page > - where we can schedule it to be run automatically at regular intervals > - It shows the status of the Handler (idle, full-import, > delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Updated: (SOLR-469) Data Import RequestHandler
Hi Julio, That was fast! I just uploaded a patch :) Actually, it is waiting on SOLR-563 ( http://issues.apache.org/jira/browse/SOLR-563) which deals with modifying the build scripts to create a contrib project area in Solr. I'm planning to work on that this week. Once that is done, it would be upto a committer to add it to the trunk. On Wed, Jun 11, 2008 at 9:24 PM, Julio Castillo <[EMAIL PROTECTED]> wrote: > Shalin, > Thanks for consolidating the patch. > > Any idea, when the dB Import request handler will be part of the nightly > build? > > Thanks again > > ** julio > > -Original Message- > From: Shalin Shekhar Mangar (JIRA) [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 11, 2008 8:43 AM > To: [EMAIL PROTECTED] > Subject: [jira] Updated: (SOLR-469) Data Import RequestHandler > > > [ > > https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugi > n.system.issuetabpanels:all-tabpanel ] > > Shalin Shekhar Mangar updated SOLR-469: > --- > >Attachment: SOLR-469.patch > > A new patch file (SOLR-469.patch) consisting of some important bug fixes > and > minor enhancements. The changes and the corresponding classes are given > below > > *Changes* > * Set fetch size to Integer.MIN_VALUE if batchSize in configuration is -1 > as > per Patrick's suggestion -- JdbcDataSource > * Transformers can add a boost to a document by adding a key/value pair > row.put("$docBoost", 2.0f) from any entity -- DocBuilder,SolrWriter and > DataImportHandler > * Fixes for infinite loop in SqlEntityProcessor when delta query fails for > some reason and NullPointerException is thrown in EntityProcessorBase -- > EntityProcessorBase > * Fix for NullPointerException in TemplateTransformer and corresponding > test > -- TemplateTransformer, TestTemplateTransformer > * Enhancement for specifying table.column syntax for pk attribute in entity > as per issue reported by Chris Moser and Olivier Poitrey -- > SqlEntityProcessor,TestSqlEntityProcessor2 > * Fix for NullPointerException in XPathRecordReader when attribute > specified > through xpath is null -- XPathRecordReader, TestXPathRecordReader > * Enhancement to DataSource interface to provide a close method -- > DataSource, FileDataSource, HttpDataSource, MockDataSource > * Context interface has a new method getDataSource(String entityName) for > getting a new DataSource instance for the given entity -- Context, > ContextImpl, DataImporter, DocBuilder > * FileListEntityProcessor implements olderThan and newerThan filtering > parameters -- FileListEntityProcessor, TestFileListEntityProcessor > * Debug Mode can be disabled from solrconfig.xml by enableDebug=false -- > DataImporter, DataImportHandler > * Running statistics are exposed on the Solr Statistics page in addition to > cumulative statictics -- DataImportHandler, DocBuilder > > > Data Import RequestHandler > > -- > > > > Key: SOLR-469 > > URL: https://issues.apache.org/jira/browse/SOLR-469 > > Project: Solr > > Issue Type: New Feature > > Components: update > >Affects Versions: 1.3 > >Reporter: Noble Paul > >Assignee: Grant Ingersoll > > Fix For: 1.3 > > > > Attachments: SOLR-469-contrib.patch, SOLR-469.patch, > > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch > > > > > > We need a RequestHandler Which can import data from a DB or other > dataSources into the Solr index .Think of it as an advanced form of > SqlUpload Plugin (SOLR-103). > > The way it works is as follows. > > * Provide a configuration file (xml) to the Handler which takes in > the > necessary SQL queries and mappings to a solr schema > > - It also takes in a properties file for the data source > configuraution > > * Given the configuration it can also generate the solr schema.xml > > * It is registered as a RequestHandler which can take two commands > do-full-import, do-delta-import > > - do-full-import - dumps all the data from the Database into > the index (based on the SQL query in configuration) > > - do-delta-import - dumps all the data that has changed since > last import. (We assume a modified-timestamp column in tables) > > * It provides a admin page > > - where we can schedule it to be run automatically at regular > intervals > > - It shows the status of the Handler (idle, full-import, > > delta-import) > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- Regards, Shalin Shekhar Mangar.
Re: range query highlighting
It's a known deficiency... ConstantScoreRangeQuery and ConstantScorePrefixQuery which Solr uses rewrite to a ConstantScoreQuery and don't expose the terms they match. Performance-wise it seems like a bad idea if the number of terms matched is large (esp when used in a MultiSearcher or later in global-idf for distributed search). -Yonik On Wed, Jun 11, 2008 at 11:09 AM, Stefan Oestreicher <[EMAIL PROTECTED]> wrote: > Hi, > > I'm using solr built from trunk and highlighting for range queries doesn't > work. > If I search for "2008" everything works as expected but if I search for > "[2000 TO 2008]" nothing gets highlighted. > The field I'm searching on is a TextField and I've confirmed that the query > and index analyzers are working as expected. > I didn't find anything in the issue tracker about this. > > Any ideas? > > TIA, > > Stefan Oestreicher > > -- > Dr. Maté GmbH > Stefan Oestreicher / Entwicklung > [EMAIL PROTECTED] > http://www.netdoktor.at > Tel Buero: + 43 1 405 55 75 24 > Fax Buero: + 43 1 405 55 75 55 > Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6 > >
CSV output
Hi, Does SOLR have .csv output? I can find references to .csv input, but not output. Thank you, Marshall
Re: CSV output
Hi Marshall, I don't think there is a CSV Writer, but here are some pointers for writing one: $ ff \*Writer\*java | grep -v Test | grep request ./src/java/org/apache/solr/request/PHPResponseWriter.java ./src/java/org/apache/solr/request/XSLTResponseWriter.java ./src/java/org/apache/solr/request/JSONResponseWriter.java ./src/java/org/apache/solr/request/PythonResponseWriter.java ./src/java/org/apache/solr/request/RawResponseWriter.java ./src/java/org/apache/solr/request/QueryResponseWriter.java ./src/java/org/apache/solr/request/PHPSerializedResponseWriter.java ./src/java/org/apache/solr/request/BinaryResponseWriter.java ./src/java/org/apache/solr/request/RubyResponseWriter.java ./src/java/org/apache/solr/request/TextResponseWriter.java ./src/java/org/apache/solr/request/XMLWriter.java ./src/java/org/apache/solr/request/BinaryQueryResponseWriter.java ./src/java/org/apache/solr/request/XMLResponseWriter.java Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Marshall Weir <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, June 11, 2008 12:52:50 PM > Subject: CSV output > > Hi, > > Does SOLR have .csv output? I can find references to .csv input, but > not output. > > Thank you, > Marshall
Re: CSV output
I recommend using the OpenCSV package. Works fine, Apache 2.0 license. http://opencsv.sourceforge.net/ wunder On 6/11/08 10:00 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Hi Marshall, > > I don't think there is a CSV Writer, but here are some pointers for writing > one: > > $ ff \*Writer\*java | grep -v Test | grep request > ./src/java/org/apache/solr/request/PHPResponseWriter.java > ./src/java/org/apache/solr/request/XSLTResponseWriter.java > ./src/java/org/apache/solr/request/JSONResponseWriter.java > ./src/java/org/apache/solr/request/PythonResponseWriter.java > ./src/java/org/apache/solr/request/RawResponseWriter.java > ./src/java/org/apache/solr/request/QueryResponseWriter.java > ./src/java/org/apache/solr/request/PHPSerializedResponseWriter.java > ./src/java/org/apache/solr/request/BinaryResponseWriter.java > ./src/java/org/apache/solr/request/RubyResponseWriter.java > ./src/java/org/apache/solr/request/TextResponseWriter.java > ./src/java/org/apache/solr/request/XMLWriter.java > ./src/java/org/apache/solr/request/BinaryQueryResponseWriter.java > ./src/java/org/apache/solr/request/XMLResponseWriter.java > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > - Original Message >> From: Marshall Weir <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Wednesday, June 11, 2008 12:52:50 PM >> Subject: CSV output >> >> Hi, >> >> Does SOLR have .csv output? I can find references to .csv input, but >> not output. >> >> Thank you, >> Marshall >
Question about fieldNorm
Hi, I've just changed the stemming algorithm slightly and am running a few tests against the old stemmer versus the new stemmer. I did a query for 'hanger' and using the old stemmer I get the following scoring for a document with the title: Converter Hanger Assembly Replacement 6.4242806 = (MATCH) sum of: 2.5697122 = (MATCH) max of: 0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.1963516 = queryWeight(markup_t:hanger), product of: 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) 3.8545685 = (MATCH) max of: 0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.0981758 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.8320503 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) Using the new stemmer I get: 5.621245 = (MATCH) sum of: 2.248498 = (MATCH) max of: 0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.19635157 = queryWeight(markup_t:hanger), product of: 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454) 3.372747 = (MATCH) max of: 0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.09817579 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.83205026 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454) The thing that is perplexing is that the fieldNorm for the title_t field is different in each of the explanations, ie: the fieldNorm using the old stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the new stemmer 0.4375 = fieldNorm(field=title_t, doc=3454). I ran the title through both stemmers and get the same number of tokens produced. I do no index time boosting on the title_t field. I am using DefaultSimilarity in both instances. So I figured the calculated fieldNorm would be: field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5 I wouldn't have thought that changing the stemmer would have any impact on the fieldNorm in this case. Any insight? Please kick me over to the lucene list if you feel this isn't appropriate
Searching for words with accented characters.
We are using Solr as the search engine for our public access library catalog. In testing I did a search for a French movie that I know is in the catalog named: "Kirikou et la sorcière" and nothing was returned. If I search for just the work "Kirikou" several results are returned, and the problem becomes apparent. The records contain "Kirikou et la sorcie?re" where the accent is a unicode combining character following the "e". After some research into Unicode normalization, I found and installed a Unicode normalization filter that is set to convert letters followed by combining codes into the precomposed form. I also installed a solr.ISOLatin1AccentFilterFactory that will then convert these precomposed forms into the latin equivalent without any accent. The following is the fieldType definition taken from the schema.xml file: words="stopwords.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> protected="protwords.txt"/> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> words="stopwords.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> protected="protwords.txt"/> So it seems like this should work. However again searching for "Kirikou et la sorcière" or "sorcière" or "sorcie?re" or just "sorciere" doesn't return the docment in question. I've tried looking at the results from solr/admin/analysis.jsp entering in text from the record for the Field value (Index) and entering in sorciere in the Field value (Query) and I get the follow results, which seems to indicate that there should be a match between the stemmed entry "sorcier" in the record and the stemmed word "sorcier" from the query. So clearly I am either doing something wrong or misinterpreting the analyzers, but I am at a loss as to how to figure out what is wrong. Any suggestions? org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 term text Kirikou et la sorcie?re France 3 Cinema / RTBF (Te?le?vision belge). Grand Prix du festival d'Annecy 1999 France French VHS VIDEO .VHS10969 1 vide?ocassette (1h10 min.) (VHS) Ocelot, Michel term type word word word word word word word word word word word word word word word word word word word word word word word word word word word word word source start,end 0,7 8,10 11,13 14,23 25,31 32,33 34,40 41,42 43,47 48,61 62,69 72,77 78,82 83,85 86,94 95,103 104,108 110,116 117,123 124,127 129,134 135,144 147,148 149,163 164,169 170,175 176,181 183,190 191,197 schema.UnicodeNormalizationFilterFactory {} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 term text (Kirikou,0,7) (et,8,10) (la,11,13) (sorcière,14,23) (France,25,31) (3,32,33) (Cinema,34,40) (/,41,42) (RTBF,43,47) ((Télévision,48,61) (belge).,62,69) (Grand,72,77) (Prix,78,82) (du,83,85) (festival,86,94) (d'Annecy,95,103) (1999,104,108) (France,110,116) (French,117,123) (VHS,124,127) (VIDEO,129,134) (.VHS10969,135,144) (1,147,148) (vidéocassette,149,163) ((1h10,164,169) (min.),170,175) ((VHS),176,181) (Ocelot,,183,190) (Michel,191,197) term type word word word word word word word word word word word word word word word word word word word word word word word word word word word word word source start,end 0,7 8,10 11,13 14,23 25,31 32,33 34,40 41,42 43,47 48,61 62,69 72,77 78,82 83,85 86,94 95,103 104,108 110,116 117,123 124,127 129,134 135,144 147,148 149,163 164,169 170,175 176,181 183,190 191,197 org.apache.solr.analysis.ISOLatin1AccentFilterFactory {} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 term text (Kirikou,0,7) (et,8,10) (la,11,13) (sorciere,14,23) (France,25,31) (3,32,33) (Cinema,34,40) (/,41,42) (RTBF,43,47) ((Television,48,61) (belge).,62,69) (Grand,72,77) (Prix,78,82) (du,83,85) (festival,86,94) (d'Annecy,95,103) (1999,104,108) (France,110,116) (French,117,123) (VHS,124,127) (VIDEO,129,134) (.VHS10969,135,144) (1,147,148) (videocassette,149,163) ((1h10,164,169) (min.),170,175) ((VHS),176,181) (Ocelot,,183,190) (Michel,191,197) term type word word word word word word word word word word word word word word word word word word word word word word word word word word word word word source start,end 0,7 8,10 11,13 14,23 25,31 32,33 34,40 41,42 43,47 48,61 62,69 72,77 78,82 83,85 86,94
Re: CSV output
When I was asked for something similar I quickly cobbled together a stylesheet (I'm no xsl expert so it's probably pretty bad). Invoked like this: http://localhost:8982/solr/select?q=testing&fl=id,title_t,score&&wt=xslt&tr=csv.xsl&rows=10 YMMV, but feel free to use it if it helps, I've attached it. Brendan On Jun 11, 2008, at 1:05 PM, Walter Underwood wrote: I recommend using the OpenCSV package. Works fine, Apache 2.0 license. http://opencsv.sourceforge.net/ wunder On 6/11/08 10:00 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: Hi Marshall, I don't think there is a CSV Writer, but here are some pointers for writing one: $ ff \*Writer\*java | grep -v Test | grep request ./src/java/org/apache/solr/request/PHPResponseWriter.java ./src/java/org/apache/solr/request/XSLTResponseWriter.java ./src/java/org/apache/solr/request/JSONResponseWriter.java ./src/java/org/apache/solr/request/PythonResponseWriter.java ./src/java/org/apache/solr/request/RawResponseWriter.java ./src/java/org/apache/solr/request/QueryResponseWriter.java ./src/java/org/apache/solr/request/PHPSerializedResponseWriter.java ./src/java/org/apache/solr/request/BinaryResponseWriter.java ./src/java/org/apache/solr/request/RubyResponseWriter.java ./src/java/org/apache/solr/request/TextResponseWriter.java ./src/java/org/apache/solr/request/XMLWriter.java ./src/java/org/apache/solr/request/BinaryQueryResponseWriter.java ./src/java/org/apache/solr/request/XMLResponseWriter.java Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Marshall Weir <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, June 11, 2008 12:52:50 PM Subject: CSV output Hi, Does SOLR have .csv output? I can find references to .csv input, but not output. Thank you, Marshall
Re: Question about fieldNorm
That is strange... did you re-index or change the index? If so, you might want to verify that docid=3454 still corresponds to the same document you queried earlier. -Yonik On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: > I've just changed the stemming algorithm slightly and am running a few tests > against the old stemmer versus the new stemmer. I did a query for 'hanger' > and using the old stemmer I get the following scoring for a document with > the title: Converter Hanger Assembly Replacement > > 6.4242806 = (MATCH) sum of: > 2.5697122 = (MATCH) max of: >0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of: > 0.1963516 = queryWeight(markup_t:hanger), product of: >6.5593724 = idf(docFreq=6375, numDocs=1655591) >0.02993451 = queryNorm > 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >1.7320508 = tf(termFreq(markup_t:hanger)=3) >6.5593724 = idf(docFreq=6375, numDocs=1655591) >0.109375 = fieldNorm(field=markup_t, doc=3454) >2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: > 0.5547002 = queryWeight(title_t:hanger^2.0), product of: >2.0 = boost >9.265229 = idf(docFreq=425, numDocs=1655591) >0.02993451 = queryNorm > 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >1.0 = tf(termFreq(title_t:hanger)=1) >9.265229 = idf(docFreq=425, numDocs=1655591) >0.5 = fieldNorm(field=title_t, doc=3454) > 3.8545685 = (MATCH) max of: >0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: > 0.0981758 = queryWeight(markup_t:hanger^0.5), product of: >0.5 = boost >6.5593724 = idf(docFreq=6375, numDocs=1655591) >0.02993451 = queryNorm > 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >1.7320508 = tf(termFreq(markup_t:hanger)=3) >6.5593724 = idf(docFreq=6375, numDocs=1655591) >0.109375 = fieldNorm(field=markup_t, doc=3454) >3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: > 0.8320503 = queryWeight(title_t:hanger^3.0), product of: >3.0 = boost >9.265229 = idf(docFreq=425, numDocs=1655591) >0.02993451 = queryNorm > 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >1.0 = tf(termFreq(title_t:hanger)=1) >9.265229 = idf(docFreq=425, numDocs=1655591) >0.5 = fieldNorm(field=title_t, doc=3454) > > Using the new stemmer I get: > > 5.621245 = (MATCH) sum of: > 2.248498 = (MATCH) max of: >0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of: > 0.19635157 = queryWeight(markup_t:hanger), product of: >6.559371 = idf(docFreq=6375, numDocs=1655589) >0.029934512 = queryNorm > 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >1.7320508 = tf(termFreq(markup_t:hanger)=3) >6.559371 = idf(docFreq=6375, numDocs=1655589) >0.109375 = fieldNorm(field=markup_t, doc=3454) >2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: > 0.5547002 = queryWeight(title_t:hanger^2.0), product of: >2.0 = boost >9.265228 = idf(docFreq=425, numDocs=1655589) >0.029934512 = queryNorm > 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >1.0 = tf(termFreq(title_t:hanger)=1) >9.265228 = idf(docFreq=425, numDocs=1655589) >0.4375 = fieldNorm(field=title_t, doc=3454) > 3.372747 = (MATCH) max of: >0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: > 0.09817579 = queryWeight(markup_t:hanger^0.5), product of: >0.5 = boost >6.559371 = idf(docFreq=6375, numDocs=1655589) >0.029934512 = queryNorm > 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >1.7320508 = tf(termFreq(markup_t:hanger)=3) >6.559371 = idf(docFreq=6375, numDocs=1655589) >0.109375 = fieldNorm(field=markup_t, doc=3454) >3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: > 0.83205026 = queryWeight(title_t:hanger^3.0), product of: >3.0 = boost >9.265228 = idf(docFreq=425, numDocs=1655589) >0.029934512 = queryNorm > 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >1.0 = tf(termFreq(title_t:hanger)=1) >9.265228 = idf(docFreq=425, numDocs=1655589) >0.4375 = fieldNorm(field=title_t, doc=3454) > > The thing that is perplexing is that the fieldNorm for the title_t field is > different in each of the explanations, ie: the fieldNorm using the old > stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the new stemmer > 0.4375 = fieldNorm(field=title_t, doc=3454). I ran the title through both > stemmers and get the same number of tokens produced. I do no index time > boosting on the title_t field. I am using DefaultSimilarity in both > i
Re: Question about fieldNorm
Hi Yonik, Yes I did rebuild the index and they are the same document (just verified). The only thing that changed was the stemmer, but that makes no sense to me. Also, if the equation for the fieldNorm is: fieldBoost * lengthNorm = fieldBoost * 1 /sqrt(numTermsForField) Then that would mean numTermsForField would be: 5.22 when the norm is 0.4375. Am I correct about how this is calculated? Thanks again Brendan On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote: That is strange... did you re-index or change the index? If so, you might want to verify that docid=3454 still corresponds to the same document you queried earlier. -Yonik On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: I've just changed the stemming algorithm slightly and am running a few tests against the old stemmer versus the new stemmer. I did a query for 'hanger' and using the old stemmer I get the following scoring for a document with the title: Converter Hanger Assembly Replacement 6.4242806 = (MATCH) sum of: 2.5697122 = (MATCH) max of: 0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.1963516 = queryWeight(markup_t:hanger), product of: 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) 3.8545685 = (MATCH) max of: 0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.0981758 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.8320503 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) Using the new stemmer I get: 5.621245 = (MATCH) sum of: 2.248498 = (MATCH) max of: 0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.19635157 = queryWeight(markup_t:hanger), product of: 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454) 3.372747 = (MATCH) max of: 0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.09817579 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.83205026 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454) The thing that is perplexing is that the fieldNorm for the title_t field is different in each of the explanations, ie: the fieldNorm using the old stemmer is: 0.5 = fieldNo
Re: Ignore fields in XML response
Sure, use the fl parameter to specify the fields that you want (comma-separated) On Wed, Jun 11, 2008 at 11:31 PM, Yves Zoundi <[EMAIL PROTECTED]> wrote: > Hi guys, > > > >Is it possible to remove some fields from the XML response? > I have a field which can contains a huge amount of data and I would like > it to be ignore it in the XML response. Can it be achieved without > writing a custom XMLResponseWriter? > > > > Thanks > > -- Regards, Shalin Shekhar Mangar.
Re: Ignore fields in XML response
Yves - you can control which fields are returned from a search using the fl (field list) parameter. &fl=* provides all fields except score. &fl=id,title,score provides only those selected fields, etc. Erik On Jun 11, 2008, at 2:01 PM, Yves Zoundi wrote: Hi guys, Is it possible to remove some fields from the XML response? I have a field which can contains a huge amount of data and I would like it to be ignore it in the XML response. Can it be achieved without writing a custom XMLResponseWriter? Thanks
Re: Question about fieldNorm
Hi Yonik, I just realized that the stemmer does make a difference because of synonyms. So on indexing using the new stemmer "converter hanger assembly replacement" gets expanded to: "converter hanger assembly assemble replacement" so there are 5 terms which gets a length norm of 0.4472136 instead of 0.5. Still unsure how it gets 0.4375 though as the result for the field norm though unless I have a boost of 0.9783 somewhere there. Brendan On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote: That is strange... did you re-index or change the index? If so, you might want to verify that docid=3454 still corresponds to the same document you queried earlier. -Yonik On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: I've just changed the stemming algorithm slightly and am running a few tests against the old stemmer versus the new stemmer. I did a query for 'hanger' and using the old stemmer I get the following scoring for a document with the title: Converter Hanger Assembly Replacement 6.4242806 = (MATCH) sum of: 2.5697122 = (MATCH) max of: 0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.1963516 = queryWeight(markup_t:hanger), product of: 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) 3.8545685 = (MATCH) max of: 0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.0981758 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.8320503 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) Using the new stemmer I get: 5.621245 = (MATCH) sum of: 2.248498 = (MATCH) max of: 0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.19635157 = queryWeight(markup_t:hanger), product of: 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454) 3.372747 = (MATCH) max of: 0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.09817579 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.83205026 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454) The thing that is perplexing is that the fieldNorm for the title_t field is different in each of the explanations, ie: the fieldNorm using th
RE: Ignore fields in XML response
Thank you guys! -Message d'origine- De : Erik Hatcher [mailto:[EMAIL PROTECTED] Envoyé : 11 juin 2008 14:07 À : solr-user@lucene.apache.org Objet : Re: Ignore fields in XML response Yves - you can control which fields are returned from a search using the fl (field list) parameter. &fl=* provides all fields except score. &fl=id,title,score provides only those selected fields, etc. Erik On Jun 11, 2008, at 2:01 PM, Yves Zoundi wrote: > Hi guys, > > > >Is it possible to remove some fields from the XML response? > I have a field which can contains a huge amount of data and I would > like > it to be ignore it in the XML response. Can it be achieved without > writing a custom XMLResponseWriter? > > > > Thanks >
Re: Question about fieldNorm
Field norms have limited precision (it's encoded as an 8 bit float) so you are probably seeing rounding. -Yonik On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: > Hi Yonik, > > I just realized that the stemmer does make a difference because of synonyms. > So on indexing using the new stemmer "converter hanger assembly replacement" > gets expanded to: "converter hanger assembly assemble replacement" so there > are 5 terms which gets a length norm of 0.4472136 instead of 0.5. Still > unsure how it gets 0.4375 though as the result for the field norm though > unless I have a boost of 0.9783 somewhere there. > > Brendan > > > On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote: > >> That is strange... did you re-index or change the index? If so, you >> might want to verify that docid=3454 still corresponds to the same >> document you queried earlier. >> >> -Yonik >> >> >> On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger >> <[EMAIL PROTECTED]> wrote: >>> >>> I've just changed the stemming algorithm slightly and am running a few >>> tests >>> against the old stemmer versus the new stemmer. I did a query for >>> 'hanger' >>> and using the old stemmer I get the following scoring for a document with >>> the title: Converter Hanger Assembly Replacement >>> >>> 6.4242806 = (MATCH) sum of: >>> 2.5697122 = (MATCH) max of: >>> 0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of: >>>0.1963516 = queryWeight(markup_t:hanger), product of: >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.02993451 = queryNorm >>>1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: >>>0.5547002 = queryWeight(title_t:hanger^2.0), product of: >>> 2.0 = boost >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.02993451 = queryNorm >>>4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >>> 1.0 = tf(termFreq(title_t:hanger)=1) >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.5 = fieldNorm(field=title_t, doc=3454) >>> 3.8545685 = (MATCH) max of: >>> 0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: >>>0.0981758 = queryWeight(markup_t:hanger^0.5), product of: >>> 0.5 = boost >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.02993451 = queryNorm >>>1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: >>>0.8320503 = queryWeight(title_t:hanger^3.0), product of: >>> 3.0 = boost >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.02993451 = queryNorm >>>4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >>> 1.0 = tf(termFreq(title_t:hanger)=1) >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.5 = fieldNorm(field=title_t, doc=3454) >>> >>> Using the new stemmer I get: >>> >>> 5.621245 = (MATCH) sum of: >>> 2.248498 = (MATCH) max of: >>> 0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of: >>>0.19635157 = queryWeight(markup_t:hanger), product of: >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.029934512 = queryNorm >>>1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: >>>0.5547002 = queryWeight(title_t:hanger^2.0), product of: >>> 2.0 = boost >>> 9.265228 = idf(docFreq=425, numDocs=1655589) >>> 0.029934512 = queryNorm >>>4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >>> 1.0 = tf(termFreq(title_t:hanger)=1) >>> 9.265228 = idf(docFreq=425, numDocs=1655589) >>> 0.4375 = fieldNorm(field=title_t, doc=3454) >>> 3.372747 = (MATCH) max of: >>> 0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: >>>0.09817579 = queryWeight(markup_t:hanger^0.5), product of: >>> 0.5 = boost >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.029934512 = queryNorm >>>1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: >>>0.83205026 = queryWeight(title_t:hanger^3.0), product of: >>> 3.0 = boost
Re: Question about fieldNorm
Thanks so much, that explains it. Brendan On Jun 11, 2008, at 4:00 PM, Yonik Seeley wrote: Field norms have limited precision (it's encoded as an 8 bit float) so you are probably seeing rounding. -Yonik On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: Hi Yonik, I just realized that the stemmer does make a difference because of synonyms. So on indexing using the new stemmer "converter hanger assembly replacement" gets expanded to: "converter hanger assembly assemble replacement" so there are 5 terms which gets a length norm of 0.4472136 instead of 0.5. Still unsure how it gets 0.4375 though as the result for the field norm though unless I have a boost of 0.9783 somewhere there. Brendan On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote: That is strange... did you re-index or change the index? If so, you might want to verify that docid=3454 still corresponds to the same document you queried earlier. -Yonik On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: I've just changed the stemming algorithm slightly and am running a few tests against the old stemmer versus the new stemmer. I did a query for 'hanger' and using the old stemmer I get the following scoring for a document with the title: Converter Hanger Assembly Replacement 6.4242806 = (MATCH) sum of: 2.5697122 = (MATCH) max of: 0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.1963516 = queryWeight(markup_t:hanger), product of: 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) 3.8545685 = (MATCH) max of: 0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.0981758 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.02993451 = queryNorm 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.5593724 = idf(docFreq=6375, numDocs=1655591) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.8320503 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265229 = idf(docFreq=425, numDocs=1655591) 0.02993451 = queryNorm 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265229 = idf(docFreq=425, numDocs=1655591) 0.5 = fieldNorm(field=title_t, doc=3454) Using the new stemmer I get: 5.621245 = (MATCH) sum of: 2.248498 = (MATCH) max of: 0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of: 0.19635157 = queryWeight(markup_t:hanger), product of: 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: 0.5547002 = queryWeight(title_t:hanger^2.0), product of: 2.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454) 3.372747 = (MATCH) max of: 0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: 0.09817579 = queryWeight(markup_t:hanger^0.5), product of: 0.5 = boost 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.029934512 = queryNorm 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: 1.7320508 = tf(termFreq(markup_t:hanger)=3) 6.559371 = idf(docFreq=6375, numDocs=1655589) 0.109375 = fieldNorm(field=markup_t, doc=3454) 3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: 0.83205026 = queryWeight(title_t:hanger^3.0), product of: 3.0 = boost 9.265228 = idf(docFreq=425, numDocs=1655589) 0.029934512 = queryNorm 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: 1.0 = tf(termFreq(title_t:hanger)=1) 9.265228 = idf(docFreq=425, numDocs=1655589) 0.4375 = fieldNorm(field=title_t, doc=3454
Re: Searching for words with accented characters.
Hi Robert, Did you rebuild the index after changing your config? The index time analyzer is only applied when a document is indexed, changing it has no effect on already indexed documents. Tom Robert Haschart wrote: > > We are using Solr as the search engine for our public access library > catalog. In testing I did a search for a French movie that I know is in > the catalog named: "Kirikou et la sorcière" and nothing was returned. > If I search for just the work "Kirikou" several results are returned, > and the problem becomes apparent. The records contain "Kirikou et la > sorcie?re" where the accent is a unicode combining character following > the "e". > > After some research into Unicode normalization, I found and installed a > Unicode normalization filter that is set to convert letters followed by > combining codes into the precomposed form. I also installed a > solr.ISOLatin1AccentFilterFactory that will then convert these > precomposed forms into the latin equivalent without any accent. The > following is the fieldType definition taken from the schema.xml file: > > positionIncrementGap="100"> > > > > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > protected="protwords.txt"/> > > > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > protected="protwords.txt"/> > > > > > So it seems like this should work. > However again searching for "Kirikou et la sorcière" or "sorcière" or > "sorcie?re" or just "sorciere" doesn't return the docment in question. > > I've tried looking at the results from solr/admin/analysis.jsp entering > in text from the record for the Field value (Index) and entering in > sorciere in the Field value (Query) and I get the follow results, which > seems to indicate that there should be a match between the stemmed entry > "sorcier" in the record and the stemmed word "sorcier" from the query. > > So clearly I am either doing something wrong or misinterpreting the > analyzers, but I am at a loss as to how to figure out what is wrong. > Any suggestions? > > > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > > term position 1 2 3 4 5 6 7 > 8 9 10 11 12 13 14 15 16 > 1718 19 20 21 22 23 24 25 26 > 27 28 29 > term text Kirikou et la sorcie?re France 3 > Cinema / RTBF > (Te?le?vision belge). Grand Prixdu festival > d'Annecy1999 > FranceFrench VHS VIDEO .VHS10969 1 vide?ocassette > (1h10 min.) > (VHS) Ocelot, Michel > term type wordwordwordwordwordwordwordword > word > word wordwordwordwordwordwordwordwordword > wordword > word wordwordwordwordwordwordword > source start,end 0,7 8,1011,13 14,23 25,31 32,33 34,40 > 41,42 > 43,47 48,61 62,69 72,77 78,82 83,85 86,94 95,103 104,108 > 110,116 117,123 124,127 129,134 135,144 > 147,148 149,163 164,169 > 170,175 176,181 183,190 191,197 > > > schema.UnicodeNormalizationFilterFactory {} > > term position 1 2 3 4 5 6 7 > 8 9 10 11 12 13 14 15 16 > 1718 19 20 21 22 23 24 25 26 > 27 28 29 > term text (Kirikou,0,7) (et,8,10) (la,11,13) > (sorcière,14,23) > (France,25,31)(3,32,33) (Cinema,34,40) (/,41,42) > (RTBF,43,47) > ((Télévision,48,61) (belge).,62,69) (Grand,72,77) (Prix,78,82) > (du,83,85)(festival,86,94)(d'Annecy,95,103) (1999,104,108) > (France,110,116) (French,117,123)(VHS,124,127) (VIDEO,129,134) > (.VHS10969,135,144) (1,147,148) (vidéocassette,149,163) > ((1h10,164,169) (min.),170,175) ((VHS),176,181) > (Ocelot,,183,190) > (Michel,191,197) > term type wordwordwordwordwordwordwordword > word > word wordwordwordwordwordwordwordwordword > wordword > word wordwordwordwordwordwordword > source start,end 0,7 8,1011,13 14,23 25,31 32,33 34,40 > 41,42 > 4
synonym token types and ranking
Hi, I've noticed that currently the SynonymFilter replaces the original token with the configured tokens list (which includes the original matched token) and each one of these tokens is of type "word". Wouldn't it make more sense to only mark the original token as type "word" and the the other tokens as "synonym" types? In addition, once payloads are integrated with Solr, it would be nice if it would be possible to configure a payload for synonyms. One of the requirements we're currently facing in our project is that matches on synonyms should weigh less than exact matches. cheers, Uri
Strategy for presenting fresh data
Hi, The product I'm working on requires new documents to be searchable very quickly (inside 60 seconds is my goal). The corpus is also going to grow very large, although it is perfectly partitionable by user. The approach I tried first was to have write-only masters and read- only slaves with data being replicated from one to another postCommit and postOptimise. This allowed new documents to be visible inside 5 minutes or so (until the indexes got so large that re-opening IndexSearchers took for ever, that is...), but still not good enough. Now, I am considering cutting out the commit / replicate / re-open cycle by augmenting Solr with a RAMDirectory per core. Your thoughts on the following approach would be much appreciated: Searches would be forked to both the RAMDirectory and FSDirectory, while writes would go to the RAMDirectory only. The RAMDirectory would be flushed back to the FSDirectory regularly, using IndexWriter.addIndexes (or addIndexesNoOptimise). Effectively, I'd be creating a searchable queue in front of a regularly committed and optimised conventional index. As this seems to be a useful pattern (and is mentioned tangentially in Lucene in Action), is there already support for this in Lucene? Thanks, James
Re: synonym token types and ranking
Hi Uri, Yes, I think that would make sense (word vs. synonym token types). Custom boosting/weighting of original token vs. synonym token(s) also makes sense. Is this something you can provide a patch for? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Uri Boness <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, June 11, 2008 8:56:02 PM > Subject: synonym token types and ranking > > Hi, > > I've noticed that currently the SynonymFilter replaces the original > token with the configured tokens list (which includes the original > matched token) and each one of these tokens is of type "word". Wouldn't > it make more sense to only mark the original token as type "word" and > the the other tokens as "synonym" types? In addition, once payloads are > integrated with Solr, it would be nice if it would be possible to > configure a payload for synonyms. One of the requirements we're > currently facing in our project is that matches on synonyms should weigh > less than exact matches. > > cheers, > Uri
Re: Strategy for presenting fresh data
Hi James, Yes, this makes sense. I've recommended doing the same to others before. It would be good to have this be a part of Solr. There is one person (named Jason) working on adding more real-time search support to both Lucene and Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: James Brady <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, June 11, 2008 11:24:38 PM > Subject: Strategy for presenting fresh data > > Hi, > The product I'm working on requires new documents to be searchable > very quickly (inside 60 seconds is my goal). The corpus is also going > to grow very large, although it is perfectly partitionable by user. > > The approach I tried first was to have write-only masters and read- > only slaves with data being replicated from one to another postCommit > and postOptimise. > > This allowed new documents to be visible inside 5 minutes or so (until > the indexes got so large that re-opening IndexSearchers took for ever, > that is...), but still not good enough. > > Now, I am considering cutting out the commit / replicate / re-open > cycle by augmenting Solr with a RAMDirectory per core. > > Your thoughts on the following approach would be much appreciated: > > Searches would be forked to both the RAMDirectory and FSDirectory, > while writes would go to the RAMDirectory only. The RAMDirectory would > be flushed back to the FSDirectory regularly, using > IndexWriter.addIndexes (or addIndexesNoOptimise). > > Effectively, I'd be creating a searchable queue in front of a > regularly committed and optimised conventional index. > > As this seems to be a useful pattern (and is mentioned tangentially in > Lucene in Action), is there already support for this in Lucene? > > Thanks, > James
Re: searching only within allowed documents
It depends on your query. The second query is better if you know that fieldb:bar filtered query will be reused often since it will be cached separately from the query. The first query occuppies one cache entry while the second one occuppies two cache entries, one in queryCache and one in filteredCache. Therefore, if you're not going to reuse fieldb:bar, the second query is better. On Wed, Jun 11, 2008 at 10:53 PM, Geoffrey Young <[EMAIL PROTECTED]> wrote: > > > Solr allows you to specify filters in separate parameters that are >> applied to the main query, but cached separately. >> >> q=the user query&fq=folder:f13&fq=folder:f24 >> > > I've been wanting more explanation around this for a while, so maybe now is > a good time to ask :) > > the "cached separately" verbiage here is the same as in the twiki, but I > don't really understand what it means. more precisely, I'm wondering what > the real performance, caching, etc differences are between > > q=fielda:foo+fieldb:bar&mm=100% > > and > > q=fielda:foo&fq=fieldb:bar > > my situation is similar to the original poster's in that documents matching > fielda is very large and common (say theaters across the world) while fieldb > would narrow it considerably (one by country, then one by zipcode, etc). > > thanks > > --Geoff > > > -- Regards, Cuong Hoang
Re: searching only within allowed documents
Just correct myself, in the last setence, the first query is better if fieldb:bar isn't reused often On Thu, Jun 12, 2008 at 2:02 PM, climbingrose <[EMAIL PROTECTED]> wrote: > It depends on your query. The second query is better if you know that > fieldb:bar filtered query will be reused often since it will be cached > separately from the query. The first query occuppies one cache entry while > the second one occuppies two cache entries, one in queryCache and one in > filteredCache. Therefore, if you're not going to reuse fieldb:bar, the > second query is better. > > > On Wed, Jun 11, 2008 at 10:53 PM, Geoffrey Young < > [EMAIL PROTECTED]> wrote: > >> >> >> Solr allows you to specify filters in separate parameters that are >>> applied to the main query, but cached separately. >>> >>> q=the user query&fq=folder:f13&fq=folder:f24 >>> >> >> I've been wanting more explanation around this for a while, so maybe now >> is a good time to ask :) >> >> the "cached separately" verbiage here is the same as in the twiki, but I >> don't really understand what it means. more precisely, I'm wondering what >> the real performance, caching, etc differences are between >> >> q=fielda:foo+fieldb:bar&mm=100% >> >> and >> >> q=fielda:foo&fq=fieldb:bar >> >> my situation is similar to the original poster's in that documents >> matching fielda is very large and common (say theaters across the world) >> while fieldb would narrow it considerably (one by country, then one by >> zipcode, etc). >> >> thanks >> >> --Geoff >> >> >> > > > -- > Regards, > > Cuong Hoang -- Regards, Cuong Hoang
Re: Strategy for presenting fresh data
Hi, I am new to Solr Lucene I have only one defaule core i am working on creating multiple core. Can you help me in this matter. with regards Rohit Arora --- On Thu, 6/12/08, James Brady <[EMAIL PROTECTED]> wrote: From: James Brady <[EMAIL PROTECTED]> Subject: Strategy for presenting fresh data To: solr-user@lucene.apache.org Date: Thursday, June 12, 2008, 8:54 AM Hi, The product I'm working on requires new documents to be searchable very quickly (inside 60 seconds is my goal). The corpus is also going to grow very large, although it is perfectly partitionable by user. The approach I tried first was to have write-only masters and read- only slaves with data being replicated from one to another postCommit and postOptimise. This allowed new documents to be visible inside 5 minutes or so (until the indexes got so large that re-opening IndexSearchers took for ever, that is...), but still not good enough. Now, I am considering cutting out the commit / replicate / re-open cycle by augmenting Solr with a RAMDirectory per core. Your thoughts on the following approach would be much appreciated: Searches would be forked to both the RAMDirectory and FSDirectory, while writes would go to the RAMDirectory only. The RAMDirectory would be flushed back to the FSDirectory regularly, using IndexWriter.addIndexes (or addIndexesNoOptimise). Effectively, I'd be creating a searchable queue in front of a regularly committed and optimised conventional index. As this seems to be a useful pattern (and is mentioned tangentially in Lucene in Action), is there already support for this in Lucene? Thanks, James
DataImportHandler questions ..
Hi, I'm playing with the Solr Data Import Handler, and everything looks great so far! Hopefully we will be able to replace our homegrown ODBC indexing service [using camping+ferret] with Solr! The wiki page mentions "scheduling full imports and delta imports" but I couldn't find any further details. Is scheduling supported by the current handler, or do I need to use an external trigger ? Also, any idea when the DataImportHandler [SOLR-469] might become part of the nightlies ? I read somewhere that it might happen RSN Thanks, Neville
Re: DataImportHandler questions ..
On Thu, Jun 12, 2008 at 11:01 AM, Neville Burnell <[EMAIL PROTECTED]> wrote: > Hi, > > I'm playing with the Solr Data Import Handler, and everything looks > great so far! thanks > > Hopefully we will be able to replace our homegrown ODBC indexing service > [using camping+ferret] with Solr! > > The wiki page mentions "scheduling full imports and delta imports" but I > couldn't find any further details. Is scheduling supported by the > current handler, or do I need to use an external trigger ? It is planned. But we believe that the scheduling may come in as a separate service which we can piggyback on. There are more components which may need that. For the time being a 'wget' from a cron job is the only option > > Also, any idea when the DataImportHandler [SOLR-469] might become part > of the nightlies ? I read somewhere that it might happen RSN crossed> It is one of the planned features for 1.3 release. Let us see when it actually happens :) > > Thanks, > > Neville > > > > > > -- --Noble Paul