SolrJ API for multi core?
Hi, Is $subject available?? Or do I need to make HTTP Get calls? -- Regards, Tharindu
Re: JVM GC troubles
Hi, I dont run totally OOM (no OOM exceptions in the log) but I constantly garbage collect. While not collecting, SOLR master handels the updates pretty well. Every insert is unique, so I dont have any deletes or optimizes and all queries are handled by the single slave instance. Is there a way to reduce the objects held in the old gen space ? It looks like the JVM is trying to hold as many objects as possible in the cache, to provide fast queries, who are not needed in my situation. Regarding the Jboss ... well as I said, its the minimalistic version of it and we use it due to automation process within our departement. In my test-env I tried it with a plain tomcat 6.x but without any improvements, so the Jboss overhead is minimal to nothing. The JVM parameters I wrote, are the ones I am struggling with at the moment. I was hoping someone will come up with a hint regarding the solarconfig.xml itself. PS: if anyone is questioning the implemented architecture (master -> slave, configs, schema, etc.) ... its our architects fault and I have to operate it ;-) 2010/10/15 Otis Gospodnetic > Hello, > > I hope you are not running JBoss just to run Solr - there are simpler > containers > out there, e.g., Jetty. > Do you OOM? > Do things look better if you replicate less often (e.g. every 5 minutes > instead > of every 60 seconds)? > Do all/some of those -X__ JVM params actually help? > > Otis > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: accid > > To: solr-user@lucene.apache.org > > Sent: Thu, October 14, 2010 1:25:34 PM > > Subject: Re: JVM GC troubles > > > > I forgot a few important details: > > > > solr version = 1.4.1 > > current index size = 50gb > > growth ~600mb / day > > jboss runs with web settings (same as minimal) > > 2010/10/14 > > > > > Hi, > > > > > > as I am new here, I want to say hello and thanks in advance for your > help. > > > > > > > > > HW Setup: > > > > > > 1x SOLR Master - Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb > ram > > > 1x SOLR Slave - Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb > ram > > > > > > SW Setup: > > > > > > Solaris 10 Generic_142901-03 > > > jboss 5.1.0 > > > JDK 1.6 update 18 > > > > > > > > > # Specify the exact Java VM executable to use. > > > # > > > JAVA="/opt/appsrv/java6/bin/amd64/java" > > > > > > # > > > # Specify options to pass to the Java VM. > > > # > > > JAVA_OPTS="-server -Xms6144m -Xmx6144m -Xmn3072m > -XX:ThreadStackSize=1024 > > > -XX:MaxPermSize=512m -Dorg.jboss.resolver.warning=true > > > -Dsun.rmi.dgc.client.gcInterval=360 > > > -Dsun.rmi.dgc.server.gcInterval=360 > -Dnetworkaddress.cache.ttl=1800 > > > -XX:+UseConcMarkSweepGC" > > > > > > > > > SOLR Setup: > > > > > > #) the master has to deal an avg. update rate of 50 updates/s and > peaks of > > > 400 updates/s > > > > > > #) the slave replicates every 60s using the built in solr replication > > > method (NOT rsync) > > > > > > #) the slave querys are ~20/sec > > > > > > > > > #) schema.xml > > > > > > > > > > > required="true"/> > > > > > required="true"/> > > > > > required="true"/> > > > > > required="true"/> > > > > > required="true"/> > > > > > required="true"/> > > > > > > > > > > > > > > > > > > > > > > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > > > > required="true"/> > > > > > default="NOW" multiValued="false"/> > > > > > > > > > #) The solarconfig.xml is attached > > > > > > > > > > > > Both, master & slave suffer from serious performance impacts during > garbage > > > collects > > > > > > > > > I obviously have an GC problem, because ~30min after startup, the Old > space > > > is full and not beeing freed up. > > > > > > Below you find a JMX copy&paste of the Heap AFTER a garbage collect!! > As > > > you can see, even the Eden Space can only free up to 700mb total, > which > > > gives very little time to relax. The system does GC's 90% of the time. > > > > > > > > > > > > > > > Total Memory Pools: 5 > > > > > >Pool: Code Cache (Non-heap memory) > > > > > >Peak Usage : init:4194304, used:7679360, committed:7798784, > > > max:50331648 > > > Current Usage : init:4194304, used:7677312, committed:7798784, > > > max:50331648 > > > > > > > > > |-| committed:7.44Mb > > > > > > > +-+ > > > |/| | max:48Mb > > > > > > > +-+ > > > |-| used:7.32Mb > > > > > > > > >Pool: Par Eden Space (Heap memory) > > > > > >Peak Usage : init:2577006592, used:2577006592, > committed:2577006592, > > > max:2577006592 > > >
How do you programatically create new cores?
Hi everyone, I'm a newbie at this and I can't figure out how to do this after going through http://wiki.apache.org/solr/CoreAdmin? Any sample code would help a lot. Thanks in advance. -- Regards, Tharindu
Re: SOLRJ - Searching text in all fields of a Bean
Ahmet, I got it working to an extent. Now: SolrQuery query = new SolrQuery(); query.setQueryType("dismax"); query.setQuery( "kitten"); query.setParam("qf", "title"); QueryResponse rsp = server.query( query ); List beans = rsp.getBeans(SOLRTitle.class); System.out.println(beans.size()); Iterator it = beans.iterator(); while(it.hasNext()) { SOLRTitle solrTitle = (SOLRTitle)it.next(); System.out.println(solrTitle.id); System.out.println(solrTitle.title); } *This code is able to find the record, and prints the ID. But fails to print the Title.* Whereas: SolrQuery query = new SolrQuery(); query.setQuery( "title:kitten" ); QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); Iterator iter = rsp.getResults().iterator(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); String title = (String) resultDoc.getFieldValue(" title"); String id = (String) resultDoc.getFieldValue("id"); //id is the uniqueKey field System.out.println(id); System.out.println(title); } * This query succeeds!* What am I doing wrong in dismax params? The title field is being fetched as Null. Regards, Subhash Bhushan. On Fri, Oct 8, 2010 at 2:05 PM, Ahmet Arslan wrote: > > I have two fields in the bean class, id and title. > > After adding the bean to SOLR, I want to search for, say > > "kitten", in all > > defined fields in the bean, like this -- query.setQuery( > > "kitten"); -- > > But I get results only when I affix the bean field name > > before the search > > text like this -- query.setQuery( "title:kitten"); -- > > > > Same case even when I use SolrInputDocument, and add these > > fields. > > > > Can we search text in all fields of a bean, without having > > to specify a > > field? > > With dismax, you can query several fields using different boosts. > http://wiki.apache.org/solr/DisMaxQParserPlugin > > > > >
problem on running fullimport
Hi, I am using the full import option with the data-config file as mentioned below on running the full-import option I am getting the error mentioned below.I had already included the dataimport.properties file in my conf file.help me to get the issue resolved - 0 334 - - data-config.xml full-import debug - - - select studentName from test1 - org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select studentName from test1 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:184) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: java.sql.SQLException: Illegal value for setFetchSize(). at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929) at com.mysql.jdbc.StatementImpl.setFetchSize(StatementImpl.java:2496) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:242) ... 33 more 0:0:0.50 idle Configuration Re-loaded sucessfully - 0:0:0.299 1 0 0 0 2010-10-15 16:42:21 Indexing failed. Rolled back all changes. 2010-10-15 16:42:21 - This response format is experimental. It is likely to change in the future. -- Regards Swapnil Dubey
Re: problem on running fullimport
On Fri, Oct 15, 2010 at 7:42 AM, swapnil dubey wrote: > Hi, > > I am using the full import option with the data-config file as mentioned > below > > >url="jdbc:mysql:///xxx" user="xxx" password="xx" /> > > > > > > > > > on running the full-import option I am getting the error mentioned below.I > had already included the dataimport.properties file in my conf file.help me > to get the issue resolved > > > - > > 0 > 334 > > - > > - > > data-config.xml > > > full-import > debug > > - > > - > > - > > select studentName from test1 > - > > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: select studentName from test1 Processing Document # 1 > ... > > -- > Regards > Swapnil Dubey > Swapnil, Everything looks fine, except that in your entity definition you forgot to define which datasource you wish to use. So if you add the 'dataSource="JdbcDataSource"' that should get rid of your exception. As a reminder, the DataImportHandler wiki ( http://wiki.apache.org/solr/DataImportHandler) on Apache's website is very helpful with learning how to use the DIH properly. It has helped me with having a printed copy beside me for easy and quick reference. - Ken
Re: SOLRJ - Searching text in all fields of a Bean
Hi Savvas, Thanks!! Was able to search using directive. I was using the default example schema packaged with solr. I added the following directive for title field and reindexed data: ** Regards, Subhash Bhushan. On Fri, Oct 8, 2010 at 2:09 PM, Savvas-Andreas Moysidis < savvas.andreas.moysi...@googlemail.com> wrote: > Hello, > > What does your schema look like? Have you defined a "catch all" field and > copy every value from all your other fields in it with a > directive? > > Cheers, > -- Savvas > > > On 8 October 2010 08:30, Subhash Bhushan wrote: > >> Hi, >> >> I have two fields in the bean class, id and title. >> After adding the bean to SOLR, I want to search for, say "kitten", in all >> defined fields in the bean, like this -- query.setQuery( "kitten"); -- >> But I get results only when I affix the bean field name before the search >> text like this -- query.setQuery( "title:kitten"); -- >> >> Same case even when I use SolrInputDocument, and add these fields. >> >> Can we search text in all fields of a bean, without having to specify a >> field? >> If we can, what am I missing in my code? >> >> *Code:* >> Bean: >> --- >> public class SOLRTitle { >> @Field >> public String id = ""; >> @Field >> public String title = ""; >> } >> --- >> Indexing function: >> --- >> >> private static void uploadData() { >> >> try { >> ... // Get Titles >>List solrTitles = new >> ArrayList(); >> Iterator it = titles.iterator(); >> while(it.hasNext()) { >> Title title = (Title) it.next(); >> SOLRTitle solrTitle = new SOLRTitle(); >> solrTitle.id = title.getID().toString(); >> solrTitle.title = title.getTitle(); >> solrTitles.add(solrTitle); >> } >> server.addBeans(solrTitles); >> server.commit(); >> } catch (SolrServerException e) { >> e.printStackTrace(); >> } catch (IOException e) { >> e.printStackTrace(); >> } >> } >> --- >> Querying function: >> --- >> >> private static void queryData() { >> >> try { >> SolrQuery query = new SolrQuery(); >> query.setQuery( "kitten"); >> >>QueryResponse rsp = server.query( query ); >>List beans = rsp.getBeans(SOLRTitle.class); >>System.out.println(beans.size()); >>Iterator it = beans.iterator(); >>while(it.hasNext()) { >> SOLRTitle solrTitle = (SOLRTitle)it.next(); >> System.out.println(solrTitle.id); >> System.out.println(solrTitle.title); >>} >> } catch (SolrServerException e) { >> e.printStackTrace(); >> } >> } >> -- >> >> Subhash Bhushan. >> > >
Re: Quick question on indexing an existing index
Why don't you simply index the source content which you used to build index2 into index1, i.e. have your "tool" index to both? You won't save anything on trying to extract that content from an existing index. But of course, you COULD write yourself a tool which extracts all stored fields for all documents in index2, transform it into docs which fit in index1 and then insert them. But how will you support deletes etc? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. okt. 2010, at 17.06, bbarani wrote: > > Hi, > > I have a very simple question about indexing an existing index. > > We have 2 index, index 1 is being maintained by us (it indexes the data from > a database) and we have an index 2 which is maintaing by a tool.. > > Both the schemas are totally different but we are interested to re-index the > index present in index2 in to index1 such that we will be having just one > single index (index 1 ) which will contain the data present in both index. > > We want to re-index the index present in index 2 using the schema presnt for > index 1. Also we are interested in customizing the data (something like > selecting columns / fields from DB using DB import handler). > > Thanks, > BB > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Quick-question-on-indexing-an-existing-index-tp1701663p1701663.html > Sent from the Solr - User mailing list archive at Nabble.com.
Term is duplicated when updating a document
Hi, we are updating our documents (that represent products in our shop) when a dealer modifies them, by calling SolrServer.add(SolrInputDocument) with the updated document. My understanding is, that there is no other way of updating an existing document. However we also use a term query to autocomplete the search field for the user, but each time adocument is updated (added) the term count is incremented. So after starting with a new index the count is e.g. 1, then the document (that contains that term) is updated, and the count is 2, the next update will set this to 3 and so on. One the index is optimized (by calling SolServer.optimize()) the count is correct again. Am I missing something or is this a bug in Solr/Lucene? Thanks in advance Thomas
Exception being thrown indexing a specific pdf document using Solr Cell
I've got an existing Spring Solr SolrJ application that indexes a mixture of documents. It seems to have been working fine now for a couple of weeks but today I've just started getting an exception when processing a certain pdf file. The exception is : ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) at uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308) at uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302) at org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195) at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159) at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141) at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90) at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLo
Re: Term is duplicated when updating a document
Which fields are modified when the document is updated/replaced. Are there any differences in the content of the fields that you are using for the AutoSuggest. Have you changed you schema.xml file recently? If you have, then there may have been changes in the way these fields are analyzed and broken down to terms. This may be a bug if you did not change the field or the schema file but the terms count is changing. On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer wrote: > Hi, > > we are updating our documents (that represent products in our shop) when a > dealer modifies them, by calling > SolrServer.add(SolrInputDocument) with the updated document. > > My understanding is, that there is no other way of updating an existing > document. > > > However we also use a term query to autocomplete the search field for the > user, but each time adocument is updated (added) the term count is > incremented. So after starting with a new index the count is e.g. 1, then > the document (that contains that term) is updated, and the count is 2, the > next update will set this to 3 and so on. > > One the index is optimized (by calling SolServer.optimize()) the count is > correct again. > > Am I missing something or is this a bug in Solr/Lucene? > > Thanks in advance > Thomas > > -- °O° "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Possible to sort by explicit docid order?
Hi, In an online bookstore project I'm working on, most frontend widgets are search driven. Most often they query with some filters and a sort order, such as availabledate desc or simply by score. However, to allow editorial control, some widgets will display a fixed list of books, defined as an ordered list of ISBN numbers inserted by the editor. Based on this we do a Solr search to fetch the data to display: &fq=isbn:(9788200011699 OR 9788200012658 OR ...) It is important to return the results in the same order as the explicitly given list of ISBNs. But I cannot see a way to do that, not even with sort by function. So currently we re-order the result list in the frontend. Would it make sense with an "explicit" sort order, perhaps implemented as a function? &sort=fieldvaluelist(isbn,1000,1,0,$isbnorder) desc, price asc&isbnorder=9788200011699,9788200012658,9788200013839,9788200014140 The function would be defined as fieldvaluelist([,...]) The output of the example above would be: For document with ISBN=9788200011699: 1000 For document with ISBN=9788200012658: 999 For document with ISBN=9788200013839: 998 For document with ISBN not in the list: 0 (fallback - in which case the second sort order would kick in) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
Re: Term is duplicated when updating a document
Thanks for the answer. Which fields are modified when the document is updated/replaced. Only one field was changed, but it was not the one where the auto-suggest term is coming from. Are there any differences in the content of the fields that you are using for the AutoSuggest. No Have you changed you schema.xml file recently? If you have, then there may have been changes in the way these fields are analyzed and broken down to terms. No, I did a complete index rebuild to rule out things like that. Then after startup, did a search, then updated the document and did a search again. Regards Thomas This may be a bug if you did not change the field or the schema file but the terms count is changing. On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer wrote: Hi, we are updating our documents (that represent products in our shop) when a dealer modifies them, by calling SolrServer.add(SolrInputDocument) with the updated document. My understanding is, that there is no other way of updating an existing document. However we also use a term query to autocomplete the search field for the user, but each time adocument is updated (added) the term count is incremented. So after starting with a new index the count is e.g. 1, then the document (that contains that term) is updated, and the count is 2, the next update will set this to 3 and so on. One the index is optimized (by calling SolServer.optimize()) the count is correct again. Am I missing something or is this a bug in Solr/Lucene? Thanks in advance Thomas
Re: searching while importing
On Thu, Oct 14, 2010 at 4:08 AM, Shawn Heisey wrote: > If you are using the DataImportHandler, you will not be able to search new > data until the full-import or delta-import is complete and the update is > committed. When I do a full reindex, it takes about 5 hours, and until it > is finished, I cannot search it. > > I have not tried to issue a manual commit in the middle of an import to see > whether that makes data inserted up to that point searchable, but I would > not expect that to work. [...] Just as a data point, we have done this, and yes it is possible to do a commit in the middle of an import, and have the documents that have already been indexed be available for search. Regards, Gora
filter query from external list of Solr unique IDs
At the Lucene Revolution conference I asked about efficiently building a filter query from an external list of Solr unique ids. Some use cases I can think of are: 1) personal sub-collections (in our case a user can create a small subset of our 6.5 million doc collection and then run filter queries against it) 2) tagging documents 3) access control lists 4) anything that needs complex relational joins 5) a sort of alternative to incremental field updating (i.e. update in an external database or kv store) 6) Grant's clustering cluster points and similar apps. Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be any work on it yet. Hoss mentioned a couple of ideas: 1) sub-classing query parser 2) Having the app query a database and somehow passing something to Solr or lucene for the filter query Can Hoss or someone else point me to more detailed information on what might be involved in the two ideas listed above? Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids needed to implement this or is that a separate issue? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search
RE: filter query from external list of Solr unique IDs
Definitely interested in this. The naive obvious approach would be just putting all the ID's in the query. Like fq=(id:1 OR id:2 OR). Or making it another clause in the 'q'. Can you outline what's wrong with this approach, to make it more clear what's needed in a solution? From: Burton-West, Tom [tburt...@umich.edu] Sent: Friday, October 15, 2010 11:49 AM To: solr-user@lucene.apache.org Subject: filter query from external list of Solr unique IDs At the Lucene Revolution conference I asked about efficiently building a filter query from an external list of Solr unique ids. Some use cases I can think of are: 1) personal sub-collections (in our case a user can create a small subset of our 6.5 million doc collection and then run filter queries against it) 2) tagging documents 3) access control lists 4) anything that needs complex relational joins 5) a sort of alternative to incremental field updating (i.e. update in an external database or kv store) 6) Grant's clustering cluster points and similar apps. Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be any work on it yet. Hoss mentioned a couple of ideas: 1) sub-classing query parser 2) Having the app query a database and somehow passing something to Solr or lucene for the filter query Can Hoss or someone else point me to more detailed information on what might be involved in the two ideas listed above? Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids needed to implement this or is that a separate issue? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search
Re: filter query from external list of Solr unique IDs
On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom wrote: > At the Lucene Revolution conference I asked about efficiently building a > filter query from an external list of Solr unique ids. Yeah, I've thought about a special query parser and query to deal with this (relatively) efficiently, both from a query perspective and a memory perspective. Should be pretty quick to throw together: - comma separated list of terms (unique ids are a special case of this) - in the query, store as a single byte array for efficiency - sort the ids if they aren't already sorted - do lookups with a term enumerator and skip weighting or anything else like that - configurable caching... may, or may not want to cache this big query That's only part of the stuff you mention, but seems like it would be useful to a number of people. -Yonik http://www.lucidimagination.com
Re: Sorting on arbitary 'custom' fields
On Mon, Oct 11, 2010 at 07:17:43PM +0100, me said: > It was just an idea though and I was hoping that there would be a > simpler more orthodox way of doing it. In the end, for anyone who cares, we used dynamic fields. There are a lot of them but we haven't seen performance impacted that badly so far.
Re: weighted facets
Hi, answering my own question(s). Result grouping could be the solution as I explained here: https://issues.apache.org/jira/browse/SOLR-385 > http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf (the file is dated to Aug > 2008) yonik implemented this here: https://issues.apache.org/jira/browse/SOLR-153 So, really cool: he's the inventor/first-thinker of their 'bitset tree' ! :-) http://search.lucidimagination.com/search/document/6ccbec5e602687ae/facet_optimizing#6ccbec5e602687ae Regards, Peter. > Hi, > > I need a feature which is well explained from Mr Goll at this site ** > > So, it then would be nice to do sth. like: > > facet.stats=sum(fieldX)&facet.stats.sort=fieldX > > And the output (sorted against the sum-output) can look sth. like this: > > > > 767 > 892 > > Is there something similar or was this answered from Hoss at the lucene > revolution? If not I'll open a JIRA issue ... > > > BTW: is the work from > http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf contributed back to > solr? > > > Regards, > Peter. > > > > PS: Related issue: > https://issues.apache.org/jira/browse/SOLR-680 > https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch > > > > ** > http://lucene.crowdvine.com/posts/14137409 > > Quoting his question in case the site goes offline: > > Hi Chris, > > Usually a facet search returns the document count for the > unique values in the facet field. Is there a way to > return a weighted facet count based on a user-defined function (sum, > product, etc.) of another field? > > Here is a sum example. Assume we have the following > 4 documents with 3 fields > > ID facet_field weight_field > 1 solr 0.4 > 2 lucene 0.3 > 3 lucene 0.1 > 4 lucene 0.2 > > Is there a way to return > > solr 0.4 > lucene 0.6 > > instead of > > solr 1 > lucene 3 > > Given the facet_field contains multiple values > > ID facet_field weight_field > 1 solr lucene 0.2 > 2 lucene 0.3 > 3 solr lucene 0.1 > 4 lucene 0.2 > > Is there a way to return > > solr 0.3 > lucene 0.8 > > instead of > > solr 2 > lucene 4 > > Thanks, > Johannes > > -- http://jetwick.com twitter search prototype
Re: Term is duplicated when updating a document
This is actually known behavior. The problem is that when you update a document, it's deleted and re-added, but the original is marked as deleted. However, the terms aren't touched, both the original and the new document's terms are counted. It'd be hard, very hard, to remove the terms from the inverted index efficiently. But when you optimize, all the deleted documents (and their assiociated terms) are physically removed from the files, thus your term counts change. HTH Erick On Fri, Oct 15, 2010 at 10:05 AM, Thomas Kellerer wrote: > Thanks for the answer. > > > Which fields are modified when the document is updated/replaced. >> > > Only one field was changed, but it was not the one where the auto-suggest > term is coming from. > > > Are there any differences in the content of the fields that you are using >> for the AutoSuggest. >> > No > > > Have you changed you schema.xml file recently? If you have, then there may >> have been changes in the way these fields are analyzed and broken down to >> terms. >> > > No, I did a complete index rebuild to rule out things like that. > Then after startup, did a search, then updated the document and did a > search again. > > Regards > Thomas > > > >> This may be a bug if you did not change the field or the schema file but >> the >> terms count is changing. >> >> On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer >> wrote: >> >> Hi, >>> >>> we are updating our documents (that represent products in our shop) when >>> a >>> dealer modifies them, by calling >>> SolrServer.add(SolrInputDocument) with the updated document. >>> >>> My understanding is, that there is no other way of updating an existing >>> document. >>> >>> >>> However we also use a term query to autocomplete the search field for the >>> user, but each time adocument is updated (added) the term count is >>> incremented. So after starting with a new index the count is e.g. 1, then >>> the document (that contains that term) is updated, and the count is 2, >>> the >>> next update will set this to 3 and so on. >>> >>> One the index is optimized (by calling SolServer.optimize()) the count is >>> correct again. >>> >>> Am I missing something or is this a bug in Solr/Lucene? >>> >>> Thanks in advance >>> Thomas >>> >>> >>> >> >> > >
RE: filter query from external list of Solr unique IDs
The main problem I've encountered with the "lots of OR clauses" approach is that you eventually hit the limit on Boolean clauses and the whole query fails. You can keep raising the limit through the Solr configuration, but there's still a ceiling eventually. - Demian > -Original Message- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Friday, October 15, 2010 1:07 PM > To: solr-user@lucene.apache.org > Subject: RE: filter query from external list of Solr unique IDs > > Definitely interested in this. > > The naive obvious approach would be just putting all the ID's in the > query. Like fq=(id:1 OR id:2 OR). Or making it another clause in > the 'q'. > > Can you outline what's wrong with this approach, to make it more clear > what's needed in a solution? > > From: Burton-West, Tom [tburt...@umich.edu] > Sent: Friday, October 15, 2010 11:49 AM > To: solr-user@lucene.apache.org > Subject: filter query from external list of Solr unique IDs > > At the Lucene Revolution conference I asked about efficiently building > a filter query from an external list of Solr unique ids. > > Some use cases I can think of are: > 1) personal sub-collections (in our case a user can create a small > subset of our 6.5 million doc collection and then run filter queries > against it) > 2) tagging documents > 3) access control lists > 4) anything that needs complex relational joins > 5) a sort of alternative to incremental field updating (i.e. > update in an external database or kv store) > 6) Grant's clustering cluster points and similar apps. > > Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't > seem to be any work on it yet. > > Hoss mentioned a couple of ideas: > 1) sub-classing query parser > 2) Having the app query a database and somehow passing > something to Solr or lucene for the filter query > > Can Hoss or someone else point me to more detailed information on what > might be involved in the two ideas listed above? > > Is somehow keeping an up-to-date map of unique Solr ids to internal > Lucene ids needed to implement this or is that a separate issue? > > > Tom Burton-West > http://www.hathitrust.org/blogs/large-scale-search > > >
RE: filter query from external list of Solr unique IDs
Hi Jonathan, The advantages of the obvious approach you outline are that it is simple, it fits in to the existing Solr model, it doesn't require any customization or modification to Solr/Lucene java code. Unfortunately, it does not scale well. We originally tried just what you suggest for our implementation of Collection Builder. For a user's personal collection we had a table that maps the collection id to the unique Solr ids. Then when they wanted to search their collection, we just took their search and added a filter query with the fq=(id:1 OR id:2 OR). I seem to remember running in to a limit on the number of OR clauses allowed. Even if you can set that limit larger, there are a number of efficiency issues. We ended up constructing a separate Solr index where we have a multi-valued collection number field. Unfortunately, until incremental field updating gets implemented, this means that every time someone adds a document to a collection, the entire document (including 700KB of OCR) needs to be re-indexed just to update the collection number field. This approach has allowed us to scale up to a total of something under 100,000 documents, but we don't think we can scale it much beyond that for various reasons. I was actually thinking of some kind of custom Lucene/Solr component that would for example take a query parameter such as &lookitUp=123 and the component might do a JDBC query against a database or kv store and return results in some form that would be efficient for Solr/Lucene to process. (Of course this assumes that a JDBC query would be more efficient than just sending a long list of ids to Solr). The other part of the equation is mapping the unique Solr ids to internal Lucene ids in order to implement a filter query. I was wondering if something like the unique id to Lucene id mapper in zoie might be useful or if that is too specific to zoie. SoThis may be totally off-base, since I haven't looked at the zoie code at all yet. In our particular use case, we might be able to build some kind of in-memory map after we optimize an index and before we mount it in production. In our workflow, we update the index and optimize it before we release it and once it is released to production there is no indexing/merging taking place on the production index (so the internal Lucene ids don't change.) Tom -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Friday, October 15, 2010 1:07 PM To: solr-user@lucene.apache.org Subject: RE: filter query from external list of Solr unique IDs Definitely interested in this. The naive obvious approach would be just putting all the ID's in the query. Like fq=(id:1 OR id:2 OR). Or making it another clause in the 'q'. Can you outline what's wrong with this approach, to make it more clear what's needed in a solution?
facet.field :java.lang.NullPointerException
Faceting blows up when the field has no data. And this seems to be random. Sometimes it will work even with no data, other times not. Sometimes the error goes away if the field is set to multiValued=true (even though it's one value every time), other times it doesn't. In all cases setting facet.method to enum takes care of the problem. If this param is not set, the default leads to null pointer exception. 09:18:52,218 SEVERE [SolrCore] Exception during facet.field of xyz:java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247) at org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164) at org.apache.solr.request.NumberedTermsEnum.(UnInvertedField.java:960) at org.apache.solr.request.TermIndex$1.(UnInvertedField.java:1151) at org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151) at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204) at org.apache.solr.request.UnInvertedField.(UnInvertedField.java:188) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at
Re: facet.field :java.lang.NullPointerException
This is https://issues.apache.org/jira/browse/SOLR-2142 I'll look into it soon. -Yonik http://www.lucidimagination.com On Fri, Oct 15, 2010 at 3:12 PM, Pradeep Singh wrote: > Faceting blows up when the field has no data. And this seems to be random. > Sometimes it will work even with no data, other times not. Sometimes the > error goes away if the field is set to multiValued=true (even though it's > one value every time), other times it doesn't. In all cases setting > facet.method to enum takes care of the problem. If this param is not set, > the default leads to null pointer exception. > > > 09:18:52,218 SEVERE [SolrCore] Exception during facet.field of > xyz:java.lang.NullPointerException > > at java.lang.System.arraycopy(Native Method) > > at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247) > > at > org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164) > > at > org.apache.solr.request.NumberedTermsEnum.(UnInvertedField.java:960) > > at > org.apache.solr.request.TermIndex$1.(UnInvertedField.java:1151) > > at > org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151) > > at > org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204) > > at > org.apache.solr.request.UnInvertedField.(UnInvertedField.java:188) > > at > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911) > > at > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298) > > at > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354) > > at > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190) > > at > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) > > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210) > > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) > > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) > > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) > at >
Re: Synchronizing Solr with a PostgreDB
Thanks for the quick response! =o) We will go with that approach. On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley wrote: > i would not cross-reference solr results with your database to merge unless > you want to spank your database. nor would i load solr with all your data. > what i have found is that the search results page is generally a small subset > of data relating to the fuller document/result. therefore i store only the > data required to present the search results wholly from solr. the user can > choose to click into a specific result which then uses just the database to > present it. > > use data import handler - define an xml config to import as many entities > into your document as you need and map columns to fields in schema.xml. use > the Wiki page on DIH - it's all there, as well as example config in the solr > distro. > > allistair > > On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez wrote: > >> Hello everyone! I am new to Solr and Lucene and I would like to ask >> you a couple of questions. >> >> I am working on an existing system that has the data saved in a >> Postgre DB and now I am trying to integrate Solr to use full-text >> search and faceted search, but I am having a couple of doubts about >> it. >> >> 1) I see two ways of storing the data and make the search: >> - Duplicate all the DB data in Solr, so complete results are returned >> from a search query, or... >> - Put in Solr just the data that I need to search and, after finding >> the elements with a Solr query, use the result to make a more specific >> query to the DB. >> >> Which is the way this is normally done? >> >> 2) How do I synchronize Solr and Postgre? Do I have to use the >> DataImportHandler or when I do the INSERT command into Postgre, I have >> to execute a command into Solr? >> >> Thanks for your time! >> >> Cheers! >> Juan M. > >
Re: SOLRJ - Searching text in all fields of a Bean
You can replace query.setQueryType("dismax") with query.set("defType", "dismax"); Also don't forget to request title field with fl parameter. query.addField("title");
Re: Solr with example Jetty and score problem
: Thanks. But do you have any suggest or work-around to deal with it? Posted in SOLR-2140 ..this key is to make sure solr knows "score" is not multiValued -Hoss
Re: ant build problem
: i updated my solr trunk to revision 1004527. when i go for compiling : the trunk with ant i get so many warnings, but the build is successful. the Most of these warnings are legitimate, the probelms have always been there, but recently the Lucene build file was updated to warn about them by default. This one though... : [javac] warning: [path] bad path element : "/usr/share/ant/lib/hamcrest-core.jar": no such file or directory ...thta's something specific to your setup. something in your systems ant configs thinks thta jar should be there. : After the compiling i thought to check with the ant test and performed but : it is failed.. failing tests are also a posisbility ... there are several tests in hte code base right now that fail sporadicly (especially because of recent changes ot hte build system designed to get test that *might* fail based on locale to fail more often) and people are working on them -- w/o full details about wat failurs you got though, we can't say if they are known issues. -Hoss
Re: having problem about Solr Date Field.
: So, regarding DST, do you put everything in GMT, and make adjustments : for in the 'seach for/between' data/time values before the query for : both DST and TZ? The client adding docs is hte only one that knows what TZ it's in when it formats the docs to add them, and the client issuing the query is hte only one that knows what TZ it's in when it formats the query string to execute the query. in both cases the client must use the UTC TZ when formating the date strings so that Solr can deal with it correctly. -Hoss
Re: Question related to phrase search in lucene/solr?
: I have question is it possible to perform a phrase search with wild cards in : solr/lucene as if i have two queries both have exactly same results one is : +Contents:"change market" : : and other is : +Contents:"chnage* market" : : but i think the second should match "chages market" as well but it does not : matches it. Any help would be appreciated In my experience, 90% of the times people ask about using wildcards in a phrase query what they really want is simple stemming of the terms -- the one example you've cited is an example of this. If your "Contents" field uses an analyzer that does stemming then "change market" and "changes market" would both match. -Hoss
Re: Disable (or prohibit) per-field overrides
: Anyone knows useful method to disable or prohibit the per-field override : features for the search components? If not, where to start to make it : configurable via solrconfig and attempt to come up with a working patch? If your goal is to prevent *clients* from specifying these (while you're still allowed to use them in your defaults) then the simplest solution is probably something external to Solr -- along the lines of mod_rewrite. Internally... that would be tough. You could probably write a SearchComponent (configured to run "first") that does it fairly easily -- just wrap the SolrParams in an impl that retuns null anytime a component asks for a param name that starts with "f." (and excludes those param names when asked for a list of the param names) It could probably be generalized to support arbitrary rules i na way that might be handy for other folks, but it would still just be wrapping all of hte params, so it would prevent you from using them in your config as well. Ultimatley i think a general solution would need to be in RequestHandlerBase ... where it wraps the request params using the defaults and invariants ... you'd want the custom exclusion rules to apply only to the request params from the client. -Hoss
RE: filter query from external list of Solr unique IDs
Thanks Yonik, Is this something you might have time to throw together, or an outline of what needs to be thrown together? Is this something that should be asked on the developer's list or discussed in SOLR 1715 or does it make the most sense to keep the discussion in this thread? Tom -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, October 15, 2010 1:19 PM To: solr-user@lucene.apache.org Subject: Re: filter query from external list of Solr unique IDs On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom wrote: > At the Lucene Revolution conference I asked about efficiently building a > filter query from an external list of Solr unique ids. Yeah, I've thought about a special query parser and query to deal with this (relatively) efficiently, both from a query perspective and a memory perspective. Should be pretty quick to throw together: - comma separated list of terms (unique ids are a special case of this) - in the query, store as a single byte array for efficiency - sort the ids if they aren't already sorted - do lookups with a term enumerator and skip weighting or anything else like that - configurable caching... may, or may not want to cache this big query That's only part of the stuff you mention, but seems like it would be useful to a number of people. -Yonik http://www.lucidimagination.com
SOLR DateTime and SortableLongField field type problems
Hello all, I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow the advice from http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.html about converting date fields to SortableLong fields for better memory efficiency. However, whenever I try to do this using the DateFormater, I get exceptions when indexing for every row that tries to create my sortable fields. In my schema.xml, I have the following definitions for the fieldType and dynamicField: In my dih.xml, I have the following definitions: The fields in question are in the formats: 2001-12-04T00:00:00Z 2001-12-04T19:38:01Z The exception that I am receiving is: Oct 15, 2010 6:23:24 PM org.apache.solr.handler.dataimport.DateFormatTransformer transformRow WARNING: Could not parse a Date field java.text.ParseException: Unparseable date: "Wed Nov 28 21:39:05 EST 2007" at java.text.DateFormat.parse(DateFormat.java:337) at org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89) at org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) I know that it has to be the SortableLong fields, because if I remove just those two lines from my dih.xml, everything imports as I expect it to. Am I doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is this not supported in my version of SOLR? I'm not very experienced with Java, so digging into the code would be a lost cause for me right now. I was hoping that somebody here might be able to help point me in the right/correct direction. It should be noted that the modified_date and df_date_published fields index just fine (so long as I do it as I've defined above). Thank you, - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
Re: Synchronizing Solr with a PostgreDB
We're doing what was recommended. Nice to hear we're on the right path. Yeah Postgres! Yeah Solr/Lucene! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 10/15/10, Juan Manuel Alvarez wrote: > From: Juan Manuel Alvarez > Subject: Re: Synchronizing Solr with a PostgreDB > To: solr-user@lucene.apache.org > Date: Friday, October 15, 2010, 1:04 PM > Thanks for the quick response! =o) > We will go with that approach. > > On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley > wrote: > > i would not cross-reference solr results with your > database to merge unless you want to spank your database. > nor would i load solr with all your data. what i have found > is that the search results page is generally a small subset > of data relating to the fuller document/result. therefore i > store only the data required to present the search results > wholly from solr. the user can choose to click into a > specific result which then uses just the database to present > it. > > > > use data import handler - define an xml config to > import as many entities into your document as you need and > map columns to fields in schema.xml. use the Wiki page on > DIH - it's all there, as well as example config in the solr > distro. > > > > allistair > > > > On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez > wrote: > > > >> Hello everyone! I am new to Solr and Lucene and I > would like to ask > >> you a couple of questions. > >> > >> I am working on an existing system that has the > data saved in a > >> Postgre DB and now I am trying to integrate Solr > to use full-text > >> search and faceted search, but I am having a > couple of doubts about > >> it. > >> > >> 1) I see two ways of storing the data and make the > search: > >> - Duplicate all the DB data in Solr, so complete > results are returned > >> from a search query, or... > >> - Put in Solr just the data that I need to search > and, after finding > >> the elements with a Solr query, use the result to > make a more specific > >> query to the DB. > >> > >> Which is the way this is normally done? > >> > >> 2) How do I synchronize Solr and Postgre? Do I > have to use the > >> DataImportHandler or when I do the INSERT command > into Postgre, I have > >> to execute a command into Solr? > >> > >> Thanks for your time! > >> > >> Cheers! > >> Juan M. > > > > >
Re: "Virtual field", Statistics
Please add a JIRA issue requesting this. A bunch of things are not supported for functions: returning as a field value, for example. On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal wrote: > Dear solr-user folks, > > I would like to use the stats module to perform very basic statistics > (mean, min and max) which is actually working just fine. > > Nethertheless I found a little limitation that bothers me a tiny bit : > how to perform the exact same statistics, but on the result of a > function query rather than a field. > > Example : > schema : > - string : id > - float : width > - float : height > - float : depth > - string : color > - float : price > > What I'd like to do is something like : > select?price:[45.5 TO > 99.99]&stats=on&stats.facet=color&stats.field={volume=product(product(width, > height), depth)} > I would expect to obtain : > > > > > ... > ... > ... > ... > ... > ... > ... > ... > > > > ... > ... > ... > ... > ... > ... > ... > ... > > > ... > ... > ... > ... > ... > ... > ... > ... > > > > > > > > Of course computing the volume can be performed before indexing data, > but defining virtual fields on the fly given an arbitrary function is > powerful and I am comfortable with the idea that many others would > appreciate. Especially for BI needs and so on... :-D > Is there a way to do it easily that I would have not been able to > find, or is it actually impossible ? > > Thank you very much in advance for your help. > > -- > Tanguy > -- Lance Norskog goks...@gmail.com
Re: filter query from external list of Solr unique IDs
: Hoss mentioned a couple of ideas: : 1) sub-classing query parser : 2) Having the app query a database and somehow passing something : to Solr or lucene for the filter query The approach i was refering to is something one of my coworkers did a while back (if he's still lurking on the list, maybe he'll speak up) He implemented a custom "SqlFilterQuery" class that was constructed from a JDBC URL and a SQL statement. the SqlQuery class rewrote to itself (so it was a primitive query class) and returned a Scorer method that would: 1) execute the SQL query (which should return a sorted list of uniqueKey field values) and retrieve a JDBC iterator (cursor?) over the results. 2) fetch a TermEnum from Lucene for the uniqueKey field 3) use the JDBC Iterator to skip ahead on the TermEnum and for each uniqueKey to get the underlying lucene docid, and record it in a DocSet As i recall, my coworker was using this in a custom RequestHandler, where he was then forcibly putting that DocSet in the filterCache so that it would be there on future requests, and it would be regenerated by autoWarming (the advantage of implementing this logic using the Query interface) but it could also be done with a custom cache if you don't want these to contend for space in the filterCache. My point aout hte query parser was that instead of needing to use a custom RequestHandler (or even a custom SearchCOmponent) to generate this DocSet for filtering, you could probably do it using a QParserPlugin -- that way you could use a regaulr "fq" param to generate the filter. You could even generalize the hell out of it so the SQL itself could be specified at request time... q=solr&fq={!sql}SELECT ID FROM USER_MAP WHERE USER=1234 ORDER BY ID ASC -Hoss