Re: Is there a multi-shard optimize message?
On Wed, Jul 29, 2009 at 2:48 AM, Phillip Farber wrote: > > Normally to optimize an index you POST to /solr/update. Is > there any way to POST an optimize message to one instance and have it > propagate to all shards sort of like the select? > > /solr-shard-1/select?q=dog... shards=shard-1,shard2 > No, you'll need to send optimize to each host separately. -- Regards, Shalin Shekhar Mangar.
refering/alias other Solr documents
Hi all: Is in solr, that will allow documents referring each other ? In other words, if a search for "abc" matches on document 1 , I should be able to return document 2 even though the index does any fields matching "abc". Here is the scenario with some more details: Solr version:1.3 Scenario: 1) Solr Document 1 with say some field title="abc" and Solr Document 2 with its own data. 2) User searches for "abc" and gets Document 1 as it matches on title field Expected results: When the user searches for "abc" he it also get Document 2 along with Document 1. I understand one way of doing this is to make sure Document 2 has all the contents of Document 1. But this introduces a issue of keeping the two documents (and hence their solr index) in sync with each other. I think I am looking for a mechanism like this: Document 1 refers => document 2, Document 3. Hence whenever document 1 in part of search results, document 2 and document 3 will also be returned as search results . I may be totally off on this expectation but am trying to solve a "Contains" problem where lets say a book (represented as Document 1 in solr) "contains" Chapters (represented by Document 2,3,4..) in solr. I hope this is not too confusing ;) TIA ~Ravi Gidwani -- View this message in context: http://www.nabble.com/refering-alias-other-Solr-documents-tp24713855p24713855.html Sent from the Solr - User mailing list archive at Nabble.com.
Boosting ('bq') on multi-valued fields
Hey, I have a field defined as such: with the string type defined as: When I try using some query-time boost parameters using the bq on values of this field it seems to behave strangely in case of documents actually having multiple values: If i'd do a boost for a particular value ( "site_id:5^1.1" ) it seems like all the cases where this field is actually populated with multiple ones ( i.e a document with field value "5|6" ) do not get boosted at all. I verified this using debugQuery & explainOther=doc_id:. is this a known issue/bug? any work arounds? (i'm using a nightly solr build from a few months back.. ) Thanks, -Chak -- View this message in context: http://www.nabble.com/Boosting-%28%27bq%27%29-on-multi-valued-fields-tp24713905p24713905.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update some index documents after indexing process is done with DIH
On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlese wrote: > > That really sounds the best way to reach my goal. How could I invoque a > listener from the newSearcher?Would be something like: > > > solr 0 name="rows">10 > rocks 0 name="rows">10 > static newSearcher warming query from > solrconfig.xml > > > > > And MyCustomListener would be the class who open the reader: > > RefCounted searchHolder = null; > try { > searchHolder = dataImporter.getCore().getSearcher(); > IndexReader reader = searchHolder.get().getReader(); > > //Here I iterate over the reader doing docuemnt modifications > > } finally { > if (searchHolder != null) searchHolder.decref(); > } > } catch (Exception ex) { > LOG.info("error"); > } you may not be able to access the DIH API from a newSearcher event . But the API would give you the searcher directly as a method parameter. > > Finally, to access to documents and add fields to some of them, I have > thought in using SolrDocument classes. Can you please point me where > something similar is done in solr source (I mean creation of SolrDocuemnts > and conversion of them to proper lucene docuements). > > Does this way for reaching the goal makes sense? > > Thanks in advance > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> when a core is reloaded the event fired is firstSearcher. newSearcher >> is fired when a commit happens >> >> >> On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese >> wrote: >>> >>> Ok, but if I handle it in a newSearcher listener it will be executed >>> every >>> time I reload a core, isn't it? The thing is that I want to use an >>> IndexReader to load in a HashMap some doc fields of the index and >>> depending >>> of the values of some field docs modify other docs. Its very memory >>> consuming (I have tested it with a simple lucene script). Thats why I >>> wanted >>> to do it just after the indexing process. >>> >>> My ideal case would be to do it in the commit function of >>> DirectUpdatehandler2.java just before >>> writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want >>> to >>> mess that code... so trying to find out the best way to do that as a >>> plugin >>> instead of a hack as possible. >>> >>> Thanks in advance >>> >>> >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: It is best handled as a 'newSearcher' listener in solrconfig.xml. onImportEnd is invoked before committing On Tue, Jul 28, 2009 at 3:13 PM, Marc Sturlese wrote: > > Hey there, > I would like to be able to do something like: After the indexing > process > is > done with DIH I would like to open an indexreader, iterate over all > docs, > modify some of them depending on others and delete some others. I can > easy > do this directly coding with lucene but would like to know if there's a > way > to do it with Solr using SolrDocument or SolrInputDocument classes. > I have thougth in using SolrJ or DIH listener onImportEnd but not sure > if > I > can get an IndexReader in there. > Any advice? > Thanks in advance > -- > View this message in context: > http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24697751.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: FieldCollapsing: Two response elements returned?
I've applied latest collapse field related patch (patch-3) and it doesn't work. Anyone knows how can i get only the collapse response ? 29-jul-2009 11:05:21 org.apache.solr.common.SolrException log GRAVE: java.lang.ClassCastException: org.apache.solr.handler.component.CollapseComponent cannot be cast to org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:150) at org.apache.solr.core.SolrCore.(SolrCore.java:539) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:381) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:241) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:115) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:987) at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:909) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:495) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:583) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) 2009/7/28 Marc Sturlese : > > That's provably because you are using both the CollpaseComponent and the > QueryComponent. I think the 2 or 3 last patches allow full replacement of > QueryComponent.You shoud just replace: > > class="org.apache.solr.handler.component.QueryComponent" /> > for: > class="org.apache.solr.handler.component.CollapseComponent" /> > > This will sort your problem and make response times faster. > > > > Jay Hill wrote: >> >> I'm doing some testing with field collapsing, and early results look good. >> One thing seems odd to me however. I would expect to get back one block of >> results, but I get two - the first one contains the collapsed results, the >> second one contains the full non-collapsed results: >> >> ... >> ... >> >> This seems somewhat confusing. Is this intended or is this a bug? >> >> Thanks, >> -Jay >> >> > > -- > View this message in context: > http://www.nabble.com/FieldCollapsing%3A-Two-response-elements-returned--tp24690426p24693960.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lici
solr/home in web.xml relative to web server home
Hi all, the environment variable (env-entry) in web.xml to configure the solr/home is relative to the web server's working directory. I find this unusual as all the servlet paths are relative to the web applications directory (webapp context, that is). So, I specified solr/home relative to the web app dir, as well, at first. I think it makes deployment in an unknown environment, or in different environments using a simple war more complex than it needed to be. If a webapp relative path inside the war file could be used, the configuration of solr (and cores) could be included in the war file completely with no outside dependency - except, of course, of the data directory if that is to go some place else. (In my case, I want to deliver the solr web application including a custom entity processor, so that is why I want to include the solr war as part of my release cycle. It is easier to deliver that to the system administration than to provide them with partial packages they have to install into an already installed war, imho.) Am I the only one who has run into that? Thanks for any input on that! Chantal -- Chantal Ackermann
Re: highlighting performance
Just an FYI, Lucene 2.9 has FastVectorHighlighter: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/search/vectorhighlight/package-summary.html Features * fast for large docs * support N-gram fields * support phrase-unit highlighting with slops * need Java 1.5 * highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS * take into account query boost to score fragments * support colored highlight tags * pluggable FragListBuilder * pluggable FragmentsBuilder Unfortunately, Solr hasn't incorporated it yet: https://issues.apache.org/jira/browse/SOLR-1268 Koji ravi.gidwani wrote: Hey Matt: I have been facing the same issue. I have a text field that I highlight along with other fields (may be 10 others fields). But If I enable highlighting on this text field that contains large number of characters/words ( > 100 000 characters) , highlighting suffers performance. Queries return in about 15/20 seconds with this field enabled in highlights as compared to less than a second WITHOUT this enabled in highlight. I did try termvector=true , but I did not see any performance gain either. Just wondering if you were able to solve your issue OR tweak the performance in any other way. BTW , I use solr 1.3. ~Ravi . goodieboy wrote: Thanks Otis. I added termVector="true" for those fields, but there isn't a noticeable difference. So, just to be a little more clear, the dynamic fields I'm adding... there might be hundreds. Do you see this as a problem? Thanks, Matt On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: Matt, I believe indexing those fields that you will use for highlighting with term vectors enabled will make things faster (and your index a bit bigger). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 5:08:23 PM Subject: highlighting performance Hi, I'm experimenting with highlighting and am noticing a big drop in performance with my setup. I have documents that use quite a few dynamic fields (20-30). The fields are multiValued stored/indexed text fields, each with a few paragraphs worth of text. My hl.fl param is set to *_t What kinds of things can I tweak to make this faster? Is it because I'm highlighting so many different fields? Thanks, Matt Quoted from: http://www.nabble.com/highlighting-performance-tp23567323p23713406.html goodieboy wrote: Thanks Otis. I added termVector="true" for those fields, but there isn't a noticeable difference. So, just to be a little more clear, the dynamic fields I'm adding... there might be hundreds. Do you see this as a problem? Thanks, Matt On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: Matt, I believe indexing those fields that you will use for highlighting with term vectors enabled will make things faster (and your index a bit bigger). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 5:08:23 PM Subject: highlighting performance Hi, I'm experimenting with highlighting and am noticing a big drop in performance with my setup. I have documents that use quite a few dynamic fields (20-30). The fields are multiValued stored/indexed text fields, each with a few paragraphs worth of text. My hl.fl param is set to *_t What kinds of things can I tweak to make this faster? Is it because I'm highlighting so many different fields? Thanks, Matt
Re: debugQuery=true issue
Hi, Thanks for your response, I'm still developing so the schema is still in flux so I guess that explains it. Oh and regarding the NPE, I updated my checkout and recompiled and now it's gone so I guess somewhere between revision 787997 and 798482 it's already been fixed. Regards, gwk Robert Petersen wrote: I had something similar happen where optimize fixed an odd sorting/scoring problem, and as I understand it the optimize will clear out index 'lint' from old schemas/documents and so thus could affect result scores since all the term vectors or something similar are refreshed etc etc
Re: HTTP Status 500 - java.lang.RuntimeException: Can't find resource 'solrconfig.xml'
As Solr said in the log, Solr couldn't find solrconfig.xml in classpath or solr.solr.home, cwd. My guess is that relative path you set for solr.solr.home was incorrect? Why don't you try: solr.solr.home=/home/huenzhao/search/tomcat6/bin/solr instead of: solr.solr.home=home/huenzhao/search/tomcat6/bin/solr Koji huenzhao wrote: Hi all, I used ubuntu 8.10 as the solr server OS, and set the solr.solr.home=home/huenzhao/search/tomcat6/bin/solr. When I run the tomcat(The tomcat and the solr that I used running on windows XP has no problem), there has error as : HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: false in null - java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'home/huenzhao/search/tomcat6/bin/solr/conf/', cwd=/home/huenzhao/search/tomcat6/bin at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162) at org.apache.solr.core.Config.(Config.java:100) at org.apache.solr.core.SolrConfig.(SolrConfig.java:113) at org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3696) at …… Anybody knows how to do? enzhao...@gmail.com
Re: solr/home in web.xml relative to web server home
On Wed, Jul 29, 2009 at 2:42 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Hi all, > > the environment variable (env-entry) in web.xml to configure the solr/home > is relative to the web server's working directory. I find this unusual as > all the servlet paths are relative to the web applications directory (webapp > context, that is). So, I specified solr/home relative to the web app dir, as > well, at first. > > I think it makes deployment in an unknown environment, or in different > environments using a simple war more complex than it needed to be. If a > webapp relative path inside the war file could be used, the configuration of > solr (and cores) could be included in the war file completely with no > outside dependency - except, of course, of the data directory if that is to > go some place else. > (In my case, I want to deliver the solr web application including a custom > entity processor, so that is why I want to include the solr war as part of > my release cycle. It is easier to deliver that to the system administration > than to provide them with partial packages they have to install into an > already installed war, imho.) > You don't need to create a custom war for that. You can package the EntityProcessor into a separate jar and add it to solr_home/lib directory. -- Regards, Shalin Shekhar Mangar.
Relevant results with DisMaxRequestHandler
Hello, I did notice several strange behaviors on queries. I would like to share with you an example, so maybe you can explain to me what is going wrong. Using the following query : http://localhost:8983/solr/others/select/?debugQuery=true&q=anna%20lewis&rows=20&start=0&fl=*&qt=dismax I get back around 100 results. Follow the two first : Person:151 Victoria Davisson Person:37 Anna Lewis And the related debugs : 57.998047 = (MATCH) sum of: 0.048290744 = (MATCH) sum of: 0.024546575 = (MATCH) max plus 0.01 times others of: 0.024546575 = (MATCH) weight(text:anna^0.5 in 64288), product of: 0.027395602 = queryWeight(text:anna^0.5), product of: 0.5 = boost 5.734427 = idf(docFreq=564, numDocs=30400) 0.009554783 = queryNorm 0.8960042 = (MATCH) fieldWeight(text:anna in 64288), product of: 1.0 = tf(termFreq(text:anna)=1) 5.734427 = idf(docFreq=564, numDocs=30400) 0.15625 = fieldNorm(field=text, doc=64288) 0.02374417 = (MATCH) max plus 0.01 times others of: 0.02374417 = (MATCH) weight(text:lewi^0.5 in 64288), product of: 0.026944114 = queryWeight(text:lewi^0.5), product of: 0.5 = boost 5.6399217 = idf(docFreq=620, numDocs=30400) 0.009554783 = queryNorm 0.88123775 = (MATCH) fieldWeight(text:lewi in 64288), product of: 1.0 = tf(termFreq(text:lewi)=1) 5.6399217 = idf(docFreq=620, numDocs=30400) 0.15625 = fieldNorm(field=text, doc=64288) 57.949757 = (MATCH) FunctionQuery(ord(name_s)), product of: 1213.0 = ord(name_s)=1213 5.0 = boost 0.009554783 = queryNorm 5.006892 = (MATCH) sum of: 0.038405567 = (MATCH) sum of: 0.021955125 = (MATCH) max plus 0.01 times others of: 0.021955125 = (MATCH) weight(text:anna^0.5 in 62632), product of: 0.027395602 = queryWeight(text:anna^0.5), product of: 0.5 = boost 5.734427 = idf(docFreq=564, numDocs=30400) 0.009554783 = queryNorm 0.80141056 = (MATCH) fieldWeight(text:anna in 62632), product of: 2.236068 = tf(termFreq(text:anna)=5) 5.734427 = idf(docFreq=564, numDocs=30400) 0.0625 = fieldNorm(field=text, doc=62632) 0.016450444 = (MATCH) max plus 0.01 times others of: 0.016450444 = (MATCH) weight(text:lewi^0.5 in 62632), product of: 0.026944114 = queryWeight(text:lewi^0.5), product of: 0.5 = boost 5.6399217 = idf(docFreq=620, numDocs=30400) 0.009554783 = queryNorm 0.61053944 = (MATCH) fieldWeight(text:lewi in 62632), product of: 1.7320508 = tf(termFreq(text:lewi)=3) 5.6399217 = idf(docFreq=620, numDocs=30400) 0.0625 = fieldNorm(field=text, doc=62632) 4.968487 = (MATCH) FunctionQuery(ord(name_s)), product of: 104.0 = ord(name_s)=104 5.0 = boost 0.009554783 = queryNorm I'm using a simple boost function : dismax explicit 0.01 text^0.5 name_s^5.0 name_s^5.0 name_s^5.0 Can anyone explain to me why the first result is on top (the query is 'anna lewis') with a huge weight and nothing related (it seems the weight come from the name_s field...) ? A second general question... is it possible to boost a field if the query match exactly the content of a field? Thank you ! Vincent -- View this message in context: http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24716870.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: facet.prefix question
Licinio Fernández Maurelo wrote: i'm trying to do some filtering in the count list retrieved by solr when doing a faceting query , i'm wondering how can i use facet.prefix to gem something like this: Query facet.field=foo&facet.prefix=A OR B Response - 12560 5440 2357 . . . How can i achieve such this behaviour? Best Regards You cannot set a query for facet.prefix parameter. facet.prefix should be a prefix *string* of terms in the index, and you can set it at a time. So I think you need to send two requests to get what you want: ...&facet.field=foo&facet.prefix=A ...&facet.field=foo&facet.prefix=B Koji
Question about formatting the results returned from Solr
Hi all, Not sure how good my title is, but here is a (hopefully) better explanation on what I mean. I am indexing a set of articles from a DB. Each article has an author. The author is saved in then the DB as an author ID, which is a number. There is another table in the DB with more relevant information about the author. Basically it has columns like: id, firstname, lastname, email, userid I set up the DIH so that it returns the userid, and it works fine: jdoe msmith Would it be possible to return all of the information about the author (first name, ...) as a subset of the results above? Here is what I mean: John Doe j...@doe.com ... Something similar to that at least... Not sure how descriptive I was, but any pointers would be highly appreciated. Cheers -- View this message in context: http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting Tika to work in Solr 1.4 nightly
I am working with Solr 1.4 nightly and am running it on a Windows machine. Solr is running using the example folder that was installed from the zip file. The only alteration that I have made to this default installation is to add a simple Word document into the exampledocs folder. I am trying to get Tika to work in Solr. When I run the tika-0.3.jar directed to a Word document it outputs to the screen in XML format. I am not able to get Solr to run tika and index the information in the sample Word document. I have looked at the following resources: Solr mailing list archive (although I could have missed something here); Documentation and Getting started on the Apache Tika website; I even found an article called Content Extraction with Tika at this website: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles /Content-Extraction-Tika This article talks about using curl. Is curl necessary to use or does Solr have something already configured to do the same as curl? I have modified the solrconfig.xml file to include the request handler for the ExtractingRequestHandler. I used the modification that was commented out in the solrconfig.xml file. Here it is for reference: last_modified true Is there some modification to this code that I need to make? Can some one please direct me to a source that can help me get this to work. Kevin Miller
Re: FieldCollapsing: Two response elements returned?
My last mail is wrong. Sorry El 29 de julio de 2009 11:10, Licinio Fernández Maurelo escribió: > I've applied latest collapse field related patch (patch-3) and it doesn't > work. > Anyone knows how can i get only the collapse response ? > > > 29-jul-2009 11:05:21 org.apache.solr.common.SolrException log > GRAVE: java.lang.ClassCastException: > org.apache.solr.handler.component.CollapseComponent cannot be cast to > org.apache.solr.request.SolrRequestHandler > at > org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:150) > at org.apache.solr.core.SolrCore.(SolrCore.java:539) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:381) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:241) > at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:115) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) > at > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) > at > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) > at > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) > at > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) > at > org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) > at > org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) > at > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) > at > org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) > at > org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:987) > at > org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:909) > at > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:495) > at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) > at > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) > at > org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) > at > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) > at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) > at > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) > at > org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) > at > org.apache.catalina.core.StandardService.start(StandardService.java:516) > at > org.apache.catalina.core.StandardServer.start(StandardServer.java:710) > at org.apache.catalina.startup.Catalina.start(Catalina.java:583) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) > at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) > > 2009/7/28 Marc Sturlese : >> >> That's provably because you are using both the CollpaseComponent and the >> QueryComponent. I think the 2 or 3 last patches allow full replacement of >> QueryComponent.You shoud just replace: >> >> > class="org.apache.solr.handler.component.QueryComponent" /> >> for: >> > class="org.apache.solr.handler.component.CollapseComponent" /> >> >> This will sort your problem and make response times faster. >> >> >> >> Jay Hill wrote: >>> >>> I'm doing some testing with field collapsing, and early results look good. >>> One thing seems odd to me however. I would expect to get back one block of >>> results, but I get two - the first one contains the collapsed results, the >>> second one contains the full non-collapsed results: >>> >>> ... >>> ... >>> >>> This seems somewhat confusing. Is this intended or is this a bug? >>> >>> Thanks, >>> -Jay >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/FieldCollapsing%3A-Two-response-elements-returned--tp24690426p24693960.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lici > -- Lici
Re: Relevant results with DisMaxRequestHandler
On Jul 29, 2009, at 6:55 AM, Vincent Pérès wrote: Using the following query : http://localhost:8983/solr/others/select/?debugQuery=true&q=anna%20lewis&rows=20&start=0&fl=*&qt=dismax I get back around 100 results. Follow the two first : Person:151 Victoria Davisson Person:37 Anna Lewis And the related debugs : 57.998047 = (MATCH) sum of: 0.048290744 = (MATCH) sum of: 0.024546575 = (MATCH) max plus 0.01 times others of: 0.024546575 = (MATCH) weight(text:anna^0.5 in 64288), product of: 0.027395602 = queryWeight(text:anna^0.5), product of: 0.5 = boost 5.734427 = idf(docFreq=564, numDocs=30400) 0.009554783 = queryNorm 0.8960042 = (MATCH) fieldWeight(text:anna in 64288), product of: 1.0 = tf(termFreq(text:anna)=1) 5.734427 = idf(docFreq=564, numDocs=30400) 0.15625 = fieldNorm(field=text, doc=64288) 0.02374417 = (MATCH) max plus 0.01 times others of: 0.02374417 = (MATCH) weight(text:lewi^0.5 in 64288), product of: 0.026944114 = queryWeight(text:lewi^0.5), product of: 0.5 = boost 5.6399217 = idf(docFreq=620, numDocs=30400) 0.009554783 = queryNorm 0.88123775 = (MATCH) fieldWeight(text:lewi in 64288), product of: 1.0 = tf(termFreq(text:lewi)=1) 5.6399217 = idf(docFreq=620, numDocs=30400) 0.15625 = fieldNorm(field=text, doc=64288) 57.949757 = (MATCH) FunctionQuery(ord(name_s)), product of: 1213.0 = ord(name_s)=1213 5.0 = boost 0.009554783 = queryNorm 5.006892 = (MATCH) sum of: 0.038405567 = (MATCH) sum of: 0.021955125 = (MATCH) max plus 0.01 times others of: 0.021955125 = (MATCH) weight(text:anna^0.5 in 62632), product of: 0.027395602 = queryWeight(text:anna^0.5), product of: 0.5 = boost 5.734427 = idf(docFreq=564, numDocs=30400) 0.009554783 = queryNorm 0.80141056 = (MATCH) fieldWeight(text:anna in 62632), product of: 2.236068 = tf(termFreq(text:anna)=5) 5.734427 = idf(docFreq=564, numDocs=30400) 0.0625 = fieldNorm(field=text, doc=62632) 0.016450444 = (MATCH) max plus 0.01 times others of: 0.016450444 = (MATCH) weight(text:lewi^0.5 in 62632), product of: 0.026944114 = queryWeight(text:lewi^0.5), product of: 0.5 = boost 5.6399217 = idf(docFreq=620, numDocs=30400) 0.009554783 = queryNorm 0.61053944 = (MATCH) fieldWeight(text:lewi in 62632), product of: 1.7320508 = tf(termFreq(text:lewi)=3) 5.6399217 = idf(docFreq=620, numDocs=30400) 0.0625 = fieldNorm(field=text, doc=62632) 4.968487 = (MATCH) FunctionQuery(ord(name_s)), product of: 104.0 = ord(name_s)=104 5.0 = boost 0.009554783 = queryNorm I'm using a simple boost function : dismax explicit 0.01 text^0.5 name_s^5.0 name_s^5.0 name_s^5.0 Can anyone explain to me why the first result is on top (the query is 'anna lewis') with a huge weight and nothing related (it seems the weight come from the name_s field...) ? The ord function perhaps isn't doing what you want. It is returning the term position, and thus it appears "Anna Lewis" is the 104th name_s value in your index lexicographically. And of course "Victoria Davisson" is much further down, at the 1203rd position. Maybe you want rord instead? But probably not... A second general question... is it possible to boost a field if the query match exactly the content of a field? You can use set dismax to have a qs (query slop) factor which will boost documents where the users terms are closer together (within the number of terms distance specified). Erik
RE: Boosting ('bq') on multi-valued fields
> Hey, > I have a field defined as such: > > stored="false" > multiValued="true" /> > > with the string type defined as: > > omitNorms="true"/> > > When I try using some query-time boost parameters using the bq on > values of > this field it seems to behave > strangely in case of documents actually having multiple values: > If i'd do a boost for a particular value ( "site_id:5^1.1" ) it seems > like > all the cases where this field is actually > populated with multiple ones ( i.e a document with field value "5|6" ) > do > not get boosted at all. I verified this using > debugQuery & explainOther=doc_id:. > is this a known issue/bug? any work arounds? (i'm using a nightly solr > build > from a few months back.. ) There is no tokenization on 'string' fields, so a query for "5" does not match a doc with a value of "5|6" for this field. You could try using field type 'text' for this and see what you get. You may need to customize it to you the StandardAnalyzer or WordDelimiterFilterFactory to get the right behavior. Using the analysis tool in the solr admin UI to experiment will probably be helpful. -Ken
Re: update some index documents after indexing process is done with DIH
>From the newSearcher(..) of a CustomEventListener which extends of AbstractSolrEventListener can access to SolrIndexSearcher and all core properties but can't get a SolrIndexWriter. Do you now how can I get from there a SolrIndexWriter? This way I would be able to modify the documents (I need to modify them depending on values of other documents, that's why I can't do it with DIH delta-import). Thanks in advance Noble Paul നോബിള് नोब्ळ्-2 wrote: > > On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlese > wrote: >> >> That really sounds the best way to reach my goal. How could I invoque a >> listener from the newSearcher?Would be something like: >> >> >> solr 0 > name="rows">10 >> rocks 0 > name="rows">10 >> static newSearcher warming query from >> solrconfig.xml >> >> >> >> >> And MyCustomListener would be the class who open the reader: >> >> RefCounted searchHolder = null; >> try { >> searchHolder = dataImporter.getCore().getSearcher(); >> IndexReader reader = searchHolder.get().getReader(); >> >> //Here I iterate over the reader doing docuemnt modifications >> >> } finally { >> if (searchHolder != null) searchHolder.decref(); >> } >> } catch (Exception ex) { >> LOG.info("error"); >> } > > you may not be able to access the DIH API from a newSearcher event . > But the API would give you the searcher directly as a method > parameter. >> >> Finally, to access to documents and add fields to some of them, I have >> thought in using SolrDocument classes. Can you please point me where >> something similar is done in solr source (I mean creation of >> SolrDocuemnts >> and conversion of them to proper lucene docuements). >> >> Does this way for reaching the goal makes sense? >> >> Thanks in advance >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> when a core is reloaded the event fired is firstSearcher. newSearcher >>> is fired when a commit happens >>> >>> >>> On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese >>> wrote: Ok, but if I handle it in a newSearcher listener it will be executed every time I reload a core, isn't it? The thing is that I want to use an IndexReader to load in a HashMap some doc fields of the index and depending of the values of some field docs modify other docs. Its very memory consuming (I have tested it with a simple lucene script). Thats why I wanted to do it just after the indexing process. My ideal case would be to do it in the commit function of DirectUpdatehandler2.java just before writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want to mess that code... so trying to find out the best way to do that as a plugin instead of a hack as possible. Thanks in advance Noble Paul നോബിള് नोब्ळ्-2 wrote: > > It is best handled as a 'newSearcher' listener in solrconfig.xml. > onImportEnd is invoked before committing > > On Tue, Jul 28, 2009 at 3:13 PM, Marc > Sturlese > wrote: >> >> Hey there, >> I would like to be able to do something like: After the indexing >> process >> is >> done with DIH I would like to open an indexreader, iterate over all >> docs, >> modify some of them depending on others and delete some others. I can >> easy >> do this directly coding with lucene but would like to know if there's >> a >> way >> to do it with Solr using SolrDocument or SolrInputDocument classes. >> I have thougth in using SolrJ or DIH listener onImportEnd but not >> sure >> if >> I >> can get an IndexReader in there. >> Any advice? >> Thanks in advance >> -- >> View this message in context: >> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24697751.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in cont
RE: search suggest
To do a proper search suggest feature you have to index all the queries your system gets and search it with wildcards for matches on what the user has typed so far for each user keystroke in the search box... Usually with some timer logic to wait for a small hesitation in their typing. -Original Message- From: Jack Bates [mailto:ms...@freezone.co.uk] Sent: Tuesday, July 28, 2009 10:54 AM To: solr-user@lucene.apache.org Subject: search suggest how can i use solr to make search suggestions? i'm thinking google-style suggestions, which suggests more refined queries - vs. freebase-style suggestions, which suggests top hits. i've been looking at the query params, http://wiki.apache.org/solr/StandardRequestHandler - and searching for "solr suggest" - but haven't figured out how to get search suggestions from solr
Wildcard and boosting
Hey now! I do index time boosting for my fields and just discovered that when searching with a trailing wild card the boosting is ignored. Will my boosting work with a wild card if I do it at query time? And if so is there a lot of performance difference? Some other method I can use to preserve my boosting? I do not need hightlighting. Thanks, Jon Helgi
RE: refering/alias other Solr documents
Hi Ravi, This may help: http://wiki.apache.org/solr/HierarchicalFaceting Steve > -Original Message- > From: ravi.gidwani [mailto:ravi.gidw...@gmail.com] > Sent: Wednesday, July 29, 2009 3:24 AM > To: solr-user@lucene.apache.org > Subject: refering/alias other Solr documents > > > Hi all: > Is in solr, that will allow documents referring each other ? In > other words, if a search for "abc" matches on document 1 , I should be > able > to return document 2 even though the index does any fields matching > "abc". > Here is the scenario with some more details: > > Solr version:1.3 > > Scenario: > 1) Solr Document 1 with say some field title="abc" and Solr Document 2 > with > its own data. > 2) User searches for "abc" and gets Document 1 as it matches on title > field > > Expected results: > When the user searches for "abc" he it also get Document 2 along with > Document 1. > > I understand one way of doing this is to make sure Document 2 has all > the > contents of Document 1. But this introduces a issue of keeping the two > documents (and hence their solr index) in sync with each other. > > I think I am looking for a mechanism like this: > > Document 1 refers => document 2, Document 3. > > Hence whenever document 1 in part of search results, document 2 and > document > 3 will also be returned as search results . > > I may be totally off on this expectation but am trying to solve a > "Contains" > problem where lets say a book (represented as Document 1 in solr) > "contains" > Chapters (represented by Document 2,3,4..) in solr. > > I hope this is not too confusing ;) > > TIA > ~Ravi Gidwani > -- > View this message in context: http://www.nabble.com/refering-alias- > other-Solr-documents-tp24713855p24713855.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting Tika to work in Solr 1.4 nightly
Hi Kevin, The parameter names have changed in the latest Solr 1.4 builds... please see http://wiki.apache.org/solr/ExtractingRequestHandler -Yonik http://www.lucidimagination.com On Wed, Jul 29, 2009 at 10:17 AM, Kevin Miller wrote: > I am working with Solr 1.4 nightly and am running it on a Windows > machine. Solr is running using the example folder that was installed > from the zip file. The only alteration that I have made to this default > installation is to add a simple Word document into the exampledocs > folder. > > I am trying to get Tika to work in Solr. When I run the tika-0.3.jar > directed to a Word document it outputs to the screen in XML format. I > am not able to get Solr to run tika and index the information in the > sample Word document. > > I have looked at the following resources: > Solr mailing list archive (although I could have missed something here); > Documentation and Getting started on the Apache Tika website; > I even found an article called Content Extraction with Tika at this > website: > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles > /Content-Extraction-Tika This article talks about using curl. Is curl > necessary to use or does Solr have something already configured to do > the same as curl? > > I have modified the solrconfig.xml file to include the request handler > for the ExtractingRequestHandler. I used the modification that was > commented out in the solrconfig.xml file. Here it is for reference: > > class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> > > last_modified > true > > > > Is there some modification to this code that I need to make? > > Can some one please direct me to a source that can help me get this to > work. > > > Kevin Miller >
Multi select faceting
Hi, We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a requirement to implement multiple-select faceting where the facet cells show up as checkboxes and despite checked options, all of the options continue to persist with counts. The best example I found is the search on Lucid Imagination's site: http://www.lucidimagination.com/search/ It appears the Solr 1.4 release has support for doing this with filter tagging (http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c), but I was wondering if there was another way to accomplish this in 1.3? Mike
query and analyzers
Hi, What analyzer, tokenizer, filter factory would I need to use to get wildcard matching to match where: Value: XYZ123 Query: XYZ1* I have been messing with solr.WordDelimiterFilterFactory splitOnNumerics and oreserveOriginal in both the analyzer and the query. I also noticed it is different when I use quotes in the query - phrase search. Unfortunately, I'm missing something as I can't get it to work. Tim
Re: query and analyzers
> What analyzer, tokenizer, filter factory would I need to > use to get wildcard matching to match where: > Value: > XYZ123 > Query: > XYZ1* StandardAnalyzer, WhitespaceAnalyzer. > I have been messing with solr.WordDelimiterFilterFactory > splitOnNumerics and oreserveOriginal in both the analyzer > and the query. I also noticed it is different when I > use quotes in the query - phrase search. > Unfortunately, I'm missing something as I can't get it to > work. But i think your problem is not the analyzer. I guess in your analyzer there is lowercase filter and wildcard queries are not analyzed. Try querying xyz1*
Re: query in solr lucene
You may index your data using a delimiter, like $my-field-content$. While searching, perform a phrase query with the leading and trailing "$" appended to the query string. Cheers Avlesh On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta wrote: > I tried using AND, but it even provided me doc 3 which was not required. > > Hence my problem still persists... > > regards, > Sushan > > > At 06:59 AM 7/29/2009, Avlesh Singh wrote: > >> > >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I >> read >> > it. >> > >> Sorry, my bad. I did not read properly before replying. >> >> Cheers >> Avlesh >> >> On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson > >wrote: >> >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I >> read >> > it. >> > >> > You might have some joy with KeywordAnalyzer, which does >> > not break the incoming stream up into tokens. You have to be >> > careful, though, because it also won't fold the case, so 'Hello' >> > would not match 'hello'. >> > >> > Best >> > Erick >> > >> > On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh >> wrote: >> > >> > > You should perform a PhraseQuery on the required field. >> > > Meaning, http://your-solr-host:port: >> > > /your-core-path/select?q=fieldName:"Hello >> > > how are you sushan" would work for you. >> > > >> > > Cheers >> > > Avlesh >> > > >> > > 2009/7/28 Gérard Dupont >> > > >> > > > Hi Sushan, >> > > > >> > > > I'm not an expert of Solr, just beginner, but it appears to me that >> you >> > > > may >> > > > have default 'OR' combinaison fo keywords so that will explain this >> > > > behavior. Try to modify the configuration for an 'AND' combinaison. >> > > > >> > > > cheers >> > > > >> > > > On Tue, Jul 28, 2009 at 16:49, Sushan Rungta >> > wrote: >> > > > >> > > > > I am extremely sorry for responding late as I was ill from past >> few >> > > days. >> > > > > >> > > > > My problem is explained below with an example: >> > > > > >> > > > > I am having three documents with following list: >> > > > > >> > > > > 1. Hello how are you >> > > > > 2. Hello how are you sushan >> > > > > 3. Hello how are you sushan. I am fine. >> > > > > >> > > > > When I search for a query "Hello how are you sushan", I should >> only >> > get >> > > > > document 2 in my result. >> > > > > >> > > > > I hope this will give you all a better insight in my problem. >> > > > > >> > > > > regards, >> > > > > >> > > > > Sushan Rungta >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Gérard Dupont >> > > > Information Processing Control and Cognition (IPCC) - EADS DS >> > > > http://weblab-project.org >> > > > >> > > > Document & Learning team - LITIS Laboratory >> > > > >> > > >> > >> > > >
Re: search suggest
Autosuggest is something that would be very useful to build into Solr as many search projects require it. I'd recommend indexing relevant terms/phrases into a Ternary Search Tree which is compact and performant. Using a wildcard query will likely not be as fast as a Ternary Tree, and I'm not sure how phrases would be handled? http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi It would be good to separate out the TernaryTree from analysis/compound and into Lucene core, or into it's own contrib. Also see http://issues.apache.org/jira/browse/LUCENE-625 which improves relevancy using click through rates. I'll open an issue in Solr to get this one going. On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersen wrote: > To do a proper search suggest feature you have to index all the queries > your system gets and search it with wildcards for matches on what the > user has typed so far for each user keystroke in the search box... > Usually with some timer logic to wait for a small hesitation in their > typing. > > > > -Original Message- > From: Jack Bates [mailto:ms...@freezone.co.uk] > Sent: Tuesday, July 28, 2009 10:54 AM > To: solr-user@lucene.apache.org > Subject: search suggest > > how can i use solr to make search suggestions? i'm thinking google-style > suggestions, which suggests more refined queries - vs. freebase-style > suggestions, which suggests top hits. > > i've been looking at the query params, > http://wiki.apache.org/solr/StandardRequestHandler > > - and searching for "solr suggest" - but haven't figured out how to get > search suggestions from solr >
Re: search suggest
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/hyphenation/TernaryTree.html On Wed, Jul 29, 2009 at 12:08 PM, Jason Rutherglen wrote: > Autosuggest is something that would be very useful to build into > Solr as many search projects require it. > > I'd recommend indexing relevant terms/phrases into a Ternary > Search Tree which is compact and performant. Using a wildcard > query will likely not be as fast as a Ternary Tree, and I'm not > sure how phrases would be handled? > > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi > > It would be good to separate out the TernaryTree from > analysis/compound and into Lucene core, or into it's own contrib. > > Also see http://issues.apache.org/jira/browse/LUCENE-625 which > improves relevancy using click through rates. > > I'll open an issue in Solr to get this one going. > > On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersen wrote: >> To do a proper search suggest feature you have to index all the queries >> your system gets and search it with wildcards for matches on what the >> user has typed so far for each user keystroke in the search box... >> Usually with some timer logic to wait for a small hesitation in their >> typing. >> >> >> >> -Original Message- >> From: Jack Bates [mailto:ms...@freezone.co.uk] >> Sent: Tuesday, July 28, 2009 10:54 AM >> To: solr-user@lucene.apache.org >> Subject: search suggest >> >> how can i use solr to make search suggestions? i'm thinking google-style >> suggestions, which suggests more refined queries - vs. freebase-style >> suggestions, which suggests top hits. >> >> i've been looking at the query params, >> http://wiki.apache.org/solr/StandardRequestHandler >> >> - and searching for "solr suggest" - but haven't figured out how to get >> search suggestions from solr >> >
Visualizing Semantic Journal Space (large scale) using full-text
I thought the Lucene and Solr communities would find this interesting: My collaborators and I have used LuSql, Lucene and Semantic Vectors to visualize a large scale semantic journal space (kind of like 'Maps of Science') of a large scale (5.7 million articles) journal article collection using only the full-text (no metadata). For more info & howto: http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html Glen Newton -- -
RE: query and analyzers
This was the definition I was last working with (I've been playing with setting the various parameters). -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Wednesday, July 29, 2009 11:55 AM To: solr-user@lucene.apache.org Subject: Re: query and analyzers > What analyzer, tokenizer, filter factory would I need to > use to get wildcard matching to match where: > Value: > XYZ123 > Query: > XYZ1* StandardAnalyzer, WhitespaceAnalyzer. > I have been messing with solr.WordDelimiterFilterFactory > splitOnNumerics and oreserveOriginal in both the analyzer > and the query. I also noticed it is different when I > use quotes in the query - phrase search. > Unfortunately, I'm missing something as I can't get it to > work. But i think your problem is not the analyzer. I guess in your analyzer there is lowercase filter and wildcard queries are not analyzed. Try querying xyz1*
RE: query and analyzers
In order to match (query) XYZ1* to (document) XYZ123 you do not need WordDelimiterFilterFactory. You need an tokenizer that recognizes XYZ123 as one token. And WhitespaceTokenizer is one of them. As I see from the fieldType named text_ws, you want to use WhitespaceTokenizerFactory and there is no LowercaseFilter in it. So there is no problem. Just remove the WordDelimiterFilterFactory (both query and index) and it should work. Ahmet
RE: query and analyzers
That did it, thanks! I thought that was how it should work, but I guess somehow I got out of sync or something at one point which led me to dive deeper into it than I needed to. -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Wednesday, July 29, 2009 12:52 PM To: solr-user@lucene.apache.org Subject: RE: query and analyzers In order to match (query) XYZ1* to (document) XYZ123 you do not need WordDelimiterFilterFactory. You need an tokenizer that recognizes XYZ123 as one token. And WhitespaceTokenizer is one of them. As I see from the fieldType named text_ws, you want to use WhitespaceTokenizerFactory and there is no LowercaseFilter in it. So there is no problem. Just remove the WordDelimiterFilterFactory (both query and index) and it should work. Ahmet
Re: search suggest
also watch out that you have a good stopwords list otherwise the suggestions won't be helpful for the user. Jack Bates wrote: how can i use solr to make search suggestions? i'm thinking google-style suggestions, which suggests more refined queries - vs. freebase-style suggestions, which suggests top hits. i've been looking at the query params, http://wiki.apache.org/solr/StandardRequestHandler - and searching for "solr suggest" - but haven't figured out how to get search suggestions from solr -- manuel aldana ald...@gmx.de software-engineering blog: http://www.aldana-online.de
RE: search suggest
Simple minded autosuggest can just not tokenize the phrases at all and so the wildcards just complete whatever the user has typed so far including spaces. Upon encountering a space though, autosuggest should wait to make more suggestions until the user has typed at least a couple of letters of the next word. That is the way I did it last time using a different search engine. It'd sure be kewl if this became a core feature of solr! I like the idea of the tree approach, sounds much faster. The root is the least letters to start suggestions and the leaves are the full phrases? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Wednesday, July 29, 2009 12:09 PM To: solr-user@lucene.apache.org Subject: Re: search suggest Autosuggest is something that would be very useful to build into Solr as many search projects require it. I'd recommend indexing relevant terms/phrases into a Ternary Search Tree which is compact and performant. Using a wildcard query will likely not be as fast as a Ternary Tree, and I'm not sure how phrases would be handled? http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi It would be good to separate out the TernaryTree from analysis/compound and into Lucene core, or into it's own contrib. Also see http://issues.apache.org/jira/browse/LUCENE-625 which improves relevancy using click through rates. I'll open an issue in Solr to get this one going. On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersen wrote: > To do a proper search suggest feature you have to index all the queries > your system gets and search it with wildcards for matches on what the > user has typed so far for each user keystroke in the search box... > Usually with some timer logic to wait for a small hesitation in their > typing. > > > > -Original Message- > From: Jack Bates [mailto:ms...@freezone.co.uk] > Sent: Tuesday, July 28, 2009 10:54 AM > To: solr-user@lucene.apache.org > Subject: search suggest > > how can i use solr to make search suggestions? i'm thinking google-style > suggestions, which suggests more refined queries - vs. freebase-style > suggestions, which suggests top hits. > > i've been looking at the query params, > http://wiki.apache.org/solr/StandardRequestHandler > > - and searching for "solr suggest" - but haven't figured out how to get > search suggestions from solr >
Re: Indexing TIKA extracted text. Are there some issues?
Sure. The java command I use with TIKA to extract text from a URL is: java -jar tika-0.3-standalone.jar -t $url I have also attached the screenshots of the web page, post documents produced in the two different ways (Perl & Tika) for that web page, and the screenshots of the search result for a string contained in that web page. The index in each case contains just this one URL. To keep everything else identical, I used the same instance for creating the index in each case. First I posted the Tika document, checked for the results, emptied the index, posted the Perl document, and checked the results. Debug query for Tika: +DisjunctionMaxQuery((urltext:é«éå ¬å¸å±ç°äºæµ·éçä¼è´¨å¤åªä½å 容è½^2.0 | title:é«éå ¬å¸å±ç°äºæµ·éçä¼è´¨å¤åªä½å 容è½^2.0 | content_china:"é«é éå ¬ å ¬å¸ å¸å± å±ç° ç°äº äºæµ· æµ·é éç çä¼ ä¼è´¨ è´¨å¤ å¤åª åªä½ ä½å å 容 容è½")~0.01) () Debug query for Perl: +DisjunctionMaxQuery((urltext:é«éå ¬å¸å±ç°äºæµ·éçä¼è´¨å¤åªä½å 容è½^2.0 | title:é«éå ¬å¸å±ç°äºæµ·éçä¼è´¨å¤åªä½å 容è½^2.0 | content_china:"é«é éå ¬ å ¬å¸ å¸å± å±ç° ç°äº äºæµ· æµ·é éç çä¼ ä¼è´¨ è´¨å¤ å¤åª åªä½ ä½å å 容 容è½")~0.01) () The screenshots http://www.nabble.com/file/p24728917/Tika%2BIssue.docx Tika+Issue.docx Perl extracted doc http://www.nabble.com/file/p24728917/china.perl.xml china.perl.xml Tika extracted doc http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml Grant Ingersoll-6 wrote: > > Hmm, looks very much like an encoding problem. Can you post a sample > showing it, along with the commands you invoked? > > Thanks, > Grant > > On Jul 28, 2009, at 6:14 PM, ashokc wrote: > >> >> I am finding that the search results based on indexing Tika >> extracted text >> are very different from results based on indexing the text extracted >> via >> other means. This shows up for example with a chinese web site that >> I am >> trying to index. >> >> I created the documents (for posting to SOLR) in two ways. The >> source text >> of the web pages are full of html entities like 〹 and some >> english >> characters mixed in. >> >> (a) Simple text extraction from the page source by a Perl script. The >> resulting content field looks like >> >> Who We Are >> 公司历史 >> 您的成功案例 >> 领导团队 业务部门 >> Innovation >> 创 etc... >> >> I posted these documents to a SOLR instance >> >> (b) Used Tika (command line). The resulting content field looks like >> >> Who We Are Ã¥ ŒÂ¸à >> ¥ÂŽÂ†Ã¥Â² >> 您的æˆÂ功æ¡ >> ˆä¾‹ 领导团队 >> 业务部门  Innovation à >> ¥Â >> etc... >> >> I posted these documents to a different instance >> >> When I search the first instance for a string (that I copied & >> pasted from >> the web site) I find a number of hits, including the page from which I >> copied the string from. But when I do the same on the instance with >> Tika >> extracted text - I get nothing. >> >> Has anyone seen this? I believe it may have to do with encoding. In >> both >> cases the posted documents were utf-8 compiant. >> >> Thanks for your insights. >> >> - ashok >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24708854.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24728917.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search suggest
Here's a good article on Ternary Trees: http://www.ddj.com/windows/184410528 I looked at the one in Lucene, I don't understand why the find method only returns a char/int? On Wed, Jul 29, 2009 at 2:33 PM, Robert Petersen wrote: > Simple minded autosuggest can just not tokenize the phrases at all and > so the wildcards just complete whatever the user has typed so far > including spaces. Upon encountering a space though, autosuggest should > wait to make more suggestions until the user has typed at least a couple > of letters of the next word. That is the way I did it last time using a > different search engine. It'd sure be kewl if this became a core > feature of solr! > > I like the idea of the tree approach, sounds much faster. The root is > the least letters to start suggestions and the leaves are the full > phrases? > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: Wednesday, July 29, 2009 12:09 PM > To: solr-user@lucene.apache.org > Subject: Re: search suggest > > Autosuggest is something that would be very useful to build into > Solr as many search projects require it. > > I'd recommend indexing relevant terms/phrases into a Ternary > Search Tree which is compact and performant. Using a wildcard > query will likely not be as fast as a Ternary Tree, and I'm not > sure how phrases would be handled? > > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi > > It would be good to separate out the TernaryTree from > analysis/compound and into Lucene core, or into it's own contrib. > > Also see http://issues.apache.org/jira/browse/LUCENE-625 which > improves relevancy using click through rates. > > I'll open an issue in Solr to get this one going. > > On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersen > wrote: >> To do a proper search suggest feature you have to index all the > queries >> your system gets and search it with wildcards for matches on what the >> user has typed so far for each user keystroke in the search box... >> Usually with some timer logic to wait for a small hesitation in their >> typing. >> >> >> >> -Original Message- >> From: Jack Bates [mailto:ms...@freezone.co.uk] >> Sent: Tuesday, July 28, 2009 10:54 AM >> To: solr-user@lucene.apache.org >> Subject: search suggest >> >> how can i use solr to make search suggestions? i'm thinking > google-style >> suggestions, which suggests more refined queries - vs. freebase-style >> suggestions, which suggests top hits. >> >> i've been looking at the query params, >> http://wiki.apache.org/solr/StandardRequestHandler >> >> - and searching for "solr suggest" - but haven't figured out how to > get >> search suggestions from solr >> >
Re: Indexing TIKA extracted text. Are there some issues?
it appears there is an encoding problem, in the screenshot I can see the title is mangled, and if i open up the URL in IE or firefox, both browsers think it is iso-8859-1. I think this is why (from w3c validator): Character Encoding mismatch! The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the element (utf-8). I will use the value from the HTTP header (iso-8859-1) for this validation. On Wed, Jul 29, 2009 at 6:02 PM, ashokc wrote: > > Sure. > > The java command I use with TIKA to extract text from a URL is: > > java -jar tika-0.3-standalone.jar -t $url > > I have also attached the screenshots of the web page, post documents > produced in the two different ways (Perl & Tika) for that web page, and the > screenshots of the search result for a string contained in that web page. > The index in each case contains just this one URL. To keep everything else > identical, I used the same instance for creating the index in each case. > First I posted the Tika document, checked for the results, emptied the > index, posted the Perl document, and checked the results. > > Debug query for Tika: > > > +DisjunctionMaxQuery((urltext:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ > 的优质多媒体内容能^2.0 > | title:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0 | > content_china:"高通 通公 å…¬å ¸ å ¸å±• 展现 现了 了海 æµ·é‡ > é‡ çš„ 的优 优质 质多 多媒 媒体 体内 内容 容能")~0.01) () > > > Debug query for Perl: > > > +DisjunctionMaxQuery((urltext:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ > 的优质多媒体内容能^2.0 > | title:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0 | > content_china:"高通 通公 å…¬å ¸ å ¸å±• 展现 现了 了海 æµ·é‡ > é‡ çš„ 的优 优质 质多 多媒 媒体 体内 内容 容能")~0.01) () > > > The screenshots > http://www.nabble.com/file/p24728917/Tika%2BIssue.docx Tika+Issue.docx > > Perl extracted doc > http://www.nabble.com/file/p24728917/china.perl.xml china.perl.xml > > Tika extracted doc > http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml > > > Grant Ingersoll-6 wrote: >> >> Hmm, looks very much like an encoding problem. Can you post a sample >> showing it, along with the commands you invoked? >> >> Thanks, >> Grant >> >> On Jul 28, 2009, at 6:14 PM, ashokc wrote: >> >>> >>> I am finding that the search results based on indexing Tika >>> extracted text >>> are very different from results based on indexing the text extracted >>> via >>> other means. This shows up for example with a chinese web site that >>> I am >>> trying to index. >>> >>> I created the documents (for posting to SOLR) in two ways. The >>> source text >>> of the web pages are full of html entities like 〹 and some >>> english >>> characters mixed in. >>> >>> (a) Simple text extraction from the page source by a Perl script. The >>> resulting content field looks like >>> >>> Who We Are >>> 公司历史 >>> 您的成功案例 >>> 领导团队 业务部门 >>> Innovation >>> 创 etc... >>> >>> I posted these documents to a SOLR instance >>> >>> (b) Used Tika (command line). The resulting content field looks like >>> >>> Who We Are Ã¥ ¬å ¸à >>> ¥ÂŽÂ†Ã¥Â ² >>> 您的戠功æ¡ >>> ˆä¾‹ 领导团队 >>> 业务部门  Innovation à >>> ¥Â >>> etc... >>> >>> I posted these documents to a different instance >>> >>> When I search the first instance for a string (that I copied & >>> pasted from >>> the web site) I find a number of hits, including the page from which I >>> copied the string from. But when I do the same on the instance with >>> Tika >>> extracted text - I get nothing. >>> >>> Has anyone seen this? I believe it may have to do with encoding. In >>> both >>> cases the posted documents were utf-8 compiant. >>> >>> Thanks for your insights. >>> >>> - ashok >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24708854.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> > > -- > View this message in context: > http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24728917.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Robert Muir rcm...@gmail.com
Re: Indexing TIKA extracted text. Are there some issues?
Could very well be... I will rectify it and try again. Thanks - ashok Robert Muir wrote: > > it appears there is an encoding problem, in the screenshot I can see > the title is mangled, and if i open up the URL in IE or firefox, both > browsers think it is iso-8859-1. > > I think this is why (from w3c validator): > > Character Encoding mismatch! > > The character encoding specified in the HTTP header (iso-8859-1) is > different from the value in the element (utf-8). I will use the > value from the HTTP header (iso-8859-1) for this validation. > > On Wed, Jul 29, 2009 at 6:02 PM, ashokc wrote: >> >> Sure. >> >> The java command I use with TIKA to extract text from a URL is: >> >> java -jar tika-0.3-standalone.jar -t $url >> >> I have also attached the screenshots of the web page, post documents >> produced in the two different ways (Perl & Tika) for that web page, and >> the >> screenshots of the search result for a string contained in that web page. >> The index in each case contains just this one URL. To keep everything >> else >> identical, I used the same instance for creating the index in each case. >> First I posted the Tika document, checked for the results, emptied the >> index, posted the Perl document, and checked the results. >> >> Debug query for Tika: >> >> >> +DisjunctionMaxQuery((urltext:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ >> 的优质多媒体内容能^2.0 >> | title:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0 | >> content_china:"高通 通公 å…¬å ¸ å ¸å±• 展现 现了 了海 æµ·é‡ >> é‡ çš„ 的优 优质 质多 多媒 媒体 体内 内容 容能")~0.01) () >> >> >> Debug query for Perl: >> >> >> +DisjunctionMaxQuery((urltext:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ >> 的优质多媒体内容能^2.0 >> | title:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0 | >> content_china:"高通 通公 å…¬å ¸ å ¸å±• 展现 现了 了海 æµ·é‡ >> é‡ çš„ 的优 优质 质多 多媒 媒体 体内 内容 容能")~0.01) () >> >> >> The screenshots >> http://www.nabble.com/file/p24728917/Tika%2BIssue.docx Tika+Issue.docx >> >> Perl extracted doc >> http://www.nabble.com/file/p24728917/china.perl.xml china.perl.xml >> >> Tika extracted doc >> http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml >> >> >> Grant Ingersoll-6 wrote: >>> >>> Hmm, looks very much like an encoding problem. Can you post a sample >>> showing it, along with the commands you invoked? >>> >>> Thanks, >>> Grant >>> >>> On Jul 28, 2009, at 6:14 PM, ashokc wrote: >>> I am finding that the search results based on indexing Tika extracted text are very different from results based on indexing the text extracted via other means. This shows up for example with a chinese web site that I am trying to index. I created the documents (for posting to SOLR) in two ways. The source text of the web pages are full of html entities like 〹 and some english characters mixed in. (a) Simple text extraction from the page source by a Perl script. The resulting content field looks like Who We Are 公司历史 您的成功案例 领导团队 业务部门 Innovation 创 etc... I posted these documents to a SOLR instance (b) Used Tika (command line). The resulting content field looks like Who We Are Ã¥ ¬å ¸à ¥ÂŽÂ†Ã¥Â ² 您的戠功æ¡ ˆä¾‹ 领导团队 业务部门  Innovation à ¥Â etc... I posted these documents to a different instance When I search the first instance for a string (that I copied & pasted from the web site) I find a number of hits, including the page from which I copied the string from. But when I do the same on the instance with Tika extracted text - I get nothing. Has anyone seen this? I believe it may have to do with encoding. In both cases the posted documents were utf-8 compiant. Thanks for your insights. - ashok -- View this message in context: http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24708854.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >>> using Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24728917.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Robert Muir > rcm...@gmail.com > > -- View this message in context: http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24729595.html Sent from the Solr - User mailing list archive at Nabble.com.
deleteById always returning OK
Is it expected behaviour that "deleteById" will always return OK as a status, regardless of whether the id was matched? I have a unit test: // set up the test data engine.index(12345, s1, d1); engine.index(54321, s2, d2); engine.index(23453, s3, d3); // ... @Test public void testRemove() throws Exception { assertEquals(engine.size(), 3); assertTrue(engine.remove(12345)); assertEquals(engine.size(), 2); // XXX, it returns true assertFalse(engine.remove(23523352)); "Engine" is my wrapper around Solr. The remove method looks like this: private static final int RESPONSE_STATUS_OK = 0; private SolrServer server; public boolean remove(final Integer titleInstanceId) throws IOException { try { server.deleteById(String.valueOf(titleInstanceId)); final UpdateResponse updateResponse = server.commit(true, true); // XXX It's always OK return (updateResponse.getStatus() == RESPONSE_STATUS_OK); Any ideas what's going wrong? Is there a different way to test for the id not having been there, other than an additional search? Thanks Reuben
Re: search suggest
I created an issue and have added some notes https://issues.apache.org/jira/browse/SOLR-1316 On Wed, Jul 29, 2009 at 3:15 PM, Jason Rutherglen wrote: > Here's a good article on Ternary Trees: http://www.ddj.com/windows/184410528 > > I looked at the one in Lucene, I don't understand why the find method > only returns a char/int? > > On Wed, Jul 29, 2009 at 2:33 PM, Robert Petersen wrote: >> Simple minded autosuggest can just not tokenize the phrases at all and >> so the wildcards just complete whatever the user has typed so far >> including spaces. Upon encountering a space though, autosuggest should >> wait to make more suggestions until the user has typed at least a couple >> of letters of the next word. That is the way I did it last time using a >> different search engine. It'd sure be kewl if this became a core >> feature of solr! >> >> I like the idea of the tree approach, sounds much faster. The root is >> the least letters to start suggestions and the leaves are the full >> phrases? >> >> -Original Message- >> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] >> Sent: Wednesday, July 29, 2009 12:09 PM >> To: solr-user@lucene.apache.org >> Subject: Re: search suggest >> >> Autosuggest is something that would be very useful to build into >> Solr as many search projects require it. >> >> I'd recommend indexing relevant terms/phrases into a Ternary >> Search Tree which is compact and performant. Using a wildcard >> query will likely not be as fast as a Ternary Tree, and I'm not >> sure how phrases would be handled? >> >> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi >> >> It would be good to separate out the TernaryTree from >> analysis/compound and into Lucene core, or into it's own contrib. >> >> Also see http://issues.apache.org/jira/browse/LUCENE-625 which >> improves relevancy using click through rates. >> >> I'll open an issue in Solr to get this one going. >> >> On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersen >> wrote: >>> To do a proper search suggest feature you have to index all the >> queries >>> your system gets and search it with wildcards for matches on what the >>> user has typed so far for each user keystroke in the search box... >>> Usually with some timer logic to wait for a small hesitation in their >>> typing. >>> >>> >>> >>> -Original Message- >>> From: Jack Bates [mailto:ms...@freezone.co.uk] >>> Sent: Tuesday, July 28, 2009 10:54 AM >>> To: solr-user@lucene.apache.org >>> Subject: search suggest >>> >>> how can i use solr to make search suggestions? i'm thinking >> google-style >>> suggestions, which suggests more refined queries - vs. freebase-style >>> suggestions, which suggests top hits. >>> >>> i've been looking at the query params, >>> http://wiki.apache.org/solr/StandardRequestHandler >>> >>> - and searching for "solr suggest" - but haven't figured out how to >> get >>> search suggestions from solr >>> >> >
Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle
Don't forget this is tonight! Excited to see everyone there. On Tue, Jul 28, 2009 at 11:25 AM, Bradford Stephens wrote: > Hey everyone, > > SLIGHT change of plans. > > A few people have asked me to move to a place with Air Conditioning, > since the temperature's in the 90's this week. So, here we go: > > Big Time Brewing Company > 4133 University Way NE > Seattle, WA 98105 > > Call me at 904-415-3009 if you have any questions. > > > On Mon, Jul 27, 2009 at 12:16 PM, Bradford > Stephens wrote: >> Hello again! >> >> Yes, I know some of us are still recovering from OSCON. It's time for >> another delicious meetup to chat about Hadoop, HBase, Solr, Lucene, >> and more! >> >> UW is quite a pain for us to access until August, so we're changing >> the venue to one pretty close: >> >> Piccolo's Pizza >> 5301 Roosevelt Way NE >> (between 53rd St & 55th St) >> >> 6:45pm - 8:30 (or when we get bored)! >> >> As usual, people are more than welcome to give talks, whether they're >> long-format or lightning. I'd also really like to start thinking about >> hackathons, perhaps we could have one next month? >> >> I'll be talking about HBase .20 and the possibility of low-latency >> HBase Analytics. I'd be very excited to hear what people are up to! >> >> Contact me if there's any questions: 904-415-3009 >> >> Cheers, >> Bradford >> >> -- >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > > > > -- > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Wildcard and boosting
I just updated to nightly build (I was using 1.2) and this does not seem to be an issue anymore. 2009/7/29 Jón Helgi Jónsson : > Hey now! > > I do index time boosting for my fields and just discovered that when > searching with a trailing wild card the boosting is ignored. > > Will my boosting work with a wild card if I do it at query time? And > if so is there a lot of performance difference? > > Some other method I can use to preserve my boosting? I do not need > hightlighting. > > Thanks, > Jon Helgi >
Re: deleteById always returning OK
Reuben Firmin wrote: Is it expected behaviour that "deleteById" will always return OK as a status, regardless of whether the id was matched? It is expected behaviour as Solr always returns 0 unless an error occurs during processing a request (query, update, ...), so you don't need to check the status, but you'll get an exception if something wrong; otherwise the request succeeded. And you cannot know whether the id was matched. The only way you can try is send a query "q=id:value&rows=0" and check the numFound in the response before sending deleteById. Koji I have a unit test: // set up the test data engine.index(12345, s1, d1); engine.index(54321, s2, d2); engine.index(23453, s3, d3); // ... @Test public void testRemove() throws Exception { assertEquals(engine.size(), 3); assertTrue(engine.remove(12345)); assertEquals(engine.size(), 2); // XXX, it returns true assertFalse(engine.remove(23523352)); "Engine" is my wrapper around Solr. The remove method looks like this: private static final int RESPONSE_STATUS_OK = 0; private SolrServer server; public boolean remove(final Integer titleInstanceId) throws IOException { try { server.deleteById(String.valueOf(titleInstanceId)); final UpdateResponse updateResponse = server.commit(true, true); // XXX It's always OK return (updateResponse.getStatus() == RESPONSE_STATUS_OK); Any ideas what's going wrong? Is there a different way to test for the id not having been there, other than an additional search? Thanks Reuben
RE: Boosting ('bq') on multi-valued fields
Hey Ken, Thanks for your reply. When I wrote '5|6' I ment that this is a multiValued field with two values '5' and '6', rather than the literal string '5|6' (and any Tokenizer). Does your reply still holds? That is, are multiValued fields dependent on the notion of tokenization to such a degree so that I cant use str type with them meaningfully? if so, it seems weird to me that I should be able to define a str multiValued field to begin with.. -Chak Ensdorf Ken wrote: > >> Hey, >> I have a field defined as such: >> >> > stored="false" >> multiValued="true" /> >> >> with the string type defined as: >> >> > omitNorms="true"/> >> >> When I try using some query-time boost parameters using the bq on >> values of >> this field it seems to behave >> strangely in case of documents actually having multiple values: >> If i'd do a boost for a particular value ( "site_id:5^1.1" ) it seems >> like >> all the cases where this field is actually >> populated with multiple ones ( i.e a document with field value "5|6" ) >> do >> not get boosted at all. I verified this using >> debugQuery & explainOther=doc_id:. >> is this a known issue/bug? any work arounds? (i'm using a nightly solr >> build >> from a few months back.. ) > > There is no tokenization on 'string' fields, so a query for "5" does not > match a doc with a value of "5|6" for this field. You could try using > field type 'text' for this and see what you get. You may need to > customize it to you the StandardAnalyzer or WordDelimiterFilterFactory to > get the right behavior. Using the analysis tool in the solr admin UI to > experiment will probably be helpful. > > -Ken > > > > -- View this message in context: http://www.nabble.com/Boosting-%28%27bq%27%29-on-multi-valued-fields-tp24713905p24730981.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a multi-shard optimize message?
: > Normally to optimize an index you POST to /solr/update. Is : > there any way to POST an optimize message to one instance and have it : > propagate to all shards sort of like the select? : > : > /solr-shard-1/select?q=dog... shards=shard-1,shard2 : No, you'll need to send optimize to each host separately. and for the record: it would be relatively straight forward to impliment something like this (just like distributed search) ... but it has very little value. clients doing "indexing" operations have to send add/delete commands directly to the individual shards, so they have to send teh commit/optimize commands directly to them as well. if/when someone writes a distributed indexing handler, making it support distributed optimize/commit will be fairly trivial. -Hoss
Re: update some index documents after indexing process is done with DIH
If you make your EventListener implements SolrCoreAware you can get hold of the core on inform. use that to get hold of the SolrIndexWriter On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlese wrote: > > From the newSearcher(..) of a CustomEventListener which extends of > AbstractSolrEventListener can access to SolrIndexSearcher and all core > properties but can't get a SolrIndexWriter. Do you now how can I get from > there a SolrIndexWriter? This way I would be able to modify the documents (I > need to modify them depending on values of other documents, that's why I > can't do it with DIH delta-import). > Thanks in advance > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlese >> wrote: >>> >>> That really sounds the best way to reach my goal. How could I invoque a >>> listener from the newSearcher?Would be something like: >>> >>> >>> solr 0 >> name="rows">10 >>> rocks 0 >> name="rows">10 >>> static newSearcher warming query from >>> solrconfig.xml >>> >>> >>> >>> >>> And MyCustomListener would be the class who open the reader: >>> >>> RefCounted searchHolder = null; >>> try { >>> searchHolder = dataImporter.getCore().getSearcher(); >>> IndexReader reader = searchHolder.get().getReader(); >>> >>> //Here I iterate over the reader doing docuemnt modifications >>> >>> } finally { >>> if (searchHolder != null) searchHolder.decref(); >>> } >>> } catch (Exception ex) { >>> LOG.info("error"); >>> } >> >> you may not be able to access the DIH API from a newSearcher event . >> But the API would give you the searcher directly as a method >> parameter. >>> >>> Finally, to access to documents and add fields to some of them, I have >>> thought in using SolrDocument classes. Can you please point me where >>> something similar is done in solr source (I mean creation of >>> SolrDocuemnts >>> and conversion of them to proper lucene docuements). >>> >>> Does this way for reaching the goal makes sense? >>> >>> Thanks in advance >>> >>> >>> >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: when a core is reloaded the event fired is firstSearcher. newSearcher is fired when a commit happens On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese wrote: > > Ok, but if I handle it in a newSearcher listener it will be executed > every > time I reload a core, isn't it? The thing is that I want to use an > IndexReader to load in a HashMap some doc fields of the index and > depending > of the values of some field docs modify other docs. Its very memory > consuming (I have tested it with a simple lucene script). Thats why I > wanted > to do it just after the indexing process. > > My ideal case would be to do it in the commit function of > DirectUpdatehandler2.java just before > writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want > to > mess that code... so trying to find out the best way to do that as a > plugin > instead of a hack as possible. > > Thanks in advance > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> It is best handled as a 'newSearcher' listener in solrconfig.xml. >> onImportEnd is invoked before committing >> >> On Tue, Jul 28, 2009 at 3:13 PM, Marc >> Sturlese >> wrote: >>> >>> Hey there, >>> I would like to be able to do something like: After the indexing >>> process >>> is >>> done with DIH I would like to open an indexreader, iterate over all >>> docs, >>> modify some of them depending on others and delete some others. I can >>> easy >>> do this directly coding with lucene but would like to know if there's >>> a >>> way >>> to do it with Solr using SolrDocument or SolrInputDocument classes. >>> I have thougth in using SolrJ or DIH listener onImportEnd but not >>> sure >>> if >>> I >>> can get an IndexReader in there. >>> Any advice? >>> Thanks in advance >>> -- >>> View this message in context: >>> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> -- >>> View this message in context: >
Re: issue inquiry: unterminated index lock after optimize update command
: I'm using solr build 2009-06-16_08-06-14, in multicore configuration. : When I issue the update command "optimize" to a core, the index files : are locked and never released. Calling the coreAdmin unload method on : the core unload the core but does not unlock the underlying index files. : The core has no other alias, the data path is not referenced by any : other core when a full status is requested. The end result is that : optimized cores that have been unloaded cannot be deleted until jetty is : restarted. ... : I have searched jira but did not find anything relevant. Is this a bug : that should be reported, or is this an intended behavior? ...this is certinaly not intented behavior .. you shouldn't need to restart the server (or even reload the core) to unlock the index ... it should be unlocked automaticly when the optimize completes. are you sure there wasn't any sort of serious error in the logs? like an OutOfMemory perhaps? if you can reproduce this consistently a detailed bug report showing your exact config files, describing your OS and filesystem, and describing exactly what setps you take to trigger this problem would certainly be appreciated. -Hoss
Re: DocList Pagination
: Hi, I am try to get the next DocList "page" in my custom search component. : Could I get a code example of this? you just increase the "offset" value you pass to SolrIndexSearcher.getDocList by whatever your page size is. (if you use the newer QueryCommand versions you just call setOffset with the same value). -Hoss
Re: solr indexing on same set of records with different value of unique field...not working...
I'm not really understanding how you could get the situation you describe ... which suggests that one (or both) of us don't understand exactly what happened. if you can post the actual schema.xml file you used and an example of the input you indexed perhaps we can spot the discrepency. FWIW: using a timestamp as a uniqueKey doesn't make much sense ... 1) if you have heavy parallelization two docs indexed at the exact same time might overwrite eachother. 2) you have no way of ever replacing an existing doc (unless you roll the clock back) in which case there's no advantage to using a uniqueKey -- so you might as leave it out of your schema (which makes indexing slightly faster) : I need to run around 10 million records to index, by solr. : I has nearly 2lakh records, so i made a program to looping it till 10 million. : Here, i specified 20 fields in schema.xml file. the unoque field i set was, : currentTimeStamp field. : So, when i run the loader program (which loads xml data into solr) it creates : currentTimestamp value...and loads into solr. : : For this situation, : i stopped the loader program, after 100 records indexed into solr. : Then again, i run the loader program for the SAME 100 records to indexed : means, : the solr results 100, rather than 200. : : Because, i set currentTimeStamp field as uniqueField. So i expect the result : as 200, if i run again the same 100 records... : : Any suggestions please... -Hoss
Re: update some index documents after indexing process is done with DIH
This thread all sounds really kludgy ... among other things the newSearcher listener is going to need to some how keep track of when it was called as a result of a "real" commit, vs when it was called as the result of a commit it itself triggered to make changes. wouldn't an easier place to implement this logic be in an UpdateProcessor? you'll still need the "double commit" (once so you can see the main changes, and once so the rest of the world can see your modifications) but you can execute them both directly in your processCommit(CommitUpdateCommand) method (so you don't have to worry about being able to tell them apart) : Date: Thu, 30 Jul 2009 10:14:16 +0530 : From: : =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCk : s+CljQ==?= : Reply-To: solr-user@lucene.apache.org, noble.p...@gmail.com : To: solr-user@lucene.apache.org : Subject: Re: update some index documents after indexing process is done with : DIH : : If you make your EventListener implements SolrCoreAware you can get : hold of the core on inform. use that to get hold of the : SolrIndexWriter : : On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlese wrote: : > : > From the newSearcher(..) of a CustomEventListener which extends of : > AbstractSolrEventListener can access to SolrIndexSearcher and all core : > properties but can't get a SolrIndexWriter. Do you now how can I get from : > there a SolrIndexWriter? This way I would be able to modify the documents (I : > need to modify them depending on values of other documents, that's why I : > can't do it with DIH delta-import). : > Thanks in advance : > : > : > Noble Paul നോബിള് नोब्ळ्-2 wrote: : >> : >> On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlese : >> wrote: : >>> : >>> That really sounds the best way to reach my goal. How could I invoque a : >>> listener from the newSearcher?Would be something like: : >>> : >>> : >>> solr 0 >> name="rows">10 : >>> rocks 0 >> name="rows">10 : >>> static newSearcher warming query from : >>> solrconfig.xml : >>> : >>> : >>> : >>> : >>> And MyCustomListener would be the class who open the reader: : >>> : >>> RefCounted searchHolder = null; : >>> try { : >>> searchHolder = dataImporter.getCore().getSearcher(); : >>> IndexReader reader = searchHolder.get().getReader(); : >>> : >>> //Here I iterate over the reader doing docuemnt modifications : >>> : >>> } finally { : >>> if (searchHolder != null) searchHolder.decref(); : >>> } : >>> } catch (Exception ex) { : >>> LOG.info("error"); : >>> } : >> : >> you may not be able to access the DIH API from a newSearcher event . : >> But the API would give you the searcher directly as a method : >> parameter. : >>> : >>> Finally, to access to documents and add fields to some of them, I have : >>> thought in using SolrDocument classes. Can you please point me where : >>> something similar is done in solr source (I mean creation of : >>> SolrDocuemnts : >>> and conversion of them to proper lucene docuements). : >>> : >>> Does this way for reaching the goal makes sense? : >>> : >>> Thanks in advance : >>> : >>> : >>> : >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: : : when a core is reloaded the event fired is firstSearcher. newSearcher : is fired when a commit happens : : : On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese : wrote: : > : > Ok, but if I handle it in a newSearcher listener it will be executed : > every : > time I reload a core, isn't it? The thing is that I want to use an : > IndexReader to load in a HashMap some doc fields of the index and : > depending : > of the values of some field docs modify other docs. Its very memory : > consuming (I have tested it with a simple lucene script). Thats why I : > wanted : > to do it just after the indexing process. : > : > My ideal case would be to do it in the commit function of : > DirectUpdatehandler2.java just before : > writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want : > to : > mess that code... so trying to find out the best way to do that as a : > plugin : > instead of a hack as possible. : > : > Thanks in advance : > : > : > Noble Paul നോബിള് नोब्ळ्-2 wrote: : >> : >> It is best handled as a 'newSearcher' listener in solrconfig.xml. : >> onImportEnd is invoked before committing : >> : >> On Tue, Jul 28, 2009 at 3:13 PM, Marc : >> Sturlese : >> wrote: : >>> : >>> Hey there, : >>> I would like to be able to do something like: After the indexing : >>> process : >>> is : >>> done with DIH I would like to open an indexreader, iterate over all : >>> docs, : >>> modify some of them depending on others and delete some others. I can : >>> easy : >>> do this directly coding with luce