Re: Type converters for DocumentObjectBinder
Hi Paul, it's working for Query, but not for Updating (Add Bean). The getter method is returning a Calendar (GregorianCalendar instance) On the indexer side, a toString() or something equivalent is done and an error is thrown Caused by: java.text.ParseException: Unparseable date: "java.util.GregorianCalendar:java.util.GregorianCalendar[time=1258100168327,areFieldsSet= rue,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="Europe/Berlin",offset=360,dstSavings=360,useDaylight=true,tran itions=143,lastRule=java.util.SimpleTimeZone[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,startYear=0,startMode=2,startMo th=2,startDay=-1,startDayOfWeek=1,startTime=360,startTimeMode=2,endMode=2,endMonth=9,endDay=-1,endDayOfWeek=1,endTime=360,endTimeMode=2] ,firstDayOfWeek=2,minimalDaysInFirstWeek=4,ERA=1,YEAR=2009,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=2,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK= ,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=9,HOUR_OF_DAY=9,MINUTE=16,SECOND=8,MILLISECOND=327,ZONE_OFFSET=360,DST_OFFSET=0]" public Calendar getValidFrom() { return validFrom; } public void setValidFrom(Calendar validFrom) { this.validFrom = validFrom; } @Field public void setValidFrom(String validFrom) { Calendar cal = Calendar.getInstance(); try { cal.setTime(dateFormat.parse(validFrom)); } catch (ParseException e) { e.printStackTrace(); } this.validFrom = cal; } Noble Paul നോബിള് नोब्ळ्-2 wrote: > > create a setter method for the field which take s a Stringand apply > the annotation there > > example > > > private Calendar validFrom; > > @Field > public void setvalidFrom(String s){ > //convert to Calendar object and set the field > } > > > On Fri, Nov 13, 2009 at 12:24 PM, paulhyo wrote: >> >> Hi, >> >> I would like to know if there is a way to add type converters when using >> getBeans. I need convertion when Updating (Calendar -> String) and when >> Searching (String -> Calendar) >> >> >> The Bean class defines : >> @Field >> private Calendar validFrom; >> >> but the recieved type within Query Response is a String (2009-11-13)... >> >> Actually I get this error : >> >> java.lang.RuntimeException: Exception while setting value : 2009-09-16 on >> private java.util.Calendar >> ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom >> at >> org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:360) >> at >> org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.inject(DocumentObjectBinder.java:342) >> at >> org.apache.solr.client.solrj.beans.DocumentObjectBinder.getBeans(DocumentObjectBinder.java:55) >> at >> org.apache.solr.client.solrj.response.QueryResponse.getBeans(QueryResponse.java:324) >> at >> ch.mycompany.access.solr.impl.result.NatPersonPartnerResultBuilder.buildBeanListResult(NatPersonPartnerResultBuilder.java:38) >> at >> ch.mycompany.access.solr.impl.SoQueryManagerImpl.searchNatPersons(SoQueryManagerImpl.java:41) >> at >> ch.mycompany.access.solr.impl.SolrQueryManagerTest.testQueryFamilyNameRigg(SolrQueryManagerTest.java:36) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at junit.framework.TestCase.runTest(TestCase.java:164) >> at junit.framework.TestCase.runBare(TestCase.java:130) >> at junit.framework.TestResult$1.protect(TestResult.java:106) >> at junit.framework.TestResult.runProtected(TestResult.java:124) >> at junit.framework.TestResult.run(TestResult.java:109) >> at junit.framework.TestCase.run(TestCase.java:120) >> at junit.framework.TestSuite.runTest(TestSuite.java:230) >> at junit.framework.TestSuite.run(TestSuite.java:225) >> at >> org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) >> at >> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) >> at >> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) >> Caused by: java.lang.IllegalArgumentException: Can not set >> java.util.Calendar field >> ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom to >> java.lang.String >> at >> sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFiel
Re: Type converters for DocumentObjectBinder
you must have a corresponding getter which returns String. public String getValidFrom() { String s = null;//convert calendar to string return s; } On Fri, Nov 13, 2009 at 2:01 PM, paulhyo wrote: > > Hi Paul, > > it's working for Query, but not for Updating (Add Bean). The getter method > is returning a Calendar (GregorianCalendar instance) > > On the indexer side, a toString() or something equivalent is done and an > error is thrown > > Caused by: java.text.ParseException: Unparseable date: > "java.util.GregorianCalendar:java.util.GregorianCalendar[time=1258100168327,areFieldsSet= > rue,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="Europe/Berlin",offset=360,dstSavings=360,useDaylight=true,tran > itions=143,lastRule=java.util.SimpleTimeZone[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,startYear=0,startMode=2,startMo > th=2,startDay=-1,startDayOfWeek=1,startTime=360,startTimeMode=2,endMode=2,endMonth=9,endDay=-1,endDayOfWeek=1,endTime=360,endTimeMode=2] > ,firstDayOfWeek=2,minimalDaysInFirstWeek=4,ERA=1,YEAR=2009,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=2,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK= > ,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=9,HOUR_OF_DAY=9,MINUTE=16,SECOND=8,MILLISECOND=327,ZONE_OFFSET=360,DST_OFFSET=0]" > > > public Calendar getValidFrom() { > return validFrom; > } > > public void setValidFrom(Calendar validFrom) { > this.validFrom = validFrom; > } > > @Field > public void setValidFrom(String validFrom) { > Calendar cal = Calendar.getInstance(); > try { > cal.setTime(dateFormat.parse(validFrom)); > } catch (ParseException e) { > e.printStackTrace(); > } > this.validFrom = cal; > } > > > > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> create a setter method for the field which take s a Stringand apply >> the annotation there >> >> example >> >> >> private Calendar validFrom; >> >> @Field >> public void setvalidFrom(String s){ >> //convert to Calendar object and set the field >> } >> >> >> On Fri, Nov 13, 2009 at 12:24 PM, paulhyo wrote: >>> >>> Hi, >>> >>> I would like to know if there is a way to add type converters when using >>> getBeans. I need convertion when Updating (Calendar -> String) and when >>> Searching (String -> Calendar) >>> >>> >>> The Bean class defines : >>> @Field >>> private Calendar validFrom; >>> >>> but the recieved type within Query Response is a String (2009-11-13)... >>> >>> Actually I get this error : >>> >>> java.lang.RuntimeException: Exception while setting value : 2009-09-16 on >>> private java.util.Calendar >>> ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom >>> at >>> org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:360) >>> at >>> org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.inject(DocumentObjectBinder.java:342) >>> at >>> org.apache.solr.client.solrj.beans.DocumentObjectBinder.getBeans(DocumentObjectBinder.java:55) >>> at >>> org.apache.solr.client.solrj.response.QueryResponse.getBeans(QueryResponse.java:324) >>> at >>> ch.mycompany.access.solr.impl.result.NatPersonPartnerResultBuilder.buildBeanListResult(NatPersonPartnerResultBuilder.java:38) >>> at >>> ch.mycompany.access.solr.impl.SoQueryManagerImpl.searchNatPersons(SoQueryManagerImpl.java:41) >>> at >>> ch.mycompany.access.solr.impl.SolrQueryManagerTest.testQueryFamilyNameRigg(SolrQueryManagerTest.java:36) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at junit.framework.TestCase.runTest(TestCase.java:164) >>> at junit.framework.TestCase.runBare(TestCase.java:130) >>> at junit.framework.TestResult$1.protect(TestResult.java:106) >>> at junit.framework.TestResult.runProtected(TestResult.java:124) >>> at junit.framework.TestResult.run(TestResult.java:109) >>> at junit.framework.TestCase.run(TestCase.java:120) >>> at junit.framework.TestSuite.runTest(TestSuite.java:230) >>> at junit.framework.TestSuite.run(TestSuite.java:225) >>> at >>> org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) >>> at >>> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) >>> at >>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) >>> at >>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) >>> at >>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) >>> at >>> org.eclipse
highlighting issue lst.name is a leaf node
Hello list, I'm new to solr but from what I'm experimenting, it's awesome. I have a small issue regarding the highlighting feature. It finds stuff (as I see from the query analyzer), but the highlight list looks something like this: (the files were added using ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract"); and I set the "literal.id" to the filename) My solrconfig.xml requesthandler looks like: explicit true 3 30 * true 0.5 [-\w ,/\n\"']{20,200} true The schema.xml is untouched and downloaded yesterday from the latest stable build. At first, I thought it had something to do with the extraction of the pdf, but I tried the demo xml docs also and got the same result. I'm new to this, so please help. Thank you, Chuck
Re: Stop solr without losing documents
Michael wrote: I've got a process external to Solr that is constantly feeding it new documents, retrying if Solr is nonresponding. What's the right way to stop Solr (running in Tomcat) so no documents are lost? Currently I'm committing all cores and then running catalina's stop script, but between my commit and the stop, more documents can come in that would need *another* commit... Lots of people must have had this problem already, so I know the answer is simple; I just can't find it! Thanks. Michael I don't know if this is the best solution, or even if it's applicable to your situation but we do incremental updates from a database based on a timestamp, (from a simple seperate sql table filled by triggers so deletes are measures correctly as well). We store this timestamp in solr as well. Our index script first does a simple Solr request to request the newest timestamp and basically selects the documents to update with a "SELECT * FROM document_updates WHERE timestamp >= X" where X is the timestamp returned from Solr (We use >= for the hopefully extremely rare case when two updates are at the same time and also at the same time the index script is run where it only retrieved one of the updates, this will cause some documents to be updates multiple times but as document updates are idempotent this is no real problem.) Regards, gwk
Re: Arguments for Solr implementation at public web site
Some extra for the pros list: - Full control over which content to be searchable and not. - Posibility to make pages searchable almost instant after publication - Control over when the site is indexed Friendly Jan-Eirik On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček wrote: > Hi, > > I am looking for good arguments to justify implementation a search for > sites > which are available on the public internet. There are many sites in > "powered > by Solr" section which are indexed by Google and other search engines but > still they decided to invest resources into building and maintenance of > their own search functionality and not to go with [user_query site: > my_site.com] google search. Why? > > By no mean I am saying it makes not sense to implement Solr! But I want to > put together list of reasons and possibly with examples. Your help would be > much appreciated! > > Let's narrow the scope of this discussion to the following: > - the search should cover several community sites running open source CMSs, > JIRAs, Bugillas ... and the like > - all documents use open formats (no need to parse Word or Excel) > (maybe something close to what LucidImagination does for mailing lists of > Lucene and Solr) > > My initial kick off list would be: > > pros: > - considering we understand the content (we understand the domain scope) we > can fine tune the search engine to provide more accurate results > - Solr can give us facets > - we have user search logs (valuable for analysis) > - implementing Solr is a fun > > cons: > - requires resources (but the cost is relatively low depending on the query > traffic, index size and frequency of updates) > > Regards, > Lukas > > http://blog.lukas-vlcek.com/ > -- Jan Eirik B. Nævdal Solutions Engineer | +47 982 65 347 Iterate AS | www.iterate.no The Lean Software Development Consultancy
Re: Arguments for Solr implementation at public web site
Next to the faceting engine: - MoreLikeThis - Highlighting - Spellchecker But also more flexible querying using the DisMax handler which is clearly superior. Solr can also be used to store data which can be retrieved in an instant! We have used this technique in a site and it is obviously much faster than multiple large and complex SQL statements. On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote: > pros: > - considering we understand the content (we understand the domain scope) we > can fine tune the search engine to provide more accurate results > - Solr can give us facets > - we have user search logs (valuable for analysis) > - implementing Solr is a fun > > cons: > - requires resources (but the cost is relatively low depending on the query > traffic, index size and frequency of updates) > > Regards, > Lukas > > http://blog.lukas-vlcek.com/
Re: highlighting issue lst.name is a leaf node
I found the solution. If somebody will run into the same problem, here is how I solved it. - while uploading the document: req.setParam("uprefix", "attr_"); req.setParam("fmap.content", "attr_content"); req.setParam("overwrite", "true"); req.setParam("commit", "true"); - in the query: http://localhost:8983/solr/select?q=attr_content:%22Django%22&rows=4 - edit the solrconfig.xml in the requesthandler params id,title so that you won't get the whole text content inside the response. Regards, Chuck On Fri, Nov 13, 2009 at 11:21 AM, Chuck Mysak wrote: > Hello list, > > I'm new to solr but from what I'm experimenting, it's awesome. > I have a small issue regarding the highlighting feature. > > It finds stuff (as I see from the query analyzer), but the highlight list > looks something like this: > > > > > > > (the files were added using ContentStreamUpdateRequest req = new > ContentStreamUpdateRequest("/update/extract"); and I set the "literal.id" > to the filename) > > My solrconfig.xml requesthandler looks like: > >default="true"> > > >explicit > >true >3 >30 > > >* >true >0.5 >[-\w ,/\n\"']{20,200} >true > > > > The schema.xml is untouched and downloaded yesterday from the latest stable > build. > > At first, I thought it had something to do with the extraction of the pdf, > but I tried the demo xml docs also and got the same result. > > I'm new to this, so please help. > > Thank you, > > Chuck > > > > > >
Re: Arguments for Solr implementation at public web site
Jan-Eirik B. Nævdal schrieb: Some extra for the pros list: - Full control over which content to be searchable and not. - Posibility to make pages searchable almost instant after publication - Control over when the site is indexed +1 expecially the last point you can also add a robot.txt and prohibit spidering of the site to reduce traffic. google won't index any highly dynamic content, then. Friendly Jan-Eirik On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček wrote: Hi, I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in "powered by Solr" section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? By no mean I am saying it makes not sense to implement Solr! But I want to put together list of reasons and possibly with examples. Your help would be much appreciated! Let's narrow the scope of this discussion to the following: - the search should cover several community sites running open source CMSs, JIRAs, Bugillas ... and the like - all documents use open formats (no need to parse Word or Excel) (maybe something close to what LucidImagination does for mailing lists of Lucene and Solr) My initial kick off list would be: pros: - considering we understand the content (we understand the domain scope) we can fine tune the search engine to provide more accurate results - Solr can give us facets - we have user search logs (valuable for analysis) - implementing Solr is a fun cons: - requires resources (but the cost is relatively low depending on the query traffic, index size and frequency of updates) Regards, Lukas http://blog.lukas-vlcek.com/ -- Jan Eirik B. Nævdal Solutions Engineer | +47 982 65 347 Iterate AS | www.iterate.no The Lean Software Development Consultancy
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: > > I am looking for good arguments to justify implementation a search for > sites > which are available on the public internet. There are many sites in > "powered > by Solr" section which are indexed by Google and other search engines but > still they decided to invest resources into building and maintenance of > their own search functionality and not to go with [user_query site: > my_site.com] google search. Why? > You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Hi, thanks for inputs so far... however, let's put it this way: When you need to search for something Lucene or Solr related, which one do you use: - generic Google - go to a particular mail list web site and search from here (if there is any search form at all) - go to LucidImagination.com and use its search capability Regards, Lukas On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg wrote: > > > Lukáš Vlček wrote: > > > > I am looking for good arguments to justify implementation a search for > > sites > > which are available on the public internet. There are many sites in > > "powered > > by Solr" section which are indexed by Google and other search engines but > > still they decided to invest resources into building and maintenance of > > their own search functionality and not to go with [user_query site: > > my_site.com] google search. Why? > > > > You're assuming that Solr is just used in these cases to index discrete web > pages which Google etc. would be able to access via following navigational > links. > > I would imagine that in a lot of cases, Solr is used to index database > entities which are used to build [parts of] pages dynamically, and which > might be viewable in different forms in various different pages. > > Plus, with stored fields, you have the option of actually driving a website > off Solr instead of directly off a database, which might make sense from a > speed perspective in some cases. > > And further, going back to page-only indexing -- you have no guarantee when > Google will decide to recrawl your site, so there may be a delay before > changes show up in their index. With an in-house search engine you can > reindex as often as you like. > > Andrew. > > -- > View this message in context: > http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Data import problem with child entity from different database
Morning all, I'm having problems with joining child a child entity from one database to a parent from another... My entity definitions look like this (names changed for brevity): c is getting indexed fine (it's stored, I can see field 'c' in the search results) but child.d isn't. I know the child table has data for the corresponding parent rows, and I've even watched the SQL queries against the child table appearing in Oracle's sqldeveloper as the DataImportHandler runs. But no content for child.d gets into the index. My schema contains a definition for a field called d like so: (keywords_ids is a conservatively-analyzed text type which has worked fine in other contexts.) Two things occur to me. 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables is just a char(4), nothing fancy. Could something weird with character encodings be happening? 2. d isn't a primary key in either parent or child, but this shouldn't matter should it? Additional data points -- I also tried using the CachedSqlEntityProcessor to do in-memory table caching of child, but it didn't work then either. I got a lot of error messages like this: No value available for the cache key : d in the entity : child If anyone knows whether this is a known limitation (if so I can work round it), or an unexpected case (if so I'll file a bug report), please shout. I'm using 1.4. Yet again, many thanks :-) Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: > > When you need to search for something Lucene or Solr related, which one do > you use: > - generic Google > - go to a particular mail list web site and search from here (if there is > any search form at all) > Both of these (Nabble in the second case) in case any recent posts have appeared which Google hasn't picked up. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
For this list I usually end up @ http://solr.markmail.org (which I believe also uses Lucene under the hood) Google is such a black box ... Pros: + 1 Open Source (enough said :-) There also seems to always be the notion that "crawling" leads itself to produce the best results but that is rarely the case. And unless you are a "special" type of site Google will not overlay your results w/ some type of context in the search (ie news or sports, etc). What I think really needs to happen is Solr (and is a bit missing @ the moment) is there needs to be a common interface to "reindexing" another index (if that makes sense) ... something akin or like OpenSearch (http://www.opensearch.org/Community/OpenSearch_software) For example what I would like to do is have my site, have my search index, and connect Google to indexing just to my search index (and not crawl the site) ... the only current option for something like that are sitemaps which I think Solr (templates) should have a contrib project for (but you would have to generate these offline for sure). - Jon On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote: > Hi, > > thanks for inputs so far... however, let's put it this way: > > When you need to search for something Lucene or Solr related, which one do > you use: > - generic Google > - go to a particular mail list web site and search from here (if there is > any search form at all) > - go to LucidImagination.com and use its search capability > > Regards, > Lukas > > > On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg wrote: > >> >> >> Lukáš Vlček wrote: >>> >>> I am looking for good arguments to justify implementation a search for >>> sites >>> which are available on the public internet. There are many sites in >>> "powered >>> by Solr" section which are indexed by Google and other search engines but >>> still they decided to invest resources into building and maintenance of >>> their own search functionality and not to go with [user_query site: >>> my_site.com] google search. Why? >>> >> >> You're assuming that Solr is just used in these cases to index discrete web >> pages which Google etc. would be able to access via following navigational >> links. >> >> I would imagine that in a lot of cases, Solr is used to index database >> entities which are used to build [parts of] pages dynamically, and which >> might be viewable in different forms in various different pages. >> >> Plus, with stored fields, you have the option of actually driving a website >> off Solr instead of directly off a database, which might make sense from a >> speed perspective in some cases. >> >> And further, going back to page-only indexing -- you have no guarantee when >> Google will decide to recrawl your site, so there may be a delay before >> changes show up in their index. With an in-house search engine you can >> reindex as often as you like. >> >> Andrew. >> >> -- >> View this message in context: >> http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >>
Re: Selection of terms for MoreLikeThis
Any ideas on this? Is it worth sending a bug report? Those links are live, by the way, in case anyone wants to verify that MLT is returning suggestions with very low tf.idf. Cheers, Andrew. Andrew Clegg wrote: > > Hi, > > If I run a MoreLikeThis query like the following: > > http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 > > one of the hits in the results is "and" (I don't do any stopword removal > on this field). > > However if I look inside that document with the TermVectorComponent: > > http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords > > I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms > with *much* higher tf.idf scores, e.g.: > > > 1 > 10 > 0.1 > > > that *don't* appear in the MoreLikeThis list. (I tried adding > &mlt.maxwl=999 to the end of the MLT query but it makes no difference.) > > What's going on? Surely something with tf.idf = 0.1 is a far better > candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4? > Or does MoreLikeThis do some other heuristic magic to select good > candidates, and sometimes get it wrong? > > BTW the keywords field is indexed, stored, multi-valued and term-vectored. > > Thanks, > > Andrew. > > -- > :: http://biotext.org.uk/ :: > > -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data import problem with child entity from different database
no obvious issues. you may post your entire data-config.xml do w/o CachedSqlEntityProcessor first and then apply that later On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg wrote: > > Morning all, > > I'm having problems with joining child a child entity from one database to a > parent from another... > > My entity definitions look like this (names changed for brevity): > > > > > > > > c is getting indexed fine (it's stored, I can see field 'c' in the search > results) but child.d isn't. I know the child table has data for the > corresponding parent rows, and I've even watched the SQL queries against the > child table appearing in Oracle's sqldeveloper as the DataImportHandler > runs. But no content for child.d gets into the index. > > My schema contains a definition for a field called d like so: > > multiValued="true" termVectors="true" /> > > (keywords_ids is a conservatively-analyzed text type which has worked fine > in other contexts.) > > Two things occur to me. > > 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables > is just a char(4), nothing fancy. Could something weird with character > encodings be happening? > > 2. d isn't a primary key in either parent or child, but this shouldn't > matter should it? > > Additional data points -- I also tried using the CachedSqlEntityProcessor to > do in-memory table caching of child, but it didn't work then either. I got a > lot of error messages like this: > > No value available for the cache key : d in the entity : child > > If anyone knows whether this is a known limitation (if so I can work round > it), or an unexpected case (if so I'll file a bug report), please shout. I'm > using 1.4. > > Yet again, many thanks :-) > > Andrew. > > -- > View this message in context: > http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
exclude some fields from copying dynamic fields | schema.xml
Hi, we are using the following entry in schema.xml to make a copy of one type of dynamic field to another : Is it possible to exclude some fields from copying. We are using Solr1.3 ~Vikrant -- View this message in context: http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data import problem with child entity from different database
Noble Paul നോബിള് नोब्ळ्-2 wrote: > > no obvious issues. > you may post your entire data-config.xml > Here it is, exactly as last attempt but with usernames etc. removed. Ignore the comments and the unused FileDataSource... http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml Noble Paul നോബിള് नोब्ळ्-2 wrote: > > do w/o CachedSqlEntityProcessor first and then apply that later > Yep, that was just a bit of a wild stab in the dark to see if it made any difference. Thanks, Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 query and index perf tank during optimize
I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It would be wonderful if from Java we could simply set a per-thread "IO priority", but, it'll be a looong time until that's possible. So I think for now we should make a Directory impl that emulates such behavior, eg Lucene could state the "context" (merge, flush, search, nrt-reopen, etc.) whenever it opens an IndexInput / IndexOutput, and then the Directory could hack in pausing the merge IO whenever search/nrt-reopen IO is active. Mike On Thu, Nov 12, 2009 at 7:18 PM, Mark Miller wrote: > Jerome L Quinn wrote: >> Hi, everyone, this is a problem I've had for quite a while, >> and have basically avoided optimizing because of it. However, >> eventually we will get to the point where we must delete as >> well as add docs continuously. >> >> I have a Solr 1.3 index with ~4M docs at around 90G. This is a single >> instance running inside tomcat 6, so no replication. Merge factor is the >> default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. >> autoCommit is set at 3 sec. >> >> We continually push new data into the index, at somewhere between 1-10 docs >> every 10 sec or so. Solr is running on a quad-core 3.0GHz server. >> under IBM java 1.6. The index is sitting on a local 15K scsi disk. >> There's nothing >> else of substance running on the box. >> >> Optimizing the index takes about 65 min. >> >> As long as I'm not optimizing, search and indexing times are satisfactory. >> >> When I start the optimize, I see massive problems with timeouts pushing new >> docs >> into the index, and search times balloon. A typical search while >> optimizing takes >> about 1 min instead of a few seconds. >> >> Can anyone offer me help with fixing the problem? >> >> Thanks, >> Jerry Quinn >> > Ah, the pains of optimization. Its kind of just how it is. One solution > is to use two boxes and replication - optimize on the master, and then > queries only hit the slave. Out of reach for some though, and adds many > complications. > > Another kind of option is to use the partial optimize feature: > > > > Using this, you can optimize down to n segments and take a shorter hit > each time. > > Also, if optimizing is so painful, you might lower the merge factor > amortize that pain better. Thats another way to slowly get there - if > you lower the merge factor, as merging takes place, the new merge factor > will be respected, and semgents will merge down. A merge factor of 2 > (the lowest) will make it so you only ever have 2 segments. Sometimes > that works reasonably well - you could try 3-6 or something as well. > Then when you do your partial optimizes (and eventually a full optimize > perhaps), you want have so far to go. > > -- > - Mark > > http://www.lucidimagination.com > > > >
Re: javabin in .NET?
Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् > Is there any tool to directly port java to .Net? then we can etxract > out the client part of the javabin code and convert it. > > On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher > wrote: > > Has anyone looked into using the javabin response format from .NET > (instead > > of SolrJ)? > > > > It's mainly a curiosity. > > > > How much better could performance/bandwidth/throughput be? How difficult > > would it be to implement some .NET code (C#, I'd guess being the best > > choice) to handle this response format? > > > > Thanks, > >Erik > > > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Re: Selection of terms for MoreLikeThis
Hi Andrew, no idea, I'm afraid - but could you sent the output of interestingTerms=details? This at least would show what MoreLikeThis uses, in comparison to the TermVectorComponent you've already pasted. Chantal Andrew Clegg schrieb: Any ideas on this? Is it worth sending a bug report? Those links are live, by the way, in case anyone wants to verify that MLT is returning suggestions with very low tf.idf. Cheers, Andrew. Andrew Clegg wrote: Hi, If I run a MoreLikeThis query like the following: http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 one of the hits in the results is "and" (I don't do any stopword removal on this field). However if I look inside that document with the TermVectorComponent: http://www.cathdb.info/solr/select/?q=id:3.40.50.720&tv=true&tv.all=true&tv.fl=keywords I see that "and" has a measly tf.idf of 7.46E-4. But there are other terms with *much* higher tf.idf scores, e.g.: 1 10 0.1 that *don't* appear in the MoreLikeThis list. (I tried adding &mlt.maxwl=999 to the end of the MLT query but it makes no difference.) What's going on? Surely something with tf.idf = 0.1 is a far better candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4? Or does MoreLikeThis do some other heuristic magic to select good candidates, and sometimes get it wrong? BTW the keywords field is indexed, stored, multi-valued and term-vectored. Thanks, Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 query and index perf tank during optimize
Another thing to try, is reducing the maxThreadCount for ConcurrentMergeScheduler. It defaults to 3, which I think is too high -- we should change this default to 1 (I'll open a Lucene issue). Mike On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn wrote: > > Hi, everyone, this is a problem I've had for quite a while, > and have basically avoided optimizing because of it. However, > eventually we will get to the point where we must delete as > well as add docs continuously. > > I have a Solr 1.3 index with ~4M docs at around 90G. This is a single > instance running inside tomcat 6, so no replication. Merge factor is the > default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. > autoCommit is set at 3 sec. > > We continually push new data into the index, at somewhere between 1-10 docs > every 10 sec or so. Solr is running on a quad-core 3.0GHz server. > under IBM java 1.6. The index is sitting on a local 15K scsi disk. > There's nothing > else of substance running on the box. > > Optimizing the index takes about 65 min. > > As long as I'm not optimizing, search and indexing times are satisfactory. > > When I start the optimize, I see massive problems with timeouts pushing new > docs > into the index, and search times balloon. A typical search while > optimizing takes > about 1 min instead of a few seconds. > > Can anyone offer me help with fixing the problem? > > Thanks, > Jerry Quinn
Re: Solr 1.3 query and index perf tank during optimize
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless wrote: > I think we sorely need a Directory impl that down-prioritizes IO > performed by merging. Presumably this "prioritizing Directory impl" could wrap/decorate any existing Directory. Mike
Re: javabin in .NET?
The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer wrote: > Nope. It has to be manually ported. Not so much because of the language > itself but because of differences in the libraries. > > > 2009/11/13 Noble Paul നോബിള് नोब्ळ् > >> Is there any tool to directly port java to .Net? then we can etxract >> out the client part of the javabin code and convert it. >> >> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher >> wrote: >> > Has anyone looked into using the javabin response format from .NET >> (instead >> > of SolrJ)? >> > >> > It's mainly a curiosity. >> > >> > How much better could performance/bandwidth/throughput be? How difficult >> > would it be to implement some .NET code (C#, I'd guess being the best >> > choice) to handle this response format? >> > >> > Thanks, >> > Erik >> > >> > >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Selection of terms for MoreLikeThis
Chantal Ackermann wrote: > > no idea, I'm afraid - but could you sent the output of > interestingTerms=details? > This at least would show what MoreLikeThis uses, in comparison to the > TermVectorComponent you've already pasted. > I can, but I'm afraid they're not very illuminating! http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=details&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 0 59 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Cheers, Andrew. -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html Sent from the Solr - User mailing list archive at Nabble.com.
non english languages
Hello all, is there support for non-english language content indexing in Solr? I'm interested in Bulgarian, Hungarian, Romanian and Russian. Best regards, Chuck
Re: Solr 1.3 query and index perf tank during optimize
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless wrote: > I think we sorely need a Directory impl that down-prioritizes IO > performed by merging. It's unclear if this case is caused by IO contention, or the OS cache of the hot parts of the index being lost by that extra IO activity. Of course the latter would lead to the former, but without that OS disk cache, the searches may be too slow even w/o the extra IO. -Yonik http://www.lucidimagination.com
Re: non english languages
the included snowball filters support hungarian, romanian, and russian. On Fri, Nov 13, 2009 at 9:03 AM, Chuck Mysak wrote: > Hello all, > > is there support for non-english language content indexing in Solr? > I'm interested in Bulgarian, Hungarian, Romanian and Russian. > > Best regards, > > Chuck > -- Robert Muir rcm...@gmail.com
Re: Selection of terms for MoreLikeThis
Hi Andrew, your URL does not include the parameter mlt.boost. Setting that to "true" made a noticeable difference for my queries. If not, there is also the parameter mlt.minwl "minimum word length below which words will be ignored." All your other terms seem longer than 3, so it would help in this case? But seems a bit like work around. Cheers, Chantal Andrew Clegg schrieb: Chantal Ackermann wrote: no idea, I'm afraid - but could you sent the output of interestingTerms=details? This at least would show what MoreLikeThis uses, in comparison to the TermVectorComponent you've already pasted. I can, but I'm afraid they're not very illuminating! http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=details&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1 0 59 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Cheers, Andrew. -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: javabin in .NET?
I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् > The javabin format does not have many dependencies. it may have 3-4 > classes an that is it. > > On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer > wrote: > > Nope. It has to be manually ported. Not so much because of the language > > itself but because of differences in the libraries. > > > > > > 2009/11/13 Noble Paul നോബിള് नोब्ळ् > > > >> Is there any tool to directly port java to .Net? then we can etxract > >> out the client part of the javabin code and convert it. > >> > >> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher > >> wrote: > >> > Has anyone looked into using the javabin response format from .NET > >> (instead > >> > of SolrJ)? > >> > > >> > It's mainly a curiosity. > >> > > >> > How much better could performance/bandwidth/throughput be? How > difficult > >> > would it be to implement some .NET code (C#, I'd guess being the best > >> > choice) to handle this response format? > >> > > >> > Thanks, > >> >Erik > >> > > >> > > >> > >> > >> > >> -- > >> - > >> Noble Paul | Principal Engineer| AOL | http://aol.com > >> > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Reseting doc boosts
Hi, Im trying to figure out if there is an easy way to basically "reset" all of any doc boosts which you have made (for analytical purposes) ... for example if I run an index, gather report, doc boost on the report, and reset the boosts @ time of next index ... It would seem to be from just knowing how Lucene works that I would really need to reindex since its a attrib on the doc itself which would have to be modified, but there is no easy way to query for docs which have been boosted either. Any insight? Thanks. - Jon
Re: Selection of terms for MoreLikeThis
Chantal Ackermann wrote: > > your URL does not include the parameter mlt.boost. Setting that to > "true" made a noticeable difference for my queries. > Hmm, I'm really not sure if this is doing the right thing either. When I add it I get: 1.0 0.60737264 0.27599618 0.2476748 0.24487767 0.23969446 0.1990452 0.18447271 0.13297324 0.1233415 0.11993817 0.11789705 0.117194556 0.11164951 0.10744005 0.09943076 0.097062066 0.09287166 0.0877542 0.0864609 0.08362857 0.07988805 0.079598725 0.07747293 0.075560644 "and" scores far more highly than much more discriminative words like "chloroplast" and "glyoxylate", both of which have *much* higher tf.idf scores than "and" according to the TermVectorComponent: 8 1887 0.0042395336512983575 7 0.0063006300630063005 45 60316 7.460706943431262E-4 In fact an order of magnitude higher. Chantal Ackermann wrote: > > If not, there is also the parameter > mlt.minwl > "minimum word length below which words will be ignored." > > All your other terms seem longer than 3, so it would help in this case? > But seems a bit like work around. > Yeah, I could do that, or add a stopword list to that field. But there are some other common terms in the list like "protein" or "enzyme" that are long and not really stopwords, but have a similarly low tf.idf to "and": 43 189541 2.2686384476181933E-4 15 16712 8.975586404978459E-4 Plus, of course, I'm curious to know exactly how MLT is identifying those terms as important, and if it's a bug or my fault... Thanks for your help though! Do any of the Solr devs have an idea of the mechanism at work here? Andrew. -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26337677.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about the message "Indexing failed. Rolled back all changes."
I'm getting the same thing. The process runs, seemingly successfully, and I can even go to other SOLR pages pointing to the same server and pull queries against the index with these just-added entires. But the response to the original import says "failed" and "rollback" both through the XML response and also in the logs. Why is the process reporting failure and saying it did not commit/rolled back, when it actually succeeded in importing and indexing? If it rolled back, as the logs say, I would expect to not be able to pull those rows out with new queries against the index. Avlesh Singh wrote: > >> >> But even after I successfully index data using >> http://host:port/solr-example/dataimport?command=full-import&commit=true&clean=true, >> do solr search which returns meaningful results >> > I am not sure what "meaningful" means. The full-import command starts an > asynchronous process to start re-indexing. The response that you get in > return to the above mentioned URL, (always) indicates that a full-import > has > been started. It does NOT know about anything that might go wrong with the > process itself. > > and then visit http://host:port/solr-example/dataimport?command=status, I >> can see thefollowing result ... >> > The status URL is the one which tells you what is going on with the > process. > The message - "Indexing failed. Rolled back all changes" can come because > of > multiple reasons - missing database drivers, incorrect sql queries, > runtime > errors in custom transformers etc. > > Start the full-import once more. Keep a watch on the Solr server log. If > you > can figure out what's going wrong, great; otherwise, copy-paste the > exception stack-trace from the log file for specific answers. > > Cheers > Avlesh > > On Tue, Nov 10, 2009 at 1:32 PM, Bertie Shen > wrote: > >> No. I did not check the logs. >> >> But even after I successfully index data using >> http://host:port >> /solr-example/dataimport?command=full-import&commit=true&clean=true, >> do solr search which returns meaningful results, and then visit >> http://host:port/solr-example/dataimport?command=status, I can see the >> following result >> >> >> - >> >> 0 >> 1 >> >> - >> >> - >> >> data-config.xml >> >> >> status >> idle >> >> - >> >> 0:2:11.426 >> 584 >> 1538 >> 0 >> 2009-11-09 23:54:41 >> *Indexing failed. Rolled back all changes.* >> 2009-11-09 23:54:42 >> 2009-11-09 23:54:42 >> 2009-11-09 23:54:42 >> >> - >> >> This response format is experimental. It is likely to change in the >> future. >> >> >> >> On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> >> > On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen >> wrote: >> > >> > > >> > > When I use >> > > http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport >> to >> > > debug >> > > the indexing config file, I always see the status message on the >> right >> > part >> > > Indexing failed. Rolled back all changes., even >> the >> > > indexing process looks to be successful. I am not sure whether you >> guys >> > > have >> > > seen the same phenomenon or not. BTW, I usually check the checkbox >> Clean >> > > and sometimes check Commit box, and then click Debug Now button. >> > > >> > > >> > Do you see any exceptions in the logs? >> > >> > -- >> > Regards, >> > Shalin Shekhar Mangar. >> > >> > > -- View this message in context: http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26338287.html Sent from the Solr - User mailing list archive at Nabble.com.
scanning folders recursively / Tika
Hello. I am on work with Tika 0.5 and want to scan a folder system about 10GB. Is there a comfortable way to scan folders recursively with an existing class or have i to write it myself? Any tips for best practise? Greetings, Peter -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
Re: scanning folders recursively / Tika
Have one thread recursing depth first down the directories & adding to a queue (fixed size). Have many threads reading off of the queue and doing the work. -glen http://zzzoot.blogspot.com/ 2009/11/13 Peter Gabriel : > Hello. > > I am on work with Tika 0.5 and want to scan a folder system about 10GB. > Is there a comfortable way to scan folders recursively with an existing class > or have i to write it myself? > > Any tips for best practise? > > Greetings, Peter > -- > Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - > sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser > -- -
Re: Stop solr without losing documents
On Fri, Nov 13, 2009 at 4:32 AM, gwk wrote: > I don't know if this is the best solution, or even if it's applicable to > your situation but we do incremental updates from a database based on a > timestamp, (from a simple seperate sql table filled by triggers so deletes Thanks, gwk! This doesn't exactly meet our needs, but helped us get to a solution. In short, we are manually committing in our outside updater process (instead of letting Solr autocommit), and marking which documents have been updated before a successful commit. Now stopping solr is as easy as kill -9. Michael
how to search against multiple attributes in the index
I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term ("techGroup",searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term("techProgram",searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com.
The status of Local/Geo/Spatial/Distance Solr
Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: false in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at org.apache.solr.core.SolrCore.(SolrCore.java:551) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access$0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:448) at org.apache.catalina.core.StandardServer.start(StandardServer.java:700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177) Caused by: java.lang.ClassNotFoundException: com.pjaol.search.geo.utils.DistanceFilter at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1362) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappCl
Re: how to search against multiple attributes in the index
Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev wrote: > > I want to build AND search query against field1 AND field2 etc. Both these > fields are stored in an index. I am migrating lucene code to Solr. > Following > is my existing lucene code > > BooleanQuery currentSearchingQuery = new BooleanQuery(); > > currentSearchingQuery.add(titleDescQuery,Occur.MUST); > highlighter = new Highlighter( new QueryScorer(titleDescQuery)); > > TermQuery searchTechGroupQyery = new TermQuery(new Term > ("techGroup",searchForm.getTechGroup())); >currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); > TermQuery searchProgramQyery = new TermQuery(new > Term("techProgram",searchForm.getTechProgram())); >currentSearchingQuery.add(searchProgramQyery, Occur.MUST); > } > > What's the equivalent Solr code for above Luce code. Any samples would be > appreciated. > > Thanks, > -- > View this message in context: > http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: The status of Local/Geo/Spatial/Distance Solr
It looks like solr+spatial will get some attention in 1.5, check: https://issues.apache.org/jira/browse/SOLR-1561 Depending on your needs, that may be enough. More robust/scaleable solutions will hopefully work their way into 1.5 (any help is always appreciated!) On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote: Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr ) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: false in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 833) at org.apache.solr.core.SolrCore.(SolrCore.java:551) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 221) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 302) at org .apache .catalina .core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4222) at org .apache .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access $0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase $PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 544) at org .apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 626) at org .apache .catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 311) at org .apache .catalina .util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 443) at org.apache.catalina.core.StandardService.start(StandardService.java: 448) at org.apache.catalina.core.StandardServer.start(StandardServer.java: 700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
Re: The status of Local/Geo/Spatial/Distance Solr
Also: https://issues.apache.org/jira/browse/SOLR-1302 On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote: Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr ) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: false in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 833) at org.apache.solr.core.SolrCore.(SolrCore.java:551) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 221) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 302) at org .apache .catalina .core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4222) at org .apache .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access $0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase $PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 544) at org .apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 626) at org .apache .catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 311) at org .apache .catalina .util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 443) at org.apache.catalina.core.StandardService.start(StandardService.java: 448) at org.apache.catalina.core.StandardServer.start(StandardServer.java: 700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org .apache.commons.daemon.support.DaemonLoader.start(Da
Re: how to search against multiple attributes in the index
I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: > > Dive in - http://wiki.apache.org/solr/Solrj > > Cheers > Avlesh > > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev wrote: > >> >> I want to build AND search query against field1 AND field2 etc. Both >> these >> fields are stored in an index. I am migrating lucene code to Solr. >> Following >> is my existing lucene code >> >> BooleanQuery currentSearchingQuery = new BooleanQuery(); >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term >> ("techGroup",searchForm.getTechGroup())); >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); >> TermQuery searchProgramQyery = new TermQuery(new >> Term("techProgram",searchForm.getTechProgram())); >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); >> } >> >> What's the equivalent Solr code for above Luce code. Any samples would be >> appreciated. >> >> Thanks, >> -- >> View this message in context: >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The status of Local/Geo/Spatial/Distance Solr
Heya.. could it be a problem with your solr config files? I seem to recall a change from the docs as they were to get this working.. I have... lat lng 4 25 localsolr facet mlt highlight debug That tie up with your config/ I'd bascially interpreted the current packaging as... What used to be locallucene has deffo merged into lucene-spatial in this build, no more locallucene. However, you still need to build localsolr for now... My solr jars are: commons-beanutils-1.8.0.jar commons-logging-1.1.1.jar localsolr-1.5.2-rc1.jar lucene-misc-2.9.1-ki-rc3.jar serializer-2.7.1.jar stax-1.2.0.jar xml-apis-1.3.04.jar commons-codec-1.4.jar commons-pool-1.5.3.jar log4j-1.2.13.jar lucene-queries-2.9.1-ki-rc3.jar slf4j-api-1.5.5.jarstax-api-1.0.jar xpp3-1.1.3.4.O.jar commons-dbcp-1.2.2.jargeoapi-nogenerics-2.1M2.jar lucene-analyzers-2.9.1-ki-rc3.jarlucene-snowball-2.9.1-ki-rc3.jar slf4j-log4j12-1.5.5.jarstax-utils-20040917.jar commons-fileupload-1.2.1.jar geronimo-stax-api_1.0_spec-1.0.1.jar lucene-core-2.9.1-ki-rc3.jar lucene-spatial-2.9.1-ki-rc3.jar solr-commons-csv-1.4.0-ki-rc1.jar woodstox-wstx-asl-3.2.7.jar commons-httpclient-3.1.jargt2-referencing-2.3.1.jar lucene-highlighter-2.9.1-ki-rc3.jar lucene-spellchecker-2.9.1-ki-rc3.jar solr-core-1.4.0-ki-rc1.jar xalan-2.7.1.jar commons-io-1.3.2.jar jsr108-0.01.jar lucene-memory-2.9.1-ki-rc3.jar org.codehaus.woodstox-wstx-asl-3.2.7.jar solr-solrj-1.4.0-ki-rc1.jar xercesImpl-2.9.1.jar Sorry for dumping the info at you... hope it helps tho Ian. 2009/11/13 Bertie Shen : > Hey, > > I am interested in using LocalSolr to go Local/Geo/Spatial/Distance > search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr) > points to pretty old documentation. Is there a better document I refer to > for the setting up of LocalSolr and some performance analysis? > > Just sync-ed Solr codebase and found LocalSolr is still NOT in the > contrib package. Do we have a plan to incorporate it? I download a LocalSolr > lib localsolr-1.5.jar from > http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice > that the namespace is com.pjaol.search. blah blah, while LocalLucene package > is in Lucene codebase and the package name is org.apache.lucene.spatial blah > blah. > > But localsolr-1.5.jar from from > http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not > work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. > After I restart tomcat, I could not load solr admin page. The error is as > follows. It looks solr is still looking for > old named classes. > > Thanks. > > HTTP Status 500 - Severe errors in solr configuration. Check your log files > for more detailed information on what may be wrong. If you want solr to > continue after configuration errors, change: > false in null > - > java.lang.NoClassDefFoundError: > com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native > Method) at java.lang.Class.forName(Class.java:247) at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) > at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at > org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at > org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at > org.apache.solr.core.SolrCore.(SolrCore.java:551) at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) > at > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) > at > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) > at > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) > at > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) > at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) > at > org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) > at org.apache.catalina.core.ContainerBase.access$0(ContainerBase.java:744) > at > org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:144) > at java.security.AccessController.doPrivileged(Native Method) at > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:738) at > org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at > org.ap
Obtaining list of dynamic fields beind available in index
Hi there! How can we retrieve the complete list of dynamic fields, which are currently available in index? Thank you in advance! -- Eugene N Dzhurinsky pgpKftn1PiY0K.pgp Description: PGP signature
Re: how to search against multiple attributes in the index
For a starting point, this might be a good read - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query Cheers Avlesh On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev wrote: > > I already did dive in before. I am using solrj API and SolrQuery object to > build query. but its not clear/written how to build booleanQuery ANDing > bunch of different attributes in the index. Any samples please? > > Avlesh Singh wrote: > > > > Dive in - http://wiki.apache.org/solr/Solrj > > > > Cheers > > Avlesh > > > > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev > wrote: > > > >> > >> I want to build AND search query against field1 AND field2 etc. Both > >> these > >> fields are stored in an index. I am migrating lucene code to Solr. > >> Following > >> is my existing lucene code > >> > >> BooleanQuery currentSearchingQuery = new BooleanQuery(); > >> > >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); > >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); > >> > >> TermQuery searchTechGroupQyery = new TermQuery(new Term > >> ("techGroup",searchForm.getTechGroup())); > >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); > >> TermQuery searchProgramQyery = new TermQuery(new > >> Term("techProgram",searchForm.getTechProgram())); > >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); > >> } > >> > >> What's the equivalent Solr code for above Luce code. Any samples would > be > >> appreciated. > >> > >> Thanks, > >> -- > >> View this message in context: > >> > http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Return doc if one or more query keywords occur multiple times
Anyone? Original-Nachricht > Datum: Thu, 12 Nov 2009 13:29:20 +0100 > Von: gistol...@gmx.de > An: solr-user@lucene.apache.org > Betreff: Return doc if one or more query keywords occur multiple times > Hello, > > I am using Dismax request handler for queries: > > ...select?q=foo bar foo2 bar2&qt=dismax&mm=2... > > With parameter "mm=2" I configure that at least 2 of the optional clauses > must match, regardless of how many clauses there are. > > But now I want change this to the following: > > List all documents that have at least 2 of the optional clauses OR that > have at least one of the query terms (e.g. foo) more than once. > > Is this possible? > Thanks, > Gisto > > -- > DSL-Preisknaller: DSL Komplettpakete von GMX schon für > 16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02 -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser
Re: Obtaining list of dynamic fields beind available in index
Luke Handler? - http://wiki.apache.org/solr/LukeRequestHandler /admin/luke?numTerms=0 Cheers Avlesh On Fri, Nov 13, 2009 at 10:05 PM, Eugene Dzhurinsky wrote: > Hi there! > > How can we retrieve the complete list of dynamic fields, which are > currently > available in index? > > Thank you in advance! > -- > Eugene N Dzhurinsky >
Re: how to search against multiple attributes in the index
I think I found the answer. needed to read more API documentation :-) you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Avlesh Singh wrote: > > For a starting point, this might be a good read - > http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query > > Cheers > Avlesh > > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev > wrote: > >> >> I already did dive in before. I am using solrj API and SolrQuery object >> to >> build query. but its not clear/written how to build booleanQuery ANDing >> bunch of different attributes in the index. Any samples please? >> >> Avlesh Singh wrote: >> > >> > Dive in - http://wiki.apache.org/solr/Solrj >> > >> > Cheers >> > Avlesh >> > >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev >> wrote: >> > >> >> >> >> I want to build AND search query against field1 AND field2 etc. Both >> >> these >> >> fields are stored in an index. I am migrating lucene code to Solr. >> >> Following >> >> is my existing lucene code >> >> >> >> BooleanQuery currentSearchingQuery = new BooleanQuery(); >> >> >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); >> >> >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term >> >> ("techGroup",searchForm.getTechGroup())); >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); >> >> TermQuery searchProgramQyery = new TermQuery(new >> >> Term("techProgram",searchForm.getTechProgram())); >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); >> >> } >> >> >> >> What's the equivalent Solr code for above Luce code. Any samples would >> be >> >> appreciated. >> >> >> >> Thanks, >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
> > you can do it using > solrQuery.setFilterQueries() and build AND queries of multiple parameters. > Nope. You would need to read more - http://wiki.apache.org/solr/FilterQueryGuidance For your impatience, here's a quick starter - #and between two fields solrQuery.setQuery("+field1:foo +field2:bar"); #or between two fields solrQuery.setQuery("field1:foo field2:bar"); Cheers Avlesh On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev wrote: > > I think I found the answer. needed to read more API documentation :-) > > you can do it using > solrQuery.setFilterQueries() and build AND queries of multiple parameters. > > > Avlesh Singh wrote: > > > > For a starting point, this might be a good read - > > > http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query > > > > Cheers > > Avlesh > > > > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev > > wrote: > > > >> > >> I already did dive in before. I am using solrj API and SolrQuery object > >> to > >> build query. but its not clear/written how to build booleanQuery ANDing > >> bunch of different attributes in the index. Any samples please? > >> > >> Avlesh Singh wrote: > >> > > >> > Dive in - http://wiki.apache.org/solr/Solrj > >> > > >> > Cheers > >> > Avlesh > >> > > >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev > >> wrote: > >> > > >> >> > >> >> I want to build AND search query against field1 AND field2 etc. Both > >> >> these > >> >> fields are stored in an index. I am migrating lucene code to Solr. > >> >> Following > >> >> is my existing lucene code > >> >> > >> >> BooleanQuery currentSearchingQuery = new BooleanQuery(); > >> >> > >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); > >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); > >> >> > >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term > >> >> ("techGroup",searchForm.getTechGroup())); > >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); > >> >> TermQuery searchProgramQyery = new TermQuery(new > >> >> Term("techProgram",searchForm.getTechProgram())); > >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); > >> >> } > >> >> > >> >> What's the equivalent Solr code for above Luce code. Any samples > would > >> be > >> >> appreciated. > >> >> > >> >> Thanks, > >> >> -- > >> >> View this message in context: > >> >> > >> > http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Reseting doc boosts
AFAIK there is no way to "reset" the doc boost. You would need to re-index. Moreover, there is no way to "search by boost". Cheers Avlesh On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer wrote: > Hi, > > Im trying to figure out if there is an easy way to basically "reset" all of > any doc boosts which you have made (for analytical purposes) ... for example > if I run an index, gather report, doc boost on the report, and reset the > boosts @ time of next index ... > > It would seem to be from just knowing how Lucene works that I would really > need to reindex since its a attrib on the doc itself which would have to be > modified, but there is no easy way to query for docs which have been boosted > either. Any insight? > > Thanks. > > - Jon
Re: The status of Local/Geo/Spatial/Distance Solr
Hi Ian and Ryan, Thanks for the reply. Ian, I checked your pasted config, I am using the same one except the values of 4 25. Basically I use the set up specified at http://www.gissearch.com/localsolr. But there are still the same error I pasted in previous email. Ryan, I just checked out the lib lucene-spatial-2.9.1.jar Grant checked in today. Previously I built lucene-spatial-3.0-dev.jar from Lucene java code base directly. There is still no luck after the lib replacement. I do not think other lib matters in this case. On Fri, Nov 13, 2009 at 8:34 AM, Ian Ibbotson wrote: > Heya.. could it be a problem with your solr config files? I seem to > recall a change from the docs as they were to get this working.. I > have... > > > class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory"> >lat >lng >4 >25 > > > > > > class="com.pjaol.search.solr.component.LocalSolrQueryComponent" /> > class="org.apache.solr.handler.component.SearchHandler"> > > localsolr > facet > mlt > highlight > debug > > > > That tie up with your config/ I'd bascially interpreted the current > packaging as... What used to be locallucene has deffo merged into > lucene-spatial in this build, no more locallucene. However, you still > need to build localsolr for now... > > My solr jars are: > > commons-beanutils-1.8.0.jar commons-logging-1.1.1.jar > localsolr-1.5.2-rc1.jar lucene-misc-2.9.1-ki-rc3.jar >serializer-2.7.1.jar stax-1.2.0.jar > xml-apis-1.3.04.jar > commons-codec-1.4.jar commons-pool-1.5.3.jar > log4j-1.2.13.jar lucene-queries-2.9.1-ki-rc3.jar >slf4j-api-1.5.5.jarstax-api-1.0.jar > xpp3-1.1.3.4.O.jar > commons-dbcp-1.2.2.jargeoapi-nogenerics-2.1M2.jar > lucene-analyzers-2.9.1-ki-rc3.jarlucene-snowball-2.9.1-ki-rc3.jar >slf4j-log4j12-1.5.5.jarstax-utils-20040917.jar > commons-fileupload-1.2.1.jar geronimo-stax-api_1.0_spec-1.0.1.jar > lucene-core-2.9.1-ki-rc3.jar lucene-spatial-2.9.1-ki-rc3.jar >solr-commons-csv-1.4.0-ki-rc1.jar woodstox-wstx-asl-3.2.7.jar > commons-httpclient-3.1.jargt2-referencing-2.3.1.jar > lucene-highlighter-2.9.1-ki-rc3.jar > lucene-spellchecker-2.9.1-ki-rc3.jar solr-core-1.4.0-ki-rc1.jar > xalan-2.7.1.jar > commons-io-1.3.2.jar jsr108-0.01.jar > lucene-memory-2.9.1-ki-rc3.jar > org.codehaus.woodstox-wstx-asl-3.2.7.jar solr-solrj-1.4.0-ki-rc1.jar > xercesImpl-2.9.1.jar > > Sorry for dumping the info at you... hope it helps tho > > Ian. > > 2009/11/13 Bertie Shen : > > Hey, > > > > I am interested in using LocalSolr to go Local/Geo/Spatial/Distance > > search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr) > > points to pretty old documentation. Is there a better document I refer to > > for the setting up of LocalSolr and some performance analysis? > > > > Just sync-ed Solr codebase and found LocalSolr is still NOT in the > > contrib package. Do we have a plan to incorporate it? I download a > LocalSolr > > lib localsolr-1.5.jar from > > http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and > notice > > that the namespace is com.pjaol.search. blah blah, while LocalLucene > package > > is in Lucene codebase and the package name is org.apache.lucene.spatial > blah > > blah. > > > > But localsolr-1.5.jar from from > > http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does > not > > work with lucene-spatial-3.0-dev.jar I build from Lucene codebase > directly. > > After I restart tomcat, I could not load solr admin page. The error is as > > follows. It looks solr is still looking for > > old named classes. > > > > Thanks. > > > > HTTP Status 500 - Severe errors in solr configuration. Check your log > files > > for more detailed information on what may be wrong. If you want solr to > > continue after configuration errors, change: > > false in null > > - > > java.lang.NoClassDefFoundError: > > com/pjaol/search/geo/utils/DistanceFilter at > java.lang.Class.forName0(Native > > Method) at java.lang.Class.forName(Class.java:247) at > > > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) > > at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at > > org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at > > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at > > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at > > org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at > > org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at > > org.apache.solr.core.SolrCore.(SolrCore.java:551) at > > > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) > > at > > > org.apache.solr.servlet.SolrDispatch
Re: Question about the message "Indexing failed. Rolled back all changes."
The process initially completes with: 2009-11-13 09:40:46 Indexing completed. Added/Updated: 20 documents. Deleted 0 documents. ...but then it fails with: 2009-11-13 09:40:46 Indexing failed. Rolled back all changes. 2009-11-13 09:41:10 2009-11-13 09:41:10 2009-11-13 09:41:10 I think it may have something to do with this, which I found by using the DataImport.jsp: (Thread.java:636) Caused by: java.sql.SQLException: Illegal value for setFetchSize(). at com.mysql.jdbc.Statement.setFetchSize(Statement.java:1864) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:242) ... 28 more -- View this message in context: http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26340360.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
great. thanks. that was helpful Avlesh Singh wrote: > >> >> you can do it using >> solrQuery.setFilterQueries() and build AND queries of multiple >> parameters. >> > Nope. You would need to read more - > http://wiki.apache.org/solr/FilterQueryGuidance > > For your impatience, here's a quick starter - > > #and between two fields > solrQuery.setQuery("+field1:foo +field2:bar"); > > #or between two fields > solrQuery.setQuery("field1:foo field2:bar"); > > Cheers > Avlesh > > On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev > wrote: > >> >> I think I found the answer. needed to read more API documentation :-) >> >> you can do it using >> solrQuery.setFilterQueries() and build AND queries of multiple >> parameters. >> >> >> Avlesh Singh wrote: >> > >> > For a starting point, this might be a good read - >> > >> http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query >> > >> > Cheers >> > Avlesh >> > >> > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev >> > wrote: >> > >> >> >> >> I already did dive in before. I am using solrj API and SolrQuery >> object >> >> to >> >> build query. but its not clear/written how to build booleanQuery >> ANDing >> >> bunch of different attributes in the index. Any samples please? >> >> >> >> Avlesh Singh wrote: >> >> > >> >> > Dive in - http://wiki.apache.org/solr/Solrj >> >> > >> >> > Cheers >> >> > Avlesh >> >> > >> >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev >> >> wrote: >> >> > >> >> >> >> >> >> I want to build AND search query against field1 AND field2 etc. >> Both >> >> >> these >> >> >> fields are stored in an index. I am migrating lucene code to Solr. >> >> >> Following >> >> >> is my existing lucene code >> >> >> >> >> >> BooleanQuery currentSearchingQuery = new BooleanQuery(); >> >> >> >> >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); >> >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); >> >> >> >> >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term >> >> >> ("techGroup",searchForm.getTechGroup())); >> >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); >> >> >> TermQuery searchProgramQyery = new TermQuery(new >> >> >> Term("techProgram",searchForm.getTechProgram())); >> >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); >> >> >> } >> >> >> >> >> >> What's the equivalent Solr code for above Luce code. Any samples >> would >> >> be >> >> >> appreciated. >> >> >> >> >> >> Thanks, >> >> >> -- >> >> >> View this message in context: >> >> >> >> >> >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html >> >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26340776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: scanning folders recursively / Tika
Peter - if you want, download the code from Lucene in Action 1 or 2, it has index traversal and indexing. 2nd edition uses Tika. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Peter Gabriel > To: solr-user@lucene.apache.org > Sent: Fri, November 13, 2009 10:26:48 AM > Subject: scanning folders recursively / Tika > > Hello. > > I am on work with Tika 0.5 and want to scan a folder system about 10GB. > Is there a comfortable way to scan folders recursively with an existing class > or > have i to write it myself? > > Any tips for best practise? > > Greetings, Peter > -- > Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - > sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
Re: Customizing Field Score (Multivalued Field)
On Thu, Nov 12, 2009 at 3:00 PM, Stephen Duncan Jr wrote: > On Thu, Nov 12, 2009 at 2:54 PM, Chris Hostetter > wrote: > >> >> oh man, so you were parsing the Stored field values of every matching doc >> at query time? ouch. >> >> Assuming i'm understanding your goal, the conventional way to solve this >> type of problem is "payloads" ... you'll find lots of discussion on it in >> the various Lucene mailing lists, and if you look online Michael Busch has >> various slides that talk about using them. they let you say things >> like "in this document, at this postion of field 'x' the word 'microsoft' >> is worth 37.4, but at this other position (or in this other document) >> 'microsoft' is only worth 17.2" >> >> The simplest way to use them in Solr (as i understand it) is to use >> soemthing like the DelimitedPayloadTokenFilterFactory when indexing, and >> then write yourself >> a simple little custom QParser that generates a BoostingTermQuery on your >> field. >> >> should be a lot simpler to implement then the Query you are describing, >> and much faster. >> >> >> -Hoss >> >> > Thanks. I finally got around to looking at this again today and was looking > at a similar path, so I appreciate the confirmation. > > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > For posterity, here's the rest of what I discovered trying to implement this: You'll need to write a PayloadSimilarity as described here: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/(here's my updated version due to deprecation of the method mentioned in that article): @Override public float scorePayload( int docId, String fieldName, int start, int end, byte[] payload, int offset, int length) { // can ignore length here, because we know it is encoded as 4 bytes return PayloadHelper.decodeFloat(payload, offset); } You'll need to register that similarity in your Solr schema.xml (was hard to figure out, as I didn't realize that the similarity has to be applied globally to the writer/search used generally, even though I only care about payloads on one field, so I wasted time trying to figure out how to plug in the similarity in my query parser). You'll want to use the "payloads" type or something based on it that's in the example schema.xml. The latest and greatest query type to use is PayloadTermQuery. I use it in my custom query parser class, overriding getFieldQuery, checking for my field name, and then: return new PayloadTermQuery(new Term(field, queryText), new AveragePayloadFunction()); Due to the global nature of the Similarity, I guess you'd have to modify it to look at the field name and base behavior on that if you wanted different kinds of payloads on different fields in one schema. Also, whereas in my original implementation, I controlled the score completely, and therefore if I set a score of 0.8, the doc came back as score of 0.8, in this technique the payload is just used as a boost/addition to the score, so my scores came out higher than before. Since they're still in the same relative order, that still satisfied my needs, but did require updating my test cases. -- Stephen Duncan Jr www.stephenduncanjr.com
Making search results more stable as index is updated
If documents are being added to and removed from an index (and commits are being issued) while a user is searching, then the experience of paging through search results using the obvious solr mechanism (&start=100&Rows=10) may be disorienting for the user. For one example, by the time the user clicks "next page" for the first time, a document that they saw on page 1 may have been pushed onto page 2. (This may be especially pronounced if docs are being sorted by date.) I'm wondering what are the best options available for presenting a more stable set of search results to users in such cases. The obvious candidates to me are: #1: Cache results in the user session of the web tier. (In particular, maybe just cache the uniqueKey of each maching document.) Pro: Simple Con: May require capping the # of search results in order to make the initial query (which now has Solr numRows param >> web pageSize) fast enough. For example, maybe it's only practical to cache the first 500 records. #2: Create some kind of per-user results cache in Solr. (One simple implementation idea: You could make your Solr search handler take a userid parameter, and cache each user's last search in a special per-user results cache. You then also provide an API that says, "give me records n through m of userid #1334's last search". For your subsequent queries, you consult the latter API rather than redoing your search. Because Lucene docids are unstable across commits and such, I think this means caching the uniqueKey of each maching document. This in turn means looking up the uniqueKey of each maching document at search time. It also means you can't use the existing Solr caches, but need to make a new one.) Pro: Maybe faster than #1?? (Saves on data transfer between Solr and web tier, at least during the initial query.) Con: More complicated than #1. #3: Use filter queries to attempt to make your subsequent queries (for page 2, page 3, etc.) return results consistent with your original query. (One idea is to give each document a docAddedTimestamp field, which would have precision down to the millisecond or something. On your initial query, you could note the current time, T. Then for the subsequent queries you add a filter query for docAddedTimestamp<=T. Hopefully with a trie date field this would be fast. This should hopefully keep any docs newly added after T from showing up in the user's search results as they page through them. However, it won't necessarily protect you from docs that were *reindexed* (i.e. re-add a doc with the same uniqueKey as an existing doc) or docs that were deleted.) Pro: Doesn't require a new cache, and no cap on # of search results Con: Maybe doesn't provide total stability. Any feedback on these options? Are there other ideas to consider? Thanks, Chris
Re: having solr generate and execute other related queries automatically
tpunder wrote: > > Maybe I misunderstand what you are trying to do (or the facet.query > feature). If I did an initial query on my data-set that left me with the > following questions: > ... > http://localhost:8983/solr/select/?q=*%3A*&start=0&rows=0&facet=on&facet.query=brand_id:1&facet.query=brand_id:2&facet.query=+%2Bbrand_id:5+%2Bcategory_id:4051 > ... > Thanks for the reply Tim. I can't provide you with an example as I dont have anything prototyped as yet; I am still trying to work things thru in my head. The +20 queries would allow us to suggest other possibilities to users in a facet-like way (but not returning the exact same info as facets). With the technique you mention I would have to specify the list of query params for each facet.query. That would work for relatively simple queries. Unfortunately, the queries I was looking at doing would be fairly long (say hundreds of AND/OR statements). That said, I dont think solr would be able to handle the query size I would end up with (at least not efficiently), because the resulting query would consist of thousands of AND/OR statements (isnt there a limit of sorts in Solr?) I think that my best bet would be to extend the SearchComponent and perform the additional query generation and execution in the extension. That approach should also allow me to have access to the facet values that the base query would generate (which would allow me to generate and execute the other queries). thx again. -- View this message in context: http://old.nabble.com/having-solr-generate-and-execute-other-related-queries-automatically-tp26327032p26343409.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Multicore solr.xml schemaName parameter not being recognized
: On the CoreAdmin wiki page. thanks FWIW: The only time the string "schemaName" appears on the CoreAdmin wiki page is when it mentions that "solr.core.schemaName" is a property that is available to cores by default. the documentation for specificly says... >> The tag accepts the following attributes: ... >> * schema - The schema file name for a given core. The default is ... So the documentation is correct. -Hoss
Re: Solr 1.3 query and index perf tank during optimize
Mark Miller wrote on 11/12/2009 07:18:03 PM: > Ah, the pains of optimization. Its kind of just how it is. One solution > is to use two boxes and replication - optimize on the master, and then > queries only hit the slave. Out of reach for some though, and adds many > complications. Yes, in my use case 2 boxes isn't a great option. > Another kind of option is to use the partial optimize feature: > > > > Using this, you can optimize down to n segments and take a shorter hit > each time. Is this a 1.4 feature? I'm planning to migrate to 1.4, but it'll take a while since I have to port custom code forward, including a query parser. > Also, if optimizing is so painful, you might lower the merge factor > amortize that pain better. Thats another way to slowly get there - if > you lower the merge factor, as merging takes place, the new merge factor > will be respected, and semgents will merge down. A merge factor of 2 > (the lowest) will make it so you only ever have 2 segments. Sometimes > that works reasonably well - you could try 3-6 or something as well. > Then when you do your partial optimizes (and eventually a full optimize > perhaps), you want have so far to go. So this will slow down indexing but speed up optimize somewhat? Unfortunately right now I lose docs I'm indexing, as well slowing searching to a crawl. Ugh. I've got plenty of CPU horsepower. This is where having the ability to optimize on another filesystem would be useful. Would it perhaps make sense to set up a master/slave on the same machine? Then I suppose I can have an index being optimized that might not clobber the search. Would new indexed items still be dropped on the floor? Thanks, Jerry
Re: Stop solr without losing documents
: which documents have been updated before a successful commit. Now : stopping solr is as easy as kill -9. please don't kill -9 ... it's grossly overkill, and doesn't give your servlet container a fair chance to cleanthings up. A lot of work has been done to make Lucene indexes robust to hard terminations of the JVM (or physical machine) but there's no reason to go out of your way to try and stab it in the heart when you could just shut it down cleanly. that's not to say your appraoch isn't a good one -- if you only have one client sending updates/commits then having it keep track of what was indexed prior to the lasts successful commit is a viable way to dela with what happens if solr stops responding (either because you shut it down, or because it crashed for some other reason). Alternately, you could take advantage of the "enabled" feature from your client (just have it test the enabled url ever N updates or so) and when it sees that you have disabled the port it can send one last commit and then stop sending updates until it sees the enabled URL work againg -- as soon as you see the updates stop, you can safely shutdown hte port. -Hoss
Re: Solr 1.3 query and index perf tank during optimize
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: > > On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless > wrote: > > I think we sorely need a Directory impl that down-prioritizes IO > > performed by merging. > > It's unclear if this case is caused by IO contention, or the OS cache > of the hot parts of the index being lost by that extra IO activity. > Of course the latter would lead to the former, but without that OS > disk cache, the searches may be too slow even w/o the extra IO. Is there a way to configure things so that search and new data indexing get cached under the control of solr/lucene? Then we'd be less reliant on the OS behavior. Alternatively if there are OS params I can tweak (RHEL/Centos 5) to solve the problem, that's an option for me. Would you know if 1.4 is better behaved than 1.3? Thanks, Jerry
Re: Solr 1.3 query and index perf tank during optimize
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: > On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless > wrote: > > I think we sorely need a Directory impl that down-prioritizes IO > > performed by merging. > > It's unclear if this case is caused by IO contention, or the OS cache > of the hot parts of the index being lost by that extra IO activity. > Of course the latter would lead to the former, but without that OS > disk cache, the searches may be too slow even w/o the extra IO. On linux there's the ionice command to try to throttle processes. Would it be possible and make sense to have a separate process for optimizing that had ionice set it to idle? Can the index be shared this way? Thanks, Jerry
Re: NPE when trying to view a specific document via Luke
: I'm seeing this stack trace when I try to view a specific document, e.g. : /admin/luke?id=1 but luke appears to be working correctly when I just FWIW: I was able to reproduce this using the example setup (i picked a doc id at random) suspecting it was a bug in docFreq when using multiple segments, i tried optimizing and still got an NPE, but then my entire computer crashed (unrelated) before i could look any deeper. I have to go out now, but i'll try to dig into this more when i get back ... given where it happens in the code, it seems like a potentially serious lucene bug (either that: or LukeRequestHandler is doing something it really shouldn't be, but i can't imagine how it could trigger an NPE that deep in the lucene code) : view /admin/luke. Does this look familiar to anyone? Our sysadmin just : upgraded us to the 1.4 release, I'm not sure if this occurred before : that. : : Thanks, : Jake : : 1. java.lang.NullPointerException : 2. at org.apache.lucene.index.TermBuffer.set(TermBuffer.java:95) : 3. at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158) : 4. at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232) : 5. at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179) : 6. at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975) : 7. at org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627) : 8. at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308) : 9. at org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:248) : 10.at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:124) : 11.at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) : 12.at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) : 13.at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) : 14.at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) : 15.at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76) : 16.at com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158) : 17.at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178) : 18.at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241) : 19.at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435) : 20.at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586) : 21.at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690) : 22.at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612) : 23.at java.lang.Thread.run(Thread.java:619) : 24. : 25. Date: Fri, 13 Nov 2009 02:19:54 GMT : 26. Server: Apache/2.2.3 (Red Hat) : 27. Cache-Control: no-cache, no-store : 28. Pragma: no-cache : 29. Expires: Sat, 01 Jan 2000 01:00:00 GMT : 30. Content-Type: text/html; charset=UTF-8 : 31. Vary: Accept-Encoding,User-Agent : 32. Content-Encoding: gzip : 33. Content-Length: 1066 : 34. Connection: close : 35. : -Hoss
Re: Request assistance with distributed search multi shard/core setup and configuration
DS requires a bunch of shard names in the url. That's all. Note that a ds does not use the data of the solr you call. You can create an entry point for your distributed search by adding a new element in solrconfig.xml. You would add the shard list parameter to the "defaults" list. Do not have it call the same requesthandler path- you'll get an infinite loop. On Tue, Nov 10, 2009 at 6:44 PM, Otis Gospodnetic wrote: > Hm, I don't follow. You don't need to create a custom (request) handler to > make use of Solr's distributed search. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: "Turner, Robbin J" >> To: "solr-user@lucene.apache.org" >> Sent: Tue, November 10, 2009 6:41:32 PM >> Subject: RE: Request assistance with distributed search multi shard/core >> setup and configuration >> >> Thanks, I had already read through this url. I guess my request was is >> there a >> way to setup something that is already part of solr itself to pass the >> URL[shard...] then having create a custom handler. >> >> thanks >> >> -Original Message- >> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> Sent: Tuesday, November 10, 2009 6:09 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Request assistance with distributed search multi shard/core >> setup >> and configuration >> >> Right, that's http://wiki.apache.org/solr/DistributedSearch >> >> Otis >> -- >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> >> >> - Original Message >> > From: "Turner, Robbin J" >> > To: "solr-user@lucene.apache.org" >> > Sent: Tue, November 10, 2009 6:05:19 PM >> > Subject: RE: Request assistance with distributed search multi >> > shard/core setup and configuration >> > >> > I've already done the single Solr, that's why my request. I read on >> > some site that there is a way to setup the configuration so I can send >> > a query to one solr instance and have it pass it on or distribute it across >> all the instances? >> > >> > Btw, thanks for the quick reply. >> > RJ >> > >> > -Original Message- >> > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> > Sent: Tuesday, November 10, 2009 6:02 PM >> > To: solr-user@lucene.apache.org >> > Subject: Re: Request assistance with distributed search multi >> > shard/core setup and configuration >> > >> > RJ, >> > >> > You may want to take a simpler step - single Solr core (no solr.xml >> > needed) per machine. Then distributed search really only requires >> > that you specify shard URLs in the URL of the search requests. In >> > practice/production you rarely benefit from distributed search against >> > multiple cores on the same server anyway. >> > >> > Otis >> > -- >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> > >> > >> > >> > >> > >> > From: "Turner, Robbin J" >> > To: "solr-user@lucene.apache.org" >> > Sent: Tue, November 10, 2009 5:58:52 PM >> > Subject: Request assistance with distributed search multi shard/core >> > setup and configuration >> > >> > I've been looking through all the documentation. I've set up a single >> > solr instance, and one multicore instance. If someone would be >> > willing to share some configuration examples and/or advise for setting >> > up solr for distributing the search, I would really appreciate it. >> > I've read that there is a way to do it, but most of the current >> > documentation doesn't provide enough example on what to do with >> > solr.xml, and the solrconfig.xml. Also, I'm using tomcat 6 for the servlet >> container. I deployed the solr 1.4.0 released yesterday. >> > >> > Thanks >> > RJ > > -- Lance Norskog goks...@gmail.com
Re: NPE when trying to view a specific document via Luke
On Fri, Nov 13, 2009 at 5:41 PM, Chris Hostetter wrote: > : I'm seeing this stack trace when I try to view a specific document, e.g. > : /admin/luke?id=1 but luke appears to be working correctly when I just > > FWIW: I was able to reproduce this using the example setup (i picked a > doc id at random) suspecting it was a bug in docFreq Probably just a null being passed in the text part of the term. I bet Luke expects all field values to be strings, but some are binary. -Yonik http://www.lucidimagination.com
Fwd: Lucene MMAP Usage with Solr
Folks, I am trying to get Lucene MMAP to work in solr. I am assuming that when I configure MMAP the entire index will be loaded into RAM. Is that the right assumption ? I have tried the following ways for using MMAP: Option 1. Using the solr config below for MMAP configuration -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory With this config, when I start solr with a 30G index, I expected that the RAM usage should go up, but it did not. Option 2. By Code Change I made the following code change : Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead of FSDirectory. Code snippet pasted below. Could you help me to understand if these are the right way to use MMAP? Thanks much /ST. Code SNippet for Option 2: package org.apache.solr.core; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.File; import java.io.IOException; import org.apache.lucene.store.Directory; import org.apache.lucene.store.MMapDirectory; /** * Directory provider which mimics original Solr FSDirectory based behavior. * */ public class StandardDirectoryFactory extends DirectoryFactory { public Directory open(String path) throws IOException { return MMapDirectory.open(new File(path)); } }
Re: any docs on solr.EdgeNGramFilterFactory?
Thanks for the link - there doesn't seem a be a fix version specified, so I guess this will not officially ship with lucene 2.9? -Peter On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir wrote: > Peter, here is a project that does this: > http://issues.apache.org/jira/browse/LUCENE-1488 > > >> That's kind of interesting - in general can I build a custom tokenizer >> from existing tokenizers that treats different parts of the input >> differently based on the utf-8 range of the characters? E.g. use a >> porter stemmer for stretches of Latin text and n-gram or something >> else for CJK? >> >> -Peter >> >> On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic >> wrote: >> > Yes, that's the n-gram one. I believe the existing CJK one in Lucene is >> really just an n-gram tokenizer, so no different than the normal n-gram >> tokenizer. >> > >> > Otis >> > -- >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> > >> > >> > >> > - Original Message >> >> From: Peter Wolanin >> >> To: solr-user@lucene.apache.org >> >> Sent: Tue, November 10, 2009 7:34:37 PM >> >> Subject: Re: any docs on solr.EdgeNGramFilterFactory? >> >> >> >> So, this is the normal N-gram one? NGramTokenizerFactory >> >> >> >> Digging deeper - there are actualy CJK and Chinese tokenizers in the >> >> Solr codebase: >> >> >> >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html >> >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html >> >> >> >> The CJK one uses the lucene CJKTokenizer >> >> >> http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html >> >> >> >> and there seems to be another one even that no one has wrapped into >> Solr: >> >> >> http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html >> >> >> >> So seems like the existing options are a little better than I thought, >> >> though it would be nice to have some docs on properly configuring >> >> these. >> >> >> >> -Peter >> >> >> >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic >> >> wrote: >> >> > Peter, >> >> > >> >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but >> just >> >> n-grams. >> >> > Before you take the n-gram route, you may want to look at the smart >> Chinese >> >> analyzer in Lucene contrib (I think it works only for Simplified >> Chinese) and >> >> Sen (on java.net). I also spotted a Korean analyzer in the wild a few >> months >> >> back. >> >> > >> >> > Otis >> >> > -- >> >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> > >> >> > >> >> > >> >> > - Original Message >> >> >> From: Peter Wolanin >> >> >> To: solr-user@lucene.apache.org >> >> >> Sent: Tue, November 10, 2009 4:06:52 PM >> >> >> Subject: any docs on solr.EdgeNGramFilterFactory? >> >> >> >> >> >> This fairly recent blog post: >> >> >> >> >> >> >> >> >> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ >> >> >> >> >> >> describes the use of the solr.EdgeNGramFilterFactory as the tokenizer >> >> >> for the index. I don't see any mention of that tokenizer on the Solr >> >> >> wiki - is it just waiting to be added, or is there any other >> >> >> documentation in addition to the blog post? In particular, there was >> >> >> a thread last year about using an N-gram tokenizer to enable >> >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious to >> >> >> know how people are configuring their schema (with this tokenizer?) >> >> >> for that use case. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Peter >> >> >> >> >> >> -- >> >> >> Peter M. Wolanin, Ph.D. >> >> >> Momentum Specialist, Acquia. Inc. >> >> >> peter.wola...@acquia.com >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Peter M. Wolanin, Ph.D. >> >> Momentum Specialist, Acquia. Inc. >> >> peter.wola...@acquia.com >> > >> > >> >> >> >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > > > > > -- > Robert Muir > rcm...@gmail.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Reseting doc boosts
I'm not sure this is what you are looking for, but there is FieldNormModifier tool in Lucene. Koji -- http://www.rondhuit.com/en/ Avlesh Singh wrote: AFAIK there is no way to "reset" the doc boost. You would need to re-index. Moreover, there is no way to "search by boost". Cheers Avlesh On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer wrote: Hi, Im trying to figure out if there is an easy way to basically "reset" all of any doc boosts which you have made (for analytical purposes) ... for example if I run an index, gather report, doc boost on the report, and reset the boosts @ time of next index ... It would seem to be from just knowing how Lucene works that I would really need to reindex since its a attrib on the doc itself which would have to be modified, but there is no easy way to query for docs which have been boosted either. Any insight? Thanks. - Jon
Re: any docs on solr.EdgeNGramFilterFactory?
ah, thanks, i'll tentatively set one in the future, but definitely not 2.9.x more just to show you the idea, you can do different things depending on different runs of writing systems in text. but it doesnt solve everything: you only know its Latin script, not english, so you can't safely automatically do anything like stemming. say your content is only chinese, english: the analyzer won't know your latin script text is english, versus say, french from the unicode, so it won't stem it. but that analyzer will lowercase it. it won't know if your ideographs are chinese or japanese, but it will use n-gram tokenization, you get the drift. in that impl, it puts the script code in the flags so downstream you could do something like stemming if you happen to know more than is evident from the unicode. On Fri, Nov 13, 2009 at 6:23 PM, Peter Wolanin wrote: > Thanks for the link - there doesn't seem a be a fix version specified, > so I guess this will not officially ship with lucene 2.9? > > -Peter > > On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir wrote: > > Peter, here is a project that does this: > > http://issues.apache.org/jira/browse/LUCENE-1488 > > > > > >> That's kind of interesting - in general can I build a custom tokenizer > >> from existing tokenizers that treats different parts of the input > >> differently based on the utf-8 range of the characters? E.g. use a > >> porter stemmer for stretches of Latin text and n-gram or something > >> else for CJK? > >> > >> -Peter > >> > >> On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic > >> wrote: > >> > Yes, that's the n-gram one. I believe the existing CJK one in Lucene > is > >> really just an n-gram tokenizer, so no different than the normal n-gram > >> tokenizer. > >> > > >> > Otis > >> > -- > >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > >> > > >> > > >> > > >> > - Original Message > >> >> From: Peter Wolanin > >> >> To: solr-user@lucene.apache.org > >> >> Sent: Tue, November 10, 2009 7:34:37 PM > >> >> Subject: Re: any docs on solr.EdgeNGramFilterFactory? > >> >> > >> >> So, this is the normal N-gram one? NGramTokenizerFactory > >> >> > >> >> Digging deeper - there are actualy CJK and Chinese tokenizers in the > >> >> Solr codebase: > >> >> > >> >> > >> > http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html > >> >> > >> > http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html > >> >> > >> >> The CJK one uses the lucene CJKTokenizer > >> >> > >> > http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html > >> >> > >> >> and there seems to be another one even that no one has wrapped into > >> Solr: > >> >> > >> > http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html > >> >> > >> >> So seems like the existing options are a little better than I > thought, > >> >> though it would be nice to have some docs on properly configuring > >> >> these. > >> >> > >> >> -Peter > >> >> > >> >> On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic > >> >> wrote: > >> >> > Peter, > >> >> > > >> >> > For CJK and n-grams, I think you don't want the *Edge* n-grams, but > >> just > >> >> n-grams. > >> >> > Before you take the n-gram route, you may want to look at the smart > >> Chinese > >> >> analyzer in Lucene contrib (I think it works only for Simplified > >> Chinese) and > >> >> Sen (on java.net). I also spotted a Korean analyzer in the wild a > few > >> months > >> >> back. > >> >> > > >> >> > Otis > >> >> > -- > >> >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > >> >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > >> >> > > >> >> > > >> >> > > >> >> > - Original Message > >> >> >> From: Peter Wolanin > >> >> >> To: solr-user@lucene.apache.org > >> >> >> Sent: Tue, November 10, 2009 4:06:52 PM > >> >> >> Subject: any docs on solr.EdgeNGramFilterFactory? > >> >> >> > >> >> >> This fairly recent blog post: > >> >> >> > >> >> >> > >> >> > >> > http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ > >> >> >> > >> >> >> describes the use of the solr.EdgeNGramFilterFactory as the > tokenizer > >> >> >> for the index. I don't see any mention of that tokenizer on the > Solr > >> >> >> wiki - is it just waiting to be added, or is there any other > >> >> >> documentation in addition to the blog post? In particular, there > was > >> >> >> a thread last year about using an N-gram tokenizer to enable > >> >> >> reasonable (if not ideal) searching of CJK text, so I'd be curious > to > >> >> >> know how people are configuring their schema (with this > tokenizer?) > >> >> >> for that use case. > >> >> >> > >> >> >> Thanks, > >> >> >> > >> >> >> Peter > >> >> >> > >> >> >> -- > >> >> >> Peter M. Wolanin, Ph.D. > >> >> >>
Re: NPE when trying to view a specific document via Luke
: > FWIW: I was able to reproduce this using the example setup (i picked a : > doc id at random) �suspecting it was a bug in docFreq : : Probably just a null being passed in the text part of the term. : I bet Luke expects all field values to be strings, but some are binary. I'm not sure i follow you ... i think you saying that naive assumptions in the LukeRequestHandler could result in it asking for the docFreq of a term that has a null string value because some field types are binary, except that... 1) 1.3 didn't have this problem 2) LukeRequestHandler.getDocumentFieldsInfo didn't change from 1.3 to 1.4 I tied to reproduce this in 1.4 using an index/configs created with 1.3, but i got a *different* NPE when loading this url... http://localhost:8983/solr/admin/luke?id=SP2514N SEVERE: java.lang.NullPointerException at org.apache.solr.util.NumberUtils.SortableStr2int(NumberUtils.java:127) at org.apache.solr.util.NumberUtils.SortableStr2float(NumberUtils.java:83) at org.apache.solr.util.NumberUtils.SortableStr2floatStr(NumberUtils.java:89) at org.apache.solr.schema.SortableFloatField.indexedToReadable(SortableFloatField.java:62) at org.apache.solr.schema.SortableFloatField.toExternal(SortableFloatField.java:53) at org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:245) ...all three of these stack traces seem to suggest that some imple of Fieldable.stringValue in 2.9 is returning null in cases where it returned *something* else in the 2.4-dev jar used by Solr 1.3. That seems like it could have other impacts besides LukeRequestHandler. -Hoss
Re: NPE when trying to view a specific document via Luke
: I tied to reproduce this in 1.4 using an index/configs created with 1.3, : but i got a *different* NPE when loading this url... I should have tried a simpler test ... iget NPE's just trying to execute a simple search for *:* when i try to use the example index built in 1.3 (with the 1.3 configs) in 1.4. same (apparent) cause: code is attempting to deref a string returned by Fieldable.stringValue() which is null... java.lang.NullPointerException at org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:311) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520) at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325) This really does smell like something in Lucene changed behavior drasticly. I've been looking at diffs from java/tr...@691741 and java/tags/lucene_2_9_1 but nothing jumps out at me that would explain this. If nothing else, i'm opening a solr issue... -Hoss
StreamingUpdateSolrServer commit?
When does StreamingUpdateSolrServer commit? I know there's a threshhold and thread pool as params but I don't see a commit timeout. Do I have to manage this myself?
Re: exclude some fields from copying dynamic fields | schema.xml
There is no direct way. Let's say you have a "nocopy_s" and you do not want a copy "nocopy_str_s". This might work: declare "nocopy_str_s" as a field and make it not indexed and not stored. I don't know if this will work. It requires two overrides to work: 1) that declaring a field name that matches a wildcard will override the default wildcard rule, and 2) that "stored=false indexed=false" works. On Fri, Nov 13, 2009 at 3:23 AM, Vicky_Dev wrote: > > Hi, > we are using the following entry in schema.xml to make a copy of one type of > dynamic field to another : > > > Is it possible to exclude some fields from copying. > > We are using Solr1.3 > > ~Vikrant > > -- > View this message in context: > http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Reseting doc boosts
This looks exactly like what I was needing ... this looks like it would be a great tool / addition to Solr web interface but it looks like it only takes (Directory d, Similarity s) (vs. subset collection of documents) ... Either way great find, thanks for your help ... - Jon On Nov 13, 2009, at 6:40 PM, Koji Sekiguchi wrote: > I'm not sure this is what you are looking for, > but there is FieldNormModifier tool in Lucene. > > Koji > > -- > > http://www.rondhuit.com/en/ > > > Avlesh Singh wrote: >> AFAIK there is no way to "reset" the doc boost. You would need to re-index. >> Moreover, there is no way to "search by boost". >> >> Cheers >> Avlesh >> >> On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer wrote: >> >> >>> Hi, >>> >>> Im trying to figure out if there is an easy way to basically "reset" all of >>> any doc boosts which you have made (for analytical purposes) ... for example >>> if I run an index, gather report, doc boost on the report, and reset the >>> boosts @ time of next index ... >>> >>> It would seem to be from just knowing how Lucene works that I would really >>> need to reindex since its a attrib on the doc itself which would have to be >>> modified, but there is no easy way to query for docs which have been boosted >>> either. Any insight? >>> >>> Thanks. >>> >>> - Jon >>> >> >> >
Re: Making search results more stable as index is updated
This is one case where permanent caches are interesting. Another case is highlighting: in some cases highlighting takes a lot of work, and this work is not cached. It might be a cleaner architecture to have session-maintaining code in a separate front-end app, and leave Solr session-free. On Fri, Nov 13, 2009 at 12:48 PM, Chris Harris wrote: > If documents are being added to and removed from an index (and commits > are being issued) while a user is searching, then the experience of > paging through search results using the obvious solr mechanism > (&start=100&Rows=10) may be disorienting for the user. For one > example, by the time the user clicks "next page" for the first time, a > document that they saw on page 1 may have been pushed onto page 2. > (This may be especially pronounced if docs are being sorted by date.) > > I'm wondering what are the best options available for presenting a > more stable set of search results to users in such cases. The obvious > candidates to me are: > > #1: Cache results in the user session of the web tier. (In particular, > maybe just cache the uniqueKey of each maching document.) > > Pro: Simple > Con: May require capping the # of search results in order to make > the initial query (which now has Solr numRows param >> web pageSize) > fast enough. For example, maybe it's only practical to cache the first > 500 records. > > #2: Create some kind of per-user results cache in Solr. (One simple > implementation idea: You could make your Solr search handler take a > userid parameter, and cache each user's last search in a special > per-user results cache. You then also provide an API that says, "give > me records n through m of userid #1334's last search". For your > subsequent queries, you consult the latter API rather than redoing > your search. Because Lucene docids are unstable across commits and > such, I think this means caching the uniqueKey of each maching > document. This in turn means looking up the uniqueKey of each maching > document at search time. It also means you can't use the existing Solr > caches, but need to make a new one.) > > Pro: Maybe faster than #1?? (Saves on data transfer between Solr and > web tier, at least during the initial query.) > Con: More complicated than #1. > > #3: Use filter queries to attempt to make your subsequent queries (for > page 2, page 3, etc.) return results consistent with your original > query. (One idea is to give each document a docAddedTimestamp field, > which would have precision down to the millisecond or something. On > your initial query, you could note the current time, T. Then for the > subsequent queries you add a filter query for docAddedTimestamp<=T. > Hopefully with a trie date field this would be fast. This should > hopefully keep any docs newly added after T from showing up in the > user's search results as they page through them. However, it won't > necessarily protect you from docs that were *reindexed* (i.e. re-add a > doc with the same uniqueKey as an existing doc) or docs that were > deleted.) > > Pro: Doesn't require a new cache, and no cap on # of search results > Con: Maybe doesn't provide total stability. > > Any feedback on these options? Are there other ideas to consider? > > Thanks, > Chris > -- Lance Norskog goks...@gmail.com
Re: StreamingUpdateSolrServer commit?
Unless I slept through it, you still need to explicitly commit, even with SUSS. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: "erikea...@yahoo.com" > To: "solr-user@lucene.apache.org" > Sent: Fri, November 13, 2009 9:43:53 PM > Subject: StreamingUpdateSolrServer commit? > > > When does StreamingUpdateSolrServer commit? > > I know there's a threshhold and thread pool as params but I don't see a > commit > timeout. Do I have to manage this myself?
Re: Fwd: Lucene MMAP Usage with Solr
I thought that was the way to use it (but I've never had to use it myself) and that it means memory through the roof, yes. If you look at the Solr Admin statistics page, does it show you which Directory you are using? For example, on 1 Solr instance I'm looking at I see: readerDir : org.apache.lucene.store.NIOFSDirectory@/mnt/ Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: ST ST > To: solr-user@lucene.apache.org > Sent: Fri, November 13, 2009 6:03:57 PM > Subject: Fwd: Lucene MMAP Usage with Solr > > Folks, > > I am trying to get Lucene MMAP to work in solr. > > I am assuming that when I configure MMAP the entire index will be loaded > into RAM. > Is that the right assumption ? > > I have tried the following ways for using MMAP: > > Option 1. Using the solr config below for MMAP configuration > > -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory > >With this config, when I start solr with a 30G index, I expected that the > RAM usage should go up, but it did not. > > Option 2. By Code Change > I made the following code change : > >Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead > of FSDirectory. >Code snippet pasted below. > > > Could you help me to understand if these are the right way to use MMAP? > > Thanks much > /ST. > > Code SNippet for Option 2: > > package org.apache.solr.core; > /** > * Licensed to the Apache Software Foundation (ASF) under one or more > * contributor license agreements. See the NOTICE file distributed with > * this work for additional information regarding copyright ownership. > * The ASF licenses this file to You under the Apache License, Version 2.0 > * (the "License"); you may not use this file except in compliance with > * the License. You may obtain a copy of the License at > * > *http://www.apache.org/licenses/LICENSE-2.0 > * > * Unless required by applicable law or agreed to in writing, software > * distributed under the License is distributed on an "AS IS" BASIS, > * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > * See the License for the specific language governing permissions and > * limitations under the License. > */ > > import java.io.File; > import java.io.IOException; > > import org.apache.lucene.store.Directory; > import org.apache.lucene.store.MMapDirectory; > > /** > * Directory provider which mimics original Solr FSDirectory based behavior. > * > */ > public class StandardDirectoryFactory extends DirectoryFactory { > > public Directory open(String path) throws IOException { > return MMapDirectory.open(new File(path)); > } > }
Re: Stop solr without losing documents
So I think the question is really: "If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits". I don't have the Solr source handy, but if I did, I'd look for "Shutdown", "Hook" and "finalize" in the code. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Chris Hostetter > To: solr-user@lucene.apache.org > Sent: Fri, November 13, 2009 4:09:00 PM > Subject: Re: Stop solr without losing documents > > > : which documents have been updated before a successful commit. Now > : stopping solr is as easy as kill -9. > > please don't kill -9 ... it's grossly overkill, and doesn't give your > servlet container a fair chance to cleanthings up. A lot of work has been > done to make Lucene indexes robust to hard terminations of the JVM (or > physical machine) but there's no reason to go out of your way to try and > stab it in the heart when you could just shut it down cleanly. > > that's not to say your appraoch isn't a good one -- if you only have one > client sending updates/commits then having it keep track of what was > indexed prior to the lasts successful commit is a viable way to dela with > what happens if solr stops responding (either because you shut it down, or > because it crashed for some other reason). > > Alternately, you could take advantage of the "enabled" feature from your > client (just have it test the enabled url ever N updates or so) and when > it sees that you have disabled the port it can send one last commit and > then stop sending updates until it sees the enabled URL work againg -- as > soon as you see the updates stop, you can safely shutdown hte port. > > > -Hoss
changes to highlighting config or syntax in 1.4?
I'm testing out the final release of Solr 1.4 as compared to the build I have been using from around June. I'm using hte dismax handler for searches. I'm finding that highlighting is completely broken as compared to previously. Much more text is returned than it should for each string in , but the search words are never highlighted in that response. Setting usePhraseHighlighter=false makes no difference. Any pointers appreciated. -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.3 query and index perf tank during optimize
Let's take a step back. Why do you need to optimize? You said: "As long as I'm not optimizing, search and indexing times are satisfactory." :) You don't need to optimize just because you are continuously adding and deleting documents. On the contrary! Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Jerome L Quinn > To: solr-user@lucene.apache.org > Sent: Thu, November 12, 2009 6:30:42 PM > Subject: Solr 1.3 query and index perf tank during optimize > > > Hi, everyone, this is a problem I've had for quite a while, > and have basically avoided optimizing because of it. However, > eventually we will get to the point where we must delete as > well as add docs continuously. > > I have a Solr 1.3 index with ~4M docs at around 90G. This is a single > instance running inside tomcat 6, so no replication. Merge factor is the > default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. > autoCommit is set at 3 sec. > > We continually push new data into the index, at somewhere between 1-10 docs > every 10 sec or so. Solr is running on a quad-core 3.0GHz server. > under IBM java 1.6. The index is sitting on a local 15K scsi disk. > There's nothing > else of substance running on the box. > > Optimizing the index takes about 65 min. > > As long as I'm not optimizing, search and indexing times are satisfactory. > > When I start the optimize, I see massive problems with timeouts pushing new > docs > into the index, and search times balloon. A typical search while > optimizing takes > about 1 min instead of a few seconds. > > Can anyone offer me help with fixing the problem? > > Thanks, > Jerry Quinn
Re: Solr 1.3 query and index perf tank during optimize
The 'maxSegments' feature is new with 1.4. I'm not sure that it will cause any less disk I/O during optimize. The 'mergeFactor=2' idea is not what you think: in this case the index is always "mostly optimized", so you never need to run optimize. Indexing is always slower, because you amortize the optimize time into little continuous chunks during indexing. You never stop indexing. You should not lose documents. On Fri, Nov 13, 2009 at 1:07 PM, Jerome L Quinn wrote: > > Mark Miller wrote on 11/12/2009 07:18:03 PM: >> Ah, the pains of optimization. Its kind of just how it is. One solution >> is to use two boxes and replication - optimize on the master, and then >> queries only hit the slave. Out of reach for some though, and adds many >> complications. > > Yes, in my use case 2 boxes isn't a great option. > > >> Another kind of option is to use the partial optimize feature: >> >> >> >> Using this, you can optimize down to n segments and take a shorter hit >> each time. > > Is this a 1.4 feature? I'm planning to migrate to 1.4, but it'll take a > while since > I have to port custom code forward, including a query parser. > > >> Also, if optimizing is so painful, you might lower the merge factor >> amortize that pain better. Thats another way to slowly get there - if >> you lower the merge factor, as merging takes place, the new merge factor >> will be respected, and semgents will merge down. A merge factor of 2 >> (the lowest) will make it so you only ever have 2 segments. Sometimes >> that works reasonably well - you could try 3-6 or something as well. >> Then when you do your partial optimizes (and eventually a full optimize >> perhaps), you want have so far to go. > > So this will slow down indexing but speed up optimize somewhat? > Unfortunately > right now I lose docs I'm indexing, as well slowing searching to a crawl. > Ugh. > > I've got plenty of CPU horsepower. This is where having the ability to > optimize > on another filesystem would be useful. > > Would it perhaps make sense to set up a master/slave on the same machine? > Then > I suppose I can have an index being optimized that might not clobber the > search. > Would new indexed items still be dropped on the floor? > > Thanks, > Jerry -- Lance Norskog goks...@gmail.com
Re: Fwd: Lucene MMAP Usage with Solr
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/apis/mmap.htm Normally file I/O in a program means that the data is copied between the system I/O disk cache and the program's memory. Memory-mapping means that the program address space points to the disk I/O cache directly, there is no copying. In other words the program and the OS share the same memory. The OS tends to stream the entire file in but this is not required. Memory-mapping may be faster and may not, depending on your index sizes, memory access patterns, etc. On Fri, Nov 13, 2009 at 7:49 PM, Otis Gospodnetic wrote: > I thought that was the way to use it (but I've never had to use it myself) > and that it means memory through the roof, yes. > If you look at the Solr Admin statistics page, does it show you which > Directory you are using? > > For example, on 1 Solr instance I'm looking at I see: > > readerDir : org.apache.lucene.store.NIOFSDirectory@/mnt/ > > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: ST ST >> To: solr-user@lucene.apache.org >> Sent: Fri, November 13, 2009 6:03:57 PM >> Subject: Fwd: Lucene MMAP Usage with Solr >> >> Folks, >> >> I am trying to get Lucene MMAP to work in solr. >> >> I am assuming that when I configure MMAP the entire index will be loaded >> into RAM. >> Is that the right assumption ? >> >> I have tried the following ways for using MMAP: >> >> Option 1. Using the solr config below for MMAP configuration >> >> -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory >> >> With this config, when I start solr with a 30G index, I expected that the >> RAM usage should go up, but it did not. >> >> Option 2. By Code Change >> I made the following code change : >> >> Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead >> of FSDirectory. >> Code snippet pasted below. >> >> >> Could you help me to understand if these are the right way to use MMAP? >> >> Thanks much >> /ST. >> >> Code SNippet for Option 2: >> >> package org.apache.solr.core; >> /** >> * Licensed to the Apache Software Foundation (ASF) under one or more >> * contributor license agreements. See the NOTICE file distributed with >> * this work for additional information regarding copyright ownership. >> * The ASF licenses this file to You under the Apache License, Version 2.0 >> * (the "License"); you may not use this file except in compliance with >> * the License. You may obtain a copy of the License at >> * >> * http://www.apache.org/licenses/LICENSE-2.0 >> * >> * Unless required by applicable law or agreed to in writing, software >> * distributed under the License is distributed on an "AS IS" BASIS, >> * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. >> * See the License for the specific language governing permissions and >> * limitations under the License. >> */ >> >> import java.io.File; >> import java.io.IOException; >> >> import org.apache.lucene.store.Directory; >> import org.apache.lucene.store.MMapDirectory; >> >> /** >> * Directory provider which mimics original Solr FSDirectory based behavior. >> * >> */ >> public class StandardDirectoryFactory extends DirectoryFactory { >> >> public Directory open(String path) throws IOException { >> return MMapDirectory.open(new File(path)); >> } >> } > > -- Lance Norskog goks...@gmail.com
Re: changes to highlighting config or syntax in 1.4?
Apparently one of my conf files was broken - odd that I didn't see any exceptions. Anyhow - excuse my haste, I don't see the problem now. -Peter On Fri, Nov 13, 2009 at 11:06 PM, Peter Wolanin wrote: > I'm testing out the final release of Solr 1.4 as compared to the build > I have been using from around June. > > I'm using hte dismax handler for searches. I'm finding that > highlighting is completely broken as compared to previously. Much > more text is returned than it should for each string in name="highlighting">, but the search words are never highlighted in > that response. Setting usePhraseHighlighter=false makes no > difference. > > Any pointers appreciated. > > -Peter > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Reseting doc boosts
Yeah I ended up created a "boosted" field for @ least debugging, but might patch / extend / create my own FieldNormModifier using just that criteria + doing the reset. - Jon On Nov 13, 2009, at 12:21 PM, Avlesh Singh wrote: > AFAIK there is no way to "reset" the doc boost. You would need to re-index. > Moreover, there is no way to "search by boost". > > Cheers > Avlesh > > On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer wrote: > >> Hi, >> >> Im trying to figure out if there is an easy way to basically "reset" all of >> any doc boosts which you have made (for analytical purposes) ... for example >> if I run an index, gather report, doc boost on the report, and reset the >> boosts @ time of next index ... >> >> It would seem to be from just knowing how Lucene works that I would really >> need to reindex since its a attrib on the doc itself which would have to be >> modified, but there is no easy way to query for docs which have been boosted >> either. Any insight? >> >> Thanks. >> >> - Jon
Re: Data import problem with child entity from different database
am unable to get the file http://old.nabble.com/file/p26335171/dataimport.temp.xml On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg wrote: > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> no obvious issues. >> you may post your entire data-config.xml >> > > Here it is, exactly as last attempt but with usernames etc. removed. > > Ignore the comments and the unused FileDataSource... > > http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> do w/o CachedSqlEntityProcessor first and then apply that later >> > > Yep, that was just a bit of a wild stab in the dark to see if it made any > difference. > > Thanks, > > Andrew. > > -- > View this message in context: > http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Stop solr without losing documents
I would go with polling Solr to find what is not yet there. In production, it is better to assume that things will break, and have backstop janitors that fix them. And then test those janitors regularly. On Fri, Nov 13, 2009 at 8:02 PM, Otis Gospodnetic wrote: > So I think the question is really: > "If I stop the servlet container, does Solr issue a commit in the shutdown > hook in order to ensure all buffered docs are persisted to disk before the > JVM exits". > > I don't have the Solr source handy, but if I did, I'd look for "Shutdown", > "Hook" and "finalize" in the code. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Chris Hostetter >> To: solr-user@lucene.apache.org >> Sent: Fri, November 13, 2009 4:09:00 PM >> Subject: Re: Stop solr without losing documents >> >> >> : which documents have been updated before a successful commit. Now >> : stopping solr is as easy as kill -9. >> >> please don't kill -9 ... it's grossly overkill, and doesn't give your >> servlet container a fair chance to cleanthings up. A lot of work has been >> done to make Lucene indexes robust to hard terminations of the JVM (or >> physical machine) but there's no reason to go out of your way to try and >> stab it in the heart when you could just shut it down cleanly. >> >> that's not to say your appraoch isn't a good one -- if you only have one >> client sending updates/commits then having it keep track of what was >> indexed prior to the lasts successful commit is a viable way to dela with >> what happens if solr stops responding (either because you shut it down, or >> because it crashed for some other reason). >> >> Alternately, you could take advantage of the "enabled" feature from your >> client (just have it test the enabled url ever N updates or so) and when >> it sees that you have disabled the port it can send one last commit and >> then stop sending updates until it sees the enabled URL work againg -- as >> soon as you see the updates stop, you can safely shutdown hte port. >> >> >> -Hoss > > -- Lance Norskog goks...@gmail.com
Re: javabin in .NET?
OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer wrote: > I meant the standard IO libraries. They are different enough that the code > has to be manually ported. There were some automated tools back when > Microsoft introduced .Net, but IIRC they never really worked. > > Anyway it's not a big deal, it should be a straightforward job. Testing it > thoroughly cross-platform is another thing though. > > 2009/11/13 Noble Paul നോബിള് नोब्ळ् > >> The javabin format does not have many dependencies. it may have 3-4 >> classes an that is it. >> >> On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer >> wrote: >> > Nope. It has to be manually ported. Not so much because of the language >> > itself but because of differences in the libraries. >> > >> > >> > 2009/11/13 Noble Paul നോബിള് नोब्ळ् >> > >> >> Is there any tool to directly port java to .Net? then we can etxract >> >> out the client part of the javabin code and convert it. >> >> >> >> On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher >> >> wrote: >> >> > Has anyone looked into using the javabin response format from .NET >> >> (instead >> >> > of SolrJ)? >> >> > >> >> > It's mainly a curiosity. >> >> > >> >> > How much better could performance/bandwidth/throughput be? How >> difficult >> >> > would it be to implement some .NET code (C#, I'd guess being the best >> >> > choice) to handle this response format? >> >> > >> >> > Thanks, >> >> > Erik >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> - >> >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> >> > >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Data import problem with child entity from different database
2009/11/13 Noble Paul നോബിള് नोब्ळ् : > am unable to get the file > http://old.nabble.com/file/p26335171/dataimport.temp.xml > > On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg wrote: >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> no obvious issues. >>> you may post your entire data-config.xml >>> >> >> Here it is, exactly as last attempt but with usernames etc. removed. >> >> Ignore the comments and the unused FileDataSource... >> >> http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> do w/o CachedSqlEntityProcessor first and then apply that later >>> >> >> Yep, that was just a bit of a wild stab in the dark to see if it made any >> difference. >> >> Thanks, >> >> Andrew. >> >> -- >> View this message in context: >> http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > -- Lance Norskog goks...@gmail.com