Query boosting
Hi, i want to boost the result through query. i have 4 fields in our schema. If i search *deepak* then result should come in that order - All *UPDBY* having deepak then All *To* having deepak then All *CC* having deepak All *BCC* having deepak I am using Standard request handler. Please help me on this. -- DEEPAK AGRAWAL +91-9379433455 GOOD LUCK.
Re: labeling facets and highlighting question
There's a ! missing in there, try {!key=label}. Regards, gwk On 2/18/2010 5:01 AM, adeelmahmood wrote: okay so if I dont want to do any excludes then I am assuming I should just put in {key=label}field .. i tried that and it doesnt work .. it says undefined field {key=label}field Lance Norskog-2 wrote: Here's the problem: the wiki page is confusing: http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters The line: q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype is standalone, but the later line: facet.field={!ex=dt key=mylabel}doctype mean 'change the long query from {!ex=dt}docType to {!ex=dt key=mylabel}docType' 'tag=dt' creates a tag (name) for a filter query, and 'ex=dt' means 'exclude this filter query'. On Wed, Feb 17, 2010 at 4:30 PM, adeelmahmood wrote: simple question: I want to give a label to my facet queries instead of the name of facet field .. i found the documentation at solr site that I can do that by specifying the key local param .. syntax something like facet.field={!ex=dt%20key='By%20Owner'}owner I am just not sure what the ex=dt part does .. if i take it out .. it throws an error so it seems its important but what for ??? also I tried turning on the highlighting and i can see that it adds the highlighting items list in the xml at the end .. but it only points out the ids of all the matching results .. it doesnt actually shows the text data thats its making a match with // so i am getting something like this back ... instead of the actual text thats being matched .. isnt it supposed to do that and wrap the search terms in em tag .. how come its not doing that in my case here is my schema -- View this message in context: http://old.nabble.com/labeling-facets-and-highlighting-question-tp27632747p27632747.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Upgrading Tika in Solr
Just a word of caution: I've been bitten by this bug, which affects Tika 0.6: https://issues.apache.org/jira/browse/PDFBOX-541 It causes the parser to go into an infinite loop, which isn't exactly great for server stability. Tika 0.4 is not affected in the same way - as far as I remember, the parser just fails on such PDF files. According to the Tika folks, PDFBox and Tika releases need to be synchronized, so it might be wise to hold off upgrading until the next Tika version has been released that contains the fixed PDFBox. Best regards - Christian On Wednesday 17 February 2010 11:40:50 am Liam O'Boyle wrote: > I just copied in the newer .jars and got rid of the old ones and > everything seemed to work smoothly enough. > > Liam > > On Tue, 2010-02-16 at 13:11 -0500, Grant Ingersoll wrote: > > I've got a task open to upgrade to 0.6. Will try to get to it this week. > > Upgrading is usually pretty trivial. > > > > On Feb 14, 2010, at 12:37 AM, Liam O'Boyle wrote: > > > Afternoon, > > > > > > I've got a large collections of documents which I'm attempting to add > > > to a Solr index using Tika via the ExtractingRequestHandler, but there > > > are a large number that it has problems with (PDFs, PPTX and XLS > > > documents mainly). > > > > > > I've tried them with the most recent stand alone version of Tika and it > > > handles most of the failing documents correctly. I tried using a > > > recent nightly build of Solr, but the same problems seem to occur. > > > > > > Are there instructions somewhere on installing a more recent Tika build > > > into Solr? > > > > > > Thanks, > > > Liam > > > > -- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem using Solr/Lucene: > > http://www.lucidimagination.com/search > -- Christian Vogler, Ph.D. Institute for Language and Speech Processing, Athens, Greece
Re: Query boosting
Try using the dismax handler http://wiki.apache.org/solr/DisMaxRequestHandler This would be very good read for you. you would use the bq ( boost query parameter) and it should look something similar to.. &bq=UPDBY:deepak^5.0+TO:deepak^4.0+CC:deepak^3.0+BCC:deepak^2.0 Paul On Thu, Feb 18, 2010 at 12:28 AM, deepak agrawal wrote: > Hi, > > i want to boost the result through query. > i have 4 fields in our schema. > > > > > > > If i search *deepak* then result should come in that order - > > > All *UPDBY* having deepak then > All *To* having deepak then > All *CC* having deepak > All *BCC* having deepak > > I am using Standard request handler. Please help me on this. > -- > DEEPAK AGRAWAL > +91-9379433455 > GOOD LUCK. >
Re: getting unexpected statscomponent values
solr-user wrote: Hossman, what do you mean by including a "TestCase"? Will create issue in Jira asap; I will include the URL, schema and some code to generate sample data I think they are good for TestCase. Koji -- http://www.rondhuit.com/en/
java.io.IOException: read past EOF after Solr 1.4.0
Using release-1.4.0 or trunk branch Solr and indexing example data and search 0 boosted word: http://localhost:8983/solr/select/?q=usb^0.0 I got the following exception: java.io.IOException: read past EOF at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:163) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.lucene.index.IndexReader.document(IndexReader.java:947) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:444) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:427) at org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:267) at org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:278) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:185) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) This cannot be reproduced with release-1.3.0... Koji -- http://www.rondhuit.com/en/
some scores to 0 using omitNorns=false
Hi, We did some tests with omitNorms=false. We have seen that in the last result's page we have some scores set to 0.0. This scores setted to 0 are problematic to our sorters. It could be some kind of bug? Regrads, Raimon Bosch. -- View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.io.IOException: read past EOF after Solr 1.4.0
2010/2/18 Koji Sekiguchi : > Using release-1.4.0 or trunk branch Solr and indexing > example data and search 0 boosted word: > > http://localhost:8983/solr/select/?q=usb^0.0 Confirmed - looks like Solr is requesting an incorrect docid. I'm looking into it. -Yonik http://www.lucidimagination.com
score computation for dismax handler
Hi , When query is made across multiple fields in dismax handler using paramater qf , I have observed that with debug query enabled the resultant score is max score of scores of query across each fields . but I want the resultant score to be sum of score across fields (like the standard handler ) . can any one tell me how this can be achevied.
Re: Realtime search and facets with very frequent commits
Hi Janne, I *think* Ocean Realtime Search has been superseded by Lucene NRT search. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Janne Majaranta > To: solr-user@lucene.apache.org > Sent: Thu, February 18, 2010 2:12:37 AM > Subject: Re: Realtime search and facets with very frequent commits > > Hi, > > Yes, I did play with mergeFactor. > I didn't play with mergePolicy. > > Wouldn't that affect indexing speed and possibly memory usage ? > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec via > the standard HTTP API ). > > My problem is that I need very warm caches to get fast faceting, and the > autowarming of the caches takes too long compared to the frequency of > commits I'm having. > So a commit every minute means less than a minute time to warm the caches. > > To give you a idea of what kind of queries needs to be autowarmed in my app, > the logevents indexed as documents have timestamps with different > granularity used for faceting. > For example, to get count of logevents for every hour using faceting there's > a timestamp field with the format mmddhh ( for example: 2010021808 > meaning 2010-02-18 8am). > One use case is to get hourly counts over the whole index. A non-cached > query counting the hourly counts over the 40M documents index takes a > while.. > And to my understanding autowarming means something like that this kind of > query would be basically re-executed against a cold cache. Probably not > exactly how it works, but it "feels" like it would. > > Moving the commits to a smaller index while using sharding to have a > transparent view to the index from the client app seems to solve my problem. > > I'm not sure if the (upcoming?) NRT features would keep the caches more > persistent, probably not in a environment where docs get frequent updates / > deletes. > > Also, I'm closely following the Ocean Realtime Search project AND it's SOLR > integration. It sounds like it has the "dream features" to enable realtime > updates to the index. > > -Janne > > > 2010/2/18 Jan Høydahl / Cominvent > > > Hi, > > > > Have you tried playing with mergeFactor or even mergePolicy? > > > > -- > > Jan Høydahl - search architect > > Cominvent AS - www.cominvent.com > > > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote: > > > > > Hey Dipti, > > > > > > Basically query optimizations + setting cache sizes to a very high level. > > > Other than that, the config is about the same as the out-of-the-box > > config > > > that comes with the Solr download. > > > > > > I haven't found a magic switch to get very fast query responses + facet > > > counts with the frequency of commits I'm having using one single SOLR > > > instance. > > > Adding some TOP queries for a certain type of user to static warming > > queries > > > just moved the time of autowarming the caches to the time it took to warm > > > the caches with static queries. > > > I've been staging a setup where there's a small solr instance receiving > > all > > > the updates and a large instance which doesn't receive the live feed of > > > updates. > > > The small index will be merged with the large index periodically (once a > > > week or once a month). > > > The two instances are seen by the client app as one instance using the > > > sharding features of SOLR. > > > The instances are running on the same server inside their own JVM / > > jetty. > > > > > > In this setup the caches are very HOT for the large index and queries are > > > extremely fast, and the small index is small enough to get extremely fast > > > queries without having to warm up the caches too much. > > > > > > Basically I'm able to have a commit frequency of 10 seconds in a 40M docs > > > index while counting TOP5 facets over 14 fields in 200ms. > > > In reality the commit frequency of 10 seconds comes from the fact that > > the > > > updates are going into a 1M - 2M documents index, and the fast facet > > counts > > > from the fact that the 38M documents index has hot caches and doesn't > > > receive any updates. > > > > > > Also, not running updates to the large index means that the SOLR instance > > > reading the large index uses about half the memory it used before when > > > running the updates to the large index. At least it does so on Win2k3. > > > > > > -Janne > > > > > > > > > 2010/2/15 dipti khullar > > > > > >> Hey Janne > > >> > > >> Can you please let me know what other optimizations are you talking > > about > > >> here. Because in our application we are committing in about 5 mins but > > >> still > > >> the response time is very low and at times there are some connection > > time > > >> outs also. > > >> > > >> Just wanted to confirm if you have done some major configuration changes > > >> which have proved beneficial. > > >> > > >> Thanks > > >> Dipti > > >> > > >> > > > >
Re: parsing strings into phrase queries
i gave it a rough shot Lance, if there's a better way to explain it, please edit On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog wrote: > That would be great. After reading this and the PositionFilter class I > still don't know how to use it. > > On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir wrote: > > i think we can improve the docs/wiki to show this example use case, i > > noticed the wiki explanation for this filter gives a more complex > shingles > > example, which is interesting, but this seems to be a common problem and > > maybe we should add this use case. > > > > On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter > > wrote: > > > >> > >> : take a look at PositionFilter > >> > >> Right, there was another thread recently where almost the exact same > issue > >> was discussed... > >> > >> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html > >> > >> ..except that i was ignorant of the existence of PositionFilter when i > >> wrote that message. > >> > >> > >> > >> -Hoss > >> > >> > > > > > > -- > > Robert Muir > > rcm...@gmail.com > > > > > > -- > Lance Norskog > goks...@gmail.com > -- Robert Muir rcm...@gmail.com
Re: What is largest reasonable setting for ramBufferSizeMB?
Hi Tom, 32MB is very low, 320MB is medium, and I think you could go higher, just pick whichever garbage collector is good for throughput. I know Java 1.6 update 18 also has some Hotspot and maybe also GC fixes, so I'd use that. Finally, this sounds like a good use case for reindexing with Hadoop! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: "Burton-West, Tom" > To: "solr-user@lucene.apache.org" > Sent: Wed, February 17, 2010 5:16:26 PM > Subject: What is largest reasonable setting for ramBufferSizeMB? > > Hello all, > > At some point we will need to re-build an index that totals about 2 > terrabytes > in size (split over 10 shards). At our current indexing speed we estimate > that > this will take about 3 weeks. We would like to reduce that time. It appears > that our main bottleneck is disk I/O. > We currently have ramBufferSizeMB set to 32 and our merge factor is 10. If > we > increase ramBufferSizeMB to 320, we avoid a merge and the 9 disk writes and > reads to merge 9+1 32MB segments into a 320MB segment. > > Assuming we allocate enough memory to the JVM, would it make sense to > increase > ramBufferSize to 3200MB? What are people's experiences with very large > ramBufferSizeMB sizes? > > Tom Burton-West > University of Michigan Library > www.hathitrust.org
Re: optimize is taking too much time
Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote: > > hi > in my solr u have 1,42,45,223 records having some 50GB . > Now when iam loading a new record and when its trying optimize the docs its > taking 2 much memory and time > > > can any body please tell do we have any property in solr to get rid of > this. > > Thanks in advance > > -- > View this message in context: > http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: getting unexpected statscomponent values
SOLR makes heavy use of JUnit for testing. The real advantage of a JUnit testcase being attached is that it can then be permanently incorporated into the SOLR builds. If you're unfamiliar with JUnit, then providing the raw data that illustrates the bug allows people who work on SOLR to save a bunch of time trying to reproduce the problem. It also insures that they are addressing what you're seeing ... It's especially helpful if you can take a bit of time to pare away all the unnecessary stuff in your example files and/or comment what you think the important bits are. HTH Erick On Wed, Feb 17, 2010 at 5:46 PM, solr-user wrote: > > > hossman wrote: > > > > > > That does look really weird, and definitely seems like a bug. > > > > Can you open an issue in Jira? ... ideally with a TestCase (even if it's > > not a JUnit test case, just having some sample docs that can be indexed > > against the example schema and a URL showing the problem would be > helpful) > > > > > > Hossman, what do you mean by including a "TestCase"? > > Will create issue in Jira asap; I will include the URL, schema and some > code > to generate sample data > -- > View this message in context: > http://old.nabble.com/getting-unexpected-statscomponent-values-tp27599248p27631633.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: some scores to 0 using omitNorns=false
I was gonna ask a question about this but you seem like you might have the answer for me .. wat exactly is the omitNorms field do (or is expected to do) .. also if you could please help me understand what termVectors and multiValued options do ?? Thanks for ur help Raimon Bosch wrote: > > > Hi, > > We did some tests with omitNorms=false. We have seen that in the last > result's page we have some scores set to 0.0. This scores setted to 0 are > problematic to our sorters. > > It could be some kind of bug? > > Regrads, > Raimon Bosch. > -- View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637819.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Realtime search and facets with very frequent commits
Hi Otis, Ok, now I'm confused ;) There seems to be a bit activity though when looking at the "last updated" timestamps in the google code project wiki: http://code.google.com/p/oceansearch/w/list The Tag Index feature sounds very interesting. -Janne 2010/2/18 Otis Gospodnetic > Hi Janne, > > I *think* Ocean Realtime Search has been superseded by Lucene NRT search. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > - Original Message > > From: Janne Majaranta > > To: solr-user@lucene.apache.org > > Sent: Thu, February 18, 2010 2:12:37 AM > > Subject: Re: Realtime search and facets with very frequent commits > > > > Hi, > > > > Yes, I did play with mergeFactor. > > I didn't play with mergePolicy. > > > > Wouldn't that affect indexing speed and possibly memory usage ? > > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec > via > > the standard HTTP API ). > > > > My problem is that I need very warm caches to get fast faceting, and the > > autowarming of the caches takes too long compared to the frequency of > > commits I'm having. > > So a commit every minute means less than a minute time to warm the > caches. > > > > To give you a idea of what kind of queries needs to be autowarmed in my > app, > > the logevents indexed as documents have timestamps with different > > granularity used for faceting. > > For example, to get count of logevents for every hour using faceting > there's > > a timestamp field with the format mmddhh ( for example: 2010021808 > > meaning 2010-02-18 8am). > > One use case is to get hourly counts over the whole index. A non-cached > > query counting the hourly counts over the 40M documents index takes a > > while.. > > And to my understanding autowarming means something like that this kind > of > > query would be basically re-executed against a cold cache. Probably not > > exactly how it works, but it "feels" like it would. > > > > Moving the commits to a smaller index while using sharding to have a > > transparent view to the index from the client app seems to solve my > problem. > > > > I'm not sure if the (upcoming?) NRT features would keep the caches more > > persistent, probably not in a environment where docs get frequent updates > / > > deletes. > > > > Also, I'm closely following the Ocean Realtime Search project AND it's > SOLR > > integration. It sounds like it has the "dream features" to enable > realtime > > updates to the index. > > > > -Janne > > > > > > 2010/2/18 Jan Høydahl / Cominvent > > > > > Hi, > > > > > > Have you tried playing with mergeFactor or even mergePolicy? > > > > > > -- > > > Jan Høydahl - search architect > > > Cominvent AS - www.cominvent.com > > > > > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote: > > > > > > > Hey Dipti, > > > > > > > > Basically query optimizations + setting cache sizes to a very high > level. > > > > Other than that, the config is about the same as the out-of-the-box > > > config > > > > that comes with the Solr download. > > > > > > > > I haven't found a magic switch to get very fast query responses + > facet > > > > counts with the frequency of commits I'm having using one single SOLR > > > > instance. > > > > Adding some TOP queries for a certain type of user to static warming > > > queries > > > > just moved the time of autowarming the caches to the time it took to > warm > > > > the caches with static queries. > > > > I've been staging a setup where there's a small solr instance > receiving > > > all > > > > the updates and a large instance which doesn't receive the live feed > of > > > > updates. > > > > The small index will be merged with the large index periodically > (once a > > > > week or once a month). > > > > The two instances are seen by the client app as one instance using > the > > > > sharding features of SOLR. > > > > The instances are running on the same server inside their own JVM / > > > jetty. > > > > > > > > In this setup the caches are very HOT for the large index and queries > are > > > > extremely fast, and the small index is small enough to get extremely > fast > > > > queries without having to warm up the caches too much. > > > > > > > > Basically I'm able to have a commit frequency of 10 seconds in a 40M > docs > > > > index while counting TOP5 facets over 14 fields in 200ms. > > > > In reality the commit frequency of 10 seconds comes from the fact > that > > > the > > > > updates are going into a 1M - 2M documents index, and the fast facet > > > counts > > > > from the fact that the 38M documents index has hot caches and doesn't > > > > receive any updates. > > > > > > > > Also, not running updates to the large index means that the SOLR > instance > > > > reading the large index uses about half the memory it used before > when > > > > running the updates to the large index. At least it does so on > Win2k3. > > > > > > > > -Janne > > > > > > > > > > > > 2010/2/15 dipti
Re: some scores to 0 using omitNorns=false
I am not an expert in lucene scoring formula, but omintNorms=false makes the scoring formula a little bit more complex, taking into account boosting for fields and documents. If I'm not wrong (if I am please, correct me) I think that with omitNorms=false take into account the queryNorm(q) and norm(t,d) from formula: score(q,d) = coord(q,d) · queryNorm(q) ·∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) ) so the formula will be more complex. See http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html, and http://old.nabble.com/scores-are-the-same-for-many-diferent-documents-td27623039.html#a27623039 multiValued option is used to create fields with multiple values. We use it one of our indexed modifying the schema.xml, adding a new field ... ... This field is processed in a specific UpdateRequestProcessorFactory (write by us) from a comma separated field called 's_similar_names': ... public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument doc = cmd.getSolrInputDocument(); String v = (String)doc.getFieldValue( "s_similar_names" ); if( v != null ) { String s_similar_names[] = v.split(","); for(String s_similar_name : s_similar_names){ if(!s_similar_name.equals("")) doc.addField( "s_similar_name", s_similar_name ); } } // pass it up the chain super.processAdd(cmd); } ... A processofactory is specified in solrconfig.xml ... # # # # # ... and adding this chain to XmlUpdateRequestHandler in solrconfig.xml: ... # # #mychain # # ... termVector is used to save more info about terns of a document in the index and save computational time in functions like MoreLikeThis. http://wiki.apache.org/solr/TermVectorComponent. We don't use it. adeelmahmood wrote: > > I was gonna ask a question about this but you seem like you might have the > answer for me .. wat exactly is the omitNorms field do (or is expected to > do) .. also if you could please help me understand what termVectors and > multiValued options do ?? > Thanks for ur help > > > Raimon Bosch wrote: >> >> >> Hi, >> >> We did some tests with omitNorms=false. We have seen that in the last >> result's page we have some scores set to 0.0. This scores setted to 0 are >> problematic to our sorters. >> >> It could be some kind of bug? >> >> Regrads, >> Raimon Bosch. >> > > -- View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637827.html Sent from the Solr - User mailing list archive at Nabble.com.
including 'the' dismax query kills results
I've noticed some peculiar behavior with the dismax searchhandler. In my case I'm making the search "The British Open", and am getting 0 results. When I change it to "British Open" I get many hits. I looked at the query analyzer and it should be broken down to "british" and "open" tokens ('the' is a stopword). I imagine it is doing an 'and' type search, and by setting the 'mm' parameter to 1 I once again get results for 'the british open'. I would like mm to be 100% however, but just not care about stopwords. Is there a way to do this? Thanks, -Kal
Re: optimize is taking too much time
Hi, You can also make use of autocommit feature of solr. You have two possibilities either based on max number of uncommited docs or based on time. see of your solrconfig.xml. Example:- 60 once your done with adding run final optimize/commit. Regards, P.N.Raju, From: Jagdish Vasani To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 3:12:15 PM Subject: Re: optimize is taking too much time Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote: > > hi > in my solr u have 1,42,45,223 records having some 50GB . > Now when iam loading a new record and when its trying optimize the docs its > taking 2 much memory and time > > > can any body please tell do we have any property in solr to get rid of > this. > > Thanks in advance > > -- > View this message in context: > http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: dataimporthandler and expungeDeletes=false
I found the error. The definition in schema.xml was not set to the primary key field/column as returned by the deletedPkQuery. Jorg On Wed, Feb 17, 2010 at 11:38 AM, Jorg Heymans wrote: > Looking closer at the documentation, it appears that expungeDeletes in fact > has nothing to do with 'removing deleted documents from the index' as i > thought before: > > > http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22 > > > expungeDeletes = "true" | "false" — default is false — merge segments with > deletes away. > > Is this correct ? > > FWIW I worked around the issue by adding a removed flag to my data and > sending and commands after delta import but it would have > been so much nicer to be able to do this all from DIH. > > Has anybody been able to get deletedPkQuery to work for deleting documents > during delta import ? > > Jorg > > On Tue, Feb 16, 2010 at 3:57 PM, Jorg Heymans wrote: > >> Hi, >> >> Can anybody tell me if [1] still applies as of version trunk 03/02/2010 ? >> I am removing documents from my index using deletedPkQuery and a >> deltaimport. I can tell from the logs that the removal seems to be working: >> >> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder >> collectDelta >> INFO: Completed parentDeltaQuery for Entity: attachment >> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder >> deleteAll >> INFO: Deleting stale documents >> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.SolrWriter >> deleteDoc >> INFO: Deleting document: 33053 >> 16-Feb-2010 15:32:54 org.apache.solr.core.SolrDeletionPolicy onInit >> INFO: SolrDeletionPolicy.onInit: commits:num=1 >> >> >> commit{dir=D:\lib\apache-solr-1.5-dev\example\solr\project\data\index,segFN=segments_1y,version=1265210107838,generation=70,filenames=[_2v.prx, >> _2v.fnm, _2v.tis, _2v.fdt, _2v.frq, segments_1y, _2v.fdx, _2v.tii] >> 16-Feb-2010 15:32:54 org.apache.solr.core.SolrDeletionPolicy updateCommits >> INFO: newest commit = 1265210107838 >> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder doDelta >> INFO: Delta Import completed successfully >> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder finish >> INFO: Import completed successfully >> 16-Feb-2010 15:32:54 org.apache.solr.update.DirectUpdateHandler2 commit >> INFO: start >> commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false) >> 16-Feb-2010 15:32:54 org.apache.solr.search.SolrIndexSearcher >> INFO: Opening searc...@182c2d9 main >> 16-Feb-2010 15:32:54 org.apache.solr.update.DirectUpdateHandler2 commit >> INFO: end_commit_flush >> >> However when i search the index the removed data is still present, >> presumably because the DirectUpdateHandler2 does not automatically do >> expungeDeletes ? Can i configure this somewhere in solrconfig.xml (SOLR-1275 >> was not very clear exactly what needs to be done to activate this behaviour) >> ? >> >> Thanks >> Jorg >> >> [1] http://marc.info/?l=solr-user&m=125962049425151&w=2 >> > >
Re: Realtime search and facets with very frequent commits
Janne, I don't think there's any activity happening there. SOLR-1606 is the tracking issue for moving to per segment facets and docsets. I haven't had an immediate commercial need to implement those. Jason On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta wrote: > Hi Otis, > > Ok, now I'm confused ;) > There seems to be a bit activity though when looking at the "last updated" > timestamps in the google code project wiki: > http://code.google.com/p/oceansearch/w/list > > The Tag Index feature sounds very interesting. > > -Janne > > > 2010/2/18 Otis Gospodnetic > >> Hi Janne, >> >> I *think* Ocean Realtime Search has been superseded by Lucene NRT search. >> >> Otis >> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> Hadoop ecosystem search :: http://search-hadoop.com/ >> >> >> >> - Original Message >> > From: Janne Majaranta >> > To: solr-user@lucene.apache.org >> > Sent: Thu, February 18, 2010 2:12:37 AM >> > Subject: Re: Realtime search and facets with very frequent commits >> > >> > Hi, >> > >> > Yes, I did play with mergeFactor. >> > I didn't play with mergePolicy. >> > >> > Wouldn't that affect indexing speed and possibly memory usage ? >> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec >> via >> > the standard HTTP API ). >> > >> > My problem is that I need very warm caches to get fast faceting, and the >> > autowarming of the caches takes too long compared to the frequency of >> > commits I'm having. >> > So a commit every minute means less than a minute time to warm the >> caches. >> > >> > To give you a idea of what kind of queries needs to be autowarmed in my >> app, >> > the logevents indexed as documents have timestamps with different >> > granularity used for faceting. >> > For example, to get count of logevents for every hour using faceting >> there's >> > a timestamp field with the format mmddhh ( for example: 2010021808 >> > meaning 2010-02-18 8am). >> > One use case is to get hourly counts over the whole index. A non-cached >> > query counting the hourly counts over the 40M documents index takes a >> > while.. >> > And to my understanding autowarming means something like that this kind >> of >> > query would be basically re-executed against a cold cache. Probably not >> > exactly how it works, but it "feels" like it would. >> > >> > Moving the commits to a smaller index while using sharding to have a >> > transparent view to the index from the client app seems to solve my >> problem. >> > >> > I'm not sure if the (upcoming?) NRT features would keep the caches more >> > persistent, probably not in a environment where docs get frequent updates >> / >> > deletes. >> > >> > Also, I'm closely following the Ocean Realtime Search project AND it's >> SOLR >> > integration. It sounds like it has the "dream features" to enable >> realtime >> > updates to the index. >> > >> > -Janne >> > >> > >> > 2010/2/18 Jan Høydahl / Cominvent >> > >> > > Hi, >> > > >> > > Have you tried playing with mergeFactor or even mergePolicy? >> > > >> > > -- >> > > Jan Høydahl - search architect >> > > Cominvent AS - www.cominvent.com >> > > >> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote: >> > > >> > > > Hey Dipti, >> > > > >> > > > Basically query optimizations + setting cache sizes to a very high >> level. >> > > > Other than that, the config is about the same as the out-of-the-box >> > > config >> > > > that comes with the Solr download. >> > > > >> > > > I haven't found a magic switch to get very fast query responses + >> facet >> > > > counts with the frequency of commits I'm having using one single SOLR >> > > > instance. >> > > > Adding some TOP queries for a certain type of user to static warming >> > > queries >> > > > just moved the time of autowarming the caches to the time it took to >> warm >> > > > the caches with static queries. >> > > > I've been staging a setup where there's a small solr instance >> receiving >> > > all >> > > > the updates and a large instance which doesn't receive the live feed >> of >> > > > updates. >> > > > The small index will be merged with the large index periodically >> (once a >> > > > week or once a month). >> > > > The two instances are seen by the client app as one instance using >> the >> > > > sharding features of SOLR. >> > > > The instances are running on the same server inside their own JVM / >> > > jetty. >> > > > >> > > > In this setup the caches are very HOT for the large index and queries >> are >> > > > extremely fast, and the small index is small enough to get extremely >> fast >> > > > queries without having to warm up the caches too much. >> > > > >> > > > Basically I'm able to have a commit frequency of 10 seconds in a 40M >> docs >> > > > index while counting TOP5 facets over 14 fields in 200ms. >> > > > In reality the commit frequency of 10 seconds comes from the fact >> that >> > > the >> > > > updates are going into a 1M - 2M documents index, and the fast facet >> > > counts >> > > >
replications issue
Hi all, I've setup solr replication as described in the wiki. when i start the replication a directory called index.$numebers is created after a while it disappears and a new index.$othernumbers is created index/ remains untouched with an empty index. any clue? thank you in advance, Riccardo -- ciao, giskard
Schema error unknown field
I'm getting the following exception SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc' I'm wondering what I need to do in order to add the "desc" field to the Solr schema for indexing?
@Field annotation support
Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation?
Re: Schema error unknown field
Add desc as a in your schema.xml file would be my first guess. Providing some explanation of what you're trying to do would help diagnose your issues. HTH Erick On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal wrote: > I'm getting the following exception > SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc' > > I'm wondering what I need to do in order to add the "desc" field to > the Solr schema for indexing? >
Re: parsing strings into phrase queries
The PositionFilter worked great for my purpose along with another filter that I build. In my case, my indexed data may be something like "X150". So, a query for "Nokia X150" should match. But I don't want random matches on "x". However, if my indexed data is "G7", I do want a query on "PowerShot G7" to match on "g" and "7". So, a simple length filter will not do. Instead I build a custom filter (that I am willing to contribute back) that filters out singletons that are surrounded by longer tokens (3 or more by default). So, "PowerShot G7" becomes "power" "shot" "g" "7", but "Nokia X150" becomes "nokia" "150". And then I put the results of this into a PositionFilter. This allows "Nokia X150ABC" to match against the "X150" part. So far I really like this for partial part number searches. And then to boost exact matches, I used copyField to create another field without PositionFilter. And then did an optional phrase query on that. From: Lance Norskog To: solr-user@lucene.apache.org Sent: Wed, February 17, 2010 7:23:23 PM Subject: Re: parsing strings into phrase queries That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir wrote: > i think we can improve the docs/wiki to show this example use case, i > noticed the wiki explanation for this filter gives a more complex shingles > example, which is interesting, but this seems to be a common problem and > maybe we should add this use case. > > On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter > wrote: > >> >> : take a look at PositionFilter >> >> Right, there was another thread recently where almost the exact same issue >> was discussed... >> >> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html >> >> ..except that i was ignorant of the existence of PositionFilter when i >> wrote that message. >> >> >> >> -Hoss >> >> > > > -- > Robert Muir > rcm...@gmail.com > -- Lance Norskog goks...@gmail.com
Re: Faceting
have you used UIMA? i did a quick read on the docs and it seems to do what i'm looking for. 2010/2/11 Otis Gospodnetic > Note that UIMA doesn't doe NER itself (as far as I know), but instead > relies on GATE or OpenNLP or OpenCalais, AFAIK :) > > Those interested in UIMA and living close to New York should go to > http://www.meetup.com/NYC-Search-and-Discovery/calendar/12384559/ > > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > - Original Message > > From: Jan Høydahl / Cominvent > > To: solr-user@lucene.apache.org > > Sent: Tue, February 9, 2010 9:57:26 AM > > Subject: Re: Faceting > > > > NOTE: Please start a new email thread for a new topic (See > > http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking) > > > > Your strategy could work. You might want to look into dedicated entity > > extraction frameworks like > > http://opennlp.sourceforge.net/ > > http://nlp.stanford.edu/software/CRF-NER.shtml > > http://incubator.apache.org/uima/index.html > > > > Or if that is too much work, look at > > http://issues.apache.org/jira/browse/SOLR-1725 for a way to plug in your > entity > > extraction code into Solr itself using a scripting language. > > > > -- > > Jan Høydahl - search architect > > Cominvent AS - www.cominvent.com > > > > On 5. feb. 2010, at 20.10, José Moreira wrote: > > > > > Hello, > > > > > > I'm planning to index a 'content' field for search and from that > > > fields text content i would like to facet (probably) according to if > > > the content has e-mails, urls and within urls, url's to pictures, > > > videos and others. > > > > > > As i'm a relatively new user to Solr, my plan was to regexp the > > > content in my application and add tags to a Solr field according to > > > the content, so for example the content "m...@email.com > > > http://www.site.com"; would have the tags "email, link". > > > > > > If i follow this path can i then facet on "email" and/or "link" ? For > > > example combining facet field with facet value params? > > > > > > Best > > > > > > -- > > > http://pt.linkedin.com/in/josemoreira > > > josemore...@irc.freenode.net > > > http://djangopeople.net/josemoreira/ > > -- josemore...@irc.freenode.net http://pt.linkedin.com/in/josemoreira http://djangopeople.net/josemoreira/
Re: Realtime search and facets with very frequent commits
Ok, thanks. -Janne 2010/2/18 Jason Rutherglen > Janne, > > I don't think there's any activity happening there. > > SOLR-1606 is the tracking issue for moving to per segment facets and > docsets. I haven't had an immediate commercial need to implement > those. > > Jason > > On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta > wrote: > > Hi Otis, > > > > Ok, now I'm confused ;) > > There seems to be a bit activity though when looking at the "last > updated" > > timestamps in the google code project wiki: > > http://code.google.com/p/oceansearch/w/list > > > > The Tag Index feature sounds very interesting. > > > > -Janne > > > > > > 2010/2/18 Otis Gospodnetic > > > >> Hi Janne, > >> > >> I *think* Ocean Realtime Search has been superseded by Lucene NRT > search. > >> > >> Otis > >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >> Hadoop ecosystem search :: http://search-hadoop.com/ > >> > >> > >> > >> - Original Message > >> > From: Janne Majaranta > >> > To: solr-user@lucene.apache.org > >> > Sent: Thu, February 18, 2010 2:12:37 AM > >> > Subject: Re: Realtime search and facets with very frequent commits > >> > > >> > Hi, > >> > > >> > Yes, I did play with mergeFactor. > >> > I didn't play with mergePolicy. > >> > > >> > Wouldn't that affect indexing speed and possibly memory usage ? > >> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec > >> via > >> > the standard HTTP API ). > >> > > >> > My problem is that I need very warm caches to get fast faceting, and > the > >> > autowarming of the caches takes too long compared to the frequency of > >> > commits I'm having. > >> > So a commit every minute means less than a minute time to warm the > >> caches. > >> > > >> > To give you a idea of what kind of queries needs to be autowarmed in > my > >> app, > >> > the logevents indexed as documents have timestamps with different > >> > granularity used for faceting. > >> > For example, to get count of logevents for every hour using faceting > >> there's > >> > a timestamp field with the format mmddhh ( for example: 2010021808 > >> > meaning 2010-02-18 8am). > >> > One use case is to get hourly counts over the whole index. A > non-cached > >> > query counting the hourly counts over the 40M documents index takes a > >> > while.. > >> > And to my understanding autowarming means something like that this > kind > >> of > >> > query would be basically re-executed against a cold cache. Probably > not > >> > exactly how it works, but it "feels" like it would. > >> > > >> > Moving the commits to a smaller index while using sharding to have a > >> > transparent view to the index from the client app seems to solve my > >> problem. > >> > > >> > I'm not sure if the (upcoming?) NRT features would keep the caches > more > >> > persistent, probably not in a environment where docs get frequent > updates > >> / > >> > deletes. > >> > > >> > Also, I'm closely following the Ocean Realtime Search project AND it's > >> SOLR > >> > integration. It sounds like it has the "dream features" to enable > >> realtime > >> > updates to the index. > >> > > >> > -Janne > >> > > >> > > >> > 2010/2/18 Jan Høydahl / Cominvent > >> > > >> > > Hi, > >> > > > >> > > Have you tried playing with mergeFactor or even mergePolicy? > >> > > > >> > > -- > >> > > Jan Høydahl - search architect > >> > > Cominvent AS - www.cominvent.com > >> > > > >> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote: > >> > > > >> > > > Hey Dipti, > >> > > > > >> > > > Basically query optimizations + setting cache sizes to a very high > >> level. > >> > > > Other than that, the config is about the same as the > out-of-the-box > >> > > config > >> > > > that comes with the Solr download. > >> > > > > >> > > > I haven't found a magic switch to get very fast query responses + > >> facet > >> > > > counts with the frequency of commits I'm having using one single > SOLR > >> > > > instance. > >> > > > Adding some TOP queries for a certain type of user to static > warming > >> > > queries > >> > > > just moved the time of autowarming the caches to the time it took > to > >> warm > >> > > > the caches with static queries. > >> > > > I've been staging a setup where there's a small solr instance > >> receiving > >> > > all > >> > > > the updates and a large instance which doesn't receive the live > feed > >> of > >> > > > updates. > >> > > > The small index will be merged with the large index periodically > >> (once a > >> > > > week or once a month). > >> > > > The two instances are seen by the client app as one instance using > >> the > >> > > > sharding features of SOLR. > >> > > > The instances are running on the same server inside their own JVM > / > >> > > jetty. > >> > > > > >> > > > In this setup the caches are very HOT for the large index and > queries > >> are > >> > > > extremely fast, and the small index is small enough to get > extremely > >> fast > >> > > > queries without having to warm up the caches t
Re: including 'the' dismax query kills results
use the common grams filter, itll create tokens for stop words and their adjacent terms On Thu, Feb 18, 2010 at 7:16 AM, Nagelberg, Kallin wrote: > I've noticed some peculiar behavior with the dismax searchhandler. > > In my case I'm making the search "The British Open", and am getting 0 > results. When I change it to "British Open" I get many hits. I looked at the > query analyzer and it should be broken down to "british" and "open" tokens > ('the' is a stopword). I imagine it is doing an 'and' type search, and by > setting the 'mm' parameter to 1 I once again get results for 'the british > open'. I would like mm to be 100% however, but just not care about stopwords. > Is there a way to do this? > > Thanks, > -Kal >
Re: Deleting spelll checker index
Thanks If this is really the case, i declared a new filed called mySpellTextDup and retired the original field. Now i have a new field which powers my dictionary with no words in it and now i am free to index which ever term i want. This is not the best of solution but i cant think of a reasonable workaround Thanks darniz Lance Norskog-2 wrote: > > This is a quirk of Lucene - when you delete a document, the indexed > terms for the document are not deleted. That is, if 2 documents have > the word 'frampton' in an indexed field, the term dictionary contains > the entry 'frampton' and pointers to those two documents. When you > delete those two documents, the index contains the entry 'frampton' > with an empty list of pointers. So, the terms are still there even > when you delete all of the documents. > > Facets and the spellchecking dictionary build from this term > dictionary, not from the text string that are 'stored' and returned > when you search for the documents. > > The command throws away these remnant terms. > > http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/ > > On Wed, Feb 17, 2010 at 12:24 PM, darniz wrote: >> >> Please bear with me on the limitted understanding. >> i deleted all documents and i made a rebuild of my spell checker using >> the >> command >> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default >> >> After this i went to the schema browser and i saw that mySpellText still >> has >> around 2000 values. >> How can i make sure that i clean up that field. >> We had the same issue with facets too, even though we delete all the >> documents, and if we do a facet on make we still see facets but we can >> filter out facets by saying facet.mincount>0. >> >> Again coming back to my question how can i make mySpellText fields get >> rid >> of all previous terms >> >> Thanks a lot >> darniz >> >> >> >> hossman wrote: >>> >>> : But still i cant stop thinking about this. >>> : i deleted my entire index and now i have 0 documents. >>> : >>> : Now if i make a query with accrd i still get a suggestion of accord >>> even >>> : though there are no document returned since i deleted my entire index. >>> i >>> : hope it also clear the spell check index field. >>> >>> there are two Lucene indexes when you use spell checking. >>> >>> there is the "main" index which is goverend by your schema.xml and is >>> what >>> you add your own documents to, and what searches are run agains for the >>> result section of solr responses. >>> >>> There is also the "spell" index which has only two fields and in >>> which each "document" corrisponds to a "word" that might be returend as >>> a >>> spelling suggestion, and the other fields contain various >>> start/end/middle >>> ngrams that represent possible misspellings. >>> >>> When you use the spellchecker component it builds the "spell" index >>> makinga document out of every word it finds in whatever field name you >>> configure it to use. >>> >>> deleting your entire "main" index won't automaticly delete the "spell" >>> index (allthough you should be able rebuild the "spell" index using the >>> *empty* "main" index, that should work). >>> >>> : i am copying both fields to a field called >>> : >>> : >>> >>> ..at this point your "main" index has a field named mySpellText, and for >>> ever document it contains a copy of make and model. >>> >>> : >>> : default >>> : mySpellText >>> : true >>> : true >>> >>> ...so whenever you commit or optimize your "main" index it will take >>> every >>> word from the mySpellText and use them all as individual documents in >>> the >>> "spell" index. >>> >>> In your previous email you said you changed hte copyField declaration, >>> and >>> then triggered a commit -- that rebuilt your "spell" index, but the data >>> was still all there in the mySpellText field of the "main" index, so the >>> rebuilt "spell" index was exactly the same. >>> >>> : i have buildOnOPtmize and buildOnCommit as true so when i index new >>> document >>> : i want my dictionary to be created but how can i make sure i remove >>> the >>> : preivious indexed terms. >>> >>> everytime the spellchecker component "builds" it will create a >>> completley >>> new "spell" index .. but if the old data is still in the "main" index >>> then >>> it will also be in the "spell" index. >>> >>> The only reason i can think of why you'd be seeing words in your "spell" >>> index after deleting documents from your "main" index is that even if >>> you >>> delete documents, the Terms are still there in the underlying index >>> untill >>> the segments are merged ... so if you do an optimize that will force >>> them >>> to be expunged --- but i honestly have no idea if that is what's causing >>> your problem, because quite frankly i really don't understand what your >>> problem is ... you have to provide specifics: reproducible steps anyone >>> can take using a clean
Re: Schema error unknown field
I guess my n00b-ness is showing :) I started off using the instructions directly from http://wiki.apache.org/solr/Solrj and there was no mention of schema there and even after gettign this error and searching for schema.xml in the wiki ... I found no meaningful hits so I thought it best to ask. With your advice, I searched for schema.xml and found 13 instances of it: \solr_1.4.0\client\ruby\solr-ruby\solr\conf\schema.xml \solr_1.4.0\client\ruby\solr-ruby\test\conf\schema.xml \solr_1.4.0\contrib\clustering\src\test\resource\schema.xml \solr_1.4.0\contrib\extraction\src\test\resource\schema.xml \solr_1.4.0\contrib\velocity\src\main\solr\conf\schema.xml \solr_1.4.0\example\example-DIH\solr\db\conf\schema.xml \solr_1.4.0\example\example-DIH\solr\mail\conf\schema.xml \solr_1.4.0\example\example-DIH\solr\rss\conf\schema.xml \solr_1.4.0\example\multicore\core0\conf\schema.xml \solr_1.4.0\example\multicore\core1\conf\schema.xml \solr_1.4.0\example\solr\conf\schema.xml \solr_1.4.0\src\test\test-files\solr\conf\schema.xml \solr_1.4.0\src\test\test-files\solr\shared\conf\schema.xml I took a wild guess and added the field I wanted ("desc") into this file since its name seemed to be the most generic one: C:\apps\solr_1.4.0\example\solr\conf\schema.xml And it worked ... a bit strange that an example directory is used but I suppose it is configurable somewhere? Thanks for you help Erick! Cheers, - Pulkit On Thu, Feb 18, 2010 at 9:53 AM, Erick Erickson wrote: > Add desc as a in your schema.xml > file would be my first guess. > > Providing some explanation of what you're trying to do > would help diagnose your issues. > > HTH > Erick > > On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal > wrote: > >> I'm getting the following exception >> SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc' >> >> I'm wondering what I need to do in order to add the "desc" field to >> the Solr schema for indexing? >> >
Re: Schema error unknown field
NP. And I see why you'd be confused... What's actually happening is that if you're using the tutorial to make things run, a lot is happening under the covers. In particular, you're switching to the solr/example directory where you're invoking the start.jar, which is pre-configured to bring up the...you guessed it... example project. This is purposely designed so you have to do the absolute minimal work to get something to play with up and running, but once you start trying anything new, you quickly need to get deeper into what's really happening to make progress. It's the usual trade-off between having to set up an entire installation before doing your first "hello world" program (which would include setting up a servlet container, placing everything in the right place, creating a schema from scratch, ad nauseum) and having something that "just works" but conceals lots and lots of details. The Manning book for SOLR or LucidWorks are good resources Erick On Thu, Feb 18, 2010 at 3:03 PM, Pulkit Singhal wrote: > I guess my n00b-ness is showing :) > > I started off using the instructions directly from > http://wiki.apache.org/solr/Solrj and there was no mention of schema > there and even after gettign this error and searching for schema.xml > in the wiki ... I found no meaningful hits so I thought it best to > ask. > > With your advice, I searched for schema.xml and found 13 instances of it: > > \solr_1.4.0\client\ruby\solr-ruby\solr\conf\schema.xml > \solr_1.4.0\client\ruby\solr-ruby\test\conf\schema.xml > \solr_1.4.0\contrib\clustering\src\test\resource\schema.xml > \solr_1.4.0\contrib\extraction\src\test\resource\schema.xml > \solr_1.4.0\contrib\velocity\src\main\solr\conf\schema.xml > \solr_1.4.0\example\example-DIH\solr\db\conf\schema.xml > \solr_1.4.0\example\example-DIH\solr\mail\conf\schema.xml > \solr_1.4.0\example\example-DIH\solr\rss\conf\schema.xml > \solr_1.4.0\example\multicore\core0\conf\schema.xml > \solr_1.4.0\example\multicore\core1\conf\schema.xml > \solr_1.4.0\example\solr\conf\schema.xml > \solr_1.4.0\src\test\test-files\solr\conf\schema.xml > \solr_1.4.0\src\test\test-files\solr\shared\conf\schema.xml > > I took a wild guess and added the field I wanted ("desc") into this > file since its name seemed to be the most generic one: > C:\apps\solr_1.4.0\example\solr\conf\schema.xml > > And it worked ... a bit strange that an example directory is used but > I suppose it is configurable somewhere? > > Thanks for you help Erick! > > Cheers, > - Pulkit > > On Thu, Feb 18, 2010 at 9:53 AM, Erick Erickson > wrote: > > Add desc as a in your schema.xml > > file would be my first guess. > > > > Providing some explanation of what you're trying to do > > would help diagnose your issues. > > > > HTH > > Erick > > > > On Thu, Feb 18, 2010 at 12:21 PM, Pulkit Singhal < > pulkitsing...@gmail.com>wrote: > > > >> I'm getting the following exception > >> SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'desc' > >> > >> I'm wondering what I need to do in order to add the "desc" field to > >> the Solr schema for indexing? > >> > > >
Re: What is largest reasonable setting for ramBufferSizeMB?
Thanks Otis, I don't know enough about Hadoop to understand the advantage of using Hadoop in this use case. How would using Hadoop differ from distributing the indexing over 10 shards on 10 machines with Solr? Tom Otis Gospodnetic wrote: > > Hi Tom, > > 32MB is very low, 320MB is medium, and I think you could go higher, just > pick whichever garbage collector is good for throughput. I know Java 1.6 > update 18 also has some Hotspot and maybe also GC fixes, so I'd use that. > Finally, this sounds like a good use case for reindexing with Hadoop! > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Hadoop ecosystem search :: http://search-hadoop.com/ > > -- View this message in context: http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27645167.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is largest reasonable setting for ramBufferSizeMB?
On Thu, Feb 18, 2010 at 8:52 AM, Otis Gospodnetic wrote: > 32MB is very low, 320MB is medium, and I think you could go higher, just pick > whichever garbage collector is good for throughput. I know Java 1.6 update > 18 also has some Hotspot and maybe also GC fixes, so I'd use that. I think you misread Tom's email - it sounds like you are talking about JVM heap sizes, not ramBufferSizeMB? 32MB is certainly not low, and 320MB is not medium. -Yonik http://www.lucidimagination.com
Re: Schema error unknown field
On Feb 18, 2010, at 3:27 PM, Erick Erickson wrote: > The Manning book for SOLR or LucidWorks are good resources And of course the PACKT book ;-) ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
Run Solr within my war
Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How easy/difficult is that to setup? Can anyone with past experience on this topic, please comment. thanks, - Pulkit
Re: Run Solr within my war
Why would you want to? Surely having it seperate increases scalablity? On 18 Feb 2010, at 22:23, "Pulkit Singhal" wrote: > Hello Everyone, > > I do NOT want to host Solr separately. I want to run it within my war > with the Java Application which is using it. How easy/difficult is > that to setup? Can anyone with past experience on this topic, please > comment. > > thanks, > - Pulkit
Re: Schema error unknown field
Oops, got my Manning MEAP edition of LIA II mixed up with my PACKT SOLR 1.4 book. But some author guy caught my gaffe ... Erick On Thu, Feb 18, 2010 at 5:13 PM, Smiley, David W. wrote: > On Feb 18, 2010, at 3:27 PM, Erick Erickson wrote: > > > The Manning book for SOLR or LucidWorks are good resources > > And of course the PACKT book ;-) > > ~ David Smiley > Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ > > > > > >
Re: Run Solr within my war
Yeah I have been pitching that but I want all the functionality of Solr in a small package because it is not a concern given the specifically limited data set being searched upon. I understand that the # of users is still another part of this equation but there just aren't that many at this time and having it separate will add to deployment complexity and kill the product before it ever takes off. Adoption is key for me. On Thu, Feb 18, 2010 at 2:25 PM, Dave Searle wrote: > Why would you want to? Surely having it seperate increases scalablity? > > On 18 Feb 2010, at 22:23, "Pulkit Singhal" > wrote: > >> Hello Everyone, >> >> I do NOT want to host Solr separately. I want to run it within my war >> with the Java Application which is using it. How easy/difficult is >> that to setup? Can anyone with past experience on this topic, please >> comment. >> >> thanks, >> - Pulkit >
spellcheck.build=true has no effect
Hello All. After doing a lot of research i came to this conclusion please correct me if i am wrong. i noticed that if you have buildonCommit and buildOnOptimize as true in your spell check component, then the spell check builds whenever a commit or optimze happens. which is the desired behaviour and correct. please read on. I am using Index based spell checker and i am copying make and model to my spellcheck field. i index some document and the make and model are being copied to spellcheck field when i commit. Now i stopped my solr server and I added one more filed bodytype to be copied to my spellcheck field. i dont want to reindex data so i issued a http request to rebuild my spellchecker &spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default. Looks like the above command has no effect, the bodyType is not being copied to spellcheck field. The only time the spellcheck filed has bodyType value copied into it is when i have to do again reindex document and do a commmit. Is this the desired behaviour. Adding buildOncommit and buildOnOptimize will force the spellchecker to rebuild only if a commit or optimize happens Please let me know if there are some configurable parameters thanks darniz -- View this message in context: http://old.nabble.com/spellcheck.build%3Dtrue-has-no-effect-tp27648346p27648346.html Sent from the Solr - User mailing list archive at Nabble.com.
Range Searches in Collections
Hi, I'm trying to do a search on a range of floats that are part of my solr schema. Basically we have a collection of "fees" that are associated with each document in our index. The query I tried was: q=fees:[3 TO 10] This should return me documents with Fee values between 3 and 10 inclusively, which it does. However, I need it to check for ALL items in this collection, not just one that satisfies it. Currently, this is returning me documents with fee values above 10 and below 3 as long as it contains at least one other within. Any suggestions on how to accomplish this? -- View this message in context: http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27648470.html Sent from the Solr - User mailing list archive at Nabble.com.
How does one sort facet queries?
All sorting of facets works great at the field level (count/index)...all good there...but how is sorting accomplished with range queries? The solrj response doesn't seem to maintain the order the queries are sent in, and the order is not in index or count order. What's the trick? http://localhost:8983/solr/select?q=someterm &rows=0 &facet=true &facet.limit=-1 &facet.query=price:[* TO 100] &facet.query=price:[100 TO 200] &facet.query=price:[200 TO 300] &facet.query=price:[300 TO 400] &facet.query=price:[400 TO 500] &facet.query=price:[500 TO 600] &facet.query=price:[600 TO 700] &facet.query=price:[700 TO *] &facet.mincount=1 &collapse.field=dedupe_hash &collapse.threshold=1 &collapse.type=normal &collapse.facet=before -- View this message in context: http://old.nabble.com/How-does-one-sort-facet-queries--tp27648587p27648587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Run Solr within my war
On 2/18/2010 4:22 PM, Pulkit Singhal wrote: Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How easy/difficult is that to setup? Can anyone with past experience on this topic, please comment. thanks, - Pulkit So basically you're talking about running an embedded version of Solr like the EmbeddedSolrServer? I have no experience on this, but this should provide you the correct search term to find documentation on use. From what little code I've seen to run test cases against Solr, it looks relatively straight forward to get running. To use you would use the SolrJ library to communicate with the embedded solr server. Richard
Re: Range Searches in Collections
Hm, yes, it sounds like your "fees" field has multiple values/tokens, one for each fee. That's full-text search for you. :) How about having multiple fee fields, each with just one fee value? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: cjkadakia To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 7:58:23 PM Subject: Range Searches in Collections Hi, I'm trying to do a search on a range of floats that are part of my solr schema. Basically we have a collection of "fees" that are associated with each document in our index. The query I tried was: q=fees:[3 TO 10] This should return me documents with Fee values between 3 and 10 inclusively, which it does. However, I need it to check for ALL items in this collection, not just one that satisfies it. Currently, this is returning me documents with fee values above 10 and below 3 as long as it contains at least one other within. Any suggestions on how to accomplish this? -- View this message in context: http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27648470.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is largest reasonable setting for ramBufferSizeMB?
Hi Tom, It wouldn't. I didn't see the mention of parallel indexing in the original email. :) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Tom Burton-West > To: solr-user@lucene.apache.org > Sent: Thu, February 18, 2010 3:30:05 PM > Subject: Re: What is largest reasonable setting for ramBufferSizeMB? > > > Thanks Otis, > > I don't know enough about Hadoop to understand the advantage of using Hadoop > in this use case. How would using Hadoop differ from distributing the > indexing over 10 shards on 10 machines with Solr? > > Tom > > > > Otis Gospodnetic wrote: > > > > Hi Tom, > > > > 32MB is very low, 320MB is medium, and I think you could go higher, just > > pick whichever garbage collector is good for throughput. I know Java 1.6 > > update 18 also has some Hotspot and maybe also GC fixes, so I'd use that. > > Finally, this sounds like a good use case for reindexing with Hadoop! > > > > Otis > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > > > -- > View this message in context: > http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27645167.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: parsing strings into phrase queries
This sounds useful to me! Here's a pointer: http://wiki.apache.org/solr/HowToContribute Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: Kevin Osborn To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 1:15:11 PM Subject: Re: parsing strings into phrase queries The PositionFilter worked great for my purpose along with another filter that I build. In my case, my indexed data may be something like "X150". So, a query for "Nokia X150" should match. But I don't want random matches on "x". However, if my indexed data is "G7", I do want a query on "PowerShot G7" to match on "g" and "7". So, a simple length filter will not do. Instead I build a custom filter (that I am willing to contribute back) that filters out singletons that are surrounded by longer tokens (3 or more by default). So, "PowerShot G7" becomes "power" "shot" "g" "7", but "Nokia X150" becomes "nokia" "150". And then I put the results of this into a PositionFilter. This allows "Nokia X150ABC" to match against the "X150" part. So far I really like this for partial part number searches. And then to boost exact matches, I used copyField to create another field without PositionFilter. And then did an optional phrase query on that. From: Lance Norskog To: solr-user@lucene.apache.org Sent: Wed, February 17, 2010 7:23:23 PM Subject: Re: parsing strings into phrase queries That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir wrote: > i think we can improve the docs/wiki to show this example use case, i > noticed the wiki explanation for this filter gives a more complex shingles > example, which is interesting, but this seems to be a common problem and > maybe we should add this use case. > > On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter > wrote: > >> >> : take a look at PositionFilter >> >> Right, there was another thread recently where almost the exact same issue >> was discussed... >> >> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html >> >> ..except that i was ignorant of the existence of PositionFilter when i >> wrote that message. >> >> >> >> -Hoss >> >> > > > -- > Robert Muir > rcm...@gmail.com > -- Lance Norskog goks...@gmail.com
Re: replications issue
giskard, Is this on the master or on the slave(s)? Maybe you can paste your replication handler config for the master and your replication handler config for the slave. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: giskard To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 12:16:37 PM Subject: replications issue Hi all, I've setup solr replication as described in the wiki. when i start the replication a directory called index.$numebers is created after a while it disappears and a new index.$othernumbers is created index/ remains untouched with an empty index. any clue? thank you in advance, Riccardo -- ciao, giskard
Re: @Field annotation support
solrj jar On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal wrote: > Hello All, > > When I use Maven or Eclipse to try and compile my bean which has the > @Field annotation as specified in http://wiki.apache.org/solr/Solrj > page ... the compiler doesn't find any class to support the > annotation. What jar should we use to bring in this custom Solr > annotation? > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: optimize is taking too much time
Jagdish Vasani-2 wrote: > > Hi, > > you should not optimize index after each insert of document.insted you > should optimize it after inserting some good no of documents. > because in optimize it will merge all segments to one according to > setting > of lucene index. > > thanks, > Jagdish > On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote: > >> >> hi >> in my solr u have 1,42,45,223 records having some 50GB . >> Now when iam loading a new record and when its trying optimize the docs >> its >> taking 2 much memory and time >> >> >> can any body please tell do we have any property in solr to get rid of >> this. >> >> Thanks in advance >> >> -- >> View this message in context: >> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > Yes, Thanks for reply i have removed the optmize() from code. but i have a doubt .. 1.Will mergefactor internally do any optmization (or) we have to specify 2. Even if solr initaiates optmize if i have a large data like 52GB will that takes huge time? Thanks, Prasad -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html Sent from the Solr - User mailing list archive at Nabble.com.