Re: Cache full text into memory
You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and modification. Fetching the stored field from Solr is also faster. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 12:08 PM, Li Li wrote: > I want to cache full text into memory to improve performance. > Full text is only used to highlight in my application(But it's very > time consuming, My avg query time is about 250ms, I guess it will cost > about 50ms if I just get top 10 full text. Things get worse when get > more full text because in disk, it scatters erverywhere for a query.). > My full text per machine is about 200GB. The memory available for > store full text is about 10GB. So I want to compress it in memory. > Suppose compression ratio is 1:5, then I can load 1/4 full text in > memory. I need a Cache component for it. Has anyone faced the problem > before? I need some advice. Is it possbile using external tools such > as MemCached? Thank you. >
Re: Ranking position in solr
I sent this command: curl http://localhost:8081/solr/update -F stream.body=' ', but it doesn't reload. It doesn't reload automatically after every commit or optimize unless I add new document then i commit. Any idea? On Tue, Jul 13, 2010 at 4:54 PM, Ahmet Arslan wrote: > > I'm using solr 1.4 and only one core. > > The elevate xml file is quite big, and > > i wonder can solr handle that? How to reload the core? > > Markus Jelsma's suggestion is more robust. You don't need to restart or > reload anything. Put elevate.xml under data directory. It will reloaded > automatically after every commit or optimize. > > > > -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: Cache full text into memory
I have already store it in lucene index. But it is in disk and When a query come, it must seek the disk to get it. I am not familiar with lucene cache. I just want to fully use my memory that load 10GB of it in memory and a LRU stragety when cache full. To load more into memory, I want to compress it "in memory". I don't care much about disk space so whether or not it's compressed in lucene . 2010/7/14 findbestopensource : > You have two options > 1. Store the compressed text as part of stored field in Solr. > 2. Using external caching. > http://www.findbestopensource.com/tagged/distributed-caching > You could use ehcache / Memcache / Membase. > > The problem with external caching is you need to synchronize the deletions > and modification. Fetching the stored field from Solr is also faster. > > Regards > Aditya > www.findbestopensource.com > > > On Wed, Jul 14, 2010 at 12:08 PM, Li Li wrote: > >> I want to cache full text into memory to improve performance. >> Full text is only used to highlight in my application(But it's very >> time consuming, My avg query time is about 250ms, I guess it will cost >> about 50ms if I just get top 10 full text. Things get worse when get >> more full text because in disk, it scatters erverywhere for a query.). >> My full text per machine is about 200GB. The memory available for >> store full text is about 10GB. So I want to compress it in memory. >> Suppose compression ratio is 1:5, then I can load 1/4 full text in >> memory. I need a Cache component for it. Has anyone faced the problem >> before? I need some advice. Is it possbile using external tools such >> as MemCached? Thank you. >> >
Re: ShingleFilter failing with more terms than index phrase
Hi Steve, Thanks for your kind response. I checked PositionfilterFactory (re-index as well) but that also didn't solve the problem. Interesting the problem is not reproduceable from Solr's Field Analysis page, it manifests only when it's in a query. I guess the subject for this post is not very correct, it's not that ShingleFilter is failing but -- using ShingleFilter, there is no score provided by the shingle field when I pass more terms than the indexed terms. I observe this using debugQuery. I had actually posted to solr-user but received no response yet. Probably because the problem is not clear at first glance. However, there's an example I have put in the mail for someone interested to try out and check if there's a problem. Let's see if I receive any response. -Ethan On Tue, Jul 13, 2010 at 9:15 PM, Steven A Rowe wrote: > Hi Ethan, > > You'll probably get better answers about Solr specific stuff on the > solr-u...@a.l.o list. > > Check out PositionFilterFactory - it may address your issue: > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory > > Steve > >> -Original Message- >> From: Ethan Collins [mailto:collins.eth...@gmail.com] >> Sent: Tuesday, July 13, 2010 3:42 AM >> To: java-u...@lucene.apache.org >> Subject: ShingleFilter failing with more terms than index phrase >> >> I am using lucene 2.9.3 (via Solr 1.4.1) on windows and am trying to >> understand ShingleFilter. I wrote the following code and find that if I >> provide more words than the actual phrase indexed in the field, then the >> search on that field fails (no score found with debugQuery=true). >> >> Here is an example to reproduce, with field names: >> Id: 1 >> title_1: Nina Simone >> title_2: I put a spell on you >> >> Query (dismax) with: >> - “Nina Simone I put” <- Fails i.e. no score shown from title_1 search >> (using debugQuery) >> - “Nina Simone” <- SUCCESS >> >> But, when I used Solr’s Field Analysis with the ‘shingle’ field (given >> below) and tried “Nina Simone I put”, it succeeds. It’s only during the >> query that no score is provided. I also checked ‘parsedquery’ and it shows >> disjunctionMaxQuery issuing the string “Nina_Simone Simone_I I_put” to the >> title_1 field. >> >> title_1 and title_2 fields are of type ‘shingle’, defined as: >> >> > positionIncrementGap="100" indexed="true" stored="true"> >> >> >> >> > maxShingleSize="2" outputUnigrams="false"/> >> >> >> >> >> > maxShingleSize="2" outputUnigrams="false"/> >> >> >> >> Note that I also have a catchall field which is text. I have qf set >> to: 'id^2 catchall' and pf set to: 'title_1^1.5 title_2^1.2' >> >> If I am missing something or doing something wrong please let me know. >> >> -Ethan >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Cache full text into memory
I have just provided you two options. Since you already store as part of the index, You could try external caching. Try using ehcache / Membase http://www.findbestopensource.com/tagged/distributed-caching . The caching system will do LRU and is much more efficient. On Wed, Jul 14, 2010 at 12:39 PM, Li Li wrote: > I have already store it in lucene index. But it is in disk and When a > query come, it must seek the disk to get it. I am not familiar with > lucene cache. I just want to fully use my memory that load 10GB of it > in memory and a LRU stragety when cache full. To load more into > memory, I want to compress it "in memory". I don't care much about > disk space so whether or not it's compressed in lucene . > > 2010/7/14 findbestopensource : > > You have two options > > 1. Store the compressed text as part of stored field in Solr. > > 2. Using external caching. > > http://www.findbestopensource.com/tagged/distributed-caching > >You could use ehcache / Memcache / Membase. > > > > The problem with external caching is you need to synchronize the > deletions > > and modification. Fetching the stored field from Solr is also faster. > > > > Regards > > Aditya > > www.findbestopensource.com > > > > > > On Wed, Jul 14, 2010 at 12:08 PM, Li Li wrote: > > > >> I want to cache full text into memory to improve performance. > >> Full text is only used to highlight in my application(But it's very > >> time consuming, My avg query time is about 250ms, I guess it will cost > >> about 50ms if I just get top 10 full text. Things get worse when get > >> more full text because in disk, it scatters erverywhere for a query.). > >> My full text per machine is about 200GB. The memory available for > >> store full text is about 10GB. So I want to compress it in memory. > >> Suppose compression ratio is 1:5, then I can load 1/4 full text in > >> memory. I need a Cache component for it. Has anyone faced the problem > >> before? I need some advice. Is it possbile using external tools such > >> as MemCached? Thank you. > >> > > >
Re: Ranking position in solr
> I sent this command: curl http://localhost:8081/solr/update -F stream.body=' > ', but it doesn't reload. > > It doesn't reload automatically after every commit or > optimize unless I add > new document then i commit. Hmm. May be there is an easier way to force it? (add empty/dummy doc) But if you are okey with the core reload/restart you can use this custom code to do it. You need to register this in solrconfig.xml. If you don't want to use custom code you need to use http://wiki.apache.org/solr/CoreAdmin#RELOAD public class DummyRequestHandler extends RequestHandlerBase { public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { try { req.getCore().getCoreDescriptor().getCoreContainer().reload(""); rsp.add("message", "core reloaded successfully"); } catch (final Throwable t) { rsp.add("message", t.getMessage()); } }
MultiValue dynamicField and copyField
Hi everyone, i was wondering if the following was possible somehow: As in: using copyField to copy a multiValued field into another multiValued field. Cheers, Jan
Re: ShingleFilter failing with more terms than index phrase
Hi Steve, Thanks, wrapping with PositionFilter actually worked the search and score -- I made a mistake while re-indexing last time. Trying to analyze PositionFilter: didn't understand why earlier the search of 'Nina Simone I Put' failed since atleast the phrase 'Nina Simone' should have matched against title_0 field. Any clue? I am also trying to understand the impact of PositionFilter on phrase search quality and score. Unfortunately there are not enough literature/help put up by google. -Ethan
Re: Cache full text into memory
Thank you. I don't know which cache system to use. In my application, the cache system must support compression algorithm which has high compression ratio and fast decompression speed(because each time it get from cache, it must decompress). 2010/7/14 findbestopensource : > I have just provided you two options. Since you already store as part of the > index, You could try external caching. Try using ehcache / Membase > http://www.findbestopensource.com/tagged/distributed-caching . The caching > system will do LRU and is much more efficient. > > On Wed, Jul 14, 2010 at 12:39 PM, Li Li wrote: > >> I have already store it in lucene index. But it is in disk and When a >> query come, it must seek the disk to get it. I am not familiar with >> lucene cache. I just want to fully use my memory that load 10GB of it >> in memory and a LRU stragety when cache full. To load more into >> memory, I want to compress it "in memory". I don't care much about >> disk space so whether or not it's compressed in lucene . >> >> 2010/7/14 findbestopensource : >> > You have two options >> > 1. Store the compressed text as part of stored field in Solr. >> > 2. Using external caching. >> > http://www.findbestopensource.com/tagged/distributed-caching >> > You could use ehcache / Memcache / Membase. >> > >> > The problem with external caching is you need to synchronize the >> deletions >> > and modification. Fetching the stored field from Solr is also faster. >> > >> > Regards >> > Aditya >> > www.findbestopensource.com >> > >> > >> > On Wed, Jul 14, 2010 at 12:08 PM, Li Li wrote: >> > >> >> I want to cache full text into memory to improve performance. >> >> Full text is only used to highlight in my application(But it's very >> >> time consuming, My avg query time is about 250ms, I guess it will cost >> >> about 50ms if I just get top 10 full text. Things get worse when get >> >> more full text because in disk, it scatters erverywhere for a query.). >> >> My full text per machine is about 200GB. The memory available for >> >> store full text is about 10GB. So I want to compress it in memory. >> >> Suppose compression ratio is 1:5, then I can load 1/4 full text in >> >> memory. I need a Cache component for it. Has anyone faced the problem >> >> before? I need some advice. Is it possbile using external tools such >> >> as MemCached? Thank you. >> >> >> > >> >
Re: ShingleFilter failing with more terms than index phrase
> Trying to analyze PositionFilter: didn't understand why earlier the > search of 'Nina Simone I Put' failed since atleast the phrase 'Nina > Simone' should have matched against title_0 field. Any clue? Please note that I have configure the ShingleFilter as bigrams without unigrams. [Honestly, I am still struggling to understand how this worked and the earlier one didn't] -Ethan
Re: Cache full text into memory
I doubt about it. Caching system is a key value store. You have to use some compression library to compress and decompress your data. Caching system helps to retrieve fast. Anyways please take a look of each of the caching system features. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 3:06 PM, Li Li wrote: > Thank you. I don't know which cache system to use. In my application, > the cache system must support compression algorithm which has high > compression ratio and fast decompression speed(because each time it > get from cache, it must decompress). > > 2010/7/14 findbestopensource : > > I have just provided you two options. Since you already store as part of > the > > index, You could try external caching. Try using ehcache / Membase > > http://www.findbestopensource.com/tagged/distributed-caching . The > caching > > system will do LRU and is much more efficient. > > > > On Wed, Jul 14, 2010 at 12:39 PM, Li Li wrote: > > > >> I have already store it in lucene index. But it is in disk and When a > >> query come, it must seek the disk to get it. I am not familiar with > >> lucene cache. I just want to fully use my memory that load 10GB of it > >> in memory and a LRU stragety when cache full. To load more into > >> memory, I want to compress it "in memory". I don't care much about > >> disk space so whether or not it's compressed in lucene . > >> > >> 2010/7/14 findbestopensource : > >> > You have two options > >> > 1. Store the compressed text as part of stored field in Solr. > >> > 2. Using external caching. > >> > http://www.findbestopensource.com/tagged/distributed-caching > >> >You could use ehcache / Memcache / Membase. > >> > > >> > The problem with external caching is you need to synchronize the > >> deletions > >> > and modification. Fetching the stored field from Solr is also faster. > >> > > >> > Regards > >> > Aditya > >> > www.findbestopensource.com > >> > > >> > > >> > On Wed, Jul 14, 2010 at 12:08 PM, Li Li wrote: > >> > > >> >> I want to cache full text into memory to improve performance. > >> >> Full text is only used to highlight in my application(But it's very > >> >> time consuming, My avg query time is about 250ms, I guess it will > cost > >> >> about 50ms if I just get top 10 full text. Things get worse when get > >> >> more full text because in disk, it scatters erverywhere for a > query.). > >> >> My full text per machine is about 200GB. The memory available for > >> >> store full text is about 10GB. So I want to compress it in memory. > >> >> Suppose compression ratio is 1:5, then I can load 1/4 full text in > >> >> memory. I need a Cache component for it. Has anyone faced the problem > >> >> before? I need some advice. Is it possbile using external tools such > >> >> as MemCached? Thank you. > >> >> > >> > > >> > > >
DataImporter
Hi all, Can someone help me in this ? Importing 2 different entities one by one (specifying through the entity parameter) why is the second import deleting the previous created index for first entity and vice-versa? The documentation provided by the solr website reports that : "entity : Name of an entity directly under the tag. Use this to execute one or more entities selectively. Multiple 'entity' parameters can be passed on to run multiple entities at once. If nothing is passed , all entities are executed". The problem results the deletion of index already present for other entities. Regards. Sam
Re: DataImporter
Is it possible that you have the same IDs in both entities? Could you show here your entity mappings? Bilgin Ibryam On Wed, Jul 14, 2010 at 11:48 AM, Amdebirhan, Samson, VF-Group < samson.amdebir...@vodafone.com> wrote: > Hi all, > > > > Can someone help me in this ? > > > > Importing 2 different entities one by one (specifying through the entity > parameter) why is the second import deleting the previous created index > for first entity and vice-versa? > > > > The documentation provided by the solr website reports that : > > > > "entity : Name of an entity directly under the tag. Use this > to execute one or more entities selectively. Multiple 'entity' > parameters can be passed on to run multiple entities at once. If nothing > is passed , all entities are executed". > > > > The problem results the deletion of index already present for other > entities. > > > > > > Regards. > > Sam > > > > > > > >
question on wild card
I have a database field = hello world and i am indexing to *text* field with standard analyzer ( text is a copy field of solr) Now when user gives a query text:"hello world%" , how does the query is interpreted in the background are we actually searchingtext: hello OR text: world%( consider by default operator is OR ) -- Nipen Mark
RE: DataImporter
Hi Bilgin It's right I have the same primary key, but testing with the property "preImportDeleteQuery" into the tag entity of the data_config.xml. So now it is working in fact it deletes only the indexs/docs for which I make the full-import based on the field I decleare for the preImportDeleteQuery. Thanks for your time. -Original Message- From: Bilgin Ibryam [mailto:bibr...@gmail.com] Sent: mercoledì 14 luglio 2010 14.46 To: solr-user@lucene.apache.org Subject: Re: DataImporter Is it possible that you have the same IDs in both entities? Could you show here your entity mappings? Bilgin Ibryam On Wed, Jul 14, 2010 at 11:48 AM, Amdebirhan, Samson, VF-Group < samson.amdebir...@vodafone.com> wrote: > Hi all, > > > > Can someone help me in this ? > > > > Importing 2 different entities one by one (specifying through the entity > parameter) why is the second import deleting the previous created index > for first entity and vice-versa? > > > > The documentation provided by the solr website reports that : > > > > "entity : Name of an entity directly under the tag. Use this > to execute one or more entities selectively. Multiple 'entity' > parameters can be passed on to run multiple entities at once. If nothing > is passed , all entities are executed". > > > > The problem results the deletion of index already present for other > entities. > > > > > > Regards. > > Sam > > > > > > > >
Re: Strange "the" when search with dismax
Sounds like you want the 'text' fieldType (or equivalent) and are using 'string' or 'lowercase'. Those must match all exactly (well, case insensitively in the case of 'lowercase'). The TextType field types (like 'text') do tokenizations so matches will occur under many more conditions. -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-the-when-search-with-dismax-tp965473p966524.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MultiValue dynamicField and copyField
Yep, my schema does this all day long. -- View this message in context: http://lucene.472066.n3.nabble.com/MultiValue-dynamicField-and-copyField-tp965941p966536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange "the" when search with dismax
"the" sounds like it might be a stopword. Are you using stopwords in any of your fields covered by the dismax search? But not in some of the other fields covered by dismax? the combination of dismax and stopwords can result in unexpected behavior if you aren't careful. I wrote about this a bit here, you might find it helpful: http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ marship wrote: > Hi. All. >I am using solr dismax to search over my books in db. I indexed them all > using solr. >the problem I noticed today is, > Everything start with I want to search for a book " > The Girl Who Kicked the Hornet's Nest > " > but nothing is returned. I'm sure I have this book in DB. So I stripped some > keyword and finally I found when I search for "the girl who kicked hornet's > nest" , I got the book. > Then I test more > when I search for "the first world war", solr return the book successfully to > me. > But when I search for "the first world war the", solr returns NOTHING! > > > So strange! > So the issue is, if there are 2 "the" in query keywords, solr/dismax simply > return nothing! > > > Why is this happening? > > > Please help. > Thanks. > Regards. > Scott > > > >
DIH: post-delta-import DB cleanup hook?
I'm updating my solr index using a "queue" table in my database. When records get updated, a row gets inserted into the queue table with pk, timestamp, deleted flag, and status. DIH made it easy to use this to identify new/udpated recods as well as deletes. I need to do some post processing however to either 1) update the status field so these queue records can be archived/deleted or 2) delete the queue record. This is a functional requirement, since DIH can handle the old queue records in there. I've perused the list and didn't see any concrete suggestions for handling this. The key requirement is that I'm not just blindly deleting from the queue table but updating or deleting by pk that was _actually_ processed by DIH. This way, if there is an error (onError="skip") we can detect this and follow up. Any thoughts? I'm using 1.4 but could move to trunk if there were a solution there. Thanks --Joachim -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-post-delta-import-DB-cleanup-hook-tp966672p966672.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: MultiValue dynamicField and copyField
I figured out where the problem was. The destination wildcard was actually matching the wrong field. I changed the fieldnames around a bit and now everything works fine. Thanks! > -Ursprüngliche Nachricht- > Von: kenf_nc [mailto:ken.fos...@realestate.com] > Gesendet: Mittwoch, 14. Juli 2010 15:56 > An: solr-user@lucene.apache.org > Betreff: Re: MultiValue dynamicField and copyField > > > Yep, my schema does this all day long. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/MultiValue-dynamicField-and- > copyField-tp965941p966536.html > Sent from the Solr - User mailing list archive at Nabble.com.
RE: Foreign characters question
Thanks for the reply but that didnt help. Tomcat is accepting foreign characters but for some reason when it reads the synonyms file and it encounters that character ñ it doesnt appear correctly in the Field Analysis admin. It shows up as �. If I query exactly for ñ it will work but the synonyms file is srcrewy. -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Foreign characters question
is your synonyms file in UTF-8 encoding? On Wed, Jul 14, 2010 at 11:11 AM, Blargy wrote: > > Thanks for the reply but that didnt help. > > Tomcat is accepting foreign characters but for some reason when it reads > the > synonyms file and it encounters that character ñ it doesnt appear correctly > in the Field Analysis admin. It shows up as �. If I query exactly for ñ it > will work but the synonyms file is srcrewy. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Robert Muir rcm...@gmail.com
date boosting and dismax
I've started a couple of previous threads on this topic, but I did not have a good date field in my index to use at the time. I now have a schema with the document's post_date in tdate format, so I would like to actually do some implementation. Right now, we are not doing relevancy ranking at all - we sort by descending post_date. We have been working on our application code so we can switch to dismax and use relevancy, but it's still important to have a small bias towards newer content. The idea is nothing this list hasn't heard before - to give newer documents a slight relevancy boost. An important sub-goal is to ensure that the adjustment doesn't render Solr's caches useless. I'm thinking that this means that at a minimum, I need to round dates to a resolution of 1 day, but if it's doable, 1 week might be even better. I do like the idea of having different boosts for different time ranges. Can anyone give me a starting point on how to do this? I will need actual URL examples and dismax configuration snippets. Thanks, Shawn
RE: date boosting and dismax
I used this before my search term and it works well: {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} Its enough that when I search for *:* the articles appear in chronological order. Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, July 14, 2010 11:47 AM To: solr-user@lucene.apache.org Subject: date boosting and dismax I've started a couple of previous threads on this topic, but I did not have a good date field in my index to use at the time. I now have a schema with the document's post_date in tdate format, so I would like to actually do some implementation. Right now, we are not doing relevancy ranking at all - we sort by descending post_date. We have been working on our application code so we can switch to dismax and use relevancy, but it's still important to have a small bias towards newer content. The idea is nothing this list hasn't heard before - to give newer documents a slight relevancy boost. An important sub-goal is to ensure that the adjustment doesn't render Solr's caches useless. I'm thinking that this means that at a minimum, I need to round dates to a resolution of 1 day, but if it's doable, 1 week might be even better. I do like the idea of having different boosts for different time ranges. Can anyone give me a starting point on how to do this? I will need actual URL examples and dismax configuration snippets. Thanks, Shawn
Re: Foreign characters question
How can I tell and/or create a UTF-8 synonyms file? Do I have to instruct solr that this file is UTF-8? -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967037.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Foreign characters question
Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf. Changed it to UTF-8 and recreated the file and all is good now. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967058.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: date boosting and dismax
One of the replies I got on a previous thread mentioned range queries, with this example: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Something like this seems more flexible, and into it, I read an implication that the performance would be better than the boost function you've shown, but I don't know how to actually put it into a URL or handler config. I also seem to remember seeing something about how to do "less than" in range queries as well as the "less than or equal to" implied by the above, but I cannot find it now. Thanks, Shawn On 7/14/2010 10:26 AM, Tim Gilbert wrote: I used this before my search term and it works well: {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} Its enough that when I search for *:* the articles appear in chronological order. Tim
RE: date boosting and dismax
Re: flexibility. This boost does decays over time, the further it gets from now the less of a boost it receives. You are right though, it doesn't allow a fine degree of control, particularly if you don't want to smoothly decay the boost. I hadn't considered your suggestion, so I'll keep it in mind if the need arises. Re: Adding boost to query: I am no expert, but I did this and it worked: SolrJ: solrQuery.setQuery("{!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} " + queryparam); Where queryparam is what you are searching for. You quite literally just prepend it. Via http://localhost:8080/apache-solr-1.4.0/select, just prepend it to your q= like this: q={!boost+b%3Drecip(ms(NOW,publishdate),3.16e-11,1,1)}+findthis Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, July 14, 2010 1:16 PM To: solr-user@lucene.apache.org Subject: Re: date boosting and dismax One of the replies I got on a previous thread mentioned range queries, with this example: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Something like this seems more flexible, and into it, I read an implication that the performance would be better than the boost function you've shown, but I don't know how to actually put it into a URL or handler config. I also seem to remember seeing something about how to do "less than" in range queries as well as the "less than or equal to" implied by the above, but I cannot find it now. Thanks, Shawn On 7/14/2010 10:26 AM, Tim Gilbert wrote: > I used this before my search term and it works well: > > {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} > > Its enough that when I search for *:* the articles appear in > chronological order. > > Tim
Re: Foreign characters question
On Wed, Jul 14, 2010 at 12:59 PM, Blargy wrote: > > Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf. > Changed it to UTF-8 and recreated the file and all is good now. Thanks! > > fyi I created an issue with your example here: https://issues.apache.org/jira/browse/SOLR-2003 In this case, the wrong encoding could have been detected and saved you some time... -- Robert Muir rcm...@gmail.com
Re: dismax and date boosts
I have finally figured out how to turn this off in Thunderbird 3: Go to Tools, Options, Display, and turn off "Display emoticons as graphics". On 4/12/2010 12:04 PM, Shawn Heisey wrote: On 4/12/2010 11:55 AM, Shawn Heisey wrote: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 And here we have the perfect example of something I mentioned a while ago - my Thunderbird (v3.0.4 on Win7) turning Solr boost syntax into pretty exponents. Attached a cropped screenshot. If anyone has any idea how to turn this off, I would appreciate knowing how. Shawn
Re: date boosting and dismax
Shawn Heisey wrote: [* TO NOW-2YEARS]^1.0 I also seem to remember seeing something about how to do "less than" in range queries as well as the "less than or equal to" implied by the above, but I cannot find it now. Ranges with square brackets [] are inclusive. Ranges with parens () are exclusive. And you have a less than example above: [* TO value] is a 'less than or equal to value' (inclusive) (* TO value) is a 'less than not including value' (exclusive) Now, if you want inclusive on one end but exclusive on the other, I'm pretty sure you're out of luck. :) Jonathan
Re: Using hl.regex.pattern to print complete lines
Any other thoughts, Chris? I've been messing with this a bit, and can't seem to get (?m)^.*$ to do what I want. 1) I don't care how many characters it returns, I'd like entire lines all the time 2) I just want it to always return 3 lines: the line before, the actual line, and the line after. 3) This should be like "grep -C1" Thanks for your time! -Pete On Jul 9, 2010, at 12:08 AM, Peter Spam wrote: > Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works > better, but I still get fragments before and after some returns. > Thanks for the hint! > > > -Pete > > On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote: > >> >> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single >> : is available that is for getting entire field contents with search terms >> : highlighted. To use it, set hl.useFastVectorHighlighter to true. >> >> He doesn't want the entire field -- his stored field values contain >> multi-line strings (using newline characters) and he wants to make >> fragments per "line" (ie: bounded by newline characters, or the start/end >> of the entire field value) >> >> Peter: i haven't looked at the code, but i expect that the problem is that >> the java regex engine isn't being used in a way that makes ^ and $ match >> any line boundary -- they are probably only matching the start/end of the >> field (and . is probably only matching non-newline characters) >> >> java regexes support embedded flags (ie: "(?xyz)your regex") so you might >> try that (i don't remember what the correct modifier flag is for the >> multiline mode off the top of my head) >> >> -Hoss >> >
Multiple cores or not?
Hi, We are planning to host on same server different website that will use solr. What will be the best? One core with a field i schema: site1, site2 etc... and then add this in every query Or one core per site? Thanks for your help
limiting the total number of documents matched
I'd like to limit the total number of documents that are returned for a search, particularly when the sort order is not based on relevancy. In other words, if the user searches for a very common term, they might get tens of thousands of hits, and if they sort by "title", then very high relevancy documents will be interspersed with very low relevancy documents. I'd like to set a limit to the 1000 most relevant documents, then sort those by title. Is there a way to do this? I guess I could always retrieve the top 1000 documents and sort them in the client, but that seems particularly inefficient. I can't find any other way to do this, though. Thanks, Paul
RE: limiting the total number of documents matched
So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against the IDs and sort by your other field. 1000 seems like a lot for that approach, but who knows until you try it on your data. -Kallin Nagelberg -Original Message- From: Paul [mailto:p...@nines.org] Sent: Wednesday, July 14, 2010 4:16 PM To: solr-user Subject: limiting the total number of documents matched I'd like to limit the total number of documents that are returned for a search, particularly when the sort order is not based on relevancy. In other words, if the user searches for a very common term, they might get tens of thousands of hits, and if they sort by "title", then very high relevancy documents will be interspersed with very low relevancy documents. I'd like to set a limit to the 1000 most relevant documents, then sort those by title. Is there a way to do this? I guess I could always retrieve the top 1000 documents and sort them in the client, but that seems particularly inefficient. I can't find any other way to do this, though. Thanks, Paul
setting up clustering
I'm trying to enable clustering in solr 1.4. I'm following these instructions: http://wiki.apache.org/solr/ClusteringComponent However, `ant get-libraries` fails for me. Before it tries to download the 4 jar files, it tries to compile lucene? Is this necessary? Has anyone gotten clustering working properly? My next attempt was to just copy contrib/clustering/lib/*.jar and contrib/clustering/lib/downloads/*.jar to WEB-INF/lib and enable clustering in solrconfig.xml, but this doesnt work either and I cant tell from the error log whether it just couldnt find the jar files or if there is some other problem: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent'
Re: limiting the total number of documents matched
I was hoping for a way to do this purely by configuration and making the correct GET requests, but if there is a way to do it by creating a custom Request Handler, I suppose I could plunge into that. Would that yield the best results, and would that be particularly difficult? On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin wrote: > So you want to take the top 1000 sorted by score, then sort those by another > field. It's a strange case, and I can't think of a clean way to accomplish > it. You could do it in two queries, where the first is by score and you only > request your IDs to keep it snappy, then do a second query against the IDs > and sort by your other field. 1000 seems like a lot for that approach, but > who knows until you try it on your data. > > -Kallin Nagelberg > > > -Original Message- > From: Paul [mailto:p...@nines.org] > Sent: Wednesday, July 14, 2010 4:16 PM > To: solr-user > Subject: limiting the total number of documents matched > > I'd like to limit the total number of documents that are returned for > a search, particularly when the sort order is not based on relevancy. > > In other words, if the user searches for a very common term, they > might get tens of thousands of hits, and if they sort by "title", then > very high relevancy documents will be interspersed with very low > relevancy documents. I'd like to set a limit to the 1000 most relevant > documents, then sort those by title. > > Is there a way to do this? > > I guess I could always retrieve the top 1000 documents and sort them > in the client, but that seems particularly inefficient. I can't find > any other way to do this, though. > > Thanks, > Paul >
Re: limiting the total number of documents matched
I thought of another way to do it, but I still have one thing I don't know how to do. I could do the search without sorting for the 50th page, then look at the relevancy score on the first item on that page, then repeat the search, but add score > that relevancy as a parameter. Is it possible to do a search with "score:[5 to *]"? It didn't work in my first attempt. On Wed, Jul 14, 2010 at 5:34 PM, Paul wrote: > I was hoping for a way to do this purely by configuration and making > the correct GET requests, but if there is a way to do it by creating a > custom Request Handler, I suppose I could plunge into that. Would that > yield the best results, and would that be particularly difficult? > > On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin > wrote: >> So you want to take the top 1000 sorted by score, then sort those by another >> field. It's a strange case, and I can't think of a clean way to accomplish >> it. You could do it in two queries, where the first is by score and you only >> request your IDs to keep it snappy, then do a second query against the IDs >> and sort by your other field. 1000 seems like a lot for that approach, but >> who knows until you try it on your data. >> >> -Kallin Nagelberg >> >> >> -Original Message- >> From: Paul [mailto:p...@nines.org] >> Sent: Wednesday, July 14, 2010 4:16 PM >> To: solr-user >> Subject: limiting the total number of documents matched >> >> I'd like to limit the total number of documents that are returned for >> a search, particularly when the sort order is not based on relevancy. >> >> In other words, if the user searches for a very common term, they >> might get tens of thousands of hits, and if they sort by "title", then >> very high relevancy documents will be interspersed with very low >> relevancy documents. I'd like to set a limit to the 1000 most relevant >> documents, then sort those by title. >> >> Is there a way to do this? >> >> I guess I could always retrieve the top 1000 documents and sort them >> in the client, but that seems particularly inefficient. I can't find >> any other way to do this, though. >> >> Thanks, >> Paul >> >
Less convoluted way to query for an empty string?
Hi all, I can't seem to find a way to query for an empty string that is simpler than this: field_name:[* to ""] Things that don't work: field_name:"" field_name["" TO ""] Is the one I'm using the simplest option? If so, is there a particular reason the other ones I mention don't work? Just curious mostly. Thanks! Mat
Re: csv response writer
I fixed the path of the queryResponseWriter class in the example solrconfig.xml. This was successfully applied against solr 4.0 trunk. A few quirks: * When I didn't specify a default Delimiter, it printed out null as delimiter. I couldn't figure out why because init(NamedList args) specifies it'll use a default of "," "organization"null"2"null" * If i don't specify the column names, the output doesn't put in empty "" correctly. eg: output has a mismatched number of commas. "organization","1","Test","Name","2"," ","200","8", "organization","4","Solar","4","0", added the patch to https://issues.apache.org/jira/browse/SOLR-1925 @tommychheng Programmer and UC Irvine Graduate Student Find a great grad school based on research interests: http://gradschoolnow.com On 7/13/10 1:41 PM, Erik Hatcher wrote: Tommy, It's not committed to trunk or any other branch at the moment, so no future released version until then. Have you tested it out? Any feedback we should incorporate? When I can carve out some time over the next week or so I'll review and commit if there are no issues brought up. Erik On Jul 13, 2010, at 3:42 PM, Tommy Chheng wrote: Hi, Which next version of solr is the csv response writer set to be included in? https://issues.apache.org/jira/browse/SOLR-1925 -- @tommychheng Programmer and UC Irvine Graduate Student Find a great grad school based on research interests: http://gradschoolnow.com
Re: Less convoluted way to query for an empty string?
On 15.07.2010, at 00:09, Mat Brown wrote: > Hi all, > > I can't seem to find a way to query for an empty string that is > simpler than this: > > field_name:[* to ""] > > Things that don't work: > > field_name:"" > field_name["" TO ""] > > Is the one I'm using the simplest option? If so, is there a particular > reason the other ones I mention don't work? Just curious mostly. > Yomik recently suggested: Hmmm, if this is on a String field, it seemed to work for me. http://localhost:8983/solr/select?debugQuery=on&q=foo_s:""; The raw query parser would also work (it skips analysis): http://localhost:8983/solr/select?debugQuery=on&q={!raw f=foo_s} regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: Solr search streaming/callback
: I was wondering if anyone was aware of any existing functionality where : clients/server components could register some search criteria and be : notified of newly committed data matching the search when it becomes : available you can register a "postCommit" listener in your solrconfig.xml file ... that could either be a custom plugin to execute searches and "push" them somewhere else, or using the existing RunExecutableListener you could execute any command line app to "pull" the data on demand (and push it where ever you want) w/o customizing solr at all. -Hoss
Re: Solr index optimizing help
Does your schema have a unique id specified? If so, is it possible that you indexed many documents that had the same ID, thus deleting previous documents with the same ID? That would account for it, but it's a shot in the dark... Best Erick On Tue, Jul 13, 2010 at 6:20 AM, Karthik K wrote: > Thanks a lot for the reply, > is it independent of merge factor?? > My index size reduced a lot (almost by 40%) after optimization and i am > worried that i might have lost the data. I have no deletes at all but a > high > merge factor. Any suggestions? > > Thanks, > Karthik >
Re: stemmed terms and phrases in a combined query
: My question is how do i query that? : q=text_clean:Nike's new text_orig:"running shoes" : seems like it would work, but not sure its the best way. that's a perfectly good way to do it. : Is there a way i can tell the parser(or extend it) so that every phrase : query it will use one field and for others will use the default field? not with any of the existing parsers. -Hoss
Re: Using stored terms for faceting
: is it possible to use the stored terms of a field for a faceted search? No, the only thing stored fields can be used for is document centric opterations (ie: once you have a small set of individual docIds, you can access the stored fields to return to the user, or highlight, etc...) : I mean, I don't want to get the term frequency per document as it is : shown here: : http://wiki.apache.org/solr/TermVectorComponentExampleOptions : : I want to get the frequency of the term of my special search and show : only the 10 most frequent terms and all the nice things that I can do : for faceting. i honestly have no idea what you are saying you want -- can you provide a concrete use case explaining what you mean? describe some example data and then explain what type of logic owuld happen and what type of result you'd get back? -Hoss
Re: question on wild card
The best way to understand how things are parsed is to go to the solr admin page (Full interface link?) and click the "debug info" box and submit your query. That'll tell you exactly what happens. Alternatively, you can put &debugQuery=on on your URL... HTH Erick On Wed, Jul 14, 2010 at 8:48 AM, Mark N wrote: > I have a database field = hello world and i am indexing to *text* field > with standard analyzer ( text is a copy field of solr) > > Now when user gives a query text:"hello world%" , how does the query is > interpreted in the background > > are we actually searchingtext: hello OR text: world%( consider by > default operator is OR ) > > > > > > > -- > Nipen Mark >
Re: Strange "the" when search with dismax
If the other suggestions don't work, you need to show us the relevant portions of your schema.xml, and probably query output with &debug=on tacked on... Here are some pointers for getting help... http://wiki.apache.org/solr/UsingMailingLists Best Erick 2010/7/14 Jonathan Rochkind > "the" sounds like it might be a stopword. Are you using stopwords in any > of your fields covered by the dismax search? But not in some of the > other fields covered by dismax? the combination of dismax and stopwords > can result in unexpected behavior if you aren't careful. > > I wrote about this a bit here, you might find it helpful: > http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ > > marship wrote: > > Hi. All. > >I am using solr dismax to search over my books in db. I indexed them > all using solr. > >the problem I noticed today is, > > Everything start with I want to search for a book " > > The Girl Who Kicked the Hornet's Nest > > " > > but nothing is returned. I'm sure I have this book in DB. So I stripped > some keyword and finally I found when I search for "the girl who kicked > hornet's nest" , I got the book. > > Then I test more > > when I search for "the first world war", solr return the book > successfully to me. > > But when I search for "the first world war the", solr returns NOTHING! > > > > > > So strange! > > So the issue is, if there are 2 "the" in query keywords, solr/dismax > simply return nothing! > > > > > > Why is this happening? > > > > > > Please help. > > Thanks. > > Regards. > > Scott > > > > > > > > >
Re: How to find first document for the ALL search
: I have found that this search crashes: : : /solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id Ouch .. that exception is kind of hairy. it suggests that your index may have been corrupted in some way -- do you have nay idea what happened? have you tried using hte CheckIndex tool to see what it says? (I'd hate to help you workd arround this but get bit by a timebomb of some other bad docs later) : It looks like just that first document is bad. I am happy to delete it - but : not sure how to get to it. Does anyone know how to find it? CheckIndexes might help ... if it doesn't the next thing you might try is asking for a legitimate field name that you know no document has (ie: if you have a dynamicField with the pattern "str_*" because you have fields like "str_foo" and "str_bar" but you never have fields named "strBOGUS" then use fl=strBOGUS) and then add debugQuery=true to the URL -- the debug info should contain the id. I'll be honest thought: i'm guessing that if your example query doesn't work, by suggestion won't either -- because if you get that error just trying to access the "id" field, the same thing will probably happen when the debugComponent tries to look at up as well. -Hoss
Re: range faceting with integers
: Subject: range faceting with integers : References: : In-Reply-To: http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Solr index optimizing help
yeah, that happened :( ,lost lot of data because of it. Can some one explain the terms numDocs and maxDoc ?? will the difference indicate the duplicates?? Thank you, karthik
about warm up
I want to load full text into an external cache, So I added so codes in newSearcher where I found the warm up takes place. I add my codes before solr warm up which is configed in solrconfig.xml like this: ... public void newSearcher(SolrIndexSearcher newSearcher, SolrIndexSearcher currentSearcher) { warmTextCache(newSearcher,warmTextCache,new String[]{"title","content"}); for (NamedList nlst : (List)args.get("queries")) { } } in warmTextCache I need a reader to get some docs for(int i=0;i0 && !forceNew && _searcher==null) { try { Line 1000 searcherLock.wait(); } catch (InterruptedException e) { log.info(SolrException.toStr(e)); } } And about 5 minutes later. it's ok. So How can I get a "safe" reader in this situation?
Re: Solr index optimizing help
Hi, The difference indicates deletes. Optimize the index (which expunges docs marked as deleted) and the difference disappears. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Karthik K > To: solr-user@lucene.apache.org > Sent: Wed, July 14, 2010 10:26:12 PM > Subject: Re: Solr index optimizing help > > yeah, that happened :( ,lost lot of data because of it. > Can some one explain the terms numDocs and maxDoc ?? will the difference > indicate the duplicates?? > > Thank you, > karthik >
Re: Multiple cores or not?
Hello there, I'm guessing the sites will be searched separately. In that case I'd recommend a core for each site. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: "scr...@asia.com" > To: solr-user@lucene.apache.org > Sent: Wed, July 14, 2010 3:02:36 PM > Subject: Multiple cores or not? > > > > > Hi, > > We are planning to host on same server different website that will use solr. > > What will be the best? > > One core with a field i schema: site1, site2 etc... and then add this in > every >query > > Or one core per site? > > Thanks for your help > > > >
How to speed up solr search speed
Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott
how to eliminating scoring from a query?
in http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf under the performance it mentions: "Queries that don’t sort by score can eliminate scoring, which speeds up queries" how exactly can i do that? If i don't mention which sort i want, it automatically sorts by "score desc". thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p968581.html Sent from the Solr - User mailing list archive at Nabble.com.