query about the server configuration
Dear all, I am quite new and not work on solr for heavy request. I have following server configuration: 16GB RAM 16 CPU I need to index update in every minutes and at least more than 5000 docs per day. Size of the data per day will be around 50 MB. I am expecting 10 to 30 concurrent hit on server which is 2 million hits per day and around 30 to 40 concurrent user at peak our. Right now I had configure core and using static method to call solr server in solrj (SolrServer server = new HttpSolrServer();). I am worried that at peak our static instance of the server in solrj will not able to perform the response and it will become slow. Is there any way to open more then one connection of server instance in the SolrJ like connection pool which we are using in Database related connection pooling (Apache DBCP or Hibernate). Please help me to configure the server as my heavy requirements. thanks for your help. regards jonty
Re: about the SolrServer server = new CommonsHttpSolrServer(URL);
for heavy use (30 to 40 concurrent user) will it work. How to open and maintain more connection at a time like connection pool. So user cat receive fast response.. regards On Fri, Jun 17, 2011 at 12:50 PM, Ahmet Arslan wrote: > > SolrServer server = new CommonsHttpSolrServer(URL); > > > > through out the class. How can I improve the connection, in > > my case: should I need to close the server after fetching > > the result or CommonsHttpSolrServer(URL); will maintain at > > their end. There is other way: I can make this as static and > > can use through out the classes. > > As wiki [1] says, you must use the same instalce through out all of the > classes. > > [1] http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer >
Jonty Rhods wants to chat
--- Jonty Rhods wants to stay in better touch using some of Google's coolest new products. If you already have Gmail or Google Talk, visit: http://mail.google.com/mail/b-26ddccf9dc-56859aec19-TvU2zC9tjv8Q_u4jzhyceWuZkgs You'll need to click this link to be able to chat with Jonty Rhods. To get Gmail - a free email account from Google with over 2,800 megabytes of storage - and chat with Jonty Rhods, visit: http://mail.google.com/mail/a-26ddccf9dc-56859aec19-TvU2zC9tjv8Q_u4jzhyceWuZkgs Gmail offers: - Instant messaging right inside Gmail - Powerful spam protection - Built-in search for finding your messages and a helpful way of organizing emails into "conversations" - No pop-up ads or untargeted banners - just text ads and related information that are relevant to the content of your messages All this, and its yours for free. But wait, there's more! By opening a Gmail account, you also get access to Google Talk, Google's instant messaging service: http://www.google.com/talk/ Google Talk offers: - Web-based chat that you can use anywhere, without a download - A contact list that's synchronized with your Gmail account - Free, high quality PC-to-PC voice calls when you download the Google Talk client We're working hard to add new features and make improvements, so we might also ask for your comments and suggestions periodically. We appreciate your help in making our products even better! Thanks, The Google Team To learn more about Gmail and Google Talk, visit: http://mail.google.com/mail/help/about.html http://www.google.com/talk/about.html (If clicking the URLs in this message does not work, copy and paste them into the address bar of your browser).
Re: about the SolrServer server = new CommonsHttpSolrServer(URL);
> for heavy use (30 to 40 concurrent > user) will it work. > How to open and maintain more connection at a time like > connection pool. So > user cat receive fast response.. It uses HttpClient under the hood. You can pass httpClient to its constructor too. It seems that MultiThreadedHttpConnectionManager has setMaxConnectionsPerHost method. String serverPath = "http://localhost:8983/solr";; HttpClient client = new HttpClient(new MultiThreadedHttpConnectionManager()); URL url = new URL(serverPath); CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client);
Weird optimize performance degradation
Hello! Here is a puzzling experiment: I build an index of about 1.2MM documents using SOLR 3.1. The index has a large number of dynamic fields (about 15.000). Each document has about 100 fields. I add the documents in batches of 20, and every 50.000 documents I optimize the index. The first 10 optimizes (up to exactly 500k documents) take less than a minute and a half. But the 11th and all subsequent commits take north of 10 minutes. The commit logs look identical (in the INFOSTREAM.txt file), but what used to be Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene Merge Thread #0]: merge: total 50 docs Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge Thread #0]: merge store matchedCount=2 vs 2 now eats a lot of time: Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene Merge Thread #0]: merge: total 55 docs Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge Thread #0]: merge store matchedCount=2 vs 2 What could be happening between those two lines that takes 10 minutes at full CPU? (and with 50k docs less used to take so much less?). Thanks in advance, Santiago
Re: Is it true that I cannot delete stored content from the index?
That is correct, but you only need to commit, optimize is not a requirement here. François On Jun 18, 2011, at 11:54 PM, Mohammad Shariq wrote: > I have define in my solr and Deleting the docs from solr using > this uniqueKey. > and then doing optimization once in a day. > is this right way to delete ??? > > On 19 June 2011 05:14, Erick Erickson wrote: > >> Yep, you've got to delete and re-add. Although if you have a >> defined you >> can just re-add that document and Solr will automatically delete the >> underlying >> document. >> >> You might have to optimize the index afterwards to get the data to really >> disappear since the deletion process just marks the document as >> deleted. >> >> Best >> Erick >> >> On Sat, Jun 18, 2011 at 1:20 PM, Gabriele Kahlout >> wrote: >>> Hello, >>> >>> I've indexing with the content field stored. Now I'd like to delete all >>> stored content, is there how to do that without re-indexing? >>> >>> It seems not from lucene >>> FAQ< >> http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F >>> >>> : >>> How do I update a document or a set of documents that are already >>> indexed? There >>> is no direct update procedure in Lucene. To update an index incrementally >>> you must first *delete* the documents that were updated, and *then >>> re-add*them to the index. >>> >>> -- >>> Regards, >>> K. Gabriele >>> >>> --- unchanged since 20/9/10 --- >>> P.S. If the subject contains "[LON]" or the addressee acknowledges the >>> receipt within 48 hours then I don't resend the email. >>> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ >> time(x) >>> < Now + 48h) ⇒ ¬resend(I, this). >>> >>> If an email is sent by a sender that is not a trusted contact or the >> email >>> does not contain a valid code then the email is not received. A valid >> code >>> starts with a hyphen and ends with "X". >>> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ >>> L(-[a-z]+[0-9]X)). >>> >> > > > > -- > Thanks and Regards > Mohammad Shariq
"site:" feature in Solr?
Hello, Beside creating an index with just the site in question, is it possible like with Google to search for results only in a given domain? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: "site:" feature in Solr?
> Beside creating an index with just the site in question, is > it possible like > with Google to search for results only in a given domain? If you have an appropriate field that is indexed, yes. fq=site:foo.com http://wiki.apache.org/solr/CommonQueryParameters#fq
example doesnt run from source?
I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started SocketConnector@0.0.0.0:8983 But then when I go to my browser and go to this address: http://localhost:8983/solr/ I get a 404 error. What else do I need to do to be able to run the example from source?
Re: example doesnt run from source?
Jason, which source did you use for the checkout and how did you build solr? Regards Stefan Am 19.06.2011 15:00, schrieb Jason Toy: I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started SocketConnector@0.0.0.0:8983 But then when I go to my browser and go to this address: http://localhost:8983/solr/ I get a 404 error. What else do I need to do to be able to run the example from source?
Re: Multiple indexes
your data is being used to build an inverted index rather than being stored as a set of records. de-normalising is fine in most cases. what is your use case which requires a normalised set of indices ? 2011/6/18 François Schiettecatte : > You would need to run two independent searches and then 'join' the results. > > It is best not to apply a 'sql' mindset to SOLR when it comes to > (de)normalization, whereas you strive for normalization in sql, that is > usually counter-productive in SOLR. For example, I am working on a project > with 30+ normalized tables, but only 4 cores. > > Perhaps describing what you are trying to achieve would give us greater > insight and thus be able to make more concrete recommendation? > > Cheers > > François > > On Jun 18, 2011, at 2:36 PM, shacky wrote: > >> Il 18 giugno 2011 20:27, François Schiettecatte >> ha scritto: >>> Sure. >> >> So I can have some searches similar to JOIN on MySQL? >> The problem is that I need at least two tables in which search data.. > >
Re: Weird optimize performance degradation
First, there's absolutely no reason to optimize this often, if at all. Older versions of Lucene would search faster on an optimized index, but this is no longer necessary. Optimize will reclaim data from deleted documents, but is generally recommended to be performed fairly rarely, often at off-peak hours. Note that optimize will re-write your entire index into a single new segment, so following your pattern it'll take longer and longer each time. But the speed change happening at 500,000 documents is suspiciously close to the default mergeFactor of 10 X 50,000. Do subsequent optimizes (i.e. on the 750,000th document) still take that long? But this doesn't make sense because if you're optimizing instead of committing, each optimize should reduce your index to 1 segment and you'll never hit a merge. So I'm a little confused. If you're really optimizing every 50K docs, what I'd expect to see is successively longer times, and at the end of each optimize I'd expect there to be only one segment in your index. Are you sure you're not just seeing successively longer times on each optimize and just noticing it after 10? Best Erick On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque wrote: > Hello! > > Here is a puzzling experiment: > > I build an index of about 1.2MM documents using SOLR 3.1. The index has a > large number of dynamic fields (about 15.000). Each document has about 100 > fields. > > I add the documents in batches of 20, and every 50.000 documents I optimize > the index. > > The first 10 optimizes (up to exactly 500k documents) take less than a > minute and a half. > > But the 11th and all subsequent commits take north of 10 minutes. The commit > logs look identical (in the INFOSTREAM.txt file), but what used to be > > Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene Merge > Thread #0]: merge: total 50 docs > > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge > Thread #0]: merge store matchedCount=2 vs 2 > > > now eats a lot of time: > > > Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene Merge > Thread #0]: merge: total 55 docs > > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge > Thread #0]: merge store matchedCount=2 vs 2 > > > What could be happening between those two lines that takes 10 minutes at > full CPU? (and with 50k docs less used to take so much less?). > > > Thanks in advance, > > Santiago >
Re: Is it true that I cannot delete stored content from the index?
That'll work, but you could just as easily simply add the document. Solr will take care of deleting any other documents with the same as a document being added automatically. Optimizing once a day is reasonable, but note that about all you're doing here is reclaiming some space. So if you only do a few deletes a day (by a few I'm thinking several thousand), you may be able to reduce that to once a week. But there's no particular reason to make it less frequent if you're satisfied with how it works now. Best Erick On Sat, Jun 18, 2011 at 11:54 PM, Mohammad Shariq wrote: > I have define in my solr and Deleting the docs from solr using > this uniqueKey. > and then doing optimization once in a day. > is this right way to delete ??? > > On 19 June 2011 05:14, Erick Erickson wrote: > >> Yep, you've got to delete and re-add. Although if you have a >> defined you >> can just re-add that document and Solr will automatically delete the >> underlying >> document. >> >> You might have to optimize the index afterwards to get the data to really >> disappear since the deletion process just marks the document as >> deleted. >> >> Best >> Erick >> >> On Sat, Jun 18, 2011 at 1:20 PM, Gabriele Kahlout >> wrote: >> > Hello, >> > >> > I've indexing with the content field stored. Now I'd like to delete all >> > stored content, is there how to do that without re-indexing? >> > >> > It seems not from lucene >> > FAQ< >> http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F >> > >> > : >> > How do I update a document or a set of documents that are already >> > indexed? There >> > is no direct update procedure in Lucene. To update an index incrementally >> > you must first *delete* the documents that were updated, and *then >> > re-add*them to the index. >> > >> > -- >> > Regards, >> > K. Gabriele >> > >> > --- unchanged since 20/9/10 --- >> > P.S. If the subject contains "[LON]" or the addressee acknowledges the >> > receipt within 48 hours then I don't resend the email. >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ >> time(x) >> > < Now + 48h) ⇒ ¬resend(I, this). >> > >> > If an email is sent by a sender that is not a trusted contact or the >> email >> > does not contain a valid code then the email is not received. A valid >> code >> > starts with a hyphen and ends with "X". >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ >> > L(-[a-z]+[0-9]X)). >> > >> > > > > -- > Thanks and Regards > Mohammad Shariq >
Re: example doesnt run from source?
Right, run "ant example" first to build the example code. You have to run it from the /solr directory. Best Erick On Sun, Jun 19, 2011 at 9:00 AM, Jason Toy wrote: > I'm trying to run the example app from the svn source, but it doesn't seem > to work. I am able to run : > java -jar start.jar > and Jetty starts with: > INFO::Started SocketConnector@0.0.0.0:8983 > > But then when I go to my browser and go to this address: > http://localhost:8983/solr/ > I get a 404 error. What else do I need to do to be able to run the example > from source? >
Re: Optimize taking two steps and extra disk space
With LogXMergePolicy (the default before 3.2), optimize respects mergeFactor, so it's doing 2 steps because you have 37 segments but 35 mergeFactor. With TieredMergePolicy (default on 3.2 and after), there is now a separate merge factor used for optimize (maxMergeAtOnceExplicit)... so you could eg set this factor higher and more often get a single merge for the optimize. Mike McCandless http://blog.mikemccandless.com On Sat, Jun 18, 2011 at 6:45 PM, Shawn Heisey wrote: > I've noticed something odd in Solr 3.2 when it does an optimize. One of my > shards (freshly built via DIH full-import) had 37 segments, totalling > 17.38GB of disk space. 13 of those segments were results of merges during > initial import, the other 24 were untouched after creation. Starting at _0, > the final segment before optimizing is _co. The mergefactor on the index is > 35, chosen because it makes merged segments line up nicely on "z" > boundaries. > > The optmization process created a _cp segment of 14.4GB, followed by a _cq > segment at the final 17.27GB size, so at the peak, it took 49GB of disk > space to hold the index. > > Is there any way to make it do the optimize in one pass? Is there a > compelling reason why it does it this way? > > Thanks, > Shawn > >
Re: Weird optimize performance degradation
Hello Erick, thanks for your answer! Yes, our over-optimization is mainly due to paranoia over these strange commit times. The long optimize time persisted in all the subsequent commits, and this is consistent with what we are seeing in other production indexes that have the same problem. Once the anomaly shows up, it never commits quickly again. I combed through the last 50k documents that were added before the first slow commit. I found one with a larger than usual number of fields (didn't write down the number, but it was a few thousands). I deleted it, and the following optimize was normal again (110 seconds). So I'm pretty sure a document with lots of fields is the cause of the slowdown. If that would be useful, I can do some further testing to confirm this hypothesis and send the document to the list. Thanks again for your answer. Best, Santiago On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson wrote: > First, there's absolutely no reason to optimize this often, if at all. > Older > versions of Lucene would search faster on an optimized index, but > this is no longer necessary. Optimize will reclaim data from > deleted documents, but is generally recommended to be performed > fairly rarely, often at off-peak hours. > > Note that optimize will re-write your entire index into a single new > segment, > so following your pattern it'll take longer and longer each time. > > But the speed change happening at 500,000 documents is suspiciously > close to the default mergeFactor of 10 X 50,000. Do subsequent > optimizes (i.e. on the 750,000th document) still take that long? But > this doesn't make sense because if you're optimizing instead of > committing, each optimize should reduce your index to 1 segment and > you'll never hit a merge. > > So I'm a little confused. If you're really optimizing every 50K docs, what > I'd expect to see is successively longer times, and at the end of each > optimize I'd expect there to be only one segment in your index. > > Are you sure you're not just seeing successively longer times on each > optimize and just noticing it after 10? > > Best > Erick > > On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque > wrote: > > Hello! > > > > Here is a puzzling experiment: > > > > I build an index of about 1.2MM documents using SOLR 3.1. The index has a > > large number of dynamic fields (about 15.000). Each document has about > 100 > > fields. > > > > I add the documents in batches of 20, and every 50.000 documents I > optimize > > the index. > > > > The first 10 optimizes (up to exactly 500k documents) take less than a > > minute and a half. > > > > But the 11th and all subsequent commits take north of 10 minutes. The > commit > > logs look identical (in the INFOSTREAM.txt file), but what used to be > > > > Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene > Merge > > Thread #0]: merge: total 50 docs > > > > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge > > Thread #0]: merge store matchedCount=2 vs 2 > > > > > > now eats a lot of time: > > > > > > Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene > Merge > > Thread #0]: merge: total 55 docs > > > > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge > > Thread #0]: merge store matchedCount=2 vs 2 > > > > > > What could be happening between those two lines that takes 10 minutes at > > full CPU? (and with 50k docs less used to take so much less?). > > > > > > Thanks in advance, > > > > Santiago > > >
Solr Multithreading
Hi, I am currently working on a search based project which involves indexing data from a SQL Server database including attachments using DIH. For indexing attachments (varbinary DB objects), I am using TikaEntityProcessor. I am trying to use the multithreading to speed up the indexing but it seems to fail when indexing attachments, even after appying a few Solr fix patches. My question is, Is the current multithreading feature stable in Solr 3.1 or it needs further enhancements ? -- Thanks and Regards Rahul A. Warawdekar
fq vs adding to query
Are there any hard and fast rules about when to use fq vs adding to the query? For instance if I started with a search of camera then wanted to add another keyword say digital, is it better to do q=camera AND digital or q=camera&fq=digital I know that fq isn't taken into account when doing highlighting, so what I am currently doing is when there are facet based queries I am doing fqs but everything else is being added to the query, so in the case above I would have done q=camera AND digital. If however there was a field called category with values standard or digital I would have done q=camera&fq=category:digital. Any guidance would be appreciated.
Re: fq vs adding to query
fq is filter-query, search based on category, timestamp, language etc. but I dont see any performance improvement if use 'keyword' in fq. useCases : fq=lang:English&q=camera AND digital OR fq=time:[13023567 TO 13023900]&q=camera AND digital On 19 June 2011 20:17, Jamie Johnson wrote: > Are there any hard and fast rules about when touse fq vs adding to the > query? For instance if I started with a search of > camera > > then wanted to add another keyword say digital, is it better to do > > q=camera AND digital > > or > > q=camera&fq=digital > > I know that fq isn't taken into account when doing highlighting, so what I > am currently doing is when there are facet based queries I am doing fqs but > everything else is being added to the query, so in the case above I would > have done q=camera AND digital. If however there was a field called > category with values standard or digital I would have done > q=camera&fq=category:digital. Any guidance would be appreciated. > -- Thanks and Regards Mohammad Shariq
Building Solr 3.2 from sources - can't get war
Hi, This is my first post here so excuse me please if it is not really related. At the moment I'm using Solr 1.4.1 with SOLR-236 (https://issues.apache.org/jira/browse/SOLR-236) patch applied to support field collapsing. One of the mandatory fields of documents indexed is generated from the *.doc/*.docx/*.pdf files uploaded by users, so Solr Cell is also heavily used in the project for the purpose of parsing documents to store their plain text content. Unfortunately, it can't parse correctly all the documents but in most cases it works well enough. Recently I learned (http://stackoverflow.com/questions/6369214/solr-cell-extractingrequesthandler-cannot-parse-some-doc-files/) that Solr Cell I'm using is old so by using its up-to-date version I can get more documents parsed correctly. As I am using apache-solr-cell-1.4.1.jar in my lib folder, first thing I tried was to replace it with apache-solr-cell-3.2.jar from the latest distribution without changing anything else (e.g. war file). After Solr instance was restarted, it worked (I managed to fetch the content of the parsed document) but after a number of requests crashed. Then, I decided that in order to use *-3.2 libraries properly I need to use 3.2 core war file as well. But as I need the collapsing functionality, I need to build a custom patched version of it as I did before with 1.4.1. -- So the first question is if I was really right in my assumption here - maybe it is possible to upgrade Solr Cell / Tika to the latest version while still using 1.4.1 Solr core? If that's possible, my following questions can be skipped. -- And the problem I am facing is that I can't build 3.2 version war file. I mean, when I get source from http://svn.apache.org/repos/asf/lucene/solr/tags/1.4.1/release-1.4.1 among the build options there is the "dist-war" key which allows to build war core and a set of standard libraries. Everything is simple in case you need to build 1.4.1 core. For 3.2, I can't see a similar build option. First, there is no release-3.2 folder, so I tried to checkout http://svn.apache.org/repos/asf/lucene/dev/trunk supposing this is the latest stable release (and I might be wrong there). However, there is no "dist-war" build option and I only get various jar files when building that branch with no war file at all. -- So the second question is what exactly am I doing wrong - do I checkout incorrect branch (and what is the correct one then?) or do I build it improperly (maybe I need to modify build.xml somehow)? -- Many thanks in advance. Feel free to ask for more details if that matters - I am a total noob in Java programming so very likely I've missed something here. -- Yuriy Akopov
Re: Optimize taking two steps and extra disk space
On 6/19/2011 7:32 AM, Michael McCandless wrote: With LogXMergePolicy (the default before 3.2), optimize respects mergeFactor, so it's doing 2 steps because you have 37 segments but 35 mergeFactor. With TieredMergePolicy (default on 3.2 and after), there is now a separate merge factor used for optimize (maxMergeAtOnceExplicit)... so you could eg set this factor higher and more often get a single merge for the optimize. This makes sense. the default for maxMergeAtOnceExplicit is 30 according to LUCENE-854, so it merges the first 30 segments, then it goes back and merges the new one plus the other 7 that remain. To counteract this behavior, I've put this in my solrconfig.xml, to test next week. 70 I figure that twice the megeFactor (35) will likely cover every possible outcome. Is that a correct thought? Thanks, Shawn
Re: Weird optimize performance degradation
I also have the solr with around 100mn docs. I do optimize once in a week, and it takes around 1 hour 30 mins to optimize. On 19 June 2011 20:02, Santiago Bazerque wrote: > Hello Erick, thanks for your answer! > > Yes, our over-optimization is mainly due to paranoia over these strange > commit times. The long optimize time persisted in all the subsequent > commits, and this is consistent with what we are seeing in other production > indexes that have the same problem. Once the anomaly shows up, it never > commits quickly again. > > I combed through the last 50k documents that were added before the first > slow commit. I found one with a larger than usual number of fields (didn't > write down the number, but it was a few thousands). > > I deleted it, and the following optimize was normal again (110 seconds). So > I'm pretty sure a document with lots of fields is the cause of the > slowdown. > > If that would be useful, I can do some further testing to confirm this > hypothesis and send the document to the list. > > Thanks again for your answer. > > Best, > Santiago > > On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson >wrote: > > > First, there's absolutely no reason to optimize this often, if at all. > > Older > > versions of Lucene would search faster on an optimized index, but > > this is no longer necessary. Optimize will reclaim data from > > deleted documents, but is generally recommended to be performed > > fairly rarely, often at off-peak hours. > > > > Note that optimize will re-write your entire index into a single new > > segment, > > so following your pattern it'll take longer and longer each time. > > > > But the speed change happening at 500,000 documents is suspiciously > > close to the default mergeFactor of 10 X 50,000. Do subsequent > > optimizes (i.e. on the 750,000th document) still take that long? But > > this doesn't make sense because if you're optimizing instead of > > committing, each optimize should reduce your index to 1 segment and > > you'll never hit a merge. > > > > So I'm a little confused. If you're really optimizing every 50K docs, > what > > I'd expect to see is successively longer times, and at the end of each > > optimize I'd expect there to be only one segment in your index. > > > > Are you sure you're not just seeing successively longer times on each > > optimize and just noticing it after 10? > > > > Best > > Erick > > > > On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque > > wrote: > > > Hello! > > > > > > Here is a puzzling experiment: > > > > > > I build an index of about 1.2MM documents using SOLR 3.1. The index has > a > > > large number of dynamic fields (about 15.000). Each document has about > > 100 > > > fields. > > > > > > I add the documents in batches of 20, and every 50.000 documents I > > optimize > > > the index. > > > > > > The first 10 optimizes (up to exactly 500k documents) take less than a > > > minute and a half. > > > > > > But the 11th and all subsequent commits take north of 10 minutes. The > > commit > > > logs look identical (in the INFOSTREAM.txt file), but what used to be > > > > > > Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene > > Merge > > > Thread #0]: merge: total 50 docs > > > > > > Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene > Merge > > > Thread #0]: merge store matchedCount=2 vs 2 > > > > > > > > > now eats a lot of time: > > > > > > > > > Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene > > Merge > > > Thread #0]: merge: total 55 docs > > > > > > Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene > Merge > > > Thread #0]: merge store matchedCount=2 vs 2 > > > > > > > > > What could be happening between those two lines that takes 10 minutes > at > > > full CPU? (and with 50k docs less used to take so much less?). > > > > > > > > > Thanks in advance, > > > > > > Santiago > > > > > > -- Thanks and Regards Mohammad Shariq
Re: fq vs adding to query
If you wan't to make good use of the filter cache then use filter queries. > fq is filter-query, search based on category, timestamp, language etc. but > I dont see any performance improvement if use 'keyword' in fq. > > useCases : > fq=lang:English&q=camera AND digital > OR > fq=time:[13023567 TO 13023900]&q=camera AND digital > > On 19 June 2011 20:17, Jamie Johnson wrote: > > Are there any hard and fast rules about when touse fq vs adding to the > > query? For instance if I started with a search of > > camera > > > > then wanted to add another keyword say digital, is it better to do > > > > q=camera AND digital > > > > or > > > > q=camera&fq=digital > > > > I know that fq isn't taken into account when doing highlighting, so what > > I am currently doing is when there are facet based queries I am doing > > fqs but everything else is being added to the query, so in the case > > above I would have done q=camera AND digital. If however there was a > > field called category with values standard or digital I would have done > > q=camera&fq=category:digital. Any guidance would be appreciated.
Re: Building Solr 3.2 from sources - can't get war
On 6/19/2011 9:32 AM, Yuriy Akopov wrote: For 3.2, I can't see a similar build option. First, there is no release-3.2 folder, so I tried to checkout http://svn.apache.org/repos/asf/lucene/dev/trunk supposing this is the latest stable release (and I might be wrong there). However, there is no "dist-war" build option and I only get various jar files when building that branch with no war file at all. I don't know the answer to your first question about Tika, but I can tackle the second. In the checked out lucene (either trunk or one of the 3.x branches) source is a solr/ directory. You just cd into that directory, and dist-war becomes a build option. I tend to build solr with "ant dist" which also builds all the contrib jars. If you are using the dataimporthandler, you'll want the contrib jars. DIH has always been a contrib module, and in 3.1 it was removed from the .war file. Building dist succeeds, but I just tried dist-war on my checked out 3.2 and it failed, ending with the following error: BUILD FAILED /opt/ncindex/src/orig_3_2/solr/build.xml:620: /opt/ncindex/src/orig_3_2/solr/build/web not found. Shawn
Re: fq vs adding to query
On 6/19/2011 10:00 AM, Markus Jelsma wrote: If you wan't to make good use of the filter cache then use filter queries. Additionally, information in filter queries will not affect relevancy ranking. If you want the terms you are using to affect the document scores, include them in the main query. Filter queries are intended for just that -- filtering. They do it very efficiently, especially if you reuse them frequently, which hits the filter cache as Markus said. It's often good practice to break up your filter queries into multiple fq statements so that there's more likelihood that they will use the cache. Thanks, Shawn
Re: query about the server configuration
Please help I am also in same situation. regards On Sunday 19 June 2011 12:59 PM, Jonty Rhods wrote: Dear all, I am quite new and not work on solr for heavy request. I have following server configuration: 16GB RAM 16 CPU I need to index update in every minutes and at least more than 5000 docs per day. Size of the data per day will be around 50 MB. I am expecting 10 to 30 concurrent hit on server which is 2 million hits per day and around 30 to 40 concurrent user at peak our. Right now I had configure core and using static method to call solr server in solrj (SolrServer server = new HttpSolrServer();). I am worried that at peak our static instance of the server in solrj will not able to perform the response and it will become slow. Is there any way to open more then one connection of server instance in the SolrJ like connection pool which we are using in Database related connection pooling (Apache DBCP or Hibernate). Please help me to configure the server as my heavy requirements. thanks for your help. regards jonty
Re: about the SolrServer server = new CommonsHttpSolrServer(URL);
thanks.. however few more query. How to maintain connections threads (max and min settings)? What would be ideal setting for max in setMaxConnectionsPerHost method. Will it be ok for 30 to 40 concurrent user. How thread will be maintain for MultiThreadedHttpConnectionManager class. On Sunday 19 June 2011 02:04 PM, Ahmet Arslan wrote: for heavy use (30 to 40 concurrent user) will it work. How to open and maintain more connection at a time like connection pool. So user cat receive fast response.. It uses HttpClient under the hood. You can pass httpClient to its constructor too. It seems that MultiThreadedHttpConnectionManager has setMaxConnectionsPerHost method. String serverPath = "http://localhost:8983/solr";; HttpClient client = new HttpClient(new MultiThreadedHttpConnectionManager()); URL url = new URL(serverPath); CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client);
Re: Building Solr 3.2 from sources - can't get war
In the checked out lucene (either trunk or one of the 3.x branches) source is a solr/ directory. You just cd into that directory, and dist-war becomes a build option. Thanks, Shawn! That worked and by invoking dist-war build I have received apache-solr-4.0-SNAPSHOT.war file successfully - but judging by its name it is a current 4.0 snapshot rather than stable 3.2. Alas, 4.0 doesn't suit me for two reasons: first, it is still experimental and hasn't been released yet (at least as far as I know) and second, it supports field collapsing natively, so it doesn't need to be patched. The problem is that the parameters Solr 4.0 uses to control collapsing are not compatible with the ones added by SOLR-236 patch so I have to rewrite my client application as well. Which is surely inevitable sooner or later but until 4.0 is released I'd prefer stick to earlier version. So I need an advice once again - which folder I need to checkout to get 3.2 source code? Is it clear with 1.4.1 (.../tags/release-1.4.1 is obvious enough), and ...dev/trunk turned out to contain 4.0. Surely my question is silly but I can't figure out how can I get Solr 3.2 buildable source code. -y.
RE: Building Solr 3.2 from sources - can't get war
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/ > -Original Message- > From: Yuriy Akopov [mailto:ako...@hotmail.co.uk] > Sent: Sunday, June 19, 2011 4:38 PM > To: solr-user@lucene.apache.org > Subject: Re: Building Solr 3.2 from sources - can't get war > > > In the checked out lucene (either trunk or one of the 3.x branches) > source > > is a solr/ directory. You just cd into that directory, and dist-war > > becomes a build option. > > Thanks, Shawn! That worked and by invoking dist-war build I have received > apache-solr-4.0-SNAPSHOT.war file successfully - but judging by its name > it > is a current 4.0 snapshot rather than stable 3.2. > > Alas, 4.0 doesn't suit me for two reasons: first, it is still > experimental > and hasn't been released yet (at least as far as I know) and second, it > supports field collapsing natively, so it doesn't need to be patched. The > problem is that the parameters Solr 4.0 uses to control collapsing are > not > compatible with the ones added by SOLR-236 patch so I have to rewrite my > client application as well. Which is surely inevitable sooner or later > but > until 4.0 is released I'd prefer stick to earlier version. > > So I need an advice once again - which folder I need to checkout to get > 3.2 > source code? Is it clear with 1.4.1 (.../tags/release-1.4.1 is obvious > enough), and ...dev/trunk turned out to contain 4.0. Surely my question > is > silly but I can't figure out how can I get Solr 3.2 buildable source > code. > > -y.
Re: Solr and Tag Cloud
Consider you have multivalued field _tag_ related to every document in your corpus. Then you can build tag cloud relevant for all data set or specific query by retrieving facets for field _tag_ for "*:*" or any other query. You'll get a list of popular _tag_ values relevant to this query with occurrence counts. If you want to build tag cloud for general analyzed text fields you still can do that the same way, but you should note that you can hit some performance/memory problems if you have significant data set and huge text fields. You should probably use stop words to filter popular general terms. On Sat, Jun 18, 2011 at 8:12 AM, Jamie Johnson wrote: > Does anyone have details of how to generate a tag cloud of popular terms > across an entire data set and then also across a query? >
paging and maintaingin a cursor just like ScrollableResultSet
As you probably know, using Query in hibernate/JPA gets slower and slower each page since it starts all over on the index tree :( WHILE ScrollableResultSet does NOT because the database maintains a cursor into the index that just picks up where it left off so as you go to the next page, next page, the speed stays linearly the same Does something like that exist in solr? I was looking at the api and all the examples are just for returning all results from what I could tell. I went into Lucene and it looks like it can do it kind of if you code up your own Collector and unfortunately make the Collector.collect(int doc) block on a lock while waiting for the client to ask for the next page(or ask to release the resource since it is complete). Ie. ScrollableResultSet obviously has to be closed when complete and so would this method as well. Any ideas on how to achieve this as my client is a computer not a webapp with a human clicking next page and we want the resultset paging to be linear as it really hurts our performance. Thanks, Dean This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
Re: Why are not query keywords treated as a set?
do you mean a phrase query? "past past" can you give some more detail? On 18 June 2011 13:02, Gabriele Kahlout wrote: > q=past past > > 1.0 = (MATCH) sum of: > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* > 1.0 = tf(termFreq(content:past)=1) > 1.0 = idf(docFreq=1, maxDocs=2) > 0.5 = fieldNorm(field=content, doc=0) > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* > 1.0 = tf(termFreq(content:past)=1) > 1.0 = idf(docFreq=1, maxDocs=2) > 0.5 = fieldNorm(field=content, doc=0) > > Is there how I can treat the query keywords as a set? > > -- > Regards, > K. Gabriele > > --- unchanged since 20/9/10 --- > P.S. If the subject contains "[LON]" or the addressee acknowledges the > receipt within 48 hours then I don't resend the email. > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) > < Now + 48h) ⇒ ¬resend(I, this). > > If an email is sent by a sender that is not a trusted contact or the email > does not contain a valid code then the email is not received. A valid code > starts with a hyphen and ends with "X". > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > L(-[a-z]+[0-9]X)). >
Re: paging and maintaingin a cursor just like ScrollableResultSet
One technique I've used to page through huge result sets that could help: if you have a sortable key (like an id), you can just fetch all docs, sorted by the key, and then on subsequent page requests use the last value from the previous page as a filter in a range term like: id:[ TO *] where you substitute for there may be a better approach though... -Mike On 6/19/2011 6:02 PM, Hiller, Dean x66079 wrote: As you probably know, using Query in hibernate/JPA gets slower and slower each page since it starts all over on the index tree :( WHILE ScrollableResultSet does NOT because the database maintains a cursor into the index that just picks up where it left off so as you go to the next page, next page, the speed stays linearly the same Does something like that exist in solr? I was looking at the api and all the examples are just for returning all results from what I could tell. I went into Lucene and it looks like it can do it kind of if you code up your own Collector and unfortunately make the Collector.collect(int doc) block on a lock while waiting for the client to ask for the next page(or ask to release the resource since it is complete). Ie. ScrollableResultSet obviously has to be closed when complete and so would this method as well. Any ideas on how to achieve this as my client is a computer not a webapp with a human clicking next page and we want the resultset paging to be linear as it really hurts our performance. Thanks, Dean This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
Re: solr highliting feature
Hi, First, you should consider SolrJ API if you're working from Java/JSP. Then, say you want to highlight title. In you loop across the N hits, instead of pulling the title from the hits themselves, check if you find a highlighted result with the same ID in the section. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. juni 2011, at 11.26, Romi wrote: > I want to highlight some search result value. i used solr for this. as i > suppose solr provides highlighting feature. i used it i configure > highlighting in solr-config.xml. i set hl="true" and hl.fl="somefield" at > query time in my url when i run the url it gives me a xml representation of > search results where i got a tag . > > further i am parsing this xml response to show result in a jsp page. but i > ma not getting how can i high lite the fields in jsp page > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-highliting-feature-tp3079239p3079239.html > Sent from the Solr - User mailing list archive at Nabble.com.
why too many open files?
Hi, All I have 12 shards and ramBufferSizeMB=512, mergeFactor=5. But solr raise java.io.FileNotFoundException (Too many open files). mergeFactor is just 5. How can this happen? Below is segments of some shard. That is too many segments over mergFactor. What's wrong and How should I set the mergeFactor? == [root@solr solr]# ls indexData/multicore-us/usn02/data/index/ _0.fdt _gs.fdt _h5.tii _hl.nrm _i1.nrm _kn.nrm _l1.nrm _lq.tii _0.fdx _gs.fdx _h5.tis _hl.prx _i1.prx _kn.prx _l1.prx _lq.tis _3i.fdt _gs.fnm _h7.fnm _hl.tii _i1.tii _kn.tii _l1.tii lucene-2de7b31b5eabdff0b6ec7fd32eecf8c7-write.lock _3i.fdx _gs.frq _h7.frq _hl.tis _i1.tis _kn.tis _l1.tis _lu.fnm _3s.fnm _gs.nrm _h7.nrm _hn.fnm _j7.fdt _kp.fnm _l2.fnm _lu.frq _3s.frq _gs.prx _h7.prx _hn.frq _j7.fdx _kp.frq _l2.frq _lu.nrm _3s.nrm _gs.tii _h7.tii _hn.nrm _kb.fnm _kp.nrm _l2.nrm _lu.prx _3s.prx _gs.tis _h7.tis _hn.prx _kb.frq _kp.prx _l2.prx _lu.tii _3s.tii _gu.fnm _h9.fnm _hn.tii _kb.nrm _kp.tii _l2.tii _lu.tis _3s.tis _gu.frq _h9.frq _hn.tis _kb.prx _kp.tis _l2.tis _ly.fnm _48.fdt _gu.nrm _h9.nrm _hp.fnm _kb.tii _kq.fnm _l6.fnm _ly.frq _48.fdx _gu.prx _h9.prx _hp.frq _kb.tis _kq.frq _l6.frq _ly.nrm _4d.fnm _gu.tii _h9.tii _hp.nrm _kc.fnm _kq.nrm _l6.nrm _ly.prx _4d.frq _gu.tis _h9.tis _hp.prx _kc.frq _kq.prx _l6.prx _ly.tii _4d.nrm _gw.fnm _hb.fnm _hp.tii _kc.nrm _kq.tii _l6.tii _ly.tis _4d.prx _gw.frq _hb.frq _hp.tis _kc.prx _kq.tis _l6.tis _m3.fnm _4d.tii _gw.nrm _hb.nrm _hr.fnm _kc.tii _kr.fnm _la.fnm _m3.frq _4d.tis _gw.prx _hb.prx _hr.frq _kc.tis _kr.frq _la.frq _m3.nrm _5b.fdt _gw.tii _hb.tii _hr.nrm _kf.fdt _kr.nrm _la.nrm _m3.prx _5b.fdx _gw.tis _hb.tis _hr.prx _kf.fdx _kr.prx _la.prx _m3.tii _5b.fnm _gy.fnm _he.fdt _hr.tii _kf.fnm _kr.tii _la.tii _m3.tis _5b.frq _gy.frq _he.fdx _hr.tis _kf.frq _kr.tis _la.tis _m8.fnm _5b.nrm _gy.nrm _he.fnm _ht.fnm _kf.nrm _kt.fnm _le.fnm _m8.frq _5b.prx _gy.prx _he.frq _ht.frq _kf.prx _kt.frq _le.frq _m8.nrm _5b.tii _gy.tii _he.nrm _ht.nrm _kf.tii _kt.nrm _le.nrm _m8.prx _5b.tis _gy.tis _he.prx _ht.prx _kf.tis _kt.prx _le.prx _m8.tii _5m.fnm _h0.fnm _he.tii _ht.tii _kg.fnm _kt.tii _le.tii _m8.tis _5m.frq _h0.frq _he.tis _ht.tis _kg.frq _kt.tis _le.tis _md.fnm _5m.nrm _h0.nrm _hh.fnm _hv.fnm _kg.nrm _kw.fnm _li.fnm _md.frq _5m.prx _h0.prx _hh.frq _hv.frq _kg.prx _kw.frq _li.frq _md.nrm _5m.tii _h0.tii _hh.nrm _hv.nrm _kg.tii _kw.nrm _li.nrm _md.prx _5m.tis _h0.tis _hh.prx _hv.prx _kg.tis _kw.prx _li.prx _md.tii _5n.fnm _h2.fnm _hh.tii _hv.tii _kj.fdt _kw.tii _li.tii _md.tis _5n.frq _h2.frq _hh.tis _hv.tis _kj.fdx _kw.tis _li.tis _mi.fnm _5n.nrm _h2.nrm _hk.fnm _hz.fdt _kj.fnm _ky.fnm _lm.fnm _mi.frq _5n.prx _h2.prx _hk.frq _hz.fdx _kj.frq _ky.frq _lm.frq _mi.nrm _5n.tii _h2.tii _hk.nrm _hz.fnm _kj.nrm _ky.nrm _lm.nrm _mi.prx _5n.tis _h2.tis _hk.prx _hz.frq _kj.prx _ky.prx _lm.prx _mi.tii _5x.fnm _h5.fdt _hk.tii _hz.nrm _kj.tii _ky.tii _lm.tii _mi.tis _5x.frq _h5.fdx _hk.tis _hz.prx _kj.tis _ky.tis _lm.tis segments_1 _5x.nrm _h5.fnm _hl.fdt _hz.tii _kn.fdt _l1.fdt _lq.fnm segments.gen _5x.prx _h5.frq _hl.fdx _hz.tis _kn.fdx _l1.fdx _lq.frq _5x.tii _h5.nrm _hl.fnm _i1.fnm _kn.fnm _l1.fnm _lq.nrm _5x.tis _h5.prx _hl.frq _i1.frq _kn.frq _l1.frq _lq.prx == Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/why-too-many-open-files-tp3084407p3084407.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query about the server configuration
I forgot an important point that I need to commit the server in 2 to 5 minutes.. please help.. regards On Sun, Jun 19, 2011 at 11:29 PM, Ranveer wrote: > Please help I am also in same situation. > > regards > > > > On Sunday 19 June 2011 12:59 PM, Jonty Rhods wrote: > >> Dear all, >> >> I am quite new and not work on solr for heavy request. >> >> I have following server configuration: >> >> 16GB RAM >> 16 CPU >> >> I need to index update in every minutes and at least more than 5000 docs >> per >> day. Size of the data per day will be around 50 MB. I am expecting 10 to >> 30 concurrent hit on server which is 2 million hits per day and around 30 >> to >> 40 concurrent user at peak our. >> >> Right now I had configure core and using static method to call solr server >> in solrj (SolrServer server = new HttpSolrServer();). I am worried that at >> peak our static instance of the server in solrj will not able to perform >> the >> response and it will become slow. >> >> Is there any way to open more then one connection of server instance in >> the >> SolrJ like connection pool which we are using in Database related >> connection >> pooling (Apache DBCP or Hibernate). >> >> Please help me to configure the server as my heavy requirements. >> >> thanks for your help. >> >> regards >> jonty >> >> >
score of Infinity on dismax query
Hello, I have a solr search server running and in at least one very rare case, I'm seeing a strange scoring result. The following example will cause solr to return a score of "Infinity": Query: {!dismax tie=0.1 qf=lyrics pf=lyrics ps=5}drugs the drugs Here is the debug output: Infinity = (MATCH) sum of: 0.0758089 = (MATCH) sum of: 0.03790445 = (MATCH) weight(lyrics:drug in 0), product of: 0.40824828 = queryWeight(lyrics:drug), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.3304368 = queryNorm 0.09284656 = (MATCH) fieldWeight(lyrics:drug in 0), product of: 3.8729835 = tf(termFreq(lyrics:drug)=15) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.078125 = fieldNorm(field=lyrics, doc=0) 0.03790445 = (MATCH) weight(lyrics:drug in 0), product of: 0.40824828 = queryWeight(lyrics:drug), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.3304368 = queryNorm 0.09284656 = (MATCH) fieldWeight(lyrics:drug in 0), product of: 3.8729835 = tf(termFreq(lyrics:drug)=15) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.078125 = fieldNorm(field=lyrics, doc=0) Infinity = (MATCH) weight(lyrics:"drug ? drug"~5 in 0), product of: 0.81649655 = queryWeight(lyrics:"drug ? drug"~5), product of: 0.61370564 = idf(lyrics: drug=1 drug=1) 1.3304368 = queryNorm Infinity = fieldWeight(lyrics:"drug drug" in 0), product of: Infinity = tf(phraseFreq=Infinity) 0.61370564 = idf(lyrics: drug=1 drug=1) 0.078125 = fieldNorm(field=lyrics, doc=0) Here is the text of the 'lyrics' field entry that gives the Infinity score: http://pastebin.com/JcN5hM8c There seems to be some kind of issue with the search query having the two consecutive words, a reserved word (the) in the middle. To me it looks like a bug but I wanted to check here first. I'm seeing this in both 1.4.1 and 3.1.0. Regards, Chris
Re: Why are not query keywords treated as a set?
past past *past past* *content:past content:past* I was expecting the query to get parsed into content:past only and not content:past content:past. On Mon, Jun 20, 2011 at 12:12 AM, lee carroll wrote: > do you mean a phrase query? "past past" > can you give some more detail? > > On 18 June 2011 13:02, Gabriele Kahlout wrote: > > q=past past > > > > 1.0 = (MATCH) sum of: > > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* > > 1.0 = tf(termFreq(content:past)=1) > > 1.0 = idf(docFreq=1, maxDocs=2) > > 0.5 = fieldNorm(field=content, doc=0) > > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* > > 1.0 = tf(termFreq(content:past)=1) > > 1.0 = idf(docFreq=1, maxDocs=2) > > 0.5 = fieldNorm(field=content, doc=0) > > > > Is there how I can treat the query keywords as a set? > > > > -- > > Regards, > > K. Gabriele > > > > --- unchanged since 20/9/10 --- > > P.S. If the subject contains "[LON]" or the addressee acknowledges the > > receipt within 48 hours then I don't resend the email. > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > time(x) > > < Now + 48h) ⇒ ¬resend(I, this). > > > > If an email is sent by a sender that is not a trusted contact or the > email > > does not contain a valid code then the email is not received. A valid > code > > starts with a hyphen and ends with "X". > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > > L(-[a-z]+[0-9]X)). > > > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: score of Infinity on dismax query
This is a bug, thanks for including all the information necessary to reproduce! https://issues.apache.org/jira/browse/LUCENE-3215 On Sun, Jun 19, 2011 at 10:24 PM, Chris Book wrote: > Hello, I have a solr search server running and in at least one very rare > case, I'm seeing a strange scoring result. The following example will cause > solr to return a score of "Infinity": > > Query: {!dismax tie=0.1 qf=lyrics pf=lyrics ps=5}drugs the drugs > > Here is the debug output: > Infinity = (MATCH) sum of: > 0.0758089 = (MATCH) sum of: > 0.03790445 = (MATCH) weight(lyrics:drug in 0), product of: > 0.40824828 = queryWeight(lyrics:drug), product of: > 0.30685282 = idf(docFreq=1, maxDocs=1) > 1.3304368 = queryNorm > 0.09284656 = (MATCH) fieldWeight(lyrics:drug in 0), product of: > 3.8729835 = tf(termFreq(lyrics:drug)=15) > 0.30685282 = idf(docFreq=1, maxDocs=1) > 0.078125 = fieldNorm(field=lyrics, doc=0) > 0.03790445 = (MATCH) weight(lyrics:drug in 0), product of: > 0.40824828 = queryWeight(lyrics:drug), product of: > 0.30685282 = idf(docFreq=1, maxDocs=1) > 1.3304368 = queryNorm > 0.09284656 = (MATCH) fieldWeight(lyrics:drug in 0), product of: > 3.8729835 = tf(termFreq(lyrics:drug)=15) > 0.30685282 = idf(docFreq=1, maxDocs=1) > 0.078125 = fieldNorm(field=lyrics, doc=0) > Infinity = (MATCH) weight(lyrics:"drug ? drug"~5 in 0), product of: > 0.81649655 = queryWeight(lyrics:"drug ? drug"~5), product of: > 0.61370564 = idf(lyrics: drug=1 drug=1) > 1.3304368 = queryNorm > Infinity = fieldWeight(lyrics:"drug drug" in 0), product of: > Infinity = tf(phraseFreq=Infinity) > 0.61370564 = idf(lyrics: drug=1 drug=1) > 0.078125 = fieldNorm(field=lyrics, doc=0) > > Here is the text of the 'lyrics' field entry that gives the Infinity score: > http://pastebin.com/JcN5hM8c > > > > There seems to be some kind of issue with the search query having the > two consecutive words, a reserved word (the) in the middle. To me it looks > like a bug but I wanted to check here first. I'm seeing this in both 1.4.1 > and 3.1.0. > > Regards, > Chris >
Re: solr highliting feature
yes, I find title in section. If i am getting results say by parsing json object then do i need to parse ? - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/solr-highliting-feature-tp3079239p3084890.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr highliting feature
Perhaps I don't understand your question right, but if you're working with the json response format, yes, you need to pull the highlighted version of the field from the highlighting section. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 20. juni 2011, at 07.22, Romi wrote: > yes, I find title in section. If i am getting results say by > parsing json object then do i need to parse ? > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-highliting-feature-tp3079239p3084890.html > Sent from the Solr - User mailing list archive at Nabble.com.
Request handle solrconfig.xml Spellchecker
I am trying to set up spellchecker, according to solr documentation. But when I am testing, I don't have any suggestion. My piece of code follows: textSpell solr.IndexBasedSpellChecker default name ./spellchecker explicit default false false 1 spellcheck i build the dictionary as http://localhost:8983/solr/select/?q=*:*&spellcheck=true&spellcheck.build=true but when i run the query i am not getting any suggestion http://localhost:8983/solr/select?q=komputer&spellcheck=true - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/Request-handle-solrconfig-xml-Spellchecker-tp3085053p3085053.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why too many open files?
Hi, did you have checked the max opened files of your OS? see: http://lj4newbies.blogspot.com/2007/04/too-many-open-files.html 2011/6/20 Jason, Kim > Hi, All > > I have 12 shards and ramBufferSizeMB=512, mergeFactor=5. > But solr raise java.io.FileNotFoundException (Too many open files). > mergeFactor is just 5. How can this happen? > Below is segments of some shard. That is too many segments over mergFactor. > What's wrong and How should I set the mergeFactor? > > == > [root@solr solr]# ls indexData/multicore-us/usn02/data/index/ > _0.fdt _gs.fdt _h5.tii _hl.nrm _i1.nrm _kn.nrm _l1.nrm _lq.tii > _0.fdx _gs.fdx _h5.tis _hl.prx _i1.prx _kn.prx _l1.prx _lq.tis > _3i.fdt _gs.fnm _h7.fnm _hl.tii _i1.tii _kn.tii _l1.tii > lucene-2de7b31b5eabdff0b6ec7fd32eecf8c7-write.lock > _3i.fdx _gs.frq _h7.frq _hl.tis _i1.tis _kn.tis _l1.tis _lu.fnm > _3s.fnm _gs.nrm _h7.nrm _hn.fnm _j7.fdt _kp.fnm _l2.fnm _lu.frq > _3s.frq _gs.prx _h7.prx _hn.frq _j7.fdx _kp.frq _l2.frq _lu.nrm > _3s.nrm _gs.tii _h7.tii _hn.nrm _kb.fnm _kp.nrm _l2.nrm _lu.prx > _3s.prx _gs.tis _h7.tis _hn.prx _kb.frq _kp.prx _l2.prx _lu.tii > _3s.tii _gu.fnm _h9.fnm _hn.tii _kb.nrm _kp.tii _l2.tii _lu.tis > _3s.tis _gu.frq _h9.frq _hn.tis _kb.prx _kp.tis _l2.tis _ly.fnm > _48.fdt _gu.nrm _h9.nrm _hp.fnm _kb.tii _kq.fnm _l6.fnm _ly.frq > _48.fdx _gu.prx _h9.prx _hp.frq _kb.tis _kq.frq _l6.frq _ly.nrm > _4d.fnm _gu.tii _h9.tii _hp.nrm _kc.fnm _kq.nrm _l6.nrm _ly.prx > _4d.frq _gu.tis _h9.tis _hp.prx _kc.frq _kq.prx _l6.prx _ly.tii > _4d.nrm _gw.fnm _hb.fnm _hp.tii _kc.nrm _kq.tii _l6.tii _ly.tis > _4d.prx _gw.frq _hb.frq _hp.tis _kc.prx _kq.tis _l6.tis _m3.fnm > _4d.tii _gw.nrm _hb.nrm _hr.fnm _kc.tii _kr.fnm _la.fnm _m3.frq > _4d.tis _gw.prx _hb.prx _hr.frq _kc.tis _kr.frq _la.frq _m3.nrm > _5b.fdt _gw.tii _hb.tii _hr.nrm _kf.fdt _kr.nrm _la.nrm _m3.prx > _5b.fdx _gw.tis _hb.tis _hr.prx _kf.fdx _kr.prx _la.prx _m3.tii > _5b.fnm _gy.fnm _he.fdt _hr.tii _kf.fnm _kr.tii _la.tii _m3.tis > _5b.frq _gy.frq _he.fdx _hr.tis _kf.frq _kr.tis _la.tis _m8.fnm > _5b.nrm _gy.nrm _he.fnm _ht.fnm _kf.nrm _kt.fnm _le.fnm _m8.frq > _5b.prx _gy.prx _he.frq _ht.frq _kf.prx _kt.frq _le.frq _m8.nrm > _5b.tii _gy.tii _he.nrm _ht.nrm _kf.tii _kt.nrm _le.nrm _m8.prx > _5b.tis _gy.tis _he.prx _ht.prx _kf.tis _kt.prx _le.prx _m8.tii > _5m.fnm _h0.fnm _he.tii _ht.tii _kg.fnm _kt.tii _le.tii _m8.tis > _5m.frq _h0.frq _he.tis _ht.tis _kg.frq _kt.tis _le.tis _md.fnm > _5m.nrm _h0.nrm _hh.fnm _hv.fnm _kg.nrm _kw.fnm _li.fnm _md.frq > _5m.prx _h0.prx _hh.frq _hv.frq _kg.prx _kw.frq _li.frq _md.nrm > _5m.tii _h0.tii _hh.nrm _hv.nrm _kg.tii _kw.nrm _li.nrm _md.prx > _5m.tis _h0.tis _hh.prx _hv.prx _kg.tis _kw.prx _li.prx _md.tii > _5n.fnm _h2.fnm _hh.tii _hv.tii _kj.fdt _kw.tii _li.tii _md.tis > _5n.frq _h2.frq _hh.tis _hv.tis _kj.fdx _kw.tis _li.tis _mi.fnm > _5n.nrm _h2.nrm _hk.fnm _hz.fdt _kj.fnm _ky.fnm _lm.fnm _mi.frq > _5n.prx _h2.prx _hk.frq _hz.fdx _kj.frq _ky.frq _lm.frq _mi.nrm > _5n.tii _h2.tii _hk.nrm _hz.fnm _kj.nrm _ky.nrm _lm.nrm _mi.prx > _5n.tis _h2.tis _hk.prx _hz.frq _kj.prx _ky.prx _lm.prx _mi.tii > _5x.fnm _h5.fdt _hk.tii _hz.nrm _kj.tii _ky.tii _lm.tii _mi.tis > _5x.frq _h5.fdx _hk.tis _hz.prx _kj.tis _ky.tis _lm.tis segments_1 > _5x.nrm _h5.fnm _hl.fdt _hz.tii _kn.fdt _l1.fdt _lq.fnm segments.gen > _5x.prx _h5.frq _hl.fdx _hz.tis _kn.fdx _l1.fdx _lq.frq > _5x.tii _h5.nrm _hl.fnm _i1.fnm _kn.fnm _l1.fnm _lq.nrm > _5x.tis _h5.prx _hl.frq _i1.frq _kn.frq _l1.frq _lq.prx > == > > Thanks in advance. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/why-too-many-open-files-tp3084407p3084407.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Why are not query keywords treated as a set?
this might help in your analysis chain http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory On 20 June 2011 04:21, Gabriele Kahlout wrote: > past past > *past past* > *content:past content:past* > > I was expecting the query to get parsed into content:past only and not > content:past content:past. > > On Mon, Jun 20, 2011 at 12:12 AM, lee carroll > wrote: > >> do you mean a phrase query? "past past" >> can you give some more detail? >> >> On 18 June 2011 13:02, Gabriele Kahlout wrote: >> > q=past past >> > >> > 1.0 = (MATCH) sum of: >> > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* >> > 1.0 = tf(termFreq(content:past)=1) >> > 1.0 = idf(docFreq=1, maxDocs=2) >> > 0.5 = fieldNorm(field=content, doc=0) >> > * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* >> > 1.0 = tf(termFreq(content:past)=1) >> > 1.0 = idf(docFreq=1, maxDocs=2) >> > 0.5 = fieldNorm(field=content, doc=0) >> > >> > Is there how I can treat the query keywords as a set? >> > >> > -- >> > Regards, >> > K. Gabriele >> > >> > --- unchanged since 20/9/10 --- >> > P.S. If the subject contains "[LON]" or the addressee acknowledges the >> > receipt within 48 hours then I don't resend the email. >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ >> time(x) >> > < Now + 48h) ⇒ ¬resend(I, this). >> > >> > If an email is sent by a sender that is not a trusted contact or the >> email >> > does not contain a valid code then the email is not received. A valid >> code >> > starts with a hyphen and ends with "X". >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ >> > L(-[a-z]+[0-9]X)). >> > >> > > > > -- > Regards, > K. Gabriele > > --- unchanged since 20/9/10 --- > P.S. If the subject contains "[LON]" or the addressee acknowledges the > receipt within 48 hours then I don't resend the email. > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) > < Now + 48h) ⇒ ¬resend(I, this). > > If an email is sent by a sender that is not a trusted contact or the email > does not contain a valid code then the email is not received. A valid code > starts with a hyphen and ends with "X". > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > L(-[a-z]+[0-9]X)). >