Re: Multi Field AND Search
Hello, I am indexing books. Fields are Title,Author, Subtitle, Category,Pages ... Boosts are TitleBoost 1, AuthorBoost .8 SubtitleBoost .6; Some one entered a query to search lets say Hitchhiker Guide Now I want to show the reuslts in which both the words occur based on there boost score. I used MultiFieldQueryParser queryParser = new MultiFieldQueryParser(new String[] { Constants.TITLEminusAUTHOR, Constants.AUTHOR_FIELD_NAME, Constants.SUBTITLEminusAUTHOR }, new StandardAnalyzer()); queryParser.setDefaultOperator(OPERATOR_AND); The problem comes is that I have author name Douglas Adams and If i search for that as a query string its not returning any result. But if i remove setDefaultOperator it returns all douglas Adams with the same query. Saurabh Otis Gospodnetic wrote: > > > Hello, > > I'm not sure what the question is. You may also want to include an > example of what you are doing. > Note that results are typically ordered based on the score of the match > for each result/hit, not purely the boost, unless you created your own > Similarity that ignores all other scoring factors except boost/lengthNorm. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: saurabhs_iitk >> To: solr-user@lucene.apache.org >> Sent: Friday, June 19, 2009 2:56:22 AM >> Subject: Multi Field AND Search >> >> >> Hi >> I have indexed 8 fileds with different boost. Now i have given a >> searchstring which consists of a words and phrases. Now i want to do AND >> search of that searchString on four fields and show the result based on >> boost. For me searchString should occur completely in one of the field >> and >> then the boosts come into picture. I know i can index combination of four >> fields in one and then search in that but then the boost will not work. >> >> Regards >> Saurabh >> -- >> View this message in context: >> http://www.nabble.com/Multi-Field-AND-Search-tp24106434p24106434.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Multi-Field-AND-Search-tp24106434p24123205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Result orde is different from I expect
Thanks, The result of adding &debugQuery=true follows. Does this mean the order is always defined from score? If that is the case, do I have to adjust the way how solr calculate score? How can I do that? I followed Otis's sugestion also and add &sort=word+asc but the first one is not "apple", but "A as in apple"... Best, --- apple apple word:appl word:appl − − 9.692953 = (MATCH) fieldWeight(word:appl in 3178), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 1.0 = fieldNorm(field=word, doc=3178) − 9.692953 = (MATCH) fieldWeight(word:appl in 76151), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 1.0 = fieldNorm(field=word, doc=76151) − 9.692953 = (MATCH) fieldWeight(word:appl in 156584), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 1.0 = fieldNorm(field=word, doc=156584) − 9.692953 = (MATCH) fieldWeight(word:appl in 156637), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 1.0 = fieldNorm(field=word, doc=156637) − 9.692953 = (MATCH) fieldWeight(word:appl in 156638), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 1.0 = fieldNorm(field=word, doc=156638) − 9.692953 = (MATCH) fieldWeight(word:appl in 156742), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 1.0 = fieldNorm(field=word, doc=156742) − 9.692953 = (MATCH) fieldWeight(word:appl in 157509), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 1.0 = fieldNorm(field=word, doc=157509) − 8.567441 = (MATCH) fieldWeight(word:appl in 156746), product of: 1.4142135 = tf(termFreq(word:appl)=2) 9.692953 = idf(docFreq=287, numDocs=1123433) 0.625 = fieldNorm(field=word, doc=156746) − 6.058096 = (MATCH) fieldWeight(word:appl in 97069), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 0.625 = fieldNorm(field=word, doc=97069) − 6.058096 = (MATCH) fieldWeight(word:appl in 123198), product of: 1.0 = tf(termFreq(word:appl)=1) 9.692953 = idf(docFreq=287, numDocs=1123433) 0.625 = fieldNorm(field=word, doc=123198) - Erik Hatcher wrote: > > Add &debugQuery=true and look at the enlightening scoring explanations. > > Erik > > On Jun 19, 2009, at 11:23 AM, akinori wrote: > >> >> I am straggling with search result order of Solr. >> I indexed a English-certain language dictionary to Solr. >> >> Then below is the result of query="apple" and I am confused with >> this. Why >> doesn't "apple" come first and then Apple ? >> I'd like to have your suggestion to fix this to more. I am really >> stressed >> about this these days. >> Any input is much appreciated. >> >> Thanks >> >> (example word list) >> An apple! >> A as in apple >> appl. >> apple >> Apple >> apples >> Appling >> apples to apples >> Adam's apple >> allergic apples >> alley apple >> apple allergy >> apple compote >> apple divider >> bad apple >> >> (schema.xml) >> >> >> >> >> >> >> >> >> >> >> >> >>> sortMissingLast="true" >> omitNorms="true"/> >> >> >>> sortMissingLast="true" >> omitNorms="true"/> >> >> >> >> >> >> >> >> >>> omitNorms="true"/> >> >> -- >> View this message in context: >> http://www.nabble.com/Result-orde-is-different-from-I-expect-tp24113572p24113572.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Result-order-is-different-from-I-expect-tp24113572p24124583.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtractRequestHandler - not properly indexing office docs?
Thanks for the quick response. Here are the fields from the schema: I use text as the content field for the default field for the ERH. Here's the config of the ERH: last_modified true Here's the output of a curl request w/ the file: 0650 ;[Content_Types].xml
;_rels/.rels
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"><Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties"; Target="docProps/app.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"; Target="word/document.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail"; Target="docProps/thumbnail.jpeg"/><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties"; Target="docProps/core.xml"/></Relationships>
word/_rels/document.xml.rels
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"><Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable"; Target="fontTable.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"; Target="styles.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings"; Target="settings.xml"/><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings"; Target="webSettings.xml"/><Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme"; Target="theme/theme1.xml"/></Relationships>
word/document.xml
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
word/theme/theme1.xml
;docProps/thumbnail.jpeg
word/settings.xml
;word/fontTable.xml
;word/webSettings.xml
;docProps/core.xml
Joe Doe12009-06-17T20:29:00Z2009-06-17T20:41:00Z
word/styles.xml
;myfileafetest.docxapplication/octet-streamapplication/zip38200 Query looks like: INFO: [] webapp=/solr path=/select params={wt=standard&rows=10&start=0&explainOther=&hl.fl=&indent=on&q=text:laborum+AND+uploaded_by_user:joe&fl=*,score&qt=standard&version=2.2} hits=0 status=0 QTime=3 Please note that searching solely by "uploaded_by_user:joe" will properly return the document. Thanks again. -joe Grant Ingersoll-6 wrote: > > Can you share your schema for the fields you are indexing, the > configuration of the ExtractingRequestHandler and what your requests > look like? Also, can you share what the output of the extract only > stuff looks like? > > Also, can you post .doc files to the example per > http://wiki.apache.org/solr/ExtractingRequestHandler > ? I was able to do that and search for the doc that I entered and > it was able to handle both .doc and .docx. > > -Grant > > > -- > Grant Ingersoll > http://www.lucidimagination.comdocProps/app.xml
Normal.dotm1100Microsoft Macintosh Word011false10genfalse0falsefalse12.
Re: ExtractRequestHandler - not properly indexing office docs?
Do you have a default field declared? &ext.default.fl= Either that, or you need to explicitly capture the fields you are interested in using &ext.capture= You could add this to your curl statement to try out. -Grant On Jun 20, 2009, at 8:41 AM, cloax wrote: Thanks for the quick response. Here are the fields from the schema: required="true" /> stored="true"/> stored="true"/> I use text as the content field for the default field for the ERH. Here's the config of the ERH: last_modified true Here's the output of a curl request w/ the file: 0name="QTime">650 ;[Content_Types].xml
;_rels/.rels
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/ relationships"><Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties " Target="docProps/app.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument " Target="word/document.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail " Target="docProps/thumbnail.jpeg"/><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties " Target="docProps/core.xml"/></ Relationships>
word/_rels/document.xml.rels
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/ relationships"><Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable " Target="fontTable.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles " Target="styles.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings " Target="settings.xml"/><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings " Target="webSettings.xml"/><Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme " Target="theme/theme1.xml"/></Relationships> p>
word/document.xml
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
word/theme/theme1.xml
;docProps/thumbnail.jpeg
word/settings.xml
;word/fontTable.xml
;word/webSettings.xml
;docProps/core.xml
Joe Doe12009-06-17T20:29:00Z2009-06-17T20:41:00Z
word/styles.xml
;myfileafetest.docxname="stream_content_type">application/octet-streamarr> name="Content-Type">application/zip38200 Query looks like: INFO: [] webapp=/solr path=/select params = {wt = standard &rows=10&start=0&explainOther=&hl.fl=&indent=on&q=text:laborum+AND +uploaded_by_user:joe&fl=*,score&qt=standard&version=2.2} hits=0 status=0 QTime=3 Please note that searching solely by "uploaded_by_user:joe" will properly return the document. Thanks again. -joe Grant Ingersoll-6 wrote: Can you share your schema for the fields you are indexing, the configuration of the ExtractingRequestHandler and what your requests lodocProps/app.xml
Normal.dotm1100Microsoft Macintosh Word011false10genfalse0falsefalse12.
Re: pk vs. uniqueKey with DIH delta-import
https://issues.apache.org/jira/browse/SOLR-1191 describes a different problem but I think his Ali's solution applies here. I tried 'select concat("",id) from table' and this also had the same exception. I can't test now, but I think this is the solution: select concat("prefix",id) AS ID The JDBC code may be hunting for "id" as a return from the query instead of just accepting a return with an unnamed value? On Thu, Jun 18, 2009 at 6:35 AM, Erik Hatcher wrote: > > On Jun 18, 2009, at 4:51 AM, Noble Paul നോബിള് नोब्ळ् wrote: > > apparently the row return a null 'board_id' >> > > I replied "No" earlier, but of course you're right here. The > deletedPkQuery I originally used was not returning a board_id column. And > even if it did, that isn't the uniqueKey (id field) value. > > What about having the results of the deletedPkQuery run through the same > transformation process that indexing would, only for the field that matches > Solr's uniqueKey setting would be necessary?? > >Erik > > -- Lance Norskog goks...@gmail.com 650-922-8831 (US)
Re: trouble with 'unique' key - documents add, not replace
Thank you Erik and Otis! I tried the switch to string and it worked perfectly! Int wasn't appropriate, because our UIDs are based on object-type + object-id, and i'd rather not work on some method of creating universal serials for everything. On Jun 19, 2009, at 11:58 PM, Erik Hatcher wrote: "text" is likely analyzing to multiple terms which is incorrect for a uid type of field. set it to type="string" instead. and reindex from scratch. Erik On Jun 19, 2009, at 8:49 PM, Jonathan Vanasco wrote: as far as i understand from the docs, with a schema.xml like this... required="true" /> uid any items with the same uid in it should replace existing ones on the commit . is this correct ? because it seems that i do not replace, but only add new records i've obviously gotten something wrong. any help would be appreciated.
are there any good samples / tutorials on making queries & facets ?
i've gone through the official docs a few times, and then found some offsite stuff of varying quality regarding how-tos. can anyone here recommend either howtos/tutorials or sample applications that they have found worthwhile ? specifically i'm looking to do the following: - with regular searching, query the system with a single term, and have solr search multiple fields - each one having a different weight - implement faceted browsing i know this is quite easy to do with solr, i'm just not seeing docs that resonate with me yet. thanks!
Use DIH with large xml file
Hi, I have about 50GB of data to be indexed each day using DIH. Some of the files are as large as 6GB. I set the JVM Xmx to be 3GB, but the DIH crashes on those big files. Is there any way to handle it? Thanks. JB
Re: Use DIH with large xml file
How are you configuring DIH to read those files? It is likely that you'll need at least as much RAM to the JVM as the largest file you're processing, though that depends entirely on how the file is being processed. Erik On Jun 20, 2009, at 9:23 PM, Jianbin Dai wrote: Hi, I have about 50GB of data to be indexed each day using DIH. Some of the files are as large as 6GB. I set the JVM Xmx to be 3GB, but the DIH crashes on those big files. Is there any way to handle it? Thanks. JB
Re: are there any good samples / tutorials on making queries & facets ?
Hi Jonathan, I think this is the best article related to faceted search. http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr On Sat, Jun 20, 2009 at 9:56 PM, Jonathan Vanasco wrote: > i've gone through the official docs a few times, and then found some > offsite stuff of varying quality regarding how-tos. > > can anyone here recommend either howtos/tutorials or sample applications > that they have found worthwhile ? > > specifically i'm looking to do the following: > >- with regular searching, query the system with a single term, and > have solr search multiple fields - each one having a different weight In order to search into multiple fields and have a different weight for each of them, you could use the Dismax requesthandler and boost each field. - use dismax - boost weights of each field using bq parameter &bq=foofield:term^0.5 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 > >- implement faceted browsing > > i know this is quite easy to do with solr, i'm just not seeing docs that > resonate with me yet. > > thanks! Cheers, Michel
Re: are there any good samples / tutorials on making queries & facets ?
Yeah the lucid imagination articles are great! Jonathan, you can also use the dismax query parser and apply boosts using the qf (query fields) param: q=my query here&qf=title^0.5 author^0.1 http://wiki.apache.org/solr/DisMaxRequestHandler#head-af452050ee272a1c88e2ff89dc0012049e69e180 Matt On Sat, Jun 20, 2009 at 10:11 PM, Michel Bottan wrote: > Hi Jonathan, > > I think this is the best article related to faceted search. > > > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr > > On Sat, Jun 20, 2009 at 9:56 PM, Jonathan Vanasco > wrote: > > > i've gone through the official docs a few times, and then found some > > offsite stuff of varying quality regarding how-tos. > > > > can anyone here recommend either howtos/tutorials or sample applications > > that they have found worthwhile ? > > > > specifically i'm looking to do the following: > > > >- with regular searching, query the system with a single term, and > > have solr search multiple fields - each one having a different weight > > > In order to search into multiple fields and have a different weight for > each > of them, you could use the Dismax requesthandler and boost each field. > > - use dismax > - boost weights of each field using bq parameter &bq=foofield:term^0.5 > > http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 > > > > > > >- implement faceted browsing > > > > i know this is quite easy to do with solr, i'm just not seeing docs that > > resonate with me yet. > > > > thanks! > > > Cheers, > Michel >
Re: Slowness during submit the index
Hi Francis, I can't tell what the problem is from the information you've provided so far. My gut instinct is that this is due to some difference in QA vs. PROD environments that isn't Solr-specific. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Francis Yakin > To: "solr-user@lucene.apache.org" > Sent: Saturday, June 20, 2009 2:18:07 AM > Subject: RE: Slowness during submit the index > > The amount of data in Prod is about 20% more than QA. > We tested the network speed is fine. The hardware in Prod is larger and more > powerful than QA. > But QA is faster during reload. It takes QA only one hour than 6 hours in > Prod. > > That's why we don't understand what's the reason, the amount of data is only > 20% > more but it will not take 5 times slower because the data only 20% more. > > So, we looked into the config file for solr, but it's not much different, > except > Prod has master/slave environment which QA only master. > > Thanks for the response. > > Francis > > > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Friday, June 19, 2009 8:58 PM > To: solr-user@lucene.apache.org > Subject: Re: Slowness during submit the index > > > Francis, > > So it could easily be that your QA and PROD DBs are really just simply > different > (different amount of data, different network speed, different hardware...) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Francis Yakin > > To: "solr-user@lucene.apache.org" > > Sent: Friday, June 19, 2009 10:39:48 PM > > Subject: RE: Slowness during submit the index > > > > * is the java version the same on both machines (QA vs. PROD) - YES > > * are the same java parameters being used on both machines - YES > > * is the connection to the DB the same on both machines - Not sure, > need > > to ask the network guy > > * are both the PROD and QA DB servers the same and are both DB instances the > > same - they are not from the same DB > > > > -Original Message- > > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > Sent: Friday, June 19, 2009 6:23 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Slowness during submit the index > > > > > > Francis, > > > > I'm not sure if I understood your email correctly, but I think you are > > saying > > you are indexing your DB content into a Solr index. If this is correct, > > here > > are things to look at: > > * is the java version the same on both machines (QA vs. PROD) > > * are the same java parameters being used on both machines > > * is the connection to the DB the same on both machines > > * are both the PROD and QA DB servers the same and are both DB instances the > > same > > ... > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > > > From: Francis Yakin > > > To: "solr-user@lucene.apache.org" > > > Sent: Friday, June 19, 2009 5:27:59 PM > > > Subject: Slowness during submit the index > > > > > > > > > We are experiencing slowness during reloading/resubmitting index from > Database > > > to the master. > > > > > > We have two environments: > > > > > > QA and Prod. > > > > > > The slowness is happened only in Production but not in QA. > > > > > > It only takes one hours to reload 2.5Mil indexes compare 5-6 hours to > > > load > the > > > same size of index in Prod. > > > > > > I checked both the config files in QA and Prod, they are all identical, > > except: > > > > > > > > > In QA: > > > false > > > In Prod: > > > true > > > > > > I believe that we use "http" protocol reload/submit the index from > > > Database > to > > > Solr Master. > > > I did test copying big files thru network from database to the solr box, I > > don't > > > see any issue. > > > > > > We are running solr 1.2 > > > > > > Any inputs will be much appreciated.
Re: Slowness during submit the index
Isn't it possible that the production equipment is simply under much higher load (given that, since it's in production, your various users are all actually using it), vs the QA equipment, which is only in use by the people doing QA? We've found the same thing at one point - we had a very small index (< 4 rows), so small it didn't seem worth the effort to do delta updates. So we would just refresh the whole thing every time - or so we planned. In the test environment it updated within a minute. In production, it would take as long as 15 minutes. What we finally realized was, because the DB was under much higher load in production than in the test environment, especially considering the amount of joins that needed to take place to pull out the data properly, various writes from the users to the affected tables would slow down the data selection process dramatically as the indexer would have to wait for locks to clear. Now of course we do delta updates and everything's fine (and blazingly fast in both environments). Try simulating higher load (involving a "normal" amount of writes to the DB) against your QA equipment and then building the index. See if the QA equipment still runs so quickly. -- Steve On Jun 20, 2009, at 11:29 PM, Otis Gospodnetic wrote: Hi Francis, I can't tell what the problem is from the information you've provided so far. My gut instinct is that this is due to some difference in QA vs. PROD environments that isn't Solr-specific. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin To: "solr-user@lucene.apache.org" Sent: Saturday, June 20, 2009 2:18:07 AM Subject: RE: Slowness during submit the index The amount of data in Prod is about 20% more than QA. We tested the network speed is fine. The hardware in Prod is larger and more powerful than QA. But QA is faster during reload. It takes QA only one hour than 6 hours in Prod. That's why we don't understand what's the reason, the amount of data is only 20% more but it will not take 5 times slower because the data only 20% more. So, we looked into the config file for solr, but it's not much different, except Prod has master/slave environment which QA only master. Thanks for the response. Francis -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, June 19, 2009 8:58 PM To: solr-user@lucene.apache.org Subject: Re: Slowness during submit the index Francis, So it could easily be that your QA and PROD DBs are really just simply different (different amount of data, different network speed, different hardware...) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin To: "solr-user@lucene.apache.org" Sent: Friday, June 19, 2009 10:39:48 PM Subject: RE: Slowness during submit the index * is the java version the same on both machines (QA vs. PROD) - YES * are the same java parameters being used on both machines - YES * is the connection to the DB the same on both machines - Not sure, need to ask the network guy * are both the PROD and QA DB servers the same and are both DB instances the same - they are not from the same DB -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, June 19, 2009 6:23 PM To: solr-user@lucene.apache.org Subject: Re: Slowness during submit the index Francis, I'm not sure if I understood your email correctly, but I think you are saying you are indexing your DB content into a Solr index. If this is correct, here are things to look at: * is the java version the same on both machines (QA vs. PROD) * are the same java parameters being used on both machines * is the connection to the DB the same on both machines * are both the PROD and QA DB servers the same and are both DB instances the same ... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin To: "solr-user@lucene.apache.org" Sent: Friday, June 19, 2009 5:27:59 PM Subject: Slowness during submit the index We are experiencing slowness during reloading/resubmitting index from Database to the master. We have two environments: QA and Prod. The slowness is happened only in Production but not in QA. It only takes one hours to reload 2.5Mil indexes compare 5-6 hours to load the same size of index in Prod. I checked both the config files in QA and Prod, they are all identical, except: In QA: false In Prod: true I believe that we use "http" protocol reload/submit the index from Database to Solr Master. I did test copying big files thru network from database to the solr box, I don't see any issue. We are running solr 1.2 Any inputs will be much appreciated.
Re: Use DIH with large xml file
Can DIH read item by item instead of the whole file before indexing? my biggest file size is 6GB, larger than the JVM max ram value. --- On Sat, 6/20/09, Erik Hatcher wrote: > From: Erik Hatcher > Subject: Re: Use DIH with large xml file > To: solr-user@lucene.apache.org > Date: Saturday, June 20, 2009, 6:52 PM > How are you configuring DIH to read > those files? It is likely that you'll need at least as > much RAM to the JVM as the largest file you're processing, > though that depends entirely on how the file is being > processed. > > Erik > > On Jun 20, 2009, at 9:23 PM, Jianbin Dai wrote: > > > > > Hi, > > > > I have about 50GB of data to be indexed each day using > DIH. Some of the files are as large as 6GB. I set the JVM > Xmx to be 3GB, but the DIH crashes on those big files. Is > there any way to handle it? > > > > Thanks. > > > > JB > > > > > > > >
Re: Slowness during submit the index
We were having performance issues using servers running on VM. Are you running QA or Prod in a VM? 2009/6/21, Stephen Weiss : > Isn't it possible that the production equipment is simply under much > higher load (given that, since it's in production, your various users > are all actually using it), vs the QA equipment, which is only in use > by the people doing QA? > > We've found the same thing at one point - we had a very small index (< > 4 rows), so small it didn't seem worth the effort to do delta > updates. So we would just refresh the whole thing every time - or so > we planned. In the test environment it updated within a minute. In > production, it would take as long as 15 minutes. What we finally > realized was, because the DB was under much higher load in production > than in the test environment, especially considering the amount of > joins that needed to take place to pull out the data properly, various > writes from the users to the affected tables would slow down the data > selection process dramatically as the indexer would have to wait for > locks to clear. Now of course we do delta updates and everything's > fine (and blazingly fast in both environments). > > Try simulating higher load (involving a "normal" amount of writes to > the DB) against your QA equipment and then building the index. See if > the QA equipment still runs so quickly. > > -- > Steve > > On Jun 20, 2009, at 11:29 PM, Otis Gospodnetic wrote: > >> >> Hi Francis, >> >> I can't tell what the problem is from the information you've >> provided so far. My gut instinct is that this is due to some >> difference in QA vs. PROD environments that isn't Solr-specific. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> From: Francis Yakin >>> To: "solr-user@lucene.apache.org" >>> Sent: Saturday, June 20, 2009 2:18:07 AM >>> Subject: RE: Slowness during submit the index >>> >>> The amount of data in Prod is about 20% more than QA. >>> We tested the network speed is fine. The hardware in Prod is larger >>> and more >>> powerful than QA. >>> But QA is faster during reload. It takes QA only one hour than 6 >>> hours in Prod. >>> >>> That's why we don't understand what's the reason, the amount of >>> data is only 20% >>> more but it will not take 5 times slower because the data only 20% >>> more. >>> >>> So, we looked into the config file for solr, but it's not much >>> different, except >>> Prod has master/slave environment which QA only master. >>> >>> Thanks for the response. >>> >>> Francis >>> >>> >>> -Original Message- >>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>> Sent: Friday, June 19, 2009 8:58 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Slowness during submit the index >>> >>> >>> Francis, >>> >>> So it could easily be that your QA and PROD DBs are really just >>> simply different >>> (different amount of data, different network speed, different >>> hardware...) >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> - Original Message From: Francis Yakin To: "solr-user@lucene.apache.org" Sent: Friday, June 19, 2009 10:39:48 PM Subject: RE: Slowness during submit the index * is the java version the same on both machines (QA vs. PROD) - YES * are the same java parameters being used on both machines - YES * is the connection to the DB the same on both machines - Not sure, >>> need to ask the network guy * are both the PROD and QA DB servers the same and are both DB instances the same - they are not from the same DB -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, June 19, 2009 6:23 PM To: solr-user@lucene.apache.org Subject: Re: Slowness during submit the index Francis, I'm not sure if I understood your email correctly, but I think you are saying you are indexing your DB content into a Solr index. If this is correct, here are things to look at: * is the java version the same on both machines (QA vs. PROD) * are the same java parameters being used on both machines * is the connection to the DB the same on both machines * are both the PROD and QA DB servers the same and are both DB instances the same ... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Francis Yakin > To: "solr-user@lucene.apache.org" > Sent: Friday, June 19, 2009 5:27:59 PM > Subject: Slowness during submit the index > > > We are experiencing slowness during reloading/resubmitting index > from >>> Database > to the master. > > We have two environments: > > QA and Prod. > > The slowness is happened