RE: Newbie SolR - Need advice
Hi Fabio, Like Jack says, try the tutorial. But to answer your question, SOLR isn't a bolt on to SQLServer or any other DB. It's a fantastically fast indexing/searching tool. You'll need to use the DataImportHandler (see the tutorial) to import your data from the DB into the indices that SOLR uses. Once in there, you'll have more power & flexibility than SQLServer would ever give you! Haven't tried SOLR on Windows (I guess your environment) but I'm sure it'll work using Jetty or Tomcat as web container. Stick with it. The ride can be bumpy but the experience is sensational! DQ -Original Message- From: fabio1605 [mailto:fabio.to...@btinternet.com] Sent: 02 July 2013 16:16 To: solr-user@lucene.apache.org Subject: Newbie SolR - Need advice Hi we have a MSSQL Server which is just getting far to large now and performance is dying! the majority of our webservers mainly are doing search function so i thought it may be best to move to SolR But i know very little about it! My questions are! Does SolR Run as a bolt on to MSSQL - as in the data is still in MSSQL and SolR is just the search bit between? Im really struggling to understand the point of SOLR etc so if someone could point me to a Dummies website id apprecaite it! google is throwing to much confusion at me! -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Newbie SolR - Need advice
Don’t worry Fabio - nobody knows everything (apart from Hossman). Following on from Sandeep, to use SOLR, you extract the data from your MSSQL DB using the DataImportHandler and you can then query it, facet it, pivot it to your heart's content. And fast! You can use almost anything to build the SOLR queries - Java & PHP being probably most popular. There is a library for Perl I think but never tried it. So, you keep your mssql database, you just don't use it for searches - that'll relieve some of the load. Searches then all go through SOLR & its Lucene indexes. If your various tables need SQL joins, you specify those in the DataImportHandler (DIH) config. That way, when SOLR indexes everything, it indexes the data the way you want to see it. DIH handles the data export from mssql -> SOLR and it's not too difficult to set up. You imply you're adding (inserting) data. How much, how often? DIH has a delta import feature so you can add data on the fly to SOLR's indexes. Much of it come down to the data model you have. My advice would be try it and see. You will be pleasantly surprised! -Original Message- From: fabio1605 [mailto:fabio.to...@btinternet.com] Sent: 02 July 2013 17:10 To: solr-user@lucene.apache.org Subject: RE: Newbie SolR - Need advice Thanks guys So SolR is actually a database replacement for mssql... Am I right We have a lot of perl scripts that contains lots of sql insert queries. Etc How do we query the SolR database from scripts I know I have a lot to learn still so excuse my ignorance. Also... What is mongo and how does it compare I just don't understand how in 10years of Web development I have never heard of SolR till last week Sent from Samsung Mobile ---- Original message From: "David Quarterman [via Lucene]" Date: 02/07/2013 16:57 (GMT+00:00) To: fabio1605 Subject: RE: Newbie SolR - Need advice Hi Fabio, Like Jack says, try the tutorial. But to answer your question, SOLR isn't a bolt on to SQLServer or any other DB. It's a fantastically fast indexing/searching tool. You'll need to use the DataImportHandler (see the tutorial) to import your data from the DB into the indices that SOLR uses. Once in there, you'll have more power & flexibility than SQLServer would ever give you! Haven't tried SOLR on Windows (I guess your environment) but I'm sure it'll work using Jetty or Tomcat as web container. Stick with it. The ride can be bumpy but the experience is sensational! DQ -Original Message- From: fabio1605 [mailto:[hidden email]] Sent: 02 July 2013 16:16 To: [hidden email] Subject: Newbie SolR - Need advice Hi we have a MSSQL Server which is just getting far to large now and performance is dying! the majority of our webservers mainly are doing search function so i thought it may be best to move to SolR But i know very little about it! My questions are! Does SolR Run as a bolt on to MSSQL - as in the data is still in MSSQL and SolR is just the search bit between? Im really struggling to understand the point of SOLR etc so if someone could point me to a Dummies website id apprecaite it! google is throwing to much confusion at me! -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074772.html To unsubscribe from Newbie SolR - Need advice, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074782.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Newbie SolR - Need advice
Hi Fabio, Sandeep is right - it'll take time. SOLR isn't straightforward when you first start out but the tutorial is the best first step. You can then adapt the various config files in the tutorial to adapt to your situation. I'd recommend a simple approach to get the hang of it and just index one table, specifying some fields to be searched in the schema.xml. There are some good books around too (Sandeeps's recommendation on Lucidworks is good too). Apache Solr 3.1 Cookbook by Rafal Kuc (still valid for 4.x.x), Jack Krupansky's Solr 4.x Deep Dive - Early Access Release, Solr In Action by Trey Grainger & Tim Potter. If you need help, shout! It's a great community. Cheers, DQ -Original Message- From: fabio1605 [mailto:fabio.to...@btinternet.com] Sent: 03 July 2013 09:55 To: solr-user@lucene.apache.org Subject: Re: Newbie SolR - Need advice Hi Sandeep Thank you for your reply Il have a read through the tutorials now that i understand the principle of all this, i would ideally like to keep mssql and bolt solr on top of this so that we can keep mssql as we have a 200GB database Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4075026.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR 4.0 frequent admin problem
Hi, About once a week the admin system comes up with SolrCore Initialization Failures. There's nothing in the logs and SOLR continues to work in the application it's supporting and in the 'direct access' mode (i.e. http://123.465.789.100:8080/solr/collection1/select?q=bingo:*). The cure is to restart Jetty (8.1.7) and then we can use the admin system again via pc's. However, a colleague can get into admin on an iPad with no trouble when no browser on a pc can! Anyone any ideas? It's really frustrating! Best regards, DQ
RE: SOLR 4.0 frequent admin problem
Cheers, Roman! It was a default Jetty set up so now added a 'work' directory and that's in use now. -Original Message- From: Roman Chyla [mailto:roman.ch...@gmail.com] Sent: 04 July 2013 15:00 To: solr-user@lucene.apache.org Subject: Re: SOLR 4.0 frequent admin problem Yes :-) see SOLR-118, seems an old issue... On 4 Jul 2013 06:43, "David Quarterman" wrote: > Hi, > > About once a week the admin system comes up with SolrCore > Initialization Failures. There's nothing in the logs and SOLR > continues to work in the application it's supporting and in the 'direct > access' mode (i.e. > http://123.465.789.100:8080/solr/collection1/select?q=bingo:*). > > The cure is to restart Jetty (8.1.7) and then we can use the admin > system again via pc's. However, a colleague can get into admin on an > iPad with no trouble when no browser on a pc can! > > Anyone any ideas? It's really frustrating! > > Best regards, > > DQ > >
RE: Commit different database rows to solr with same "id" value?
Hi Jason, Assuming you're using DIH, why not build a new, unique id within the query to use as the 'doc_id' for SOLR? We do something like this in one of our collections. In MySQL, try this (don't know what it would be for any other db but there must be equivalents): select @rownum:=@rownum+1 rowid, t.* from (main select query) t, (select @rownum:=0) s Regards, DQ -Original Message- From: Jason Huang [mailto:jason.hu...@icare.com] Sent: 10 July 2013 15:50 To: solr-user@lucene.apache.org Subject: Commit different database rows to solr with same "id" value? Hello, I am trying to use Solr to store fields from two different database tables, where the primary keys are in the format of "1, 2, 3, " In Java, we build different POJO classes for these two database tables: table1.java @SolrIndex(name="id") private String idTable1 table2.java @SolrIndex(name="id") private String idTable2 And later we add these fields defined in the two different types of tables and commit it to solrServer. Here is the scenario where I am having issues: (1) commit a row from table1 with primary key = "3", this generates a document in Solr (2) commit another row from table2 with the same value of primary key = "3", this overwrites the document generated in step (1). What we really want to achieve is to keep both rows in (1) and (2) because they are from different tables. I've read something from google search and it appears that we might be able to do it via keeping multiple cores in solr? Could anyone point at how to implement multiple core to achieve this? To be more specific, when I commit the row as a document, I don't have a place to pick a certain core and I am not sure if it makes any sense for me to specify a core when I commit the document since the layer I am working on should abstract it away from me. The second question is - if we don't want to do a multicore (since we can't easily search for related data between multiple cores), how can we resolve this issue so both rows from different database table which shares the same primary key still exist? We don't want to have to always change the primary key format to ensure a uniqueness of the primary key among all different types of database tables. thanks! Jason
RE: Facet sorting seems weird
Hi Henrik, Try setting up a copyfield in your schema and set the copied field to use something like 'text_ws' which implements LowerCaseFilterFactory. Then sort on the copyfield. Regards, DQ -Original Message- From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com] Sent: 15 July 2013 15:08 To: solr-user@lucene.apache.org Subject: Facet sorting seems weird Hello, first time writing to the list. I am a developer for a company where we recently switched all of our search core from Sphinx to Solr with very great results. In general we've been very happy with the switch, and everything seems to work just as we want it to. Today however we've run into a bit of a issue regarding faceted sort. For example we have a field called "brand" in our core, defined as the text_en datatype from the example Solr core. This field is copied into facet_brand with the datatype string (since we don't really need to do much with it except show it for faceted navigation). Now, given these two entries into the field on different documents, "LEGO" and "bObles", and given facet.sort=index, it appears that LEGO is sorted as being before bObles. I assume this is because of casing differences. My question then is, how do we define a decent datatype in our schema, where the casing is exact, but we are able to sort it without casing mattering? Thank you :) Best regards, Henrik Ossipoff
Edismax odd results
Hi all, We have an index of boots which contains harness boots, engineer boots , ankle boots, etc. An edismax search on the index for 'harness boots' brings back 2,175 boots with 'harness' results at the top. 'Searching 'engineer boots' brings back everything but 'engineer boots', same for 'ankle boots' - in fact, same result set of 1,873 mostly boots but a few other products mixed in. We're on SOLR 4.0 and the field we're querying is stemmed (snowball), lowercased on WhiteSpaceTokenizer. Any ideas? Regards, David Q
RE: Edismax odd results
Hi Jack, Here's q test query we've been using: select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplurals&pf2=prodnameplurals^2.0 This still produces a result set where the first 'engineer boot' is way down the list and subsequent ones are interspersed with other boots. They're all in there, just not at the top. Below is the debug on the first item that is an engineer boot. 0.23492618 = (MATCH) sum of: 0.23492618 = (MATCH) product of: 0.46985236 = (MATCH) sum of: 0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) [DefaultSimilarity], result of: 0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0 ), product of: 0.22236869 = queryWeight, product of: 4.8295836 = idf(docFreq=1867, maxDocs=86009) 0.046043035 = queryNorm 2.112943 = fieldWeight in 48270, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.8295836 = idf(docFreq=1867, maxDocs=86009) 0.4375 = fieldNorm(doc=48270) 0.5 = coord(1/2) Regards, DQ -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: 19 February 2013 15:31 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results Show us your qf and pf params. Do you have PF2 set? That's the key for getting the phrase "engineer boots" boosted higher than just boots. You may also simply have to give a higher PF2 boost since "boots" probably has a much higher term frequency than "engineer" or even the natural Lucene score for "engineer boot". Also check the &debugQuery=true "explain" scoring to see how engineer, boot, and "engineer boot" are being scored - you may have to add some specific query phrases to force "engineer boot" into the top results to comparing the scoring. -- Jack Krupansky -Original Message- From: David Quarterman Sent: Tuesday, February 19, 2013 6:21 AM To: solr-user@lucene.apache.org Subject: Edismax odd results Hi all, We have an index of boots which contains harness boots, engineer boots , ankle boots, etc. An edismax search on the index for 'harness boots' brings back 2,175 boots with 'harness' results at the top. 'Searching 'engineer boots' brings back everything but 'engineer boots', same for 'ankle boots' - in fact, same result set of 1,873 mostly boots but a few other products mixed in. We're on SOLR 4.0 and the field we're querying is stemmed (snowball), lowercased on WhiteSpaceTokenizer. Any ideas? Regards, David Q
RE: Edismax odd results
Hi Shawn, I checked the admin analysis earlier. Stemming is taking 'engineer' down to 'engin', but then I'd have thought that a search on 'engin boots' would work but it doesn't. I'll try turning the wick back up on the logging - we set it to 'warning'. Regards, DQ -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 19 February 2013 16:25 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results I do not see the word engineer (or any other similar word) in the score calculation, only boots. A test on my own index shows both words in the calculations. I would use the analysis admin page on the prodnameplurals field to see what happens to the input of "engineer boots" on both index and query - see what part of your analysis chain removes it. If you don't see any problem there, then the Solr log (assuming you haven't changed the default log level of INFO) should have a record of what parameters were actually received when the query was made. Thanks, Shawn On 2/19/2013 9:14 AM, David Quarterman wrote: > Hi Jack, > > Here's q test query we've been using: > > select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplural > s&pf2=prodnameplurals^2.0 > > This still produces a result set where the first 'engineer boot' is way down > the list and subsequent ones are interspersed with other boots. They're all > in there, just not at the top. Below is the debug on the first item that is > an engineer boot. > > > 0.23492618 = (MATCH) sum of: >0.23492618 = (MATCH) product of: > 0.46985236 = (MATCH) sum of: >0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) > [DefaultSimilarity], result of: > 0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0 ), > product of: >0.22236869 = queryWeight, product of: > 4.8295836 = idf(docFreq=1867, maxDocs=86009) > 0.046043035 = queryNorm >2.112943 = fieldWeight in 48270, product of: > 1.0 = tf(freq=1.0), with freq of: >1.0 = termFreq=1.0 > 4.8295836 = idf(docFreq=1867, maxDocs=86009) > 0.4375 = fieldNorm(doc=48270) > 0.5 = coord(1/2) > > > Regards, > > DQ > > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: 19 February 2013 15:31 > To: solr-user@lucene.apache.org > Subject: Re: Edismax odd results > > Show us your qf and pf params. Do you have PF2 set? That's the key for > getting the phrase "engineer boots" boosted higher than just boots. You may > also simply have to give a higher PF2 boost since "boots" probably has a much > higher term frequency than "engineer" or even the natural Lucene score for > "engineer boot". > > Also check the &debugQuery=true "explain" scoring to see how engineer, boot, > and "engineer boot" are being scored - you may have to add some specific > query phrases to force "engineer boot" into the top results to comparing the > scoring. > > -- Jack Krupansky > > -Original Message- > From: David Quarterman > Sent: Tuesday, February 19, 2013 6:21 AM > To: solr-user@lucene.apache.org > Subject: Edismax odd results > > Hi all, > > We have an index of boots which contains harness boots, engineer boots , > ankle boots, etc. An edismax search on the index for 'harness boots' brings > back 2,175 boots with 'harness' results at the top. 'Searching 'engineer > boots' brings back everything but 'engineer boots', same for 'ankle boots' - > in fact, same result set of 1,873 mostly boots but a few other products mixed > in. > > We're on SOLR 4.0 and the field we're querying is stemmed (snowball), > lowercased on WhiteSpaceTokenizer. Any ideas?
RE: Edismax odd results
Hi Shawn/Jack, The log shows the query going in okay, nothing gets stripped out so we're still at a loss to understand this. Could it be theta Snowball stemming is too invasive? Regards, DQ -Original Message- From: David Quarterman [mailto:da...@corexe.com] Sent: 19 February 2013 16:38 To: solr-user@lucene.apache.org Subject: RE: Edismax odd results Hi Shawn, I checked the admin analysis earlier. Stemming is taking 'engineer' down to 'engin', but then I'd have thought that a search on 'engin boots' would work but it doesn't. I'll try turning the wick back up on the logging - we set it to 'warning'. Regards, DQ -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 19 February 2013 16:25 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results I do not see the word engineer (or any other similar word) in the score calculation, only boots. A test on my own index shows both words in the calculations. I would use the analysis admin page on the prodnameplurals field to see what happens to the input of "engineer boots" on both index and query - see what part of your analysis chain removes it. If you don't see any problem there, then the Solr log (assuming you haven't changed the default log level of INFO) should have a record of what parameters were actually received when the query was made. Thanks, Shawn On 2/19/2013 9:14 AM, David Quarterman wrote: > Hi Jack, > > Here's q test query we've been using: > > select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplural > s&pf2=prodnameplurals^2.0 > > This still produces a result set where the first 'engineer boot' is way down > the list and subsequent ones are interspersed with other boots. They're all > in there, just not at the top. Below is the debug on the first item that is > an engineer boot. > > > 0.23492618 = (MATCH) sum of: >0.23492618 = (MATCH) product of: > 0.46985236 = (MATCH) sum of: >0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) > [DefaultSimilarity], result of: > 0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0 ), > product of: >0.22236869 = queryWeight, product of: > 4.8295836 = idf(docFreq=1867, maxDocs=86009) > 0.046043035 = queryNorm >2.112943 = fieldWeight in 48270, product of: > 1.0 = tf(freq=1.0), with freq of: >1.0 = termFreq=1.0 > 4.8295836 = idf(docFreq=1867, maxDocs=86009) > 0.4375 = fieldNorm(doc=48270) > 0.5 = coord(1/2) > > > Regards, > > DQ > > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: 19 February 2013 15:31 > To: solr-user@lucene.apache.org > Subject: Re: Edismax odd results > > Show us your qf and pf params. Do you have PF2 set? That's the key for > getting the phrase "engineer boots" boosted higher than just boots. You may > also simply have to give a higher PF2 boost since "boots" probably has a much > higher term frequency than "engineer" or even the natural Lucene score for > "engineer boot". > > Also check the &debugQuery=true "explain" scoring to see how engineer, boot, > and "engineer boot" are being scored - you may have to add some specific > query phrases to force "engineer boot" into the top results to comparing the > scoring. > > -- Jack Krupansky > > -Original Message- > From: David Quarterman > Sent: Tuesday, February 19, 2013 6:21 AM > To: solr-user@lucene.apache.org > Subject: Edismax odd results > > Hi all, > > We have an index of boots which contains harness boots, engineer boots , > ankle boots, etc. An edismax search on the index for 'harness boots' brings > back 2,175 boots with 'harness' results at the top. 'Searching 'engineer > boots' brings back everything but 'engineer boots', same for 'ankle boots' - > in fact, same result set of 1,873 mostly boots but a few other products mixed > in. > > We're on SOLR 4.0 and the field we're querying is stemmed (snowball), > lowercased on WhiteSpaceTokenizer. Any ideas?
RE: Edismax odd results
Hi, This is definitely driving us mad now! Changed to PorterStemming and there's very little difference. If we add fq=engineer, we get 0 results. Add fq=engineer* and we get the 90 in the system. Try with fq=ankle* and we get 2. Correct. Try with fq=harness* and we get 0! The stemming reduces 'engineer' to 'engin' so I'd have expected a lot more results. Anyone got any ideas? Regards, DQ -----Original Message- From: David Quarterman [mailto:da...@corexe.com] Sent: 19 February 2013 17:09 To: solr-user@lucene.apache.org Subject: RE: Edismax odd results Hi Shawn/Jack, The log shows the query going in okay, nothing gets stripped out so we're still at a loss to understand this. Could it be theta Snowball stemming is too invasive? Regards, DQ -----Original Message- From: David Quarterman [mailto:da...@corexe.com] Sent: 19 February 2013 16:38 To: solr-user@lucene.apache.org Subject: RE: Edismax odd results Hi Shawn, I checked the admin analysis earlier. Stemming is taking 'engineer' down to 'engin', but then I'd have thought that a search on 'engin boots' would work but it doesn't. I'll try turning the wick back up on the logging - we set it to 'warning'. Regards, DQ -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 19 February 2013 16:25 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results I do not see the word engineer (or any other similar word) in the score calculation, only boots. A test on my own index shows both words in the calculations. I would use the analysis admin page on the prodnameplurals field to see what happens to the input of "engineer boots" on both index and query - see what part of your analysis chain removes it. If you don't see any problem there, then the Solr log (assuming you haven't changed the default log level of INFO) should have a record of what parameters were actually received when the query was made. Thanks, Shawn On 2/19/2013 9:14 AM, David Quarterman wrote: > Hi Jack, > > Here's q test query we've been using: > > select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplural > s&pf2=prodnameplurals^2.0 > > This still produces a result set where the first 'engineer boot' is way down > the list and subsequent ones are interspersed with other boots. They're all > in there, just not at the top. Below is the debug on the first item that is > an engineer boot. > > > 0.23492618 = (MATCH) sum of: >0.23492618 = (MATCH) product of: > 0.46985236 = (MATCH) sum of: >0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) > [DefaultSimilarity], result of: > 0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0 ), > product of: >0.22236869 = queryWeight, product of: > 4.8295836 = idf(docFreq=1867, maxDocs=86009) > 0.046043035 = queryNorm >2.112943 = fieldWeight in 48270, product of: > 1.0 = tf(freq=1.0), with freq of: >1.0 = termFreq=1.0 > 4.8295836 = idf(docFreq=1867, maxDocs=86009) > 0.4375 = fieldNorm(doc=48270) > 0.5 = coord(1/2) > > > Regards, > > DQ > > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: 19 February 2013 15:31 > To: solr-user@lucene.apache.org > Subject: Re: Edismax odd results > > Show us your qf and pf params. Do you have PF2 set? That's the key for > getting the phrase "engineer boots" boosted higher than just boots. You may > also simply have to give a higher PF2 boost since "boots" probably has a much > higher term frequency than "engineer" or even the natural Lucene score for > "engineer boot". > > Also check the &debugQuery=true "explain" scoring to see how engineer, boot, > and "engineer boot" are being scored - you may have to add some specific > query phrases to force "engineer boot" into the top results to comparing the > scoring. > > -- Jack Krupansky > > -Original Message- > From: David Quarterman > Sent: Tuesday, February 19, 2013 6:21 AM > To: solr-user@lucene.apache.org > Subject: Edismax odd results > > Hi all, > > We have an index of boots which contains harness boots, engineer boots , > ankle boots, etc. An edismax search on the index for 'harness boots' brings > back 2,175 boots with 'harness' results at the top. 'Searching 'engineer > boots' brings back everything but 'engineer boots', same for 'ankle boots' - > in fact, same result set of 1,873 mostly boots but a few other products mixed > in. > > We're on SOLR 4.0 and the field we're querying is stemmed (snowball), > lowercased on WhiteSpaceTokenizer. Any ideas?
Re: Edismax odd results
Hi Shawn, Now finished for the day but will post the schema tomorrow. Thanks for the help (and Jack too). Regards, DQ P.S. did reindex after changing schema and the analyzer/query stuff matches precisely!! Shawn Heisey wrote: On 2/19/2013 11:16 AM, David Quarterman wrote: > This is definitely driving us mad now! Changed to PorterStemming and there's > very little difference. > > If we add fq=engineer, we get 0 results. Add fq=engineer* and we get the 90 > in the system. Try with fq=ankle* and we get 2. Correct. Try with fq=harness* > and we get 0! > > The stemming reduces 'engineer' to 'engin' so I'd have expected a lot more > results. > > Anyone got any ideas? Did you completely reindex when you changed your schema? You must reindex. Does the index analysis match the query analysis? Some specific differences are allowed (and sometimes encouraged), but stemming must be done to both. Can you share your schema? Use a paste website like pastie.org for that. Thanks, Shawn
RE: Edismax odd results
Hi Shawn, Schema's at http://justpaste.it/davidqhog. It's the basic SOLR 4.0 with additions! Regards, DQ -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 19 February 2013 18:32 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results On 2/19/2013 11:16 AM, David Quarterman wrote: > This is definitely driving us mad now! Changed to PorterStemming and there's > very little difference. > > If we add fq=engineer, we get 0 results. Add fq=engineer* and we get the 90 > in the system. Try with fq=ankle* and we get 2. Correct. Try with fq=harness* > and we get 0! > > The stemming reduces 'engineer' to 'engin' so I'd have expected a lot more > results. > > Anyone got any ideas? Did you completely reindex when you changed your schema? You must reindex. Does the index analysis match the query analysis? Some specific differences are allowed (and sometimes encouraged), but stemming must be done to both. Can you share your schema? Use a paste website like pastie.org for that. Thanks, Shawn
RE: Edismax odd results
Hi Erick, Debug=all posted on http://justpaste.it/davidqhogdebug. Can't see anything obvious myselfbut then I'm not an expert! Regards, DQ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 20 February 2013 02:02 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results When you get back to this tomorrow, also try and paste the parsed query bits you get back when you append &debug=all. Sometimes it's surprising what the parsed query _really_ looks like Best Erick On Tue, Feb 19, 2013 at 3:13 PM, David Quarterman wrote: > Hi Shawn, > > Now finished for the day but will post the schema tomorrow. Thanks for > the help (and Jack too). > > Regards, > > DQ > > P.S. did reindex after changing schema and the analyzer/query stuff > matches precisely!! > > Shawn Heisey wrote: > > On 2/19/2013 11:16 AM, David Quarterman wrote: > > This is definitely driving us mad now! Changed to PorterStemming and > there's very little difference. > > > > If we add fq=engineer, we get 0 results. Add fq=engineer* and we get > > the > 90 in the system. Try with fq=ankle* and we get 2. Correct. Try with > fq=harness* and we get 0! > > > > The stemming reduces 'engineer' to 'engin' so I'd have expected a > > lot > more results. > > > > Anyone got any ideas? > > Did you completely reindex when you changed your schema? You must reindex. > > Does the index analysis match the query analysis? Some specific > differences are allowed (and sometimes encouraged), but stemming must > be done to both. Can you share your schema? Use a paste website like > pastie.org for that. > > Thanks, > Shawn > >
RE: Edismax odd results
Hi Erick, I understand the wildcard issue - that was more desperation on our part than logic! TermsComponent showed 222 197 so the term is in the index. Using the explainOther, I can see that the relevance of documents with 'engineer boots' in the name is low compared to the others and they appear randomly distributed through the resultset (I know it's not random). We've tried all sorts of things to boost them but to no avail. Trying 'logger boots' or 'harness boots' gives good results with the required terms at the top of the set. I'm mystified. Regards, DQ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 20 February 2013 12:49 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results OK, first: wildcarding and stemming don't get along well together. Since you've stemmed the field, enginee* would not match the stemmed term engin. This is actually pretty tricky to try to implement. For instance, how would enginee stem? So the fqs you posted are going to mislead you in that regard. If you want to examine the actual values in your index, consider using TermsComponent or Luke. Either will show you exactly what's being searched against. I suspect that your fq entries (as typed) are going against the default field of "text" as defined in your schema, which doesn't stem, so that's leading you astray possibly. Finally, you may be getting bitten by scoring, field norms and all that. If you have a doc ID that you _know_ contains "engineers boots", try using debug with explainOther ( http://wiki.apache.org/solr/CommonQueryParameters#explainOther) which might help you understand what's happening with the doc you care about Best Erick On Wed, Feb 20, 2013 at 7:13 AM, David Quarterman wrote: > Hi Erick, > > Debug=all posted on http://justpaste.it/davidqhogdebug. Can't see > anything obvious myselfbut then I'm not an expert! > > Regards, > > DQ > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 20 February 2013 02:02 > To: solr-user@lucene.apache.org > Subject: Re: Edismax odd results > > When you get back to this tomorrow, also try and paste the parsed > query bits you get back when you append &debug=all. Sometimes it's > surprising what the parsed query _really_ looks like > > Best > Erick > > > On Tue, Feb 19, 2013 at 3:13 PM, David Quarterman > wrote: > > > Hi Shawn, > > > > Now finished for the day but will post the schema tomorrow. Thanks > > for the help (and Jack too). > > > > Regards, > > > > DQ > > > > P.S. did reindex after changing schema and the analyzer/query stuff > > matches precisely!! > > > > Shawn Heisey wrote: > > > > On 2/19/2013 11:16 AM, David Quarterman wrote: > > > This is definitely driving us mad now! Changed to PorterStemming > > > and > > there's very little difference. > > > > > > If we add fq=engineer, we get 0 results. Add fq=engineer* and we > > > get the > > 90 in the system. Try with fq=ankle* and we get 2. Correct. Try with > > fq=harness* and we get 0! > > > > > > The stemming reduces 'engineer' to 'engin' so I'd have expected a > > > lot > > more results. > > > > > > Anyone got any ideas? > > > > Did you completely reindex when you changed your schema? You must > reindex. > > > > Does the index analysis match the query analysis? Some specific > > differences are allowed (and sometimes encouraged), but stemming > > must be done to both. Can you share your schema? Use a paste > > website like pastie.org for that. > > > > Thanks, > > Shawn > > > > >
RE: If we Open Source our platform, would it be interesting to you?
Hi Marcelo, Looked through your site and the framework looks very powerful as an aggregator. We do a lot of data aggregation from many different sources in many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main repository for eventual SOLR indexing. A 'one-stop-shop' for all this would be very appealing. Have you looked at products like Talend & Jitterbit? These offer transformation from almost anything to almost anything using graphical interfaces (Jitterbit is better) and a PHP-like coding format for trickier work. If you (or somebody) could add a graphical interface, the world would beat a path to your door! Regards, DQ -Original Message- From: Marcelo Elias Del Valle [mailto:marc...@s1mbi0se.com.br] Sent: 20 February 2013 18:18 To: solr-user@lucene.apache.org Subject: If we Open Source our platform, would it be interesting to you? Hello All, I’m sending this email because I think it may be interesting for Solr users, as this project have a strong usage of Solr platform. We are strongly considering opening the source of our DMP (Data Management Platform), if it proves to be technically interesting to other developers / companies. More details: http://www.s1mbi0se.com/s1mbi0se_DMP.html All comments, questions and critics happening at HN: http://news.ycombinator.com/item?id=5251780 Please, feel free to send questions, comments and critics... We will try to reply them all. Regards, Marcelo
RE: Edismax odd results
Hi Erick, Funnily enough, I cracked it about 5 minutes before your email arrived! Problem was using WhiteSpaceTokenizer instead of Standard AND had the LowerCaseFilter after the PorterStemmingFilter. Getting them in the right order has solved all the problems and we get all our engineer boots, ankle boots at the top of the set! Many thanks to all who took part. Regards, DQ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 22 February 2013 12:59 To: solr-user@lucene.apache.org Subject: Re: Edismax odd results OK, let's see the debug data for explainOther. One thing, though. Your analysis chain is apt to be surprising. The fact that you have 222 terms with the ":" says that you're probably not getting what I'd guess you want. That ':' is part of your token, and will not match "engineering", consider changing some of your filters to remove stuff like that Best Erick
RE: Building a central index with Lucene + Solr
Hi Alvaro, I agree with Otis & Alexandre (esp. Windows + PHP!). However, there are plenty of people using Solr & PHP out there very successfully. There's another good package at http://code.google.com/p/solr-php-client/ which is easy to implement and has some example usage. Regards, DQ From: Álvaro Vargas Quezada [mailto:al...@outlook.com] Sent: 05 March 2013 14:53 To: solr-user@lucene.apache.org Subject: Building a central index with Lucene + Solr Hi everyone! I'm trying to develop a central index, I installed Solr and I reach the screen that I attach. But the problem is that I don't know how to continue since this point, I wanted to develop an app in php which use Solr, but I don't know how, anyone that can help me maybe with a tutorial or something like that? Thanks and greetz from Chile!
SOLR 4.0 Beta documents being duplicated
Hi, We've been using V4.x of SOLR since last November without too much trouble. Our MySQL database is refreshed daily and a full import is run automatically after the refresh and generally produces around 86,000 products, obviously on unique doc_id's. So, we upgraded to 4.0 Beta a few days ago, with only mild difficulty, reindexed and all was fine. Except after the next data refresh and full-import, we had duplicate products appearing on different unique doc_ids. Not all products are being duplicated, just random ones. We've just deleted the data directory and reindexed and the product count has dropped from 116,711 to 86,543. There'll be another refresh/import early tomorrow morning and I fear we'll have more duplicates. The call to the import now contains clean=true, commit=true and optimize=true but it seems to make no difference. Anyone have any ideas? Regards, David Q
RE: SOLR 4.0 Beta documents being duplicated
Thanks Erick. We've added the '_version_' and we'll see if that makes a difference tomorrow. Also, have downloaded the RC1 and will try that next week. Regards, David Q -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 05 October 2012 15:40 To: solr-user@lucene.apache.org Subject: Re: SOLR 4.0 Beta documents being duplicated How are you indexing? There was a problem with indexing from SolrJ if you indexed documents in batches, server.add(doclist) that's fixed in 4.0 RC#. The work-around is to add docs singly, server.add(doc) Second thing. Bad Things Happen if you don't have a _version_ field in your schema.xml. Solr 4.0 RC# isn't happy on startup if this field is missing... Personally, I think you'd be better off using one of the release candidates. Robert cut one here: http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0RC1-rev13911 44/solr/ There will be an RC2 sometime, a couple of problems have been found, but using RC1 should minimize any update to the official 4.0 plus have a lot of improvements over BETA... Best Erick On Fri, Oct 5, 2012 at 10:25 AM, David Quarterman wrote: > Hi, > > We've been using V4.x of SOLR since last November without too much > trouble. Our MySQL database is refreshed daily and a full import is > run automatically after the refresh and generally produces around > 86,000 products, obviously on unique doc_id's. > > > > So, we upgraded to 4.0 Beta a few days ago, with only mild difficulty, > reindexed and all was fine. Except after the next data refresh and > full-import, we had duplicate products appearing on different unique > doc_ids. Not all products are being duplicated, just random ones. > We've just deleted the data directory and reindexed and the product > count has dropped from 116,711 to 86,543. There'll be another > refresh/import early tomorrow morning and I fear we'll have more duplicates. > > > > The call to the import now contains clean=true, commit=true and > optimize=true but it seems to make no difference. > > > > Anyone have any ideas? > > > > Regards, > > > > David Q > > >
RE: Feature & design question: use autocomple?te to search on 2 different fields, and return 2 different data groups
We had a similar requirement and found the best solution (unfortunately) was to spend a small amount of money. Have a look at Sematext's site (www.sematext.com). Their Autocomplete is awesome and we have a fantastic looking AC now on our development site, grouped by category, product & brand with product pictures to boot! It's very, very quick in operation too. Best, DQ -Original Message- From: fernando.beck [mailto:fernando.b...@gmail.com] Sent: 01 November 2012 13:40 To: solr-user@lucene.apache.org Subject: Feature & design question: use autocomple?te to search on 2 different fields, and return 2 different data groups Hello, we're facing a new feature request, and we can't get the right way to come up with a working solution. Context: we have a list of businesses . For each business we have: name, category, address, city. One business may have 1 or more categories. Example: Name: Outback SteakHouse Category: Restaurants , American Address: xx City: Rio de Janeiro Name: Starbucks Category: Bar, Coffee Address: y City: Rio de Janeiro Name: Pizza Hut Category: Restaurant, Pizza Address: City: New York and so on. What we need to do: create an "autocomplete" feature; whenever someone starts to type, we will need to search the term BOTH on CompanyName AND Category. Example: I type pizz and the result should be coming back in 2 groups. Group 1: Categories (displaying Pizza) Group 2: all those businesses featuring pizza on their name , ie Pizza Hut. Right now we can not find a way to get this done. Schema (since we're running a portuguese based application, there are 2 fieldType added for it): --> LocalBusinessId Thanks, F -- View this message in context: http://lucene.472066.n3.nabble.com/Feature-design-question-use-autocompl e-te-to-search-on-2-different-fields-and-return-2-different-dats-tp40175 28.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Feature & design question: use autocomple?te to search on 2 different fields, and return 2 different data groups
Fernando, Pretty much the problem we came up against. We had a basic AC running using SpellChecker a while ago but it was the grouping that floored us and sent us elsewhere. Again, multiple queries seemed like the only possible answer but in an AC scenario, even with SOLR's speed, probably too slow under load. Best, DQ -Original Message- From: fernando.beck [mailto:fernando.b...@gmail.com] Sent: 01 November 2012 13:55 To: solr-user@lucene.apache.org Subject: RE: Feature & design question: use autocomple?te to search on 2 different fields, and return 2 different data groups David, appreciate the suggestion. Our current autocomplete feature is actually working pretty good. No perfomance issues; functionally is providing 100% results as expected. I checked sematext and also http://www.cominvent.com; they are great, and our budget to go get them is 0. At this time, and given the presented schema, my question would be: is even possible to get it done somehow? with 1 query, and "group" those results while autocompleting on 2 different search fields? -- View this message in context: http://lucene.472066.n3.nabble.com/Feature-design-question-use-autocompl e-te-to-search-on-2-different-fields-and-return-2-different-dats-tp40175 28p4017534.html Sent from the Solr - User mailing list archive at Nabble.com.