Re: suggester issues
Finally got it working - turns out you can't just add it to the lib dir as the wiki suggests. Unfortunately the only way is adding it to solr.war. Thanks for your help. -- From: "William Oberman" Sent: Friday, August 19, 2011 5:07 PM To: Subject: Re: suggester issues Hard to say, so I'll list the exact steps I took: -Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn) -Untar and cd -ant -Wrote my class below (under a peer directory in apache-solr-3.3.0) -javac -cp ../dist/apache-solr-core-3.3.0.jar:../lucene/build/lucene-core-3.3-SNAPSHOT.jar com/civicscience/SpellingQueryConverter.java -jar cf cs.jar com -Unzipped solr.war (under example) -Added my cs.jar to lib (under web-inf) -Rezipped solr.war -Added: class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml -Restarted jetty And, that seemed to all work. will On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote: As far as I checked creating a custom query converter is the only way to make this work. Unfortunately I have some problems with running it - after creating a JAR with my class (Im using your source code, obviously besides package and class names) and throwing it into the lib dir I've added name="queryConverter" class="mypackage.MySpellingQueryConverter"/> to solrconfig.xml. I get a "SEVERE: org.apache.solr.common.SolrException: Error Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a org.apache.solr.spelling.QueryConverter". What am I doing wrong? -- From: "William Oberman" Sent: Thursday, August 18, 2011 10:35 PM To: Subject: Re: suggester issues I tried this: package com.civicscience; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import org.apache.lucene.analysis.Token; import org.apache.solr.spelling.QueryConverter; /** * Converts the query string to a Collection of Lucene tokens. **/ public class SpellingQueryConverter extends QueryConverter { /** * Converts the original query string to a collection of Lucene Tokens. * @param original the original query string * @return a Collection of Lucene Tokens */ @Override public Collection convert(String original) { if (original == null) { return Collections.emptyList(); } Collection result = new ArrayList(); Token token = new Token(original, 0, original.length(), "word"); result.add(token); return result; } } And added it to the classpath, and now it does what I expect. will On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote: It can be done, I did that with shingles, but it's not the way it's meant to be. The main problem with suggester is that we want compound words and we never get them. I try to get "internet explorer" but when i enter in the second word, "internet e" the suggester never finds "explorer". 2011/8/18 oberman_cs I was trying to deal with the exact same issue, with the exact same results. Is there really no way to feed a phrase into the suggester (spellchecker) without it splitting the input phrase into words? -- View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: suggester issues
Sent from my iPhone On Aug 21, 2011, at 5:54 AM, "Kuba Krzemien" wrote: Finally got it working - turns out you can't just add it to the lib dir as the wiki suggests. Unfortunately the only way is adding it to solr.war. Thanks for your help. -- From: "William Oberman" Sent: Friday, August 19, 2011 5:07 PM To: Subject: Re: suggester issues Hard to say, so I'll list the exact steps I took: -Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn) -Untar and cd -ant -Wrote my class below (under a peer directory in apache-solr-3.3.0) -javac -cp ../dist/apache-solr-core-3.3.0.jar:../lucene/build/ lucene-core-3.3-SNAPSHOT.jar com/civicscience/ SpellingQueryConverter.java -jar cf cs.jar com -Unzipped solr.war (under example) -Added my cs.jar to lib (under web-inf) -Rezipped solr.war -Added: class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml -Restarted jetty And, that seemed to all work. will On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote: As far as I checked creating a custom query converter is the only way to make this work. Unfortunately I have some problems with running it - after creating a JAR with my class (Im using your source code, obviously besides package and class names) and throwing it into the lib dir I've added class="mypackage.MySpellingQueryConverter"/> to solrconfig.xml. I get a "SEVERE: org.apache.solr.common.SolrException: Error Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a org.apache.solr.spelling.QueryConverter". What am I doing wrong? -- From: "William Oberman" Sent: Thursday, August 18, 2011 10:35 PM To: Subject: Re: suggester issues I tried this: package com.civicscience; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import org.apache.lucene.analysis.Token; import org.apache.solr.spelling.QueryConverter; /** * Converts the query string to a Collection of Lucene tokens. **/ public class SpellingQueryConverter extends QueryConverter { /** * Converts the original query string to a collection of Lucene Tokens. * @param original the original query string * @return a Collection of Lucene Tokens */ @Override public Collection convert(String original) { if (original == null) { return Collections.emptyList(); } Collection result = new ArrayList(); Token token = new Token(original, 0, original.length(), "word"); result.add(token); return result; } } And added it to the classpath, and now it does what I expect. will On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote: It can be done, I did that with shingles, but it's not the way it's meant to be. The main problem with suggester is that we want compound words and we never get them. I try to get "internet explorer" but when i enter in the second word, "internet e" the suggester never finds "explorer". 2011/8/18 oberman_cs I was trying to deal with the exact same issue, with the exact same results. Is there really no way to feed a phrase into the suggester (spellchecker) without it splitting the input phrase into words? -- View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Terms.regex performance issue
Wait. Sometimes I get confused because gmail will substitute * for bolding, so in my client it looks like you're searching infix (e.g. leading and trailing wildcards). If that's the case, then your performance will always be poor, it has to enumerate all the terms in the field... If it's just bolding confusing me, then never mind Best Erick On Fri, Aug 19, 2011 at 8:27 PM, O. Klein wrote: > Terms.prefix was just to compare performance. > > The use case was terms.regex=.*query.* And as Markus pointed out, this will > prolly remain a bottleneck. > > I looked at the Suggester. But like many others I have been struggling to > make it useful. It needs a custom queryConverter to give proper suggestions, > but I havent tried this yet. > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3269628.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Too many results in dismax queries with one word
The root problem here is "This is unacceptable for my client". The first thing I'd suggest is that you work with your client and get them to define what is acceptable. You'll be forever changing things (to no good purpose) if all they can say is "that's not right". For instance, you apparently have two competing requirements: 1> try to correct users input, which inevitably increases the results returned 2> narrow the search to the "right" results. You can't have both every time! So you could try something like going with a more-restrictive search (no metaphone comparison) first and, if the results returned weren't sufficient firing the "broader" query back, without showing the too-small results first. You could work with your client and see if what they really want is just the most relevant results at the top of the list, in which case you can play with the dismax field boosts (by the way, what version of Solr are you using?) You could work with the client to understand the user experience if you use autocomplete and/or faceting etc. to guide their explorations. You could... But none of that will help unless and until you and your client can agree what is the correct behavior ahead of time Best Erick On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker) wrote: > Hi all, > > I have a database of e-commerce products (5M) and trying to build a search > solution for it. > > I have used steemer, edgengram and doublemetaphone phonetic fields for > omiting common typos in queries. It works quite good with dismax QParser > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d" > etc. For not having too many results I manipulated with `mm` parameter. But > when user type a single word like "ipad", "cannon". I always having a lot of > results (~6). This is unacceptable for my client. He would like to have > then only the `good` results. That particulary match specific query. It's > hard to acomplish for me cause of use doublemetaphone field which converts > words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic word > - APT. And then all of these words are matched fairly the same gives me > huge amount of results. Similar problems I have with other words like > "canon", "canine" and "cannon" which are KNN in phonetic way. But lexically > have different meanings: "canon" - camera, "canine" - cat food , "cannon" - > may be a misspell for canon or part of book title about cannon weapons. > > My first idea was to make a second requestHandler without searching in > *_phonetic fields. And use it for queries with only one word. But it didn't > worked cause sometimes I want to correct user even if there is only one word > and suggest him something better. Query "cannon" is a good example. I'm > fairly sure that most of the time when someone type "cannon" it would be a > typo for "canon" and I want to show user also CANON cameras. That's why I > can't use second requestHandler for one word queries. > > I'm looking for any ideas how could I change my requestHandler. > > My regular queries are: http://localhost:8983/solr/select?q=cannon > > Below I put my configuration for requestHandler and schema.xml. > > > > solrconfig.xml: > > > > *:* > dismax > > title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17 > title_ngram^0.54 > producer_name^0.9 producer_name_text^0.89 > category_path_text^0.8 category_path_phonetic^0.65 > description^0.60 description_text^0.56 > > title_text^1.1 title^1.2 description^0.3 > 3 > 0.1 > 2<100% 3<-1 5<85% > > *,score > > > > > schema.xml: > > > > > omitNorms="true" positionIncrementGap="0" /> > omitNorms="true" positionIncrementGap="0"/> > sortMissingLast="true" omitNorms="true" /> > sortMissingLast="true" omitNorms="true" /> > precisionStep="2" omitNorms="true" positionIncrementGap="0" /> > > positionIncrementGap="100"> > > > > > ignoreCase="true" > words="stopwords_pl.txt" > enablePositionIncrements="true" > /> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > > > > > > positionIncrementGap="100"> > > > > ignoreCase="true" > words="stopwords_pl.txt" > enablePositionIncrements="true" > /> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > > > > > > class="solr.TextField" > > > > ignoreCase="true" > words="stopwords_pl.txt" >
Re: Update field value in the document based on value of another field in the document
Publishing stack traces does no good unless you also tell us what version of Solr you are using. The source-file numbers do move around between versions Also, what line in your code is at the root of this chain? The very first thing I'd do is just comment out your custom code (i.e. just ahve the super.processAdd) in your code and build up from there. Some printlns might show the problem by testing, for instance, that doc is not null (I can't imagine why it would be, but it's been said that "It's not the things you don't know that'll kill you, it's the things you do know that aren't true". Best Erick On Sat, Aug 20, 2011 at 2:39 PM, bhawna singh wrote: > Now that I have set it up using UpdateProcessorChain, I am running into null > exeception. > Here is what I have- > In SolrConfig.xml > > > > > > > > > startup="lazy" > > > mychain > > > > > Here is my java code- > package mysolr; > > > import java.io.IOException; > > import org.apache.solr.common.SolrInputDocument; > import org.apache.solr.request.SolrQueryRequest; > import org.apache.solr.request.SolrQueryResponse; > import org.apache.solr.update.AddUpdateCommand; > import org.apache.solr.update.processor.UpdateRequestProcessor; > import org.apache.solr.update.processor.UpdateRequestProcessorFactory; > > public class AddConditionalFieldsFactory extends > UpdateRequestProcessorFactory > { > @Override > public UpdateRequestProcessor getInstance(SolrQueryRequest req, > SolrQueryResponse rsp, UpdateRequestProcessor next) > { > System.out.println("From customization:"); > return new AddConditionalFields(next); > } > } > > class AddConditionalFields extends UpdateRequestProcessor > { > public AddConditionalFields( UpdateRequestProcessor next) { > > super( next ); > } > > @Override > public void processAdd(AddUpdateCommand cmd) throws IOException { > SolrInputDocument doc = cmd.getSolrInputDocument(); > > Object v = doc.getFieldValue( "url" ); > if( v != null ) { > String url = v.toString(); > if( url.contains("question") ) { > doc.addField( "tierFilter", "1" ); > } > } > > // pass it up the chain > super.processAdd(cmd); > } > } > > Here is my Java code- > and I get the following error when I try to index- > Aug 20, 2011 10:48:43 AM org.apache.solr.common.SolrException log > SEVERE: java.lang.AbstractMethodError at > org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:74) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:53) > > > Any pointers please. I am using Solr 3.3 > > Thanks, > Bhawna > > On Thu, Aug 18, 2011 at 2:04 PM, simon wrote: > >> An UpdateRequestProcessor would do the trick. Look at the (rather minimal) >> documentation and code example in >> http://wiki.apache.org/solr/UpdateRequestProcessor >> >> -Simon >> >> On Thu, Aug 18, 2011 at 4:15 PM, bhawna singh >> wrote: >> >> > Hi All, >> > I have a requirement to update a certain field value depending on the >> field >> > value of another field. >> > To elaborate- >> > I have a field called 'popularity' and a field called 'URL'. I need to >> > assign popularity value depending on the domain (URL) ( I have the >> > popularity and domain mapping in a text file). >> > >> > I am using CSVRequestHandler to import the data. >> > >> > What are the suggested ways to achieve this. >> > Your quick response is much appreciated. >> > >> > Thanks, >> > Bhawna >> > >> >
Re: get update record from database using DIH
At a guess, you're not getting as many rows as you think, and commit has nothing to do with it. But that's just a guess. So the very first thing I'd do is be sure the SQL is doing what you think. There's a little-known data import debugging page that might help: blahblahblah./solr/admin/dataimport.jsp Best Erick On Sat, Aug 20, 2011 at 4:30 PM, Alexandre Sompheng wrote: > Actually I requested .../dataimport?command=delta-import&commit=true > And DIH in delta-import mode does not commit, you can se log below. My index > is quite empty, maybe 10 data rows max... It's just the beginning. > > > INFO: Starting Delta Import > > Aug 14, 2011 1:42:02 AM org.apache.solr.core.SolrCore execute > > INFO: [] webapp=/apache-solr-3.3.0 path=/dataimport > params={commit=true&command=delta-import} status=0 QTime=0 > > Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.SolrWriter > readIndexerProperties > > INFO: Read dataimport.properties > > Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder > doDelta > > INFO: Starting delta collection. > > Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > > INFO: Running ModifiedRowKey() for Entity: event > > Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 > call > > INFO: Creating a connection for entity event with URL: jdbc:mysql:// > 85.168.123.207:3306/AGENDA > > Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 > call > > INFO: Time taken for getConnection(): 865 > > Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > > INFO: Completed ModifiedRowKey for Entity: event rows obtained : 3 > > Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > > INFO: Completed DeletedRowKey for Entity: event rows obtained : 0 > > Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > > INFO: Completed parentDeltaQuery for Entity: event > > Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder > doDelta > > INFO: Delta Import completed successfully > > Aug 14, 2011 1:42:03 AM org.apache.solr.update.processor.LogUpdateProcessor > finish > > INFO: {} 0 0 > > Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder > execute > > INFO: Time taken = 0:0:1.282 > > > On 19 août 2011, at 10:39, Gora Mohanty wrote: > > On Fri, Aug 19, 2011 at 5:32 AM, Alexandre Sompheng > wrote: > > Hi guys, i try the delta import, i got logs saying that it found delta > > data to update. But it seems that the index is not updated. Amy guess > > why this happens ? Did i miss something? I'm on solr 3.3 with no > > patch. > > [...] > > Please show us the following: > * The exact URL you loaded for delta-import > * The Solr response which shows the delta documents that it found, > and the status of the delta-import. > If your index is large, and if you are running an optimise after the > delta-import (the default is to optimise), it can take some time. > Check the status: It will say "busy" if the optimise is still running. > > Regards, > Gora >
Re: Too many results in dismax queries with one word
Thanks for reply. I know that sometimes meeting all clients needs would be impossible but then client recalls that competitive (commercial) product already do that (but has other problems, like performance). And then I'm obligated to try more tricks. :/ I'm currently using Solr 3.1 but thinking about migrating to latest stable version - 3.3. You correct, to meet client needs I have also used some hacks with boosting queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer. You mentioned faceting. This is also one of my(my client?) problems. In the user interface they want to have 5 categories for products. Those 5 should be most relevance ones. When I get those with highest counts for one word queries they are most of the time "not that which should be there". For example with phrase "ipad" which actually has only 12 most relevant products in category "tablets" but phonetic APT matches also part of model name for hundreds of UPS power supplies and bath tubes . And these are on the list, not tablets. :/ But you mentioned autocomplete which is something what I haven't watched yet. I'll try with that and show it to my client. -- Rafał "RaVbaker" Piekarski. web: http://ja.ravbaker.net mail: ravba...@gmail.com jid/xmpp/aim: ravba...@gmail.com mobile: +48-663-808-481 On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson wrote: > The root problem here is "This is unacceptable for my client". The first > thing I'd suggest is that you work with your client and get them to define > what is acceptable. You'll be forever changing things (to no good purpose) > if all they can say is "that's not right". > > For instance, you apparently have two competing requirements: > 1> try to correct users input, which inevitably increases the results > returned > 2> narrow the search to the "right" results. > > You can't have both every time! > > So you could try something like going with a more-restrictive search > (no metaphone > comparison) first and, if the results returned weren't sufficient > firing the "broader" query > back, without showing the too-small results first. > > You could work with your client and see if what they really want is > just the most relevant > results at the top of the list, in which case you can play with the > dismax field boosts > (by the way, what version of Solr are you using?) > > You could work with the client to understand the user experience if > you use autocomplete > and/or faceting etc. to guide their explorations. > > You could... > > But none of that will help unless and until you and your client can > agree what is the > correct behavior ahead of time > > Best > Erick > > On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker) > wrote: > > Hi all, > > > > I have a database of e-commerce products (5M) and trying to build a > search > > solution for it. > > > > I have used steemer, edgengram and doublemetaphone phonetic fields for > > omiting common typos in queries. It works quite good with dismax QParser > > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d" > > etc. For not having too many results I manipulated with `mm` parameter. > But > > when user type a single word like "ipad", "cannon". I always having a lot > of > > results (~6). This is unacceptable for my client. He would like to > have > > then only the `good` results. That particulary match specific query. It's > > hard to acomplish for me cause of use doublemetaphone field which > converts > > words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic > word > > - APT. And then all of these words are matched fairly the same gives me > > huge amount of results. Similar problems I have with other words like > > "canon", "canine" and "cannon" which are KNN in phonetic way. But > lexically > > have different meanings: "canon" - camera, "canine" - cat food , "cannon" > - > > may be a misspell for canon or part of book title about cannon weapons. > > > > My first idea was to make a second requestHandler without searching in > > *_phonetic fields. And use it for queries with only one word. But it > didn't > > worked cause sometimes I want to correct user even if there is only one > word > > and suggest him something better. Query "cannon" is a good example. I'm > > fairly sure that most of the time when someone type "cannon" it would be > a > > typo for "canon" and I want to show user also CANON cameras. That's why I > > can't use second requestHandler for one word queries. > > > > I'm looking for any ideas how could I change my requestHandler. > > > > My regular queries are: http://localhost:8983/solr/select?q=cannon > > > > Below I put my configuration for requestHandler and schema.xml. > > > > > > > > solrconfig.xml: > > > > > > > > *:* > > dismax > > > > title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17 > > title_ngram^0.54 > > producer_name^0.9 producer_name_text^0.89 > > category_path_text^0.8 category_path
Re: Requiring multiple matches of a term
On Fri, Aug 19, 2011 at 6:26 PM, Michael Ryan wrote: > Is there a way to specify in a query that a term must match at least X times > in a document, where X is some value greater than 1? > One simple way of doing this is maybe to write a wrapper for TermQuery that only returns docs with a Term Frequency > X as far as I understand the question those terms don't have to be within a certain window right? simon > For example, I want to only get documents that contain the word "dog" three > times. I've thought that using a proximity query with an arbitrary large > distance value might do it: > "dog dog dog"~10 > And that does seem to return the results I expect. > > But when I try for more than three, I start getting unexpected result counts > as I change the proximity value: > "dog dog dog dog"~10 returns 6403 results > "dog dog dog dog"~20 returns 9291 results > "dog dog dog dog"~30 returns 6395 results > > Anyone ever do something like this and know how I can accomplish this? > > -Michael >
Re: Too many results in dismax queries with one word
Would it make sense to have a "Did you mean?" type of functionality for which you use the EdgeNGram and Metaphone filters /if/ you don't get appropriate results for the user query? So when user types "cannon" and the application notices that there are no cannons for sale in the index (0 results with standard analysis), it then makes another query with the EdgeNGram and/or Metaphone filters and come back with: Did you mean "Canon", "Canine"? Clicking on "Canon" or "Canine" would fire off a query for these terms. That way your application doesn't guess what is right, it goes back and asks the user what he wants. -sujit On Sun, 2011-08-21 at 17:19 +0200, Rafał Piekarski (RaVbaker) wrote: > Thanks for reply. I know that sometimes meeting all clients needs would be > impossible but then client recalls that competitive (commercial) product > already do that (but has other problems, like performance). And then I'm > obligated to try more tricks. :/ > > I'm currently using Solr 3.1 but thinking about migrating to latest stable > version - 3.3. > > You correct, to meet client needs I have also used some hacks with boosting > queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer. > > You mentioned faceting. This is also one of my(my client?) problems. In the > user interface they want to have 5 categories for products. Those 5 should > be most relevance ones. When I get those with highest counts for one word > queries they are most of the time "not that which should be there". For > example with phrase "ipad" which actually has only 12 most relevant products > in category "tablets" but phonetic APT matches also part of model name for > hundreds of UPS power supplies and bath tubes . And these are on the list, > not tablets. :/ > > But you mentioned autocomplete which is something what I haven't watched > yet. I'll try with that and show it to my client. > > -- > Rafał "RaVbaker" Piekarski. > > web: http://ja.ravbaker.net > mail: ravba...@gmail.com > jid/xmpp/aim: ravba...@gmail.com > mobile: +48-663-808-481 > > > On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson > wrote: > > > The root problem here is "This is unacceptable for my client". The first > > thing I'd suggest is that you work with your client and get them to define > > what is acceptable. You'll be forever changing things (to no good purpose) > > if all they can say is "that's not right". > > > > For instance, you apparently have two competing requirements: > > 1> try to correct users input, which inevitably increases the results > > returned > > 2> narrow the search to the "right" results. > > > > You can't have both every time! > > > > So you could try something like going with a more-restrictive search > > (no metaphone > > comparison) first and, if the results returned weren't sufficient > > firing the "broader" query > > back, without showing the too-small results first. > > > > You could work with your client and see if what they really want is > > just the most relevant > > results at the top of the list, in which case you can play with the > > dismax field boosts > > (by the way, what version of Solr are you using?) > > > > You could work with the client to understand the user experience if > > you use autocomplete > > and/or faceting etc. to guide their explorations. > > > > You could... > > > > But none of that will help unless and until you and your client can > > agree what is the > > correct behavior ahead of time > > > > Best > > Erick > > > > On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker) > > wrote: > > > Hi all, > > > > > > I have a database of e-commerce products (5M) and trying to build a > > search > > > solution for it. > > > > > > I have used steemer, edgengram and doublemetaphone phonetic fields for > > > omiting common typos in queries. It works quite good with dismax QParser > > > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d" > > > etc. For not having too many results I manipulated with `mm` parameter. > > But > > > when user type a single word like "ipad", "cannon". I always having a lot > > of > > > results (~6). This is unacceptable for my client. He would like to > > have > > > then only the `good` results. That particulary match specific query. It's > > > hard to acomplish for me cause of use doublemetaphone field which > > converts > > > words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic > > word > > > - APT. And then all of these words are matched fairly the same gives me > > > huge amount of results. Similar problems I have with other words like > > > "canon", "canine" and "cannon" which are KNN in phonetic way. But > > lexically > > > have different meanings: "canon" - camera, "canine" - cat food , "cannon" > > - > > > may be a misspell for canon or part of book title about cannon weapons. > > > > > > My first idea was to make a second requestHandler without searching in > > > *_phonetic fields. And use it for queries with
Re: Too many results in dismax queries with one word
I think Sujit has hit the nail on the head. Any program you try to write that tries to guess what the user *really* meant will require endless tinkering and *still* won't be right. If you only knew how annoying I find Google's attempts to "help". So perhaps concentrating on some interaction with the user, who is, after all, the only one who really knows what they want is the best approach. Best Erick On Sun, Aug 21, 2011 at 12:26 PM, Sujit Pal wrote: > Would it make sense to have a "Did you mean?" type of functionality for > which you use the EdgeNGram and Metaphone filters /if/ you don't get > appropriate results for the user query? > > So when user types "cannon" and the application notices that there are > no cannons for sale in the index (0 results with standard analysis), it > then makes another query with the EdgeNGram and/or Metaphone filters and > come back with: > > Did you mean "Canon", "Canine"? > > Clicking on "Canon" or "Canine" would fire off a query for these terms. > > That way your application doesn't guess what is right, it goes back and > asks the user what he wants. > > -sujit > > On Sun, 2011-08-21 at 17:19 +0200, Rafał Piekarski (RaVbaker) wrote: >> Thanks for reply. I know that sometimes meeting all clients needs would be >> impossible but then client recalls that competitive (commercial) product >> already do that (but has other problems, like performance). And then I'm >> obligated to try more tricks. :/ >> >> I'm currently using Solr 3.1 but thinking about migrating to latest stable >> version - 3.3. >> >> You correct, to meet client needs I have also used some hacks with boosting >> queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer. >> >> You mentioned faceting. This is also one of my(my client?) problems. In the >> user interface they want to have 5 categories for products. Those 5 should >> be most relevance ones. When I get those with highest counts for one word >> queries they are most of the time "not that which should be there". For >> example with phrase "ipad" which actually has only 12 most relevant products >> in category "tablets" but phonetic APT matches also part of model name for >> hundreds of UPS power supplies and bath tubes . And these are on the list, >> not tablets. :/ >> >> But you mentioned autocomplete which is something what I haven't watched >> yet. I'll try with that and show it to my client. >> >> -- >> Rafał "RaVbaker" Piekarski. >> >> web: http://ja.ravbaker.net >> mail: ravba...@gmail.com >> jid/xmpp/aim: ravba...@gmail.com >> mobile: +48-663-808-481 >> >> >> On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson >> wrote: >> >> > The root problem here is "This is unacceptable for my client". The first >> > thing I'd suggest is that you work with your client and get them to define >> > what is acceptable. You'll be forever changing things (to no good purpose) >> > if all they can say is "that's not right". >> > >> > For instance, you apparently have two competing requirements: >> > 1> try to correct users input, which inevitably increases the results >> > returned >> > 2> narrow the search to the "right" results. >> > >> > You can't have both every time! >> > >> > So you could try something like going with a more-restrictive search >> > (no metaphone >> > comparison) first and, if the results returned weren't sufficient >> > firing the "broader" query >> > back, without showing the too-small results first. >> > >> > You could work with your client and see if what they really want is >> > just the most relevant >> > results at the top of the list, in which case you can play with the >> > dismax field boosts >> > (by the way, what version of Solr are you using?) >> > >> > You could work with the client to understand the user experience if >> > you use autocomplete >> > and/or faceting etc. to guide their explorations. >> > >> > You could... >> > >> > But none of that will help unless and until you and your client can >> > agree what is the >> > correct behavior ahead of time >> > >> > Best >> > Erick >> > >> > On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker) >> > wrote: >> > > Hi all, >> > > >> > > I have a database of e-commerce products (5M) and trying to build a >> > search >> > > solution for it. >> > > >> > > I have used steemer, edgengram and doublemetaphone phonetic fields for >> > > omiting common typos in queries. It works quite good with dismax QParser >> > > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d" >> > > etc. For not having too many results I manipulated with `mm` parameter. >> > But >> > > when user type a single word like "ipad", "cannon". I always having a lot >> > of >> > > results (~6). This is unacceptable for my client. He would like to >> > have >> > > then only the `good` results. That particulary match specific query. It's >> > > hard to acomplish for me cause of use doublemetaphone field which >> > converts >> > > words like "apt", "opt" and "ipad" and even
Re: Terms.regex performance issue
Yeah, I was searching infix. It worked very nice for autocomplete. Made a custom QueryConverter for the Suggester so it gives proper suggestions for shingles. Will stick with that for now. Thanx for the feedback. -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many results in dismax queries with one word
Thanks very much for your advice. I think I now better understand how to make better use of solr. I have tested spellchecker and it looks like it let me to achieve better results and hopefully we will satisfy the client. In my solution I will change user query to use or not to use phonetic fields based on results from spellcheck.collation and frequency of words. If I wouldn't be sure what is better then I'll ask user through "did you mean" and log his reply to make better choices in future. Once again thanks a lot guys. This is my example of query to spellchecker: http://localhost:8983/solr/select?spellcheck=true&q=cannon&rows=0&spellcheck.collate=true&spellcheck.count=10&spellcheck.onlyMorePopular=true&spellcheck.extendedResults=on -- Rafał "RaVbaker" Piekarski. web: http://ja.ravbaker.net mail: ravba...@gmail.com jid/xmpp/aim: ravba...@gmail.com mobile: +48-663-808-481 On Sun, Aug 21, 2011 at 6:36 PM, Erick Erickson wrote: > I think Sujit has hit the nail on the head. Any program you try to write > that tries to guess what the user *really* meant will require endless > tinkering and *still* won't be right. If you only knew how annoying I > find Google's attempts to "help". > > So perhaps concentrating on some interaction with the user, who is, > after all, the only one who really knows what they want is the best > approach. > > Best > Erick > > On Sun, Aug 21, 2011 at 12:26 PM, Sujit Pal wrote: > > Would it make sense to have a "Did you mean?" type of functionality for > > which you use the EdgeNGram and Metaphone filters /if/ you don't get > > appropriate results for the user query? > > > > So when user types "cannon" and the application notices that there are > > no cannons for sale in the index (0 results with standard analysis), it > > then makes another query with the EdgeNGram and/or Metaphone filters and > > come back with: > > > > Did you mean "Canon", "Canine"? > > > > Clicking on "Canon" or "Canine" would fire off a query for these terms. > > > > That way your application doesn't guess what is right, it goes back and > > asks the user what he wants. > > > > -sujit > > > > On Sun, 2011-08-21 at 17:19 +0200, Rafał Piekarski (RaVbaker) wrote: > >> Thanks for reply. I know that sometimes meeting all clients needs would > be > >> impossible but then client recalls that competitive (commercial) product > >> already do that (but has other problems, like performance). And then I'm > >> obligated to try more tricks. :/ > >> > >> I'm currently using Solr 3.1 but thinking about migrating to latest > stable > >> version - 3.3. > >> > >> You correct, to meet client needs I have also used some hacks with > boosting > >> queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer. > >> > >> You mentioned faceting. This is also one of my(my client?) problems. In > the > >> user interface they want to have 5 categories for products. Those 5 > should > >> be most relevance ones. When I get those with highest counts for one > word > >> queries they are most of the time "not that which should be there". For > >> example with phrase "ipad" which actually has only 12 most relevant > products > >> in category "tablets" but phonetic APT matches also part of model name > for > >> hundreds of UPS power supplies and bath tubes . And these are on the > list, > >> not tablets. :/ > >> > >> But you mentioned autocomplete which is something what I haven't watched > >> yet. I'll try with that and show it to my client. > >> > >> -- > >> Rafał "RaVbaker" Piekarski. > >> > >> web: http://ja.ravbaker.net > >> mail: ravba...@gmail.com > >> jid/xmpp/aim: ravba...@gmail.com > >> mobile: +48-663-808-481 > >> > >> > >> On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson < > erickerick...@gmail.com>wrote: > >> > >> > The root problem here is "This is unacceptable for my client". The > first > >> > thing I'd suggest is that you work with your client and get them to > define > >> > what is acceptable. You'll be forever changing things (to no good > purpose) > >> > if all they can say is "that's not right". > >> > > >> > For instance, you apparently have two competing requirements: > >> > 1> try to correct users input, which inevitably increases the results > >> > returned > >> > 2> narrow the search to the "right" results. > >> > > >> > You can't have both every time! > >> > > >> > So you could try something like going with a more-restrictive search > >> > (no metaphone > >> > comparison) first and, if the results returned weren't sufficient > >> > firing the "broader" query > >> > back, without showing the too-small results first. > >> > > >> > You could work with your client and see if what they really want is > >> > just the most relevant > >> > results at the top of the list, in which case you can play with the > >> > dismax field boosts > >> > (by the way, what version of Solr are you using?) > >> > > >> > You could work with the client to understand the user experience if > >> > you use autocomplete > >> > an
Re: Terms.regex performance issue
Ah, in that case, comparing prefix and regex is an apples-to-oranges comparison. I expect regex to be slower, but a fairer comparison would be prefix to stuff* (which may be changed into a prefix enumeration for all I know). But comparing infix to prefix doesn't tell you much really Best Erick P.S. There's no reason to do anything if you have a solution that works already though. On Sun, Aug 21, 2011 at 12:56 PM, O. Klein wrote: > Yeah, I was searching infix. It worked very nice for autocomplete. > > Made a custom QueryConverter for the Suggester so it gives proper > suggestions for shingles. Will stick with that for now. > > Thanx for the feedback. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273145.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Terms.regex performance issue
Of course. Thats why I compared prefix to bla* and saw it was already a lot slower. -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273370.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Terms.regex performance issue
I see now in Suggester Wiki; Support for infix-suggestions is planned for FSTLookup (which would be the only structure to support these). -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3273711.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Requiring multiple matches of a term
> One simple way of doing this is maybe to write a wrapper for TermQuery > that only returns docs with a Term Frequency > X as far as I > understand the question those terms don't have to be within a certain > window right? Correct. Terms can be anywhere in the document. I figured term frequencies might be involved, but wasn't sure how to actually do this. > Hmmm... i would think the phrase query approach should work, but it's > totally possible that there's something odd in the way phrase queries > work that could cause a problem -- the best way to sanity test something > like this is to try a really small self contained example that you can post > for other people to try. I've been able to reduce it pretty far, but I don't have a totally self-contained example yet. I haven't tried it out yet on a stock build of Solr (I'm using 3.2 with various patches). Right now I'm inserting a few documents with a text field that contains "dog dog dog", then repeatedly running q="dog dog dog dog"~1 with the queryResultCache disabled. The query is not giving me the same results each time (!!!). Sometimes all the documents are returned, sometimes a subset is returned, and sometimes no documents are returned. So far I've traced it down to the "repeats" array in SloppyPhraseScorer.initPhrasePositions() - depending on the order of the elements in this array, the document may or may not match. I think the HashSet.toArray() call is to blame here, but I don't yet fully understand the expected behavior of the initPhrasePositions function... -Michael
Solr Join with multiple query parameters
Hi all, I am trying to use the Join feature in SOLR trunk with limited sucess. I am able to make simple searches and get the returns of documents as expected. A query such a follows works perfectly fine and as expected: http://localhost:8983/solr/core0/select?q={!join%20from=matchset_id_ss%20to=id}*:* I can then add parameters to this search and make the search as follows, and it works fine. http://localhost:8983/solr/core0/select?q={!join%20from=matchset_id_ss%20to=id}*:*&fq=status_s:completed I get filtered results of documents that are completed. The issue I am now trying to face is how do I filter the initial search of documents based on multiple conditions and then get a list of documents through the join. Here is the search I am trying to do: http://localhost:8983/solr/core0/select?start=0&q=*:*&fq=status_i:1&rows=30&fq=team_id_i:1223 This search returns everything I want as expected, now I want to apply the join statement. I have added in the join statement above to make the new link in any place I can think of, but it seems that the join statement is taking place before any of the other filters applied. The issue becomes is that the returned documents mapped by the matchset_id_ss do not have the field status_i or team_id_i, these only exist on the initial documents I am searching. Is there a way that I can apply multiple filters first, then complete the join? And if that is possible, can I then add more filters after the join? Thanks for the help, Cameron
Re: How to implement Spell Checker using Solr?
The changes for Solrconfig.xml in solr is as follows default solr.IndexBasedSpellChecker spell ./spellchecker 0.7 .0001 jarowinkler lowerfilt org.apache.lucene.search.spell.JaroWinklerDistance ./spellchecker textSpell And for the Request handler, I have incorporated the following changes: true false default false 5 true true spellcheck The same is failing while crawling. I have reveretd my code for now. But can try it once again and post the exception that I have been getting while crawling. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3274069.html Sent from the Solr - User mailing list archive at Nabble.com.