Re: solr synonyms behaviour
hossman wrote: > > This is "Issue #1" regarding trying to use query time multi word synonyms > discussed on the wiki... > >>> "The Lucene QueryParser tokenizes on white space before giving any >>> text to the Analyzer, so if a person searches for the words sea biscit >>> the analyzer will be given the words "sea" and "biscit" seperately, and >>> will not know that they match a synonym. > > on the "boosting" part of the query (where the dismax handler > automagically quote the entire input and queries it against the "pf" > fields, the synonyms do get used (because the whole input is analyzed as > one string) but in this case the phrase queries will match any of these > phrases... > >divorce dispute resolution >alternative mediation resolution >divorce mediation resolution >etc... > > ..it will *NOT* match either of these phrases... > >divorce mediation >alternative dispute resolution > > ...because the SynonymFilter has no way to tell the query parser which > words should be linked to which other words when building up the phrase > query. > > This is "Issue #2" regarding trying to use query time multi word synonyms > discussed on the wiki... > >>> Phrase searching (ie: "sea biscit") will cause the QueryParser to pass >>> the entire string to the analyzer, but if the SynonymFilter is >>> configured to expand the synonyms, then when the QueryParser gets the >>> resulting list of tokens back from the Analyzer, it will construct a >>> MultiPhraseQuery that will not have the desired effect. This is because >>> of the limited mechanism available for the Analyzer to indicate that >>> two terms occupy the same position: there is no way to indicate that a >>> "phrase" occupies the same position as a term. For our example the >>> resulting MultiPhraseQuery would be "(sea | sea | seabiscuit) (biscuit >>> | biscit)" which would not match the simple case of "seabisuit" >>> occuring in a document > > : I have the synonym filter only at query time coz i can't re-index data > (or > : portion of data) everytime i add a synonym and a couple of other > reasons. > > Use cases like yours will *never* work as a query time synonym ... hence > all of the information about multi-word synonyms and the caveats about > using them in the wiki... > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter > > > -Hoss > > > We have a very similar problem, and want to make sure that this is hopeless with Solr before we try something else... I have a synonyms.txt file similar to the following: bar=>bar, club club=>club, bar, night club ... A search for 'bar' returns the exact results we want: anything with 'bar' or 'club' in the name. However, a search for 'club' produces very strange results: name:"(club bar night) club" Knowing the Lucene struggles with multi-word query-time synonyms, my question is, does this also affect index-time synonyms? What other alternatives do we have if we require there to be multiple word synonyms? -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18349953.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr synonyms behaviour
matt connolly wrote: > > > swarag wrote: >> >> Knowing the Lucene struggles with multi-word query-time synonyms, my >> question is, does this also affect index-time synonyms? What other >> alternatives do we have if we require there to be multiple word synonyms? >> > > No the multiple word problem doesn't happen with index synonyms, only > query synonyms. > > See: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46 > > I ended up using index time synonyms, but ideally, I'd like to see a > filter factory that does something like the SynsExpand tool does (which > was written for lucene, not solr). > I've tried this and it doesn't seem to work. Here are the basics of my config: ... Synonyms for queryTime is off Here is a basic example of some synonyms in my synonyms.txt: club=>club,bar,night cabaret bar=>bar,club As you can see, a search for 'bar' will return any documents with 'bar' or 'club' in the name. This works fine. However, a search for 'club' SHOULD return any documents with 'club', 'bar' or 'night cabaret' in the name, but it does not. It only returns 'bar' and 'club'. Interestingly, a search for 'night cabaret' gives me all 'night cabaret's, 'bar's and 'club's...which is quite unexpected since I'm using uni-directional synonym config (using the => symbol) Does your config give you my desired behavior? -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18469995.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr synonyms behaviour
matt connolly wrote: > > You won't have the multiple word problem if you use synonyms at index time > instead of query time. > > > swarag wrote: >> >> Here is a basic example of some synonyms in my synonyms.txt: >> club=>club,bar,night cabaret >> bar=>bar,club >> >> As you can see, a search for 'bar' will return any documents with 'bar' >> or 'club' in the name. This works fine. However, a search for 'club' >> SHOULD return any documents with 'club', 'bar' or 'night cabaret' in the >> name, but it does not. It only returns 'bar' and 'club'. >> >> Interestingly, a search for 'night cabaret' gives me all 'night >> cabaret's, 'bar's and 'club's...which is quite unexpected since I'm using >> uni-directional synonym config (using the => symbol) >> >> Does your config give you my desired behavior? >> > > Is there something I am missing here? This is an excerpt from my schema.xml: To my understanding, this means I am using synonyms at index time and NOT query time. And yet, I am still having these problems with synonyms. -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18471922.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr synonyms behaviour
Yonik Seeley wrote: > > On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]> > wrote: >> To my understanding, this means I am using synonyms at index time and NOT >> query time. And yet, I am still having these problems with synonyms. > > Can you give a specific example? Use debugQuery=true to see what the > resulting query is. > You can also use the admin analysis page to see what the output of the > index and query analyzers. > > -Yonik > > So it sounds like using the '=>' operator for synonyms that may or may not contain multiple words causes problems. So I changed my synonyms.txt to the following: club,bar,night cabaret In schema.xml, I now have the following: As you can see, 'night cabaret' is my only multi-word synonym term. Searches for 'bar' and 'club' now behave as expected. However, if I search for JUST 'night' or JUST 'cabaret', it looks like it is still using the synonyms 'bar' and 'club', which is not what is desired. I only want 'bar' and 'club' to be returned if a search for the complete 'night cabaret' is submitted. Since query-time synonyms is turned "off", the resulting parsedquery_toString is simply "name:night", "name:cabaret", etc... Thanks! -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18476205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr synonyms behaviour
swarag wrote: > > > Yonik Seeley wrote: >> >> On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]> >> wrote: >>> To my understanding, this means I am using synonyms at index time and >>> NOT >>> query time. And yet, I am still having these problems with synonyms. >> >> Can you give a specific example? Use debugQuery=true to see what the >> resulting query is. >> You can also use the admin analysis page to see what the output of the >> index and query analyzers. >> >> -Yonik >> >> > > So it sounds like using the '=>' operator for synonyms that may or may not > contain multiple words causes problems. So I changed my synonyms.txt to > the following: > > club,bar,night cabaret > > In schema.xml, I now have the following: > positionIncrementGap="100"> > > > ignoreCase="true" expand="true"/> > words="stopwords.txt" enablePositionIncrements="true"/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > As you can see, 'night cabaret' is my only multi-word synonym term. > Searches for 'bar' and 'club' now behave as expected. However, if I > search for JUST 'night' or JUST 'cabaret', it looks like it is still using > the synonyms 'bar' and 'club', which is not what is desired. I only want > 'bar' and 'club' to be returned if a search for the complete 'night > cabaret' is submitted. > > Since query-time synonyms is turned "off", the resulting > parsedquery_toString is simply "name:night", "name:cabaret", etc... > > Thanks! > We are still having problems. Searches for single words that are part of a multi-word synonym seem to be affected by the synonyms, when they should not. Anyone else experience this? If not, would you mind explaining your config and the format of your synonyms.txt file? -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18660135.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr synonyms behaviour
Hi Laurent Laurent Gilles wrote: > > Hi, > > I was faced with the same issues reguarding multiwords synonyms > Let's say a synonyms list like: > > club, bar, night cabaret > > Now if we have a document containing "club", with the default synonyms > filter behaviour with expand=true, we will end up in the lucene index with > a > document containing "club|bar|night cabaret". > So if the user search for "night", the query-time will search for "night" > in > the index and will match our document since it had been "enriched" @ > index-time, and it really contains the token "night". > > The only valid solution I've founded was to create a field-type > exclusively > used for synonyms search where: > > @IndexTime > ignoreCase="true" expand="false" /> > @QueryTime > ignoreCase="true" expand="false" /> > > And with a customised synonyms file that looks like: > > SYN_ID_1, club, bar, night cabaret > > So for our document containing "club", the synonym filter at index time > with > expand=false will replace every matching token/expression in the document > with the SYN_ID_1. > > And at query time, when an user search for "night", since "night" is not > alone in synonyms definition, it will not be matched, even by "normal" > search, because every document containing "club" or "bar" would have been > "enriched" with "SYN_ID_1" and NOT with "club|bar|night cabaret", so the > final indexed document will not contains isolated token from synonyms > expression that risks to be matched later without notice. > > In order to match our document containing "club", the user HAVE TO type > the > entire expression "night cabaret", and not only part of the expression. > > > Of course, as I said before, this field was exclusively used for synonym > matching, so it requires another field for normal full-text-stemmed search > to add normal results, this approach give us the opportunity to setup > Boosting separately on full-text-stemmed search VS synonyms search, let's > say : > > "title_stem":"club"^100 OR "title_syns":"club"^10 > > I hope to have been clear, even if I dont believe to.. Fact is this > approach have fixed your problem, since we didn't what synonym matching if > the user only types part of synonymic expression. > > Regards, > Laurent > > This has seemed to solve our problem. Thank you very much for your help. Once we have our environment setup and all of our data indexed, it may even provide an extra 'bonus' to be able to add different weights/boosts for the different fields. Now, not to be too greedy, but I am wondering if there is a way to utilize this technique for "Explicit synonym matching" (i.e. synonym mappings that use the '=>' operator). For example, we may have a couple mappings like the following: night club=>club, bar swim club=>club, team As you can see, both night clubs and swim clubs are clubs, but are not necessarily equivalent with the term "club". It would be nice to be able to search for "night club" and only see results for "clubs" and "bars", but not necessarily "teams", which otherwise, would show up in the results if we use Equivalent synonyms. Just wondering if you have been able to do this as well. Again, thank you for your help! -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18703520.html Sent from the Solr - User mailing list archive at Nabble.com.
Quick shards question
I'm currently looking through the source, but just wanted to verify how shards work. If a request is made to: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr Does the 8983 instance of Solr make an http request to both 7574 AND 8983 to search? Or does it know that itself is 8983 and only make the http request to 7574 while running the search on itself locally? The source code seems to tell me that it uses the former (makes an http request to itself), but I just wanted to make sure. -- View this message in context: http://www.nabble.com/Quick-shards-question-tp18724604p18724604.html Sent from the Solr - User mailing list archive at Nabble.com.
Lower Case Filter Factory
Hi, I am using the basic text field in schema.xml. Here is an excerpt. and the fieldType text is as follows: When I query: http://localhost:8983/solr/select?q=p* I get results back, but when I query as http://localhost:8983/solr/select?q=P* I get no results. Is there anything wrong im doing? Thanks, Swarag -- View this message in context: http://www.nabble.com/Lower-Case-Filter-Factory-tp18930459p18930459.html Sent from the Solr - User mailing list archive at Nabble.com.
RAM Based Index for Solr
In Lucene there is a Ram Based Index "org.apache.lucene.store.RAMDirectory". Is there a way to setup my index in solr to use a RAMDirectory? -- View this message in context: http://www.nabble.com/RAM-Based-Index-for-Solr-tp16166036p16166036.html Sent from the Solr - User mailing list archive at Nabble.com.
Master Slave Replication
I want to know if we can use index replication when we have segmented indexes over multiple solr instances? -- View this message in context: http://www.nabble.com/Master-Slave-Replication-tp16293553p16293553.html Sent from the Solr - User mailing list archive at Nabble.com.
Number of docs per segments
I have about 15 millions documents totalling 27GB I would like to know how many segments I would need split them into. I am trying to achieve a qps of 100? -- View this message in context: http://www.nabble.com/Number-of-docs-per-segments-tp16538528p16538528.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: restrictions on distributed search
So does it mean that Solr doesnt support QueryElevation (boost values) wouldnt work on a distributed search? Koji Sekiguchi-2 wrote: > > Thank you, Yonik! > > Koji >> - doesn't currently support date faceting >> - currently only supports sorted field facets >> >> > > -- View this message in context: http://www.nabble.com/restrictions-on-distributed-search-tp16528122p16540191.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Number of docs per segments
I have a 8 Gigs or RAM 4 CPU's 2.4 GHz eachIs this the information you were looking for? Otis Gospodnetic wrote: > > You'll need to provide more information about your environment and index > if you want guesstimates. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: swarag <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, April 7, 2008 2:31:55 PM > Subject: Number of docs per segments > > > I have about 15 millions documents totalling 27GB I would like to know > how many segments I would need split them into. I am trying to achieve a > qps > of 100? > -- > View this message in context: > http://www.nabble.com/Number-of-docs-per-segments-tp16538528p16538528.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- View this message in context: http://www.nabble.com/Number-of-docs-per-segments-tp16538528p16565488.html Sent from the Solr - User mailing list archive at Nabble.com.
Distributed Search
Hi, I am trying to search through a distributed index and when I enter this link: http://wil1devsch1.cs.tmcs:8983/select?shards=wil1devsch1.cs.tmcs:8983,wil1devsch1.cs.tmcs:8080&q=pizza But it always gives me results from the index stored on 8983 and not on 8080. Is there anything wrong in what I am doing??? -- View this message in context: http://www.nabble.com/Distributed-Search-tp16577204p16577204.html Sent from the Solr - User mailing list archive at Nabble.com.
Distributed Search Caching
hey, I have a distributed search environment with one server hitting 3 shards. for Example: http://server1.cs.tmcs:15100/solr/search/?q=starbucks&shards=server1.cs.tmcs:8983/solr,server2.cs.tmcs:8983/solr,server3.cs.tmcs:8983/solr&collapse.field=locChainId So, where is the cache stored? Is is distributed on the 3 servers or is it on server1.cs.tmcs:15100? -- View this message in context: http://www.nabble.com/Distributed-Search-Caching-tp16851547p16851547.html Sent from the Solr - User mailing list archive at Nabble.com.