Doing spatial search on multiple location points
Hi, I am trying to find out if solr supports doing a spatial search on multiple location points. Basically, while querying solr, I will be giving multiple lat-long points and solr will be returning documents which are closer to any of the given points. If this is not possible, is there any way to make it work without hitting solr for each of the lat-long and then collating results. Thanks in advance. -- Varun Gupta
Re: Doing spatial search on multiple location points
Hi David, Thanks for the quick reply. As I haven't migrated to 4.7 (I am still using 4.6), I tested using OR clause with multiple geofilt query based phrases and it seems to be working great. But I have one more question: How do I boost the score of the matching documents based on geodist? How will I get the geodist based on the closest matching lat-long point. Thanks in advance. -- Varun Gupta On Mon, Mar 17, 2014 at 7:27 PM, Smiley, David W. wrote: > Absolutely. The most straight-forward approach is to use the default > query parser comprised of OR clauses of geofilt query parser based > clauses. Another way to do it in Solr 4.7 that is probably faster is to > use WKT with the custom "buffer" extension: > myLocationRptField:"BUFFER(MULTIPOINT(x y, x y, x y, x y), d) > distErrPct=0" (whereas 'd' is distance in degrees, not km). > > ~ David > > On 3/17/14, 9:28 AM, "Varun Gupta" wrote: > > >Hi, > > > >I am trying to find out if solr supports doing a spatial search on > >multiple > >location points. Basically, while querying solr, I will be giving multiple > >lat-long points and solr will be returning documents which are closer to > >any of the given points. > > > >If this is not possible, is there any way to make it work without hitting > >solr for each of the lat-long and then collating results. > > > >Thanks in advance. > > > >-- > >Varun Gupta > >
Using ExternalFileField on SolrCloud
Hi, I am trying to use ExternalFileField on Solr 4.6 running on SolrCloud for the purpose of changing the document score based on a frequently changed field. According to the documentation, the external file needs to be present in the "data" folder of the collection. I am confused over here on where should I upload the external file on zookeeper so that the file will end up in the "data" folder? I can see "/configs/" and "/collections/" in my zookeeper instance. Am I right in trying to propagate the external file using zookeeper or should I be looking into some other way to sync the file to all solr instances. -- Thanks Varun Gupta
Getting min and max of a solr field for each group while doing field collapsing/result grouping
Hi, I am using SolrCloud for getting results grouped by a particular field. Now, I also want to get min and max value for a particular field for each group. For example, if I am grouping results by city, then I also want to get the minimum and maximum price for each city. Is this possible to do with Solr. Thanks in Advance! -- Varun Gupta
Re: Getting min and max of a solr field for each group while doing field collapsing/result grouping
Hi Ahmet, Thanks for the information! But as per Solr documentation, group.truncate is not supported in distributed searches and I am looking for a solution that can work on SolrCloud. -- Varun Gupta On Thu, May 1, 2014 at 4:12 PM, Ahmet Arslan wrote: > Hi Varun, > > I think you can use group.truncate=true with stats component > http://wiki.apache.org/solr/StatsComponent > > > If true, facet counts are based on the most relevant document of each > group matching the query. Same applies for StatsComponent. Default is > false. Solr3.4 Supported from Solr 3.4 and up. > > On Thursday, May 1, 2014 12:30 PM, Varun Gupta > wrote: > > Hi, > > I am using SolrCloud for getting results grouped by a particular field. > Now, I also want to get min and max value for a particular field for each > group. For example, if I am grouping results by city, then I also want to > get the minimum and maximum price for each city. > > Is this possible to do with Solr. > > Thanks in Advance! > > -- > Varun Gupta >
Add a new replica to SolrCloud
Hi, I am currently using Solr 4.7.2 and have SolrCloud setup running on 2 servers with number of shards as 2, replication factor as 2 and mas shards per node as 4. Now, I want to add another server to the SolrCloud as a replica. I can see Collection API to add a new replica but that was added in Solr 4.8. Is there some way to add a new replica in Solr 4.7.2? -- Thanks Varun Gupta
How do I this in Solr?
Hi, I have lot of small documents (each containing 1 to 15 words) indexed in Solr. For the search query, I want the search results to contain only those documents that satisfy this criteria "All of the words of the search result document are present in the search query" For example: If I have the following documents indexed: "nokia n95", "GPS", "android", "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" If I search with the text "samsung andriod GPS", search results should only conain "samsung", "GPS", "andriod" and "samsung andriod". Is there a way to do this in Solr. -- Thanks Varun Gupta
Re: How do I this in Solr?
Thanks everybody for the inputs. Looks like Steven's solution is the closest one but will lead to performance issues when the query string has many terms. I will try to implement the two filters suggested by Steven and see how the performance matches up. -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) wrote: > I think you have to write a "yet exact match" handler yourself (I mean yet > cause it's not quite exact match we normally know). Steve's answer is quite > near your request. You can do further work based on his solution. > > At the last step, I'll suggest you eat up all blank within query string and > query result, respevtively & only returns those results that has equal > string length as the query string's. > > For example, giving: > *query string = "Samsung with GPS" > *query results: > resutl 1 = "Samsung has lots of mobile with GPS" > result 2 = "with GPS Samsng" > result 3 = "GPS mobile with vendors, such as Sony, Samsung" > > they become: > *query result = "SamsungwithGPS" (length =14) > *query results: > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) > result 2 = "withGPSSamsng" (length =14) > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43) > > so result 2 matches your request. > > In this way, you can avoid case-sensitive, word-order-rearrange load of > works. Furthermore, you can do refined work, such as remove white > characters, etc. > > Scott @ Taiwan > > > - Original Message - From: "Varun Gupta" > > To: > Sent: Tuesday, October 26, 2010 9:07 PM > > Subject: How do I this in Solr? > > > Hi, >> >> I have lot of small documents (each containing 1 to 15 words) indexed in >> Solr. For the search query, I want the search results to contain only >> those >> documents that satisfy this criteria "All of the words of the search >> result >> document are present in the search query" >> >> For example: >> If I have the following documents indexed: "nokia n95", "GPS", "android", >> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" >> >> If I search with the text "samsung andriod GPS", search results should >> only >> conain "samsung", "GPS", "andriod" and "samsung andriod". >> >> Is there a way to do this in Solr. >> >> -- >> Thanks >> Varun Gupta >> >> > > > > > > > %<&b6G$J0T.'$$'d(l/f,r!C > Checked by AVG - www.avg.com > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 > 14:34:00 > >
Re: How do I this in Solr?
Toke, the search query will contain 4-5 words on an average (excluding the stopwords). Mike, I don't care about the result count. Excluding the terms at the client side may be a good idea. Is there any way to alter scoring such that the docs containing only the searched-for terms are shown first? Can I use term frequency to do such kind of thing? -- Thanks Varun Gupta On Wed, Oct 27, 2010 at 7:13 PM, Mike Sokolov wrote: > Yes I missed that requirement (as Steven also pointed out in a private > e-mail). I now agree that the combinatorics are required. > > Another possibility to consider (if the queries are large, which actually > seems unlikely) is to use the default behavior where all terms are optional, > sort by relevance, and truncate the result list on the client side after > some unwanted term is found. I *think* the scoring should find only docs > with the searched-for terms first, although if there are a lot of repeated > terms maybe not? Also result counts will be screwy. > > -Mike > > > On 10/27/2010 09:34 AM, Toke Eskildsen wrote: > >> That does not work either as it requires that all the terms in the query >> are present in the document. The original poster did not state this >> requirement. On the contrary, his examples were mostly single-word >> matches, implying an OR-search at the core. >> >> The query-explosion still seems like the only working idea. Maybe Varun >> could comment on the maximum numbers of terms that his queries will >> contain? >> >> Regards, >> Toke Eskildsen >> >> On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote: >> >> >>> Right - my point was to combine this with the previous approaches to >>> form a query like: >>> >>> samsung AND android AND GPS AND word_count:3 >>> >>> in order to exclude documents containing additional words. This would >>> avoid the combinatoric explosion problem otehrs had alluded to earlier. >>> Of course this would fail because android is "mis-" spelled :) >>> >>> -Mike >>> >>> On 10/27/2010 08:45 AM, Steven A Rowe wrote: >>> >>> >>>> I'm pretty sure the word-count strategy won't work. >>>> >>>> >>>> >>>> >>>>> If I search with the text "samsung andriod GPS", search results >>>>> should only conain "samsung", "GPS", "andriod" and "samsung andriod". >>>>> >>>>> >>>>> >>>> Using the word-count strategy, a document containing "samsung andriod >>>> PDQ" would be a hit, but Varun doesn't want it, because it contains a word >>>> that is not in the query. >>>> >>>> Steve >>>> >>>> >>>> >>>> >>>>> -Original Message- >>>>> From: Michael Sokolov [mailto:soko...@ifactory.com] >>>>> Sent: Wednesday, October 27, 2010 7:44 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: RE: How do I this in Solr? >>>>> >>>>> You might try adding a field containing the word count and making sure >>>>> that >>>>> matches the query's word count? >>>>> >>>>> This would require you to tokenize the query and document yourself, >>>>> perhaps. >>>>> >>>>> -Mike >>>>> >>>>> >>>>> >>>>> >>>>>> -Original Message- >>>>>> From: Varun Gupta [mailto:varun.vgu...@gmail.com] >>>>>> Sent: Tuesday, October 26, 2010 11:26 PM >>>>>> To: solr-user@lucene.apache.org >>>>>> Subject: Re: How do I this in Solr? >>>>>> >>>>>> Thanks everybody for the inputs. >>>>>> >>>>>> Looks like Steven's solution is the closest one but will lead >>>>>> to performance issues when the query string has many terms. >>>>>> >>>>>> I will try to implement the two filters suggested by Steven >>>>>> and see how the performance matches up. >>>>>> >>>>>> -- >>>>>> Thanks >>>>>> Varun Gupta >>>>>> >>>>>> >>>>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>&g
Re: How do I this in Solr?
I haven't been able to work on it because of some other commitments. The MemoryIndex approach seems promising. Only thing I will have to check is the memory requirement as I have close to 2 million documents. Will let you know if I can make it work. Thanks a lot! -- Varun Gupta On Sat, Nov 6, 2010 at 3:48 AM, Steven A Rowe wrote: > Hi Varun, > > On 10/26/2010 at 11:26 PM, Varun Gupta wrote: > > I will try to implement the two filters suggested by Steven and see how > > the performance matches up. > > Have you made any progress? > > I was thinking about your use case, and it occurred to me that you could > get what you want by reversing the problem, using Lucene's MemoryIndex < > http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html>. > (As far as I can tell, this functionality -- i.e. standing queries a.k.a. > routing a.k.a. filtering -- is not present in Solr.) > > You can load your query (as a document) into a MemoryIndex, and then use > each of your documents to query against it, something like (untested!): > >Map documents = new HashMap(); >Analyzer analyzer = new WhitespaceAnalyzer(); >QueryParser parser = new QueryParser("content", analyzer); >parser.setDefaultOperator(QueryParser.Operator.AND); >documents.put("ID001", parser.parse("nokia n95")); >documents.put("ID002", parser.parse("GPS")); >documents.put("ID003", parser.parse("android")); >documents.put("ID004", parser.parse("samsung")); > documents.put("ID005", parser.parse("samsung android")); > documents.put("ID006", parser.parse("nokia android")); > documents.put("ID007", parser.parse("mobile with GPS")); > >MemoryIndex index = new MemoryIndex(); >index.addField("content", "samsung with GPS", analyzer); > >for (Map.Entry entry : documents.entrySet()) { > Query query = entry.getValue(); > if (index.search(query) > 0.0f) { >String docId = entry.getKey(); >// Do something with the hits here ... > } >} > > In the above example, the documents "samsung", "GPS", "android" and > "samsung android" would be hits, and the other documents would not be, just > as you wanted. > > MemoryIndex is designed to be very fast for this kind of usage, so even > 100's of thousands of documents should be feasible. > > Steve > > > -Original Message- > > From: Varun Gupta [mailto:varun.vgu...@gmail.com] > > Sent: Tuesday, October 26, 2010 11:26 PM > > To: solr-user@lucene.apache.org > > Subject: Re: How do I this in Solr? > > > > Thanks everybody for the inputs. > > > > Looks like Steven's solution is the closest one but will lead to > > performance > > issues when the query string has many terms. > > > > I will try to implement the two filters suggested by Steven and see how > > the > > performance matches up. > > > > -- > > Thanks > > Varun Gupta > > > > > > On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) > > wrote: > > > > > I think you have to write a "yet exact match" handler yourself (I mean > > yet > > > cause it's not quite exact match we normally know). Steve's answer is > > quite > > > near your request. You can do further work based on his solution. > > > > > > At the last step, I'll suggest you eat up all blank within query string > > and > > > query result, respevtively & only returns those results that has equal > > > string length as the query string's. > > > > > > For example, giving: > > > *query string = "Samsung with GPS" > > > *query results: > > > resutl 1 = "Samsung has lots of mobile with GPS" > > > result 2 = "with GPS Samsng" > > > result 3 = "GPS mobile with vendors, such as Sony, Samsung" > > > > > > they become: > > > *query result = "SamsungwithGPS" (length =14) > > > *query results: > > > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) > > > result 2 = "withGPSSamsng" (length =14) > > > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43) > > > > > > so result 2 matches your request. > > > > > > In this way, you can avoi
Showing few results for each category (facet)
Hi, I am looking for a way to do the following in solr: When somebody does a search, I want to show results by category (facet) such that I display 5 results from each category (along with showing the total number of results in each category which I can always do using the facet search). This is kind of an overview of all the search results and user can click on the category to see all the results pertaining to that category (normal facet search with filter). One way that I can think of doing this is by making as many queries as there are categories and show these results under each category. But this will be very inefficient. Is there any way I can do this ? Thanks & Regards, Varun Gupta
Re: Showing few results for each category (facet)
Thanks Matt!! I will take a look at the patch for field collapsing. Thanks Marian for pointing that out. If the field collapse does not work then I will have to rely on solr caching. Thanks, Varun Gupta On Wed, Sep 30, 2009 at 1:44 AM, Matt Weber wrote: > So, you want to display 5 results from each category and still know how > many results are in each category. This is a perfect situation for the > field collapsing patch: > > https://issues.apache.org/jira/browse/SOLR-236 > http://wiki.apache.org/solr/FieldCollapsing > > Here is how I would do it. > > Add a field to your schema called category or whatever. Then while > indexing you populate that field with whatever category the document belongs > in. While executing a search, collapse the results on that field with a max > collapse of 5. This will give you at most 5 results per category. Now, at > the same time enable faceting on that field and DO NOT use the collapsing > parameter to recount the facet vales. This means that the facet counts will > be reflect the non-collapsed results. This facet should only be used to get > the count for each category, not displayed to the user. On your search > results page that gets the collapsed results, you can put a link that says > "Show all X results from this category" where X is the value you pull out of > the facet. When a user clicks that link you basically do the same search > with field collapsing disabled, and a filter query on the specific category > they want to see, for example: &fq=category:people. > > Hope this helps. > > Thanks, > > Matt Weber > > > On Sep 29, 2009, at 4:55 AM, Marian Steinbach wrote: > > On Tue, Sep 29, 2009 at 11:36 AM, Varun Gupta >> wrote: >> >>> ... >>> >>> One way that I can think of doing this is by making as many queries as >>> there >>> are categories and show these results under each category. But this will >>> be >>> very inefficient. Is there any way I can do this ? >>> >> >> >> Hi Varun! >> >> I think that doing multiple queries doesn't have to be inefficient, >> since Solr caches subsequent queries for the same term and facets. >> >> Imagine this as your first query: >> - q: xyz >> - facets: myfacet >> >> and this as a second query: >> - q:xyz >> - fq: myfacet=a >> >> Compared to the first query, the second query will be very fast, since >> all the hard work ahs been done in query one and then cached. >> >> At least that's my understanding. Please correct me if I'm wrong. >> >> Marian >> > >
SpellCheck Index not building
Hi, I am using Solr 1.3 for spell checking. I am facing a strange problem of spell checking index not been generated. When I have less number of documents (less than 1000) indexed then the spell check index builds, but when the documents are more (around 40K), then the index for spell checking does not build. I can see the directory for spell checking build and there are two files under it: segments_3 & segments.gen I am using the following query to build the spell checking index: /select params={spellcheck=true&start=0&qt=contentsearch&wt=xml&rows=0&spellcheck.build=true&version=2.2 In the logs I see: INFO: [] webapp=/solr path=/select params={spellcheck=true&start=0&qt=contentsearch&wt=xml&rows=0&spellcheck.build=true&version=2.2} hits=37467 status=0 QTime=44 Please help me solve this problem. Here is my configuration: *schema.xml:* *solrconfig.xml:* dismax false false 5 true jarowinkler spellcheck textSpell a_spell a_spell ./spellchecker_a_spell 0.7 jarowinkler a_spell org.apache.lucene.search.spell.JaroWinklerDistance ./spellchecker_a_spell 0.7 -- Thanks Varun Gupta
Re: SpellCheck Index not building
No, there are no exceptions in the logs. -- Thanks Varun Gupta On Tue, Oct 13, 2009 at 8:46 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta > wrote: > > > Hi, > > > > I am using Solr 1.3 for spell checking. I am facing a strange problem of > > spell checking index not been generated. When I have less number of > > documents (less than 1000) indexed then the spell check index builds, but > > when the documents are more (around 40K), then the index for spell > checking > > does not build. I can see the directory for spell checking build and > there > > are two files under it: segments_3 & segments.gen > > > > > It seems that you might be running out of memory with a larger index. Can > you check the logs to see if it has any exceptions recorded? > > -- > Regards, > Shalin Shekhar Mangar. >
Results after using Field Collapsing are not matching the results without using Field Collapsing
Hi, I have documents under 6 different categories. While searching, I want to show 3 documents from each category along with a link to see all the documents under a single category. I decided to use field collapsing so that I don't have to make 6 queries (one for each category). Currently I am using the field collapsing patch uploaded on 29th Nov. Now, the results that are coming after using field collapsing are not matching the results for a single category. For example, for category C1, I am getting results R1, R2 and R3 using field collapsing, but after I see results only from the category C1 (without using field collapsing) these results are nowhere in the first 10 results. Am I doing something wrong or using the field collapsing for the wrong feature? I am using the following field collapsing parameters while querying: collapse.field=category collapse.facet=before collapse.threshold=3 -- Thanks Varun Gupta
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
Hi Martijn, I am not sending the collapse parameters for the second query. Here are the queries I am using: *When using field collapsing (searching over all categories):* spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch categories is represented as the field "ctype" above. *Without using field collapsing:* spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch I append "&fq=ctype:1" to the above queries when trying to get results for a particular category. -- Thanks Varun Gupta On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen < martijn.is.h...@gmail.com> wrote: > Hi Varun, > > Can you send the whole requests (with params), that you send to Solr > for both queries? > In your situation the collapse parameters only have to be used for the > first query and not the second query. > > Martijn > > 2009/12/10 Varun Gupta : > > Hi, > > > > I have documents under 6 different categories. While searching, I want to > > show 3 documents from each category along with a link to see all the > > documents under a single category. I decided to use field collapsing so > that > > I don't have to make 6 queries (one for each category). Currently I am > using > > the field collapsing patch uploaded on 29th Nov. > > > > Now, the results that are coming after using field collapsing are not > > matching the results for a single category. For example, for category C1, > I > > am getting results R1, R2 and R3 using field collapsing, but after I see > > results only from the category C1 (without using field collapsing) these > > results are nowhere in the first 10 results. > > > > Am I doing something wrong or using the field collapsing for the wrong > > feature? > > > > I am using the following field collapsing parameters while querying: > > collapse.field=category > > collapse.facet=before > > collapse.threshold=3 > > > > -- > > Thanks > > Varun Gupta > > > > > > -- > Met vriendelijke groet, > > Martijn van Groningen >
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
Here is the field type configuration of ctype: In solrconfig.xml, this is how I am enabling field collapsing: Apart from this, I made no changes in solrconfig.xml for field collapse. I am currently not using the field collapse cache. I have applied the patch on the Solr 1.4 build. I am not using the latest solr nightly build. Can that cause any problem? -- Thanks Varun Gupta On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen < martijn.is.h...@gmail.com> wrote: > I tried to reproduce a similar situation here, but I got the expected > and correct results. Those three documents that you saw in your first > search result should be the first in your second search result (unless > the index changes or the sort changes ) when fq on that specific > category. I'm not sure what is causing this problem. Can you give me > some more information like the field type configuration for the ctype > field and how have configured field collapsing? > > I did find another problem to do with field collapse caching. The > collapse.threshold or collapse.maxdocs parameters are not taken into > account when caching, which is off course wrong because they do matter > when collapsing. Based on the information you have given me this > caching problem is not the cause of the situation you have. I will > update the patch that fixes this problem shortly. > > Martijn > > 2009/12/10 Varun Gupta : > > Hi Martijn, > > > > I am not sending the collapse parameters for the second query. Here are > the > > queries I am using: > > > > *When using field collapsing (searching over all categories):* > > > spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch > > > > categories is represented as the field "ctype" above. > > > > *Without using field collapsing:* > > > spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch > > > > I append "&fq=ctype:1" to the above queries when trying to get results > for a > > particular category. > > > > -- > > Thanks > > Varun Gupta > > > > > > On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen < > > martijn.is.h...@gmail.com> wrote: > > > >> Hi Varun, > >> > >> Can you send the whole requests (with params), that you send to Solr > >> for both queries? > >> In your situation the collapse parameters only have to be used for the > >> first query and not the second query. > >> > >> Martijn > >> > >> 2009/12/10 Varun Gupta : > >> > Hi, > >> > > >> > I have documents under 6 different categories. While searching, I want > to > >> > show 3 documents from each category along with a link to see all the > >> > documents under a single category. I decided to use field collapsing > so > >> that > >> > I don't have to make 6 queries (one for each category). Currently I am > >> using > >> > the field collapsing patch uploaded on 29th Nov. > >> > > >> > Now, the results that are coming after using field collapsing are not > >> > matching the results for a single category. For example, for category > C1, > >> I > >> > am getting results R1, R2 and R3 using field collapsing, but after I > see > >> > results only from the category C1 (without using field collapsing) > these > >> > results are nowhere in the first 10 results. > >> > > >> > Am I doing something wrong or using the field collapsing for the wrong > >> > feature? > >> > > >> > I am using the following field collapsing parameters while querying: > >> > collapse.field=category > >> > collapse.facet=before > >> > collapse.threshold=3 > >> > > >> > -- > >> > Thanks > >> > Varun Gupta > >> > > >> > >> > >> > >> -- > >> Met vriendelijke groet, > >> > >> Martijn van Groningen > >> > > > > > > -- > Met vriendelijke groet, > > Martijn van Groningen >
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
When I used collapse.threshold=1, out of the 5 categories 4 had the same top result, but 1 category had a different result (it was the 3rd result coming for that category when I used threshold as 3). -- Thanks, Varun Gupta On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen < martijn.is.h...@gmail.com> wrote: > I would not expect that Solr 1.4 build is the cause of the problem. > Just out of curiosity does the same happen when collapse.threshold=1? > > 2009/12/11 Varun Gupta : > > Here is the field type configuration of ctype: > > > omitNorms="true" /> > > > > In solrconfig.xml, this is how I am enabling field collapsing: > > > class="org.apache.solr.handler.component.CollapseComponent"/> > > > > Apart from this, I made no changes in solrconfig.xml for field collapse. > I > > am currently not using the field collapse cache. > > > > I have applied the patch on the Solr 1.4 build. I am not using the latest > > solr nightly build. Can that cause any problem? > > > > -- > > Thanks > > Varun Gupta > > > > > > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen < > > martijn.is.h...@gmail.com> wrote: > > > >> I tried to reproduce a similar situation here, but I got the expected > >> and correct results. Those three documents that you saw in your first > >> search result should be the first in your second search result (unless > >> the index changes or the sort changes ) when fq on that specific > >> category. I'm not sure what is causing this problem. Can you give me > >> some more information like the field type configuration for the ctype > >> field and how have configured field collapsing? > >> > >> I did find another problem to do with field collapse caching. The > >> collapse.threshold or collapse.maxdocs parameters are not taken into > >> account when caching, which is off course wrong because they do matter > >> when collapsing. Based on the information you have given me this > >> caching problem is not the cause of the situation you have. I will > >> update the patch that fixes this problem shortly. > >> > >> Martijn > >> > >> 2009/12/10 Varun Gupta : > >> > Hi Martijn, > >> > > >> > I am not sending the collapse parameters for the second query. Here > are > >> the > >> > queries I am using: > >> > > >> > *When using field collapsing (searching over all categories):* > >> > > >> > spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch > >> > > >> > categories is represented as the field "ctype" above. > >> > > >> > *Without using field collapsing:* > >> > > >> > spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch > >> > > >> > I append "&fq=ctype:1" to the above queries when trying to get results > >> for a > >> > particular category. > >> > > >> > -- > >> > Thanks > >> > Varun Gupta > >> > > >> > > >> > On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen < > >> > martijn.is.h...@gmail.com> wrote: > >> > > >> >> Hi Varun, > >> >> > >> >> Can you send the whole requests (with params), that you send to Solr > >> >> for both queries? > >> >> In your situation the collapse parameters only have to be used for > the > >> >> first query and not the second query. > >> >> > >> >> Martijn > >> >> > >> >> 2009/12/10 Varun Gupta : > >> >> > Hi, > >> >> > > >> >> > I have documents under 6 different categories. While searching, I > want > >> to > >> >> > show 3 documents from each category along with a link to see all > the > >> >> > documents under a single category. I decided to use field > collap
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
After a lot of debugging, I finally found why the order of collapse results are not matching the uncollapsed results. I can't say if it is a bug in the implementation of fieldcollapse or not. *Explaination:* Actually, I am querying the fieldcollapse with some filters to restrict the collapsing to some particular categories only by appending the parameter: fq=ctype:(1+2+8+6+3). In: NonAdjacentDocumentCollapser.doQuery() Line: DocSet filter = searcher.getDocSet(filterQueries); Here, filter docset is got without any scores (since I have filter in my query, this line actually gets executed) and also stored in the filter cache. In the next line in the code, the actual uncollapsed DocSet is got passing the DocSetScoreCollector. Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter, DocSetAwareCollector collector) Line: if (filterCache != null) Because of the filter cache not being null, and no result for the query in the cache, the line: first = getDocSetNC(absQ,null); gets executed. Notice, over here the DocSetScoreCollector is not passed. Hence, results are collected without any scores. This makes the uncollapsedDocSet to be without any scores and hence the sorting is not done based on score. @Martijn: Is what I am right or I should use field collapsing in some other way. Else, what is the ideal fix for this problem (I am not an active developer, so can't say the fix that I do will not break anything). -- Thanks, Varun Gupta On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta wrote: > When I used collapse.threshold=1, out of the 5 categories 4 had the same > top result, but 1 category had a different result (it was the 3rd result > coming for that category when I used threshold as 3). > > -- > Thanks, > Varun Gupta > > > > On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen < > martijn.is.h...@gmail.com> wrote: > >> I would not expect that Solr 1.4 build is the cause of the problem. >> Just out of curiosity does the same happen when collapse.threshold=1? >> >> 2009/12/11 Varun Gupta : >> > Here is the field type configuration of ctype: >> >> > omitNorms="true" /> >> > >> > In solrconfig.xml, this is how I am enabling field collapsing: >> >> > class="org.apache.solr.handler.component.CollapseComponent"/> >> > >> > Apart from this, I made no changes in solrconfig.xml for field collapse. >> I >> > am currently not using the field collapse cache. >> > >> > I have applied the patch on the Solr 1.4 build. I am not using the >> latest >> > solr nightly build. Can that cause any problem? >> > >> > -- >> > Thanks >> > Varun Gupta >> > >> > >> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen < >> > martijn.is.h...@gmail.com> wrote: >> > >> >> I tried to reproduce a similar situation here, but I got the expected >> >> and correct results. Those three documents that you saw in your first >> >> search result should be the first in your second search result (unless >> >> the index changes or the sort changes ) when fq on that specific >> >> category. I'm not sure what is causing this problem. Can you give me >> >> some more information like the field type configuration for the ctype >> >> field and how have configured field collapsing? >> >> >> >> I did find another problem to do with field collapse caching. The >> >> collapse.threshold or collapse.maxdocs parameters are not taken into >> >> account when caching, which is off course wrong because they do matter >> >> when collapsing. Based on the information you have given me this >> >> caching problem is not the cause of the situation you have. I will >> >> update the patch that fixes this problem shortly. >> >> >> >> Martijn >> >> >> >> 2009/12/10 Varun Gupta : >> >> > Hi Martijn, >> >> > >> >> > I am not sending the collapse parameters for the second query. Here >> are >> >> the >> >> > queries I am using: >> >> > >> >> > *When using field collapsing (searching over all categories):* >> >> > >> >> >> spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
Hi Martijn, Yes, it is working after making these changes. -- Thanks Varun Gupta On Sun, Dec 20, 2009 at 5:54 PM, Martijn v Groningen < martijn.is.h...@gmail.com> wrote: > Hi Varun, > > Yes, after going over the code I think you are right. If you change > the following if block in SolrIndexSearcher.getDocSet(Query query, > DocSet filter, DocSetAwareCollector collector): > if (first==null) { >first = getDocSetNC(absQ, null); >filterCache.put(absQ,first); > } > with: > if (first==null) { >first = getDocSetNC(absQ, null, collector); >filterCache.put(absQ,first); > } > It should work then. Let me know if this solves your problem. > > Martijn > > > 2009/12/18 Varun Gupta : > > After a lot of debugging, I finally found why the order of collapse > results > > are not matching the uncollapsed results. I can't say if it is a bug in > the > > implementation of fieldcollapse or not. > > > > *Explaination:* > > Actually, I am querying the fieldcollapse with some filters to restrict > the > > collapsing to some particular categories only by appending the parameter: > > fq=ctype:(1+2+8+6+3). > > > > In: NonAdjacentDocumentCollapser.doQuery() > > Line: DocSet filter = searcher.getDocSet(filterQueries); > > > > Here, filter docset is got without any scores (since I have filter in my > > query, this line actually gets executed) and also stored in the filter > > cache. In the next line in the code, the actual uncollapsed DocSet is got > > passing the DocSetScoreCollector. > > > > Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter, > > DocSetAwareCollector collector) > > Line: if (filterCache != null) > > Because of the filter cache not being null, and no result for the query > in > > the cache, the line: first = getDocSetNC(absQ,null); gets executed. > Notice, > > over here the DocSetScoreCollector is not passed. Hence, results are > > collected without any scores. > > > > This makes the uncollapsedDocSet to be without any scores and hence the > > sorting is not done based on score. > > > > @Martijn: Is what I am right or I should use field collapsing in some > other > > way. Else, what is the ideal fix for this problem (I am not an active > > developer, so can't say the fix that I do will not break anything). > > > > -- > > Thanks, > > Varun Gupta > > > > > > On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta >wrote: > > > >> When I used collapse.threshold=1, out of the 5 categories 4 had the same > >> top result, but 1 category had a different result (it was the 3rd result > >> coming for that category when I used threshold as 3). > >> > >> -- > >> Thanks, > >> Varun Gupta > >> > >> > >> > >> On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen < > >> martijn.is.h...@gmail.com> wrote: > >> > >>> I would not expect that Solr 1.4 build is the cause of the problem. > >>> Just out of curiosity does the same happen when collapse.threshold=1? > >>> > >>> 2009/12/11 Varun Gupta : > >>> > Here is the field type configuration of ctype: > >>> > >>> > omitNorms="true" /> > >>> > > >>> > In solrconfig.xml, this is how I am enabling field collapsing: > >>> > >>> > class="org.apache.solr.handler.component.CollapseComponent"/> > >>> > > >>> > Apart from this, I made no changes in solrconfig.xml for field > collapse. > >>> I > >>> > am currently not using the field collapse cache. > >>> > > >>> > I have applied the patch on the Solr 1.4 build. I am not using the > >>> latest > >>> > solr nightly build. Can that cause any problem? > >>> > > >>> > -- > >>> > Thanks > >>> > Varun Gupta > >>> > > >>> > > >>> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen < > >>> > martijn.is.h...@gmail.com> wrote: > >>> > > >>> >> I tried to reproduce a similar situation here, but I got the > expected > >>> >> and correct results. Those three documents that you saw in your > first > >>> >> search result should be the first in your second search result > (unless > >>> >> the index changes or the sort changes ) when fq on that specific > >&g