Doing spatial search on multiple location points

2014-03-17 Thread Varun Gupta
Hi,

I am trying to find out if solr supports doing a spatial search on multiple
location points. Basically, while querying solr, I will be giving multiple
lat-long points and solr will be returning documents which are closer to
any of the given points.

If this is not possible, is there any way to make it work without hitting
solr for each of the lat-long and then collating results.

Thanks in advance.

--
Varun Gupta


Re: Doing spatial search on multiple location points

2014-03-18 Thread Varun Gupta
Hi David,

Thanks for the quick reply.

As I haven't migrated to 4.7 (I am still using 4.6), I tested using OR
clause with multiple geofilt query based phrases and it seems to be working
great. But I have one more question: How do I boost the score of the
matching documents based on geodist? How will I get the geodist based on
the closest matching lat-long point.

Thanks in advance.

--
Varun Gupta

On Mon, Mar 17, 2014 at 7:27 PM, Smiley, David W.  wrote:

> Absolutely.  The most straight-forward approach is to use the default
> query parser comprised of OR clauses of geofilt query parser based
> clauses.  Another way to do it in Solr 4.7 that is probably faster is to
> use WKT with the custom "buffer" extension:
> myLocationRptField:"BUFFER(MULTIPOINT(x y, x y, x y, x y), d)
> distErrPct=0" (whereas 'd' is distance in degrees, not km).
>
> ~ David
>
> On 3/17/14, 9:28 AM, "Varun Gupta"  wrote:
>
> >Hi,
> >
> >I am trying to find out if solr supports doing a spatial search on
> >multiple
> >location points. Basically, while querying solr, I will be giving multiple
> >lat-long points and solr will be returning documents which are closer to
> >any of the given points.
> >
> >If this is not possible, is there any way to make it work without hitting
> >solr for each of the lat-long and then collating results.
> >
> >Thanks in advance.
> >
> >--
> >Varun Gupta
>
>


Using ExternalFileField on SolrCloud

2014-04-21 Thread Varun Gupta
Hi,

I am trying to use ExternalFileField on Solr 4.6 running on SolrCloud for
the purpose of changing the document score based on a frequently changed
field. According to the documentation, the external file needs to be
present in the "data" folder of the collection.

I am confused over here on where should I upload the external file on
zookeeper so that the file will end up in the "data" folder? I can see
"/configs/" and "/collections/" in my
zookeeper instance. Am I right in trying to propagate the external file
using zookeeper or should I be looking into some other way to sync the file
to all solr instances.

--
Thanks
Varun Gupta


Getting min and max of a solr field for each group while doing field collapsing/result grouping

2014-05-01 Thread Varun Gupta
Hi,

I am using SolrCloud for getting results grouped by a particular field.
Now, I also want to get min and max value for a particular field for each
group. For example, if I am grouping results by city, then I also want to
get the minimum and maximum price for each city.

Is this possible to do with Solr.

Thanks in Advance!

--
Varun Gupta


Re: Getting min and max of a solr field for each group while doing field collapsing/result grouping

2014-05-01 Thread Varun Gupta
Hi Ahmet,

Thanks for the information! But as per Solr documentation, group.truncate
is not supported in distributed searches and I am looking for a solution
that can work on SolrCloud.

--
Varun Gupta

On Thu, May 1, 2014 at 4:12 PM, Ahmet Arslan  wrote:

> Hi Varun,
>
> I think you can use group.truncate=true with stats component
> http://wiki.apache.org/solr/StatsComponent
>
>
> If true, facet counts are based on the most relevant document of each
> group matching the query. Same applies for StatsComponent. Default is
> false.  Solr3.4 Supported from Solr 3.4 and up.
>
> On Thursday, May 1, 2014 12:30 PM, Varun Gupta 
> wrote:
>
> Hi,
>
> I am using SolrCloud for getting results grouped by a particular field.
> Now, I also want to get min and max value for a particular field for each
> group. For example, if I am grouping results by city, then I also want to
> get the minimum and maximum price for each city.
>
> Is this possible to do with Solr.
>
> Thanks in Advance!
>
> --
> Varun Gupta
>


Add a new replica to SolrCloud

2014-07-08 Thread Varun Gupta
Hi,

I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
servers with number of shards as 2, replication factor as 2 and mas shards
per node as 4.

Now, I want to add another server to the SolrCloud as a replica. I can see
Collection API to add a new replica but that was added in Solr 4.8. Is
there some way to add a new replica in Solr 4.7.2?

--
Thanks
Varun Gupta


How do I this in Solr?

2010-10-26 Thread Varun Gupta
Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only those
documents that satisfy this criteria "All of the words of the search result
document are present in the search query"

For example:
If I have the following documents indexed: "nokia n95", "GPS", "android",
"samsung", "samsung andriod", "nokia andriod", "mobile with GPS"

If I search with the text "samsung andriod GPS", search results should only
conain "samsung", "GPS", "andriod" and "samsung andriod".

Is there a way to do this in Solr.

--
Thanks
Varun Gupta


Re: How do I this in Solr?

2010-10-26 Thread Varun Gupta
Thanks everybody for the inputs.

Looks like Steven's solution is the closest one but will lead to performance
issues when the query string has many terms.

I will try to implement the two filters suggested by Steven and see how the
performance matches up.

--
Thanks
Varun Gupta


On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) wrote:

> I think you have to write a "yet exact match" handler yourself (I mean yet
> cause it's not quite exact match we normally know). Steve's answer is quite
> near your request. You can do further work based on his solution.
>
> At the last step, I'll suggest you eat up all blank within query string and
> query result, respevtively & only returns those results that has equal
> string length as the query string's.
>
> For example, giving:
> *query string = "Samsung with GPS"
> *query results:
> resutl 1 = "Samsung has lots of mobile with GPS"
> result 2 = "with GPS Samsng"
> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>
> they become:
> *query result = "SamsungwithGPS" (length =14)
> *query results:
> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
> result 2 = "withGPSSamsng" (length =14)
> result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>
> so result 2 matches your request.
>
> In this way, you can avoid case-sensitive, word-order-rearrange load of
> works. Furthermore, you can do refined work, such as remove white
> characters, etc.
>
> Scott @ Taiwan
>
>
> - Original Message - From: "Varun Gupta" 
>
> To: 
> Sent: Tuesday, October 26, 2010 9:07 PM
>
> Subject: How do I this in Solr?
>
>
>  Hi,
>>
>> I have lot of small documents (each containing 1 to 15 words) indexed in
>> Solr. For the search query, I want the search results to contain only
>> those
>> documents that satisfy this criteria "All of the words of the search
>> result
>> document are present in the search query"
>>
>> For example:
>> If I have the following documents indexed: "nokia n95", "GPS", "android",
>> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
>>
>> If I search with the text "samsung andriod GPS", search results should
>> only
>> conain "samsung", "GPS", "andriod" and "samsung andriod".
>>
>> Is there a way to do this in Solr.
>>
>> --
>> Thanks
>> Varun Gupta
>>
>>
>
>
> 
>
>
>
> %<&b6G$J0T.'$$'d(l/f,r!C
> Checked by AVG - www.avg.com
> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
> 14:34:00
>
>


Re: How do I this in Solr?

2010-10-27 Thread Varun Gupta
Toke, the search query will contain 4-5 words on an average (excluding the
stopwords).

Mike, I don't care about the result count. Excluding the terms at the client
side may be a good idea. Is there any way to alter scoring such that the
docs containing only the searched-for terms are shown first? Can I use term
frequency to do such kind of thing?

--
Thanks
Varun Gupta

On Wed, Oct 27, 2010 at 7:13 PM, Mike Sokolov  wrote:

> Yes I missed that requirement (as Steven also pointed out in a private
> e-mail).  I now agree that the combinatorics are required.
>
> Another possibility to consider (if the queries are large, which actually
> seems unlikely) is to use the default behavior where all terms are optional,
> sort by relevance, and truncate the result list on the client side after
> some unwanted term is found.  I *think* the scoring should find only docs
> with the searched-for terms first, although if there are a lot of repeated
> terms maybe not? Also result counts will be screwy.
>
> -Mike
>
>
> On 10/27/2010 09:34 AM, Toke Eskildsen wrote:
>
>> That does not work either as it requires that all the terms in the query
>> are present in the document. The original poster did not state this
>> requirement. On the contrary, his examples were mostly single-word
>> matches, implying an OR-search at the core.
>>
>> The query-explosion still seems like the only working idea. Maybe Varun
>> could comment on the maximum numbers of terms that his queries will
>> contain?
>>
>> Regards,
>> Toke Eskildsen
>>
>> On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
>>
>>
>>> Right - my point was to combine this with the previous approaches to
>>> form a query like:
>>>
>>> samsung AND android AND GPS AND word_count:3
>>>
>>> in order to exclude documents containing additional words. This would
>>> avoid the combinatoric explosion problem otehrs had alluded to earlier.
>>> Of course this would fail because android is "mis-" spelled :)
>>>
>>> -Mike
>>>
>>> On 10/27/2010 08:45 AM, Steven A Rowe wrote:
>>>
>>>
>>>> I'm pretty sure the word-count strategy won't work.
>>>>
>>>>
>>>>
>>>>
>>>>> If I search with the text "samsung andriod GPS", search results
>>>>> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
>>>>>
>>>>>
>>>>>
>>>> Using the word-count strategy, a document containing "samsung andriod
>>>> PDQ" would be a hit, but Varun doesn't want it, because it contains a word
>>>> that is not in the query.
>>>>
>>>> Steve
>>>>
>>>>
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: Michael Sokolov [mailto:soko...@ifactory.com]
>>>>> Sent: Wednesday, October 27, 2010 7:44 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: RE: How do I this in Solr?
>>>>>
>>>>> You might try adding a field containing the word count and making sure
>>>>> that
>>>>> matches the query's word count?
>>>>>
>>>>> This would require you to tokenize the query and document yourself,
>>>>> perhaps.
>>>>>
>>>>> -Mike
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -Original Message-
>>>>>> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
>>>>>> Sent: Tuesday, October 26, 2010 11:26 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Re: How do I this in Solr?
>>>>>>
>>>>>> Thanks everybody for the inputs.
>>>>>>
>>>>>> Looks like Steven's solution is the closest one but will lead
>>>>>> to performance issues when the query string has many terms.
>>>>>>
>>>>>> I will try to implement the two filters suggested by Steven
>>>>>> and see how the performance matches up.
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Varun Gupta
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>&g

Re: How do I this in Solr?

2010-11-08 Thread Varun Gupta
I haven't been able to work on it because of some other commitments. The
MemoryIndex approach seems promising. Only thing I will have to check is the
memory requirement as I have close to 2 million documents.

Will let you know if I can make it work.

Thanks a lot!

--
Varun Gupta

On Sat, Nov 6, 2010 at 3:48 AM, Steven A Rowe  wrote:

> Hi Varun,
>
> On 10/26/2010 at 11:26 PM, Varun Gupta wrote:
> > I will try to implement the two filters suggested by Steven and see how
> > the performance matches up.
>
> Have you made any progress?
>
> I was thinking about your use case, and it occurred to me that you could
> get what you want by reversing the problem, using Lucene's MemoryIndex <
> http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html>.
>  (As far as I can tell, this functionality -- i.e. standing queries a.k.a.
> routing a.k.a. filtering -- is not present in Solr.)
>
> You can load your query (as a document) into a MemoryIndex, and then use
> each of your documents to query against it, something like (untested!):
>
>Map documents = new HashMap();
>Analyzer analyzer = new WhitespaceAnalyzer();
>QueryParser parser = new QueryParser("content", analyzer);
>parser.setDefaultOperator(QueryParser.Operator.AND);
>documents.put("ID001", parser.parse("nokia n95"));
>documents.put("ID002", parser.parse("GPS"));
>documents.put("ID003", parser.parse("android"));
>documents.put("ID004", parser.parse("samsung"));
>  documents.put("ID005", parser.parse("samsung android"));
>  documents.put("ID006", parser.parse("nokia android"));
>  documents.put("ID007", parser.parse("mobile with GPS"));
>
>MemoryIndex index = new MemoryIndex();
>index.addField("content", "samsung with GPS", analyzer);
>
>for (Map.Entry entry : documents.entrySet()) {
>  Query query = entry.getValue();
>  if (index.search(query) > 0.0f) {
>String docId = entry.getKey();
>// Do something with the hits here ...
>  }
>}
>
> In the above example, the documents "samsung", "GPS", "android" and
> "samsung android" would be hits, and the other documents would not be, just
> as you wanted.
>
> MemoryIndex is designed to be very fast for this kind of usage, so even
> 100's of thousands of documents should be feasible.
>
> Steve
>
> > -Original Message-
> > From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> > Sent: Tuesday, October 26, 2010 11:26 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How do I this in Solr?
> >
> > Thanks everybody for the inputs.
> >
> > Looks like Steven's solution is the closest one but will lead to
> > performance
> > issues when the query string has many terms.
> >
> > I will try to implement the two filters suggested by Steven and see how
> > the
> > performance matches up.
> >
> > --
> > Thanks
> > Varun Gupta
> >
> >
> > On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹)
> > wrote:
> >
> > > I think you have to write a "yet exact match" handler yourself (I mean
> > yet
> > > cause it's not quite exact match we normally know). Steve's answer is
> > quite
> > > near your request. You can do further work based on his solution.
> > >
> > > At the last step, I'll suggest you eat up all blank within query string
> > and
> > > query result, respevtively & only returns those results that has equal
> > > string length as the query string's.
> > >
> > > For example, giving:
> > > *query string = "Samsung with GPS"
> > > *query results:
> > > resutl 1 = "Samsung has lots of mobile with GPS"
> > > result 2 = "with GPS Samsng"
> > > result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> > >
> > > they become:
> > > *query result = "SamsungwithGPS" (length =14)
> > > *query results:
> > > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
> > > result 2 = "withGPSSamsng" (length =14)
> > > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> > >
> > > so result 2 matches your request.
> > >
> > > In this way, you can avoi

Showing few results for each category (facet)

2009-09-29 Thread Varun Gupta
Hi,

I am looking for a way to do the following in solr:
When somebody does a search, I want to show results by category (facet) such
that I display 5 results from each category (along with showing the total
number of results in each category which I can always do using the facet
search). This is kind of an overview of all the search results and user can
click on the category to see all the results pertaining to that category
(normal facet search with filter).

One way that I can think of doing this is by making as many queries as there
are categories and show these results under each category. But this will be
very inefficient. Is there any way I can do this ?

Thanks & Regards,
Varun Gupta


Re: Showing few results for each category (facet)

2009-09-30 Thread Varun Gupta
Thanks Matt!! I will take a look at the patch for field collapsing.

Thanks Marian for pointing that out. If the field collapse does not work
then I will have to rely on solr caching.

Thanks,
Varun Gupta


On Wed, Sep 30, 2009 at 1:44 AM, Matt Weber  wrote:

> So, you want to display 5 results from each category and still know how
> many results are in each category.  This is a perfect situation for the
> field collapsing patch:
>
> https://issues.apache.org/jira/browse/SOLR-236
> http://wiki.apache.org/solr/FieldCollapsing
>
> Here is how I would do it.
>
> Add a field to your schema called category or whatever.  Then while
> indexing you populate that field with whatever category the document belongs
> in.  While executing a search, collapse the results on that field with a max
> collapse of 5.  This will give you at most 5 results per category.  Now, at
> the same time enable faceting on that field and DO NOT use the collapsing
> parameter to recount the facet vales.  This means that the facet counts will
> be reflect the non-collapsed results.  This facet should only be used to get
> the count for each category, not displayed to the user.  On your search
> results page that gets the collapsed results, you can put a link that says
> "Show all X results from this category" where X is the value you pull out of
> the facet.  When a user clicks that link you basically do the same search
> with field collapsing disabled, and a filter query on the specific category
> they want to see, for example:  &fq=category:people.
>
> Hope this helps.
>
> Thanks,
>
> Matt Weber
>
>
> On Sep 29, 2009, at 4:55 AM, Marian Steinbach wrote:
>
>  On Tue, Sep 29, 2009 at 11:36 AM, Varun Gupta 
>> wrote:
>>
>>> ...
>>>
>>> One way that I can think of doing this is by making as many queries as
>>> there
>>> are categories and show these results under each category. But this will
>>> be
>>> very inefficient. Is there any way I can do this ?
>>>
>>
>>
>> Hi Varun!
>>
>> I think that doing multiple queries doesn't have to be inefficient,
>> since Solr caches subsequent queries for the same term and facets.
>>
>> Imagine this as your first query:
>> - q: xyz
>> - facets: myfacet
>>
>> and this as a second query:
>> - q:xyz
>> - fq: myfacet=a
>>
>> Compared to the first query, the second query will be very fast, since
>> all the hard work ahs been done in query one and then cached.
>>
>> At least that's my understanding. Please correct me if I'm wrong.
>>
>> Marian
>>
>
>


SpellCheck Index not building

2009-10-12 Thread Varun Gupta
Hi,

I am using Solr 1.3 for spell checking. I am facing a strange problem of
spell checking index not been generated. When I have less number of
documents (less than 1000) indexed then the spell check index builds, but
when the documents are more (around 40K), then the index for spell checking
does not build. I can see the directory for spell checking build and there
are two files under it: segments_3  & segments.gen

I am using the following query to build the spell checking index:
/select
params={spellcheck=true&start=0&qt=contentsearch&wt=xml&rows=0&spellcheck.build=true&version=2.2

In the logs I see:
INFO: [] webapp=/solr path=/select
params={spellcheck=true&start=0&qt=contentsearch&wt=xml&rows=0&spellcheck.build=true&version=2.2}
hits=37467 status=0 QTime=44

Please help me solve this problem.

Here is my configuration:
*schema.xml:*

  




  

   
   
   

*solrconfig.xml:*
  

 dismax

  false
  false
  5
  true
  jarowinkler


spellcheck

  

  
textSpell

  a_spell
  a_spell
  ./spellchecker_a_spell
  0.7


  jarowinkler
  a_spell
  
  org.apache.lucene.search.spell.JaroWinklerDistance
  ./spellchecker_a_spell
  0.7

  

--
Thanks
Varun Gupta


Re: SpellCheck Index not building

2009-10-12 Thread Varun Gupta
No, there are no exceptions in the logs.

--
Thanks
Varun Gupta

On Tue, Oct 13, 2009 at 8:46 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta 
> wrote:
>
> > Hi,
> >
> > I am using Solr 1.3 for spell checking. I am facing a strange problem of
> > spell checking index not been generated. When I have less number of
> > documents (less than 1000) indexed then the spell check index builds, but
> > when the documents are more (around 40K), then the index for spell
> checking
> > does not build. I can see the directory for spell checking build and
> there
> > are two files under it: segments_3  & segments.gen
> >
> >
> It seems that you might be running out of memory with a larger index. Can
> you check the logs to see if it has any exceptions recorded?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-10 Thread Varun Gupta
Hi,

I have documents under 6 different categories. While searching, I want to
show 3 documents from each category along with a link to see all the
documents under a single category. I decided to use field collapsing so that
I don't have to make 6 queries (one for each category). Currently I am using
the field collapsing patch uploaded on 29th Nov.

Now, the results that are coming after using field collapsing are not
matching the results for a single category. For example, for category C1, I
am getting results R1, R2 and R3 using field collapsing, but after I see
results only from the category C1 (without using field collapsing) these
results are nowhere in the first 10 results.

Am I doing something wrong or using the field collapsing for the wrong
feature?

I am using the following field collapsing parameters while querying:
   collapse.field=category
   collapse.facet=before
   collapse.threshold=3

--
Thanks
Varun Gupta


Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-10 Thread Varun Gupta
Hi Martijn,

I am not sending the collapse parameters for the second query. Here are the
queries I am using:

*When using field collapsing (searching over all categories):*
spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch

categories is represented as the field "ctype" above.

*Without using field collapsing:*
spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch

I append "&fq=ctype:1" to the above queries when trying to get results for a
particular category.

--
Thanks
Varun Gupta


On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen <
martijn.is.h...@gmail.com> wrote:

> Hi Varun,
>
> Can you send the whole requests (with params), that you send to Solr
> for both queries?
> In your situation the collapse parameters only have to be used for the
> first query and not the second query.
>
> Martijn
>
> 2009/12/10 Varun Gupta :
> > Hi,
> >
> > I have documents under 6 different categories. While searching, I want to
> > show 3 documents from each category along with a link to see all the
> > documents under a single category. I decided to use field collapsing so
> that
> > I don't have to make 6 queries (one for each category). Currently I am
> using
> > the field collapsing patch uploaded on 29th Nov.
> >
> > Now, the results that are coming after using field collapsing are not
> > matching the results for a single category. For example, for category C1,
> I
> > am getting results R1, R2 and R3 using field collapsing, but after I see
> > results only from the category C1 (without using field collapsing) these
> > results are nowhere in the first 10 results.
> >
> > Am I doing something wrong or using the field collapsing for the wrong
> > feature?
> >
> > I am using the following field collapsing parameters while querying:
> >   collapse.field=category
> >   collapse.facet=before
> >   collapse.threshold=3
> >
> > --
> > Thanks
> > Varun Gupta
> >
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>


Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-10 Thread Varun Gupta
Here is the field type configuration of ctype:


In solrconfig.xml, this is how I am enabling field collapsing:


Apart from this, I made no changes in solrconfig.xml for field collapse. I
am currently not using the field collapse cache.

I have applied the patch on the Solr 1.4 build. I am not using the latest
solr nightly build. Can that cause any problem?

--
Thanks
Varun Gupta


On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen <
martijn.is.h...@gmail.com> wrote:

> I tried to reproduce a similar situation here, but I got the expected
> and correct results. Those three documents that you saw in your first
> search result should be the first in your second search result (unless
> the index changes or the sort changes ) when fq on that specific
> category. I'm not sure what is causing this problem. Can you give me
> some more information like the field type configuration for the ctype
> field and how have configured field collapsing?
>
> I did find another problem to do with field collapse caching. The
> collapse.threshold or collapse.maxdocs parameters are not taken into
> account when caching, which is off course wrong because they do matter
> when collapsing. Based on the information you have given me this
> caching problem is not the cause of the situation you have. I will
> update the patch that fixes this problem shortly.
>
> Martijn
>
> 2009/12/10 Varun Gupta :
> > Hi Martijn,
> >
> > I am not sending the collapse parameters for the second query. Here are
> the
> > queries I am using:
> >
> > *When using field collapsing (searching over all categories):*
> >
> spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch
> >
> > categories is represented as the field "ctype" above.
> >
> > *Without using field collapsing:*
> >
> spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch
> >
> > I append "&fq=ctype:1" to the above queries when trying to get results
> for a
> > particular category.
> >
> > --
> > Thanks
> > Varun Gupta
> >
> >
> > On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen <
> > martijn.is.h...@gmail.com> wrote:
> >
> >> Hi Varun,
> >>
> >> Can you send the whole requests (with params), that you send to Solr
> >> for both queries?
> >> In your situation the collapse parameters only have to be used for the
> >> first query and not the second query.
> >>
> >> Martijn
> >>
> >> 2009/12/10 Varun Gupta :
> >> > Hi,
> >> >
> >> > I have documents under 6 different categories. While searching, I want
> to
> >> > show 3 documents from each category along with a link to see all the
> >> > documents under a single category. I decided to use field collapsing
> so
> >> that
> >> > I don't have to make 6 queries (one for each category). Currently I am
> >> using
> >> > the field collapsing patch uploaded on 29th Nov.
> >> >
> >> > Now, the results that are coming after using field collapsing are not
> >> > matching the results for a single category. For example, for category
> C1,
> >> I
> >> > am getting results R1, R2 and R3 using field collapsing, but after I
> see
> >> > results only from the category C1 (without using field collapsing)
> these
> >> > results are nowhere in the first 10 results.
> >> >
> >> > Am I doing something wrong or using the field collapsing for the wrong
> >> > feature?
> >> >
> >> > I am using the following field collapsing parameters while querying:
> >> >   collapse.field=category
> >> >   collapse.facet=before
> >> >   collapse.threshold=3
> >> >
> >> > --
> >> > Thanks
> >> > Varun Gupta
> >> >
> >>
> >>
> >>
> >> --
> >> Met vriendelijke groet,
> >>
> >> Martijn van Groningen
> >>
> >
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>


Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-13 Thread Varun Gupta
When I used collapse.threshold=1, out of the 5 categories 4 had the same top
result, but 1 category had a different result (it was the 3rd result coming
for that category when I used threshold as 3).

--
Thanks,
Varun Gupta


On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen <
martijn.is.h...@gmail.com> wrote:

> I would not expect that Solr 1.4 build is the cause of the problem.
> Just out of curiosity does the same happen when collapse.threshold=1?
>
> 2009/12/11 Varun Gupta :
> > Here is the field type configuration of ctype:
> > > omitNorms="true" />
> >
> > In solrconfig.xml, this is how I am enabling field collapsing:
> > > class="org.apache.solr.handler.component.CollapseComponent"/>
> >
> > Apart from this, I made no changes in solrconfig.xml for field collapse.
> I
> > am currently not using the field collapse cache.
> >
> > I have applied the patch on the Solr 1.4 build. I am not using the latest
> > solr nightly build. Can that cause any problem?
> >
> > --
> > Thanks
> > Varun Gupta
> >
> >
> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen <
> > martijn.is.h...@gmail.com> wrote:
> >
> >> I tried to reproduce a similar situation here, but I got the expected
> >> and correct results. Those three documents that you saw in your first
> >> search result should be the first in your second search result (unless
> >> the index changes or the sort changes ) when fq on that specific
> >> category. I'm not sure what is causing this problem. Can you give me
> >> some more information like the field type configuration for the ctype
> >> field and how have configured field collapsing?
> >>
> >> I did find another problem to do with field collapse caching. The
> >> collapse.threshold or collapse.maxdocs parameters are not taken into
> >> account when caching, which is off course wrong because they do matter
> >> when collapsing. Based on the information you have given me this
> >> caching problem is not the cause of the situation you have. I will
> >> update the patch that fixes this problem shortly.
> >>
> >> Martijn
> >>
> >> 2009/12/10 Varun Gupta :
> >> > Hi Martijn,
> >> >
> >> > I am not sending the collapse parameters for the second query. Here
> are
> >> the
> >> > queries I am using:
> >> >
> >> > *When using field collapsing (searching over all categories):*
> >> >
> >>
> spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype&qt=contentsearch
> >> >
> >> > categories is represented as the field "ctype" above.
> >> >
> >> > *Without using field collapsing:*
> >> >
> >>
> spellcheck=true&facet=true&facet.mincount=1&spellcheck.q=weight+loss&wt=xml&hl=true&rows=10&version=2.2&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&facet.field=ctype&qt=contentsearch
> >> >
> >> > I append "&fq=ctype:1" to the above queries when trying to get results
> >> for a
> >> > particular category.
> >> >
> >> > --
> >> > Thanks
> >> > Varun Gupta
> >> >
> >> >
> >> > On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen <
> >> > martijn.is.h...@gmail.com> wrote:
> >> >
> >> >> Hi Varun,
> >> >>
> >> >> Can you send the whole requests (with params), that you send to Solr
> >> >> for both queries?
> >> >> In your situation the collapse parameters only have to be used for
> the
> >> >> first query and not the second query.
> >> >>
> >> >> Martijn
> >> >>
> >> >> 2009/12/10 Varun Gupta :
> >> >> > Hi,
> >> >> >
> >> >> > I have documents under 6 different categories. While searching, I
> want
> >> to
> >> >> > show 3 documents from each category along with a link to see all
> the
> >> >> > documents under a single category. I decided to use field
> collap

Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-17 Thread Varun Gupta
After a lot of debugging, I finally found why the order of collapse results
are not matching the uncollapsed results. I can't say if it is a bug in the
implementation of fieldcollapse or not.

*Explaination:*
Actually, I am querying the fieldcollapse with some filters to restrict the
collapsing to some particular categories only by appending the parameter:
fq=ctype:(1+2+8+6+3).

In: NonAdjacentDocumentCollapser.doQuery()
Line: DocSet filter = searcher.getDocSet(filterQueries);

Here, filter docset is got without any scores (since I have filter in my
query, this line actually gets executed) and also stored in the filter
cache. In the next line in the code, the actual uncollapsed DocSet is got
passing the DocSetScoreCollector.

Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter,
DocSetAwareCollector collector)
Line: if (filterCache != null)
Because of the filter cache not being null, and no result for the query in
the cache, the line: first = getDocSetNC(absQ,null); gets executed. Notice,
over here the DocSetScoreCollector is not passed. Hence, results are
collected without any scores.

This makes the uncollapsedDocSet to be without any scores and hence the
sorting is not done based on score.

@Martijn: Is what I am right or I should use field collapsing in some other
way. Else, what is the ideal fix for this problem (I am not an active
developer, so can't say the fix that I do will not break anything).

--
Thanks,
Varun Gupta


On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta wrote:

> When I used collapse.threshold=1, out of the 5 categories 4 had the same
> top result, but 1 category had a different result (it was the 3rd result
> coming for that category when I used threshold as 3).
>
> --
> Thanks,
> Varun Gupta
>
>
>
> On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen <
> martijn.is.h...@gmail.com> wrote:
>
>> I would not expect that Solr 1.4 build is the cause of the problem.
>> Just out of curiosity does the same happen when collapse.threshold=1?
>>
>> 2009/12/11 Varun Gupta :
>> > Here is the field type configuration of ctype:
>> >> > omitNorms="true" />
>> >
>> > In solrconfig.xml, this is how I am enabling field collapsing:
>> >> > class="org.apache.solr.handler.component.CollapseComponent"/>
>> >
>> > Apart from this, I made no changes in solrconfig.xml for field collapse.
>> I
>> > am currently not using the field collapse cache.
>> >
>> > I have applied the patch on the Solr 1.4 build. I am not using the
>> latest
>> > solr nightly build. Can that cause any problem?
>> >
>> > --
>> > Thanks
>> > Varun Gupta
>> >
>> >
>> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen <
>> > martijn.is.h...@gmail.com> wrote:
>> >
>> >> I tried to reproduce a similar situation here, but I got the expected
>> >> and correct results. Those three documents that you saw in your first
>> >> search result should be the first in your second search result (unless
>> >> the index changes or the sort changes ) when fq on that specific
>> >> category. I'm not sure what is causing this problem. Can you give me
>> >> some more information like the field type configuration for the ctype
>> >> field and how have configured field collapsing?
>> >>
>> >> I did find another problem to do with field collapse caching. The
>> >> collapse.threshold or collapse.maxdocs parameters are not taken into
>> >> account when caching, which is off course wrong because they do matter
>> >> when collapsing. Based on the information you have given me this
>> >> caching problem is not the cause of the situation you have. I will
>> >> update the patch that fixes this problem shortly.
>> >>
>> >> Martijn
>> >>
>> >> 2009/12/10 Varun Gupta :
>> >> > Hi Martijn,
>> >> >
>> >> > I am not sending the collapse parameters for the second query. Here
>> are
>> >> the
>> >> > queries I am using:
>> >> >
>> >> > *When using field collapsing (searching over all categories):*
>> >> >
>> >>
>> spellcheck=true&collapse.info.doc=true&facet=true&collapse.threshold=3&facet.mincount=1&spellcheck.q=weight+loss&collapse.facet=before&wt=xml&f.content.hl.snippets=2&hl=true&version=2.2&rows=20&collapse.field=ctype&fl=id,sid,title,image,ctype,score&start=0&q=weight+loss&collapse.info.count=false&facet.field=ctype

Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-21 Thread Varun Gupta
Hi Martijn,

Yes, it is working after making these changes.

--
Thanks
Varun Gupta

On Sun, Dec 20, 2009 at 5:54 PM, Martijn v Groningen <
martijn.is.h...@gmail.com> wrote:

> Hi Varun,
>
> Yes, after going over the code I think you are right. If you change
> the following if block in SolrIndexSearcher.getDocSet(Query query,
> DocSet filter, DocSetAwareCollector collector):
> if (first==null) {
>first = getDocSetNC(absQ, null);
>filterCache.put(absQ,first);
> }
> with:
> if (first==null) {
>first = getDocSetNC(absQ, null, collector);
>filterCache.put(absQ,first);
> }
> It should work then. Let me know if this solves your problem.
>
> Martijn
>
>
> 2009/12/18 Varun Gupta :
> > After a lot of debugging, I finally found why the order of collapse
> results
> > are not matching the uncollapsed results. I can't say if it is a bug in
> the
> > implementation of fieldcollapse or not.
> >
> > *Explaination:*
> > Actually, I am querying the fieldcollapse with some filters to restrict
> the
> > collapsing to some particular categories only by appending the parameter:
> > fq=ctype:(1+2+8+6+3).
> >
> > In: NonAdjacentDocumentCollapser.doQuery()
> > Line: DocSet filter = searcher.getDocSet(filterQueries);
> >
> > Here, filter docset is got without any scores (since I have filter in my
> > query, this line actually gets executed) and also stored in the filter
> > cache. In the next line in the code, the actual uncollapsed DocSet is got
> > passing the DocSetScoreCollector.
> >
> > Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter,
> > DocSetAwareCollector collector)
> > Line: if (filterCache != null)
> > Because of the filter cache not being null, and no result for the query
> in
> > the cache, the line: first = getDocSetNC(absQ,null); gets executed.
> Notice,
> > over here the DocSetScoreCollector is not passed. Hence, results are
> > collected without any scores.
> >
> > This makes the uncollapsedDocSet to be without any scores and hence the
> > sorting is not done based on score.
> >
> > @Martijn: Is what I am right or I should use field collapsing in some
> other
> > way. Else, what is the ideal fix for this problem (I am not an active
> > developer, so can't say the fix that I do will not break anything).
> >
> > --
> > Thanks,
> > Varun Gupta
> >
> >
> > On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta  >wrote:
> >
> >> When I used collapse.threshold=1, out of the 5 categories 4 had the same
> >> top result, but 1 category had a different result (it was the 3rd result
> >> coming for that category when I used threshold as 3).
> >>
> >> --
> >> Thanks,
> >> Varun Gupta
> >>
> >>
> >>
> >> On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen <
> >> martijn.is.h...@gmail.com> wrote:
> >>
> >>> I would not expect that Solr 1.4 build is the cause of the problem.
> >>> Just out of curiosity does the same happen when collapse.threshold=1?
> >>>
> >>> 2009/12/11 Varun Gupta :
> >>> > Here is the field type configuration of ctype:
> >>> > >>> > omitNorms="true" />
> >>> >
> >>> > In solrconfig.xml, this is how I am enabling field collapsing:
> >>> > >>> > class="org.apache.solr.handler.component.CollapseComponent"/>
> >>> >
> >>> > Apart from this, I made no changes in solrconfig.xml for field
> collapse.
> >>> I
> >>> > am currently not using the field collapse cache.
> >>> >
> >>> > I have applied the patch on the Solr 1.4 build. I am not using the
> >>> latest
> >>> > solr nightly build. Can that cause any problem?
> >>> >
> >>> > --
> >>> > Thanks
> >>> > Varun Gupta
> >>> >
> >>> >
> >>> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen <
> >>> > martijn.is.h...@gmail.com> wrote:
> >>> >
> >>> >> I tried to reproduce a similar situation here, but I got the
> expected
> >>> >> and correct results. Those three documents that you saw in your
> first
> >>> >> search result should be the first in your second search result
> (unless
> >>> >> the index changes or the sort changes ) when fq on that specific
> >&g