Duplicate docs in Solr pagination

2016-12-11 Thread atawfik
Hi all,

I am experiencing a weird behavior with Solr. Pagination gives duplicates
results. 

Requesting
*http://localhost:8983/solr/tweets/select?q=text:test&start=0&wt=csv&fl=id,timestamp&fq=doc_type:tweet*
gives me:

id,timestamp
801943081268428800,2016-11-25T00:18:24.613Z
802159834942541824,2016-11-25T14:39:42.716Z
801932818301521920,2016-11-24T23:37:37.731Z
801945904328544256,2016-11-25T00:29:37.683Z
801947217439272960,2016-11-25T00:34:50.753Z
801944318885982208,2016-11-25T00:23:19.684Z
801944683282894848,2016-11-25T00:24:46.563Z
801945048527097856,2016-11-25T00:26:13.644Z
*802339145678848000*,2016-11-26T02:32:13.727Z
802340356973010944,2016-11-26T02:37:02.522Z

However, requesting
*http://localhost:8983/solr/tweets/select?q=text:test&start=1&wt=csv&fl=id,timestamp&fq=doc_type:tweet*
gives me:
id,timestamp
802159834942541824,2016-11-25T14:39:42.716Z
801932818301521920,2016-11-24T23:37:37.731Z
801945904328544256,2016-11-25T00:29:37.683Z
801947217439272960,2016-11-25T00:34:50.753Z
801944318885982208,2016-11-25T00:23:19.684Z
801944683282894848,2016-11-25T00:24:46.563Z
801945048527097856,2016-11-25T00:26:13.644Z
*802339145678848000*,2016-11-26T02:32:13.727Z
802340356973010944,2016-11-26T02:37:02.522Z
802345158679363584,2016-11-26T02:56:07.338Z


The index is already optimized, I am not adding any documents when I issue
the queries and I am using Solr 6.2.1.

Regards
Ameer




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Duplicate-docs-in-Solr-pagination-tp4309292.html
Sent from the Solr - User mailing list archive at Nabble.com.


Join and non-Join query give different results

2014-07-13 Thread atawfik
Hi everyone,

I am trying to link two types of documents in my Solr index. The parent is
named "house" and the child is named "available". So, I want to return a
list of houses that have available documents with some filtering. However,
the following query gives me around 18 documents, which is wrong. It should
return 0 documents.

q=*:*
&fq={!join from=house_id_fk to=house_id}doctype:available AND discount:[1 TO
*] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS]
&fq={!join from=house_id_fk to=house_id}doctype:available AND sd_year:2014
AND sd_month:11

To debug it, I tried first to check whether there is any available documents
with the given filter queries. So, I tried the following query:
q=*:*
&fq=doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO
NOW/DAY%2B21DAYS]
&fq=doctype:available AND sd_year:2014 AND sd_month:11

The query gives 0 results, which is correct. So as you can see both queries
are the same, the different is using the join query parser. I am a bit
confused, why the first query gives results. My understanding is that this
should not happen because the second query shows that there is no any
available documents that satisfy the given filter queries.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Join and non-Join query give different results

2014-07-19 Thread atawfik
I have figured it out. 

The reason is simply the type of join in Solr. It is an outer join. Since
both filter queries are executed separately, a house that has available
documents with discount > 1 or (sd_year:2014 AND sd_month:11) will be
returned even though my intention was applying bother conditions at the same
time. 

However, in the second case, both conditions are applied at same time to
find available documents, then houses based on the matching available
documents are returned. Since there is no any available document that
satisfies both conditions, then there is no any matching house which gives
zero results.

It really took sometime to figure this out, I hope this will help someone
else.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922p4148131.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing custom analyzer for multi-language stemming

2014-09-18 Thread atawfik
Hi,

The author of Solr in Action has produced something similar to what you
want. I even has used it for one of my projects where I needed to
automatically analyze languages.  Here is the link to its code 
https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14

 
.

Nevertheless, you need to pay attention that not all languages are supported
by Lucene or Solr. Therefore, some of the languages detected by Google API
will not have their responding chain analysis. You need to develop that.

In another project, I am following the same approach to develop an
AutoAnalyzer for Lucene without using Solr. So, let me know if you want
directions in how to do it.

Regards
Ameer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-custom-analyzer-for-multi-language-stemming-tp4150156p4159588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets for Child Documents?

2014-10-10 Thread atawfik
Yes. One way is using a join query to link authors to books. The query will
look like this:

q={!join to=author_id_fk to=author_id} publication_date:[...]


The other way is using grouping. Here, you first retrieved books based their
publication then group them on their authors.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-for-Child-Documents-tp4163592p4163751.html
Sent from the Solr - User mailing list archive at Nabble.com.


Large scale Update of solr indexed documents

2014-12-17 Thread atawfik
Hi all,

I have a scenario where I need to generate summaries of indexed documents.
So, I initially thought I should do that at Nutch because I am using Nutch
to push documents to Solr. However, I will need some statistics about terms
and documents. Hence, I will have to duplicate analysis at Nutch. Therefore,
Nutch is not the right place to handle that.

I ended up with two potential solutions. The first is to use Solr. However,
I am not sure how to handle that. 

The second solution is actually to read directly from Lucene index, access
whatever statistics i need then generate summary.

The other challenge is that Solr have around 5 millions documents. The
solution needs to be scalable as well. 

Any ideas or thoughts are very much welcome.

Ameer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Large-scale-Update-of-solr-indexed-documents-tp4174695.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlight documents using group.query?

2018-10-18 Thread atawfik
 Hi, 

if I am using a group.query to get documents, is there a way to highlight
the documents matching group.query using the matching query itself?

If I am not mistaken currently solr will highlight documents using the main
query pass via the request q parameter?





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html