solr export get wrong results

2014-12-26 Thread Sandy Ding
Hi, all

I've recently set up a solr cluster and found that "export" returns
different results from "select".
And I confirmed that the "export" results are wrong by manually query the
results.
Even simple queries as follows will get different results:

curl "http://localhost:8983/solr/pa_info/select?q=*:*&fl=id&sort=id+desc":

011id descid*:*...

curl "http://localhost:8983/solr/pa_info/export?q=*:*&fl=id&sort=id+desc"; :
{*"numFound":172*, "docs":[..]

Don't have a clue why this happen! Anyone help?

Best,
Sandy


Re: solr export get wrong results

2014-12-26 Thread Sandy Ding
Thanks for your reply, Jack.

The export result sets are incorrect in the sense that results totally
don't match the query.
For example, when I query age=20(age is int type), the results contains
age=14, 22...
  curl "http://localhost:8983/solr/pa_info/export?q=age:20&fl=id,age"; will
get the following result:

0526650337502665034814266503514326650353592665035552266503574726650361626650367726650372352665037422


I 've read the cwiki document, but I'm still not sure that export will
return partial results since the doc says:"It's possible to export fully
sorted result sets using a special rank query parser
<https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking>
and response
writer <https://cwiki.apache.org/confluence/display/solr/Response+Writers>".
But as you can see from the above example, the results are not just
partial, they are simply wrong,,,

2014-12-26 20:18 GMT+08:00 Jack Krupansky :

> You neglected to tell us specifically in what way the export result is
> incorrect. Is some of the data missing, duplicated, garbled, or... what?
> Provide an example and be specific about what you think is "wrong" in the
> results.
>
> Have you modified the default solrconfig file?
>
> I notice that you don't have distrib=false on your select, which would make
> your select be from all nodes, while export would only be docs from the
> specific node you sent the request to.
>
> Please confirm whether you have read the doc for the Solr export feature:
> https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
>
>
> -- Jack Krupansky
>
> On Fri, Dec 26, 2014 at 3:58 AM, Sandy Ding 
> wrote:
>
> > Hi, all
> >
> > I've recently set up a solr cluster and found that "export" returns
> > different results from "select".
> > And I confirmed that the "export" results are wrong by manually query the
> > results.
> > Even simple queries as follows will get different results:
> >
> > curl "http://localhost:8983/solr/pa_info/select?q=*:*&fl=id&sort=id+desc
> ":
> >
> > 0 > name="QTime">11id
> desc > name="fl">id*:* > name="response" *numFound="1197"* start="0">...
> >
> > curl "http://localhost:8983/solr/pa_info/export?q=*:*&fl=id&sort=id+desc
> "
> > :
> > {*"numFound":172*, "docs":[..]
> >
> > Don't have a clue why this happen! Anyone help?
> >
> > Best,
> > Sandy
> >
>


Re: solr export get wrong results

2014-12-26 Thread Sandy Ding
Hi, Ahmet,

I use libuuid for unique id and I guess there shouldn't be duplicate ids.
Also, the results are not just incomplete, they are screwed.

2014-12-26 20:19 GMT+08:00 Ahmet Arslan :

> Hi,
>
> Two different things :
>
> If you have unique key defined document with same id override within a
> single shard.
>
> Plus, uniqueIDs expected to be unique across shards.
>
> Ahmet
>
>
>
> On Friday, December 26, 2014 11:00 AM, Sandy Ding 
> wrote:
> Hi, all
>
> I've recently set up a solr cluster and found that "export" returns
> different results from "select".
> And I confirmed that the "export" results are wrong by manually query the
> results.
> Even simple queries as follows will get different results:
>
> curl "http://localhost:8983/solr/pa_info/select?q=*:*&fl=id&sort=id+desc":
>
> 0 name="QTime">11id desc name="fl">id*:* name="response" *numFound="1197"* start="0">...
>
> curl "http://localhost:8983/solr/pa_info/export?q=*:*&fl=id&sort=id+desc";
> :
> {*"numFound":172*, "docs":[..]
>
> Don't have a clue why this happen! Anyone help?
>
> Best,
> Sandy
>


Re: solr export get wrong results

2014-12-28 Thread Sandy Ding
Hi, Joel

Thanks for your reply.
It seems that the weird export results is because that I removed the "xsort" invariant of the export request handler in the default
sorlconfig.xml to get csv-format output.
I don't quite understand the meaning of "xsort", but I removed it because I
always get json response (as you said) with the xsort invariant.
Is there a way to get a csv output using export?
And also, can I get full results from all shards? (I tried to set
"distrib=true" but get "SyntaxError:xport RankQuery is required for xsort:
rq={!xport}", and I do have rq={!xport} in the export invariants)


2014-12-27 3:21 GMT+08:00 Joel Bernstein :

> Hi Sandy,
>
> I pulled Solr 4.10.3 to see if I could recreate the issue you are seeing
> with export and I wasn't able to recreate the bug you are seeing. For
> example the following query:
>
> http://localhost:8983/solr/collection1/export?q=join_i:[50 TO
> 500010]&wt=json&indent=true&sort=join_i+asc&fl=join_i,ShopId_i
>
>
> Brings back the following result:
>
>
> {"responseHeader": {"status": 0}, "response":{"numFound":11,
>
> "docs":[{"join_i":50,"ShopId_i":578917},{"join_i":51,"ShopId_i":294217},{"join_i":52,"ShopId_i":199805},{"join_i":53,"ShopId_i":633461},{"join_i":54,"ShopId_i":472995},{"join_i":55,"ShopId_i":672122},{"join_i":56,"ShopId_i":394637},{"join_i":57,"ShopId_i":446443},{"join_i":58,"ShopId_i":697329},{"join_i":59,"ShopId_i":166988},{"join_i":500010,"ShopId_i":191261}]}}
>
>
> Notice the join_i values are all within the correct range.
>
> If you can post the export handler configuration we should be able to
> see the issue.
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
> On Fri, Dec 26, 2014 at 1:50 PM, Joel Bernstein 
> wrote:
>
> > Hi Sandy,
> >
> > The export handler should only return documents in JSON format. The
> > results in your second example are in XML for format so something looks
> to
> > be wrong in the configuration. Can you post what your solrconfig looks
> like?
> >
> > Joel
> >
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
> > On Fri, Dec 26, 2014 at 12:43 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> I think you missed a very important part of Jack's reply:
> >>
> >> bq: I notice that you don't have distrib=false on your select, which
> >> would make your select be from all nodes, while export would only be
> >> docs from the specific node you sent the request to.
> >>
> >> And from the Reference Guide on export
> >>
> >> bq: The initial release treats all queries as non-distributed
> >> requests. So the client is responsible for making the calls to each
> >> Solr instance and merging the results.
> >>
> >> So the export statement you're sending is _only_ exporting the results
> >> from the shard on 8983 and completely ignoring the other (6?) shards,
> >> whereas the query you're sending is getting the results from all the
> >> shards.
> >>
> >> As Jack said, add &distrib=false to the query, send it to the same
> >> shard you send the export command to and the results should match.
> >>
> >> Also, be sure your configuration for the /select handler doesn't have
> >> any additional default parameters that might alter the results, but I
> >> doubt that's really a problem here.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Dec 26, 2014 at 7:02 AM, Ahmet Arslan  >
> >> wrote:
> >> > Hi,
> >> >
> >> > Do you have any custom solr components deployed? May be custom
> response
> >> writer?
> >> >
> >> > Ahmet
> >> >
> >> >
> >> >
> >> >
> >> > On Friday, December 26, 2014 3:26 PM, Sandy Ding <
> >> sandy.ding...@gmail.com> wrote:
> >> > Hi, Ahmet,
> >> >
> >> > I use libuuid for unique id and I guess there shouldn't be duplicate
> >> ids.
> >> > Also, the results are not just incomplete, they are screwed.
> >> >
> >> >
> >> > 2014-12-26 20:19 GMT+08:00 Ahmet Arslan :
> >> >
&

Re: solr export get wrong results

2015-01-03 Thread Sandy Ding
Thanks a lot for your for your help, Joel.
Just wondering, why does "export" have such limitations? It uses the same
query handler with "select", isn't it?

2014-12-31 10:28 GMT+08:00 Joel Bernstein :

> For the initial release only JSON output format is supported with the
> /export feature. Also there is no built-in distributed support yet. Both of
> these features are likely to follow in future releases.
>
> For the initial release you'll need a client that can handle the JSON
> format and distributed logic. The Heliosearch project includes a client
> called CloudSolrStream that you can use for this purpose. Here are two
> links to get started with CloudSolrStream:
>
>
> https://github.com/Heliosearch/heliosearch/blob/helio_4_10/solr/solrj/src/java/org/apache/solr/client/solrj/streaming/CloudSolrStream.java
> http://heliosearch.org/streaming-aggregation-for-solrcloud/
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
> On Mon, Dec 29, 2014 at 2:20 AM, Sandy Ding 
> wrote:
>
> > Hi, Joel
> >
> > Thanks for your reply.
> > It seems that the weird export results is because that I removed the
> " > name>xsort" invariant of the export request handler in the default
> > sorlconfig.xml to get csv-format output.
> > I don't quite understand the meaning of "xsort", but I removed it
> because I
> > always get json response (as you said) with the xsort invariant.
> > Is there a way to get a csv output using export?
> > And also, can I get full results from all shards? (I tried to set
> > "distrib=true" but get "SyntaxError:xport RankQuery is required for
> xsort:
> > rq={!xport}", and I do have rq={!xport} in the export invariants)
> >
> >
> > 2014-12-27 3:21 GMT+08:00 Joel Bernstein :
> >
> > > Hi Sandy,
> > >
> > > I pulled Solr 4.10.3 to see if I could recreate the issue you are
> seeing
> > > with export and I wasn't able to recreate the bug you are seeing. For
> > > example the following query:
> > >
> > > http://localhost:8983/solr/collection1/export?q=join_i:[50 TO
> > > 500010]&wt=json&indent=true&sort=join_i+asc&fl=join_i,ShopId_i
> > >
> > >
> > > Brings back the following result:
> > >
> > >
> > > {"responseHeader": {"status": 0}, "response":{"numFound":11,
> > >
> > >
> >
> "docs":[{"join_i":50,"ShopId_i":578917},{"join_i":51,"ShopId_i":294217},{"join_i":52,"ShopId_i":199805},{"join_i":53,"ShopId_i":633461},{"join_i":54,"ShopId_i":472995},{"join_i":55,"ShopId_i":672122},{"join_i":56,"ShopId_i":394637},{"join_i":57,"ShopId_i":446443},{"join_i":58,"ShopId_i":697329},{"join_i":59,"ShopId_i":166988},{"join_i":500010,"ShopId_i":191261}]}}
> > >
> > >
> > > Notice the join_i values are all within the correct range.
> > >
> > > If you can post the export handler configuration we should be able to
> > > see the issue.
> > >
> > >
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> > > On Fri, Dec 26, 2014 at 1:50 PM, Joel Bernstein 
> > > wrote:
> > >
> > > > Hi Sandy,
> > > >
> > > > The export handler should only return documents in JSON format. The
> > > > results in your second example are in XML for format so something
> looks
> > > to
> > > > be wrong in the configuration. Can you post what your solrconfig
> looks
> > > like?
> > > >
> > > > Joel
> > > >
> > > > Joel Bernstein
> > > > Search Engineer at Heliosearch
> > > >
> > > > On Fri, Dec 26, 2014 at 12:43 PM, Erick Erickson <
> > > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> I think you missed a very important part of Jack's reply:
> > > >>
> > > >> bq: I notice that you don't have distrib=false on your select, which
> > > >> would make your select be from all nodes, while export would only be
> > > >> docs from the specific node you sent the request to.
> > > >>
> > > >> And from the Reference Guide on e

How to limit the number of result sets of the 'export' handler

2015-01-05 Thread Sandy Ding
Using rows=xxx doesn't seem to work.
Is there a way to do this?


Re: How to limit the number of result sets of the 'export' handler

2015-01-06 Thread Sandy Ding
Thanks Alexandre.
I actually need the whole result set. But it is large(perhaps 10m-100m) and
I find select is slow.
How does export differ from select except that select will make distributed
requests and do the merge?
Will select with ‘distrib=false’ have comparable performance with export?


2015-01-06 20:55 GMT+08:00 Alexandre Rafalovitch :

> Export was specifically designed to get everything which is very
> expensive otherwise.
>
> If you just want the subset, you might be better off with normal
> queries and/or with deep paging (cursor).
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 6 January 2015 at 00:30, Sandy Ding  wrote:
> > Using rows=xxx doesn't seem to work.
> > Is there a way to do this?
>


indexed and stored fields don't appear in the response

2015-02-09 Thread Sandy Ding
Part of my schema is as follows:



 

When I issue the following command,

curl "http://localhost:8983/solr/pa_info/select?q=*:*&rows=10";

The response is:


016*:*106594047971492348841962242048101233589714923488419632906261783978259149234884196329062711972893031492348841963290629941465004149234884196329063060977659814923488419632906324147965141492348841963290633229697821514923488419643392021007119321149234884196433920524041969331492348841964339208


Any idea why this happen?

Thanks~


Re: indexed and stored fields don't appear in the response

2015-02-09 Thread Sandy Ding
Sorry about the error, I have copied the wrong schema file :(

The schema.xml file is actually as follows:







The command
curl "http://localhost:8983/solr/pa_info/select?
q=bizid:2380505101&rows=10
<http://localhost:8983/solr/pa_info/select?q=*:*&rows=10>"
will get the following response:


015bizid:2380505101102874472791492411976981151745782566232149234015974562201885794002414924119769811517471282927396149234015974247628929985574841492340159744573441


And my question is why bizid and tagid didn't appear in the response?


2015-02-09 17:17 GMT+08:00 Anshum Gupta :

> What happens? You seem to be getting back the stored fields for the top 10
> documents.
> What do you want/think should happen?
>
> On Mon, Feb 9, 2015 at 12:56 AM, Sandy Ding 
> wrote:
>
> > Part of my schema is as follows:
> >
> > 
> >  > required="true" multiValued="false" docValues="true" />
> >   > required="false" multiValued="true" docValues="true"/>
> >
> > When I issue the following command,
> >
> > curl "http://localhost:8983/solr/pa_info/select?q=*:*&rows=10";
> >
> > The response is:
> >
> > 
> > 0 > name="QTime">16*:* > name="rows">10 numFound="2831784"
> > start="0" maxScore="1.0">659404797 > name="_version_">1492348841962242048 > name="id">1012335897 > name="_version_">1492348841963290626 > name="id">1783978259 > name="_version_">1492348841963290627 > name="id">1197289303 > name="_version_">1492348841963290629 > name="id">941465004 > name="_version_">1492348841963290630 > name="id">609776598 > name="_version_">1492348841963290632 > name="id">414796514 > name="_version_">1492348841963290633 > name="id">2296978215 > name="_version_">1492348841964339202 > name="id">1007119321 > name="_version_">1492348841964339205 > name="id">2404196933 > name="_version_">1492348841964339208
> > 
> >
> > Any idea why this happen?
> >
> > Thanks~
> >
>
>
>
> --
> Anshum Gupta
> http://about.me/anshumgupta
>


Re: indexed and stored fields don't appear in the response

2015-02-09 Thread Sandy Ding
Thanks for Anshum & Gora's suggestion.
I haven't set default fl in solrconfig.xml and the documents do contain
tagid and bizid field(I've tried both q=*:* and q=bizid:2380505101).
I'll look into the reindex possibility that Gora mentioned.

2015-02-09 18:37 GMT+08:00 Gora Mohanty :

> On 9 February 2015 at 15:50, Anshum Gupta  wrote:
> > Common reasons for that would be
> > 1. Your default fl in solrconfig is set to id, _version_. Can you try
> > explicitly mentioning fl=id,tagid,bizid in the request? Also, it'd be
> good
> > to look at your solrconfig.xml.
> > 2. Chances are, those documents do not contain those fields to begin
> with.
> > Both bizid and and tagid aren't required fields and so those documents
> > might not even have those. What you've shared confuses me a bit. Does
> your
> > query contain q=bizid:2380505101 or q=*:* ? If you are querying for
> bizid,
> > the field (with that value) should be a part of the document and the
> > previous point should be the reason why you're seeing this behavior.
>
> One more possibility to consider is that if you change the schema, you
> would need to reload the Solr container, and reindex.
>
> Regards,
> Gora
>