solr export get wrong results
Hi, all I've recently set up a solr cluster and found that "export" returns different results from "select". And I confirmed that the "export" results are wrong by manually query the results. Even simple queries as follows will get different results: curl "http://localhost:8983/solr/pa_info/select?q=*:*&fl=id&sort=id+desc": 011id descid*:*... curl "http://localhost:8983/solr/pa_info/export?q=*:*&fl=id&sort=id+desc"; : {*"numFound":172*, "docs":[..] Don't have a clue why this happen! Anyone help? Best, Sandy
Re: solr export get wrong results
Thanks for your reply, Jack. The export result sets are incorrect in the sense that results totally don't match the query. For example, when I query age=20(age is int type), the results contains age=14, 22... curl "http://localhost:8983/solr/pa_info/export?q=age:20&fl=id,age"; will get the following result: 0526650337502665034814266503514326650353592665035552266503574726650361626650367726650372352665037422 I 've read the cwiki document, but I'm still not sure that export will return partial results since the doc says:"It's possible to export fully sorted result sets using a special rank query parser <https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking> and response writer <https://cwiki.apache.org/confluence/display/solr/Response+Writers>". But as you can see from the above example, the results are not just partial, they are simply wrong,,, 2014-12-26 20:18 GMT+08:00 Jack Krupansky : > You neglected to tell us specifically in what way the export result is > incorrect. Is some of the data missing, duplicated, garbled, or... what? > Provide an example and be specific about what you think is "wrong" in the > results. > > Have you modified the default solrconfig file? > > I notice that you don't have distrib=false on your select, which would make > your select be from all nodes, while export would only be docs from the > specific node you sent the request to. > > Please confirm whether you have read the doc for the Solr export feature: > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets > > > -- Jack Krupansky > > On Fri, Dec 26, 2014 at 3:58 AM, Sandy Ding > wrote: > > > Hi, all > > > > I've recently set up a solr cluster and found that "export" returns > > different results from "select". > > And I confirmed that the "export" results are wrong by manually query the > > results. > > Even simple queries as follows will get different results: > > > > curl "http://localhost:8983/solr/pa_info/select?q=*:*&fl=id&sort=id+desc > ": > > > > 0 > name="QTime">11id > desc > name="fl">id*:* > name="response" *numFound="1197"* start="0">... > > > > curl "http://localhost:8983/solr/pa_info/export?q=*:*&fl=id&sort=id+desc > " > > : > > {*"numFound":172*, "docs":[..] > > > > Don't have a clue why this happen! Anyone help? > > > > Best, > > Sandy > > >
Re: solr export get wrong results
Hi, Ahmet, I use libuuid for unique id and I guess there shouldn't be duplicate ids. Also, the results are not just incomplete, they are screwed. 2014-12-26 20:19 GMT+08:00 Ahmet Arslan : > Hi, > > Two different things : > > If you have unique key defined document with same id override within a > single shard. > > Plus, uniqueIDs expected to be unique across shards. > > Ahmet > > > > On Friday, December 26, 2014 11:00 AM, Sandy Ding > wrote: > Hi, all > > I've recently set up a solr cluster and found that "export" returns > different results from "select". > And I confirmed that the "export" results are wrong by manually query the > results. > Even simple queries as follows will get different results: > > curl "http://localhost:8983/solr/pa_info/select?q=*:*&fl=id&sort=id+desc": > > 0 name="QTime">11id desc name="fl">id*:* name="response" *numFound="1197"* start="0">... > > curl "http://localhost:8983/solr/pa_info/export?q=*:*&fl=id&sort=id+desc"; > : > {*"numFound":172*, "docs":[..] > > Don't have a clue why this happen! Anyone help? > > Best, > Sandy >
Re: solr export get wrong results
Hi, Joel Thanks for your reply. It seems that the weird export results is because that I removed the "xsort" invariant of the export request handler in the default sorlconfig.xml to get csv-format output. I don't quite understand the meaning of "xsort", but I removed it because I always get json response (as you said) with the xsort invariant. Is there a way to get a csv output using export? And also, can I get full results from all shards? (I tried to set "distrib=true" but get "SyntaxError:xport RankQuery is required for xsort: rq={!xport}", and I do have rq={!xport} in the export invariants) 2014-12-27 3:21 GMT+08:00 Joel Bernstein : > Hi Sandy, > > I pulled Solr 4.10.3 to see if I could recreate the issue you are seeing > with export and I wasn't able to recreate the bug you are seeing. For > example the following query: > > http://localhost:8983/solr/collection1/export?q=join_i:[50 TO > 500010]&wt=json&indent=true&sort=join_i+asc&fl=join_i,ShopId_i > > > Brings back the following result: > > > {"responseHeader": {"status": 0}, "response":{"numFound":11, > > "docs":[{"join_i":50,"ShopId_i":578917},{"join_i":51,"ShopId_i":294217},{"join_i":52,"ShopId_i":199805},{"join_i":53,"ShopId_i":633461},{"join_i":54,"ShopId_i":472995},{"join_i":55,"ShopId_i":672122},{"join_i":56,"ShopId_i":394637},{"join_i":57,"ShopId_i":446443},{"join_i":58,"ShopId_i":697329},{"join_i":59,"ShopId_i":166988},{"join_i":500010,"ShopId_i":191261}]}} > > > Notice the join_i values are all within the correct range. > > If you can post the export handler configuration we should be able to > see the issue. > > > Joel Bernstein > Search Engineer at Heliosearch > > On Fri, Dec 26, 2014 at 1:50 PM, Joel Bernstein > wrote: > > > Hi Sandy, > > > > The export handler should only return documents in JSON format. The > > results in your second example are in XML for format so something looks > to > > be wrong in the configuration. Can you post what your solrconfig looks > like? > > > > Joel > > > > Joel Bernstein > > Search Engineer at Heliosearch > > > > On Fri, Dec 26, 2014 at 12:43 PM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> I think you missed a very important part of Jack's reply: > >> > >> bq: I notice that you don't have distrib=false on your select, which > >> would make your select be from all nodes, while export would only be > >> docs from the specific node you sent the request to. > >> > >> And from the Reference Guide on export > >> > >> bq: The initial release treats all queries as non-distributed > >> requests. So the client is responsible for making the calls to each > >> Solr instance and merging the results. > >> > >> So the export statement you're sending is _only_ exporting the results > >> from the shard on 8983 and completely ignoring the other (6?) shards, > >> whereas the query you're sending is getting the results from all the > >> shards. > >> > >> As Jack said, add &distrib=false to the query, send it to the same > >> shard you send the export command to and the results should match. > >> > >> Also, be sure your configuration for the /select handler doesn't have > >> any additional default parameters that might alter the results, but I > >> doubt that's really a problem here. > >> > >> Best, > >> Erick > >> > >> On Fri, Dec 26, 2014 at 7:02 AM, Ahmet Arslan > > >> wrote: > >> > Hi, > >> > > >> > Do you have any custom solr components deployed? May be custom > response > >> writer? > >> > > >> > Ahmet > >> > > >> > > >> > > >> > > >> > On Friday, December 26, 2014 3:26 PM, Sandy Ding < > >> sandy.ding...@gmail.com> wrote: > >> > Hi, Ahmet, > >> > > >> > I use libuuid for unique id and I guess there shouldn't be duplicate > >> ids. > >> > Also, the results are not just incomplete, they are screwed. > >> > > >> > > >> > 2014-12-26 20:19 GMT+08:00 Ahmet Arslan : > >> > &
Re: solr export get wrong results
Thanks a lot for your for your help, Joel. Just wondering, why does "export" have such limitations? It uses the same query handler with "select", isn't it? 2014-12-31 10:28 GMT+08:00 Joel Bernstein : > For the initial release only JSON output format is supported with the > /export feature. Also there is no built-in distributed support yet. Both of > these features are likely to follow in future releases. > > For the initial release you'll need a client that can handle the JSON > format and distributed logic. The Heliosearch project includes a client > called CloudSolrStream that you can use for this purpose. Here are two > links to get started with CloudSolrStream: > > > https://github.com/Heliosearch/heliosearch/blob/helio_4_10/solr/solrj/src/java/org/apache/solr/client/solrj/streaming/CloudSolrStream.java > http://heliosearch.org/streaming-aggregation-for-solrcloud/ > > > > > > Joel Bernstein > Search Engineer at Heliosearch > > On Mon, Dec 29, 2014 at 2:20 AM, Sandy Ding > wrote: > > > Hi, Joel > > > > Thanks for your reply. > > It seems that the weird export results is because that I removed the > " > name>xsort" invariant of the export request handler in the default > > sorlconfig.xml to get csv-format output. > > I don't quite understand the meaning of "xsort", but I removed it > because I > > always get json response (as you said) with the xsort invariant. > > Is there a way to get a csv output using export? > > And also, can I get full results from all shards? (I tried to set > > "distrib=true" but get "SyntaxError:xport RankQuery is required for > xsort: > > rq={!xport}", and I do have rq={!xport} in the export invariants) > > > > > > 2014-12-27 3:21 GMT+08:00 Joel Bernstein : > > > > > Hi Sandy, > > > > > > I pulled Solr 4.10.3 to see if I could recreate the issue you are > seeing > > > with export and I wasn't able to recreate the bug you are seeing. For > > > example the following query: > > > > > > http://localhost:8983/solr/collection1/export?q=join_i:[50 TO > > > 500010]&wt=json&indent=true&sort=join_i+asc&fl=join_i,ShopId_i > > > > > > > > > Brings back the following result: > > > > > > > > > {"responseHeader": {"status": 0}, "response":{"numFound":11, > > > > > > > > > "docs":[{"join_i":50,"ShopId_i":578917},{"join_i":51,"ShopId_i":294217},{"join_i":52,"ShopId_i":199805},{"join_i":53,"ShopId_i":633461},{"join_i":54,"ShopId_i":472995},{"join_i":55,"ShopId_i":672122},{"join_i":56,"ShopId_i":394637},{"join_i":57,"ShopId_i":446443},{"join_i":58,"ShopId_i":697329},{"join_i":59,"ShopId_i":166988},{"join_i":500010,"ShopId_i":191261}]}} > > > > > > > > > Notice the join_i values are all within the correct range. > > > > > > If you can post the export handler configuration we should be able to > > > see the issue. > > > > > > > > > Joel Bernstein > > > Search Engineer at Heliosearch > > > > > > On Fri, Dec 26, 2014 at 1:50 PM, Joel Bernstein > > > wrote: > > > > > > > Hi Sandy, > > > > > > > > The export handler should only return documents in JSON format. The > > > > results in your second example are in XML for format so something > looks > > > to > > > > be wrong in the configuration. Can you post what your solrconfig > looks > > > like? > > > > > > > > Joel > > > > > > > > Joel Bernstein > > > > Search Engineer at Heliosearch > > > > > > > > On Fri, Dec 26, 2014 at 12:43 PM, Erick Erickson < > > > erickerick...@gmail.com> > > > > wrote: > > > > > > > >> I think you missed a very important part of Jack's reply: > > > >> > > > >> bq: I notice that you don't have distrib=false on your select, which > > > >> would make your select be from all nodes, while export would only be > > > >> docs from the specific node you sent the request to. > > > >> > > > >> And from the Reference Guide on e
How to limit the number of result sets of the 'export' handler
Using rows=xxx doesn't seem to work. Is there a way to do this?
Re: How to limit the number of result sets of the 'export' handler
Thanks Alexandre. I actually need the whole result set. But it is large(perhaps 10m-100m) and I find select is slow. How does export differ from select except that select will make distributed requests and do the merge? Will select with ‘distrib=false’ have comparable performance with export? 2015-01-06 20:55 GMT+08:00 Alexandre Rafalovitch : > Export was specifically designed to get everything which is very > expensive otherwise. > > If you just want the subset, you might be better off with normal > queries and/or with deep paging (cursor). > > Regards, >Alex. > > Sign up for my Solr resources newsletter at http://www.solr-start.com/ > > > On 6 January 2015 at 00:30, Sandy Ding wrote: > > Using rows=xxx doesn't seem to work. > > Is there a way to do this? >
indexed and stored fields don't appear in the response
Part of my schema is as follows: When I issue the following command, curl "http://localhost:8983/solr/pa_info/select?q=*:*&rows=10"; The response is: 016*:*106594047971492348841962242048101233589714923488419632906261783978259149234884196329062711972893031492348841963290629941465004149234884196329063060977659814923488419632906324147965141492348841963290633229697821514923488419643392021007119321149234884196433920524041969331492348841964339208 Any idea why this happen? Thanks~
Re: indexed and stored fields don't appear in the response
Sorry about the error, I have copied the wrong schema file :( The schema.xml file is actually as follows: The command curl "http://localhost:8983/solr/pa_info/select? q=bizid:2380505101&rows=10 <http://localhost:8983/solr/pa_info/select?q=*:*&rows=10>" will get the following response: 015bizid:2380505101102874472791492411976981151745782566232149234015974562201885794002414924119769811517471282927396149234015974247628929985574841492340159744573441 And my question is why bizid and tagid didn't appear in the response? 2015-02-09 17:17 GMT+08:00 Anshum Gupta : > What happens? You seem to be getting back the stored fields for the top 10 > documents. > What do you want/think should happen? > > On Mon, Feb 9, 2015 at 12:56 AM, Sandy Ding > wrote: > > > Part of my schema is as follows: > > > > > > > required="true" multiValued="false" docValues="true" /> > > > required="false" multiValued="true" docValues="true"/> > > > > When I issue the following command, > > > > curl "http://localhost:8983/solr/pa_info/select?q=*:*&rows=10"; > > > > The response is: > > > > > > 0 > name="QTime">16*:* > name="rows">10 numFound="2831784" > > start="0" maxScore="1.0">659404797 > name="_version_">1492348841962242048 > name="id">1012335897 > name="_version_">1492348841963290626 > name="id">1783978259 > name="_version_">1492348841963290627 > name="id">1197289303 > name="_version_">1492348841963290629 > name="id">941465004 > name="_version_">1492348841963290630 > name="id">609776598 > name="_version_">1492348841963290632 > name="id">414796514 > name="_version_">1492348841963290633 > name="id">2296978215 > name="_version_">1492348841964339202 > name="id">1007119321 > name="_version_">1492348841964339205 > name="id">2404196933 > name="_version_">1492348841964339208 > > > > > > Any idea why this happen? > > > > Thanks~ > > > > > > -- > Anshum Gupta > http://about.me/anshumgupta >
Re: indexed and stored fields don't appear in the response
Thanks for Anshum & Gora's suggestion. I haven't set default fl in solrconfig.xml and the documents do contain tagid and bizid field(I've tried both q=*:* and q=bizid:2380505101). I'll look into the reindex possibility that Gora mentioned. 2015-02-09 18:37 GMT+08:00 Gora Mohanty : > On 9 February 2015 at 15:50, Anshum Gupta wrote: > > Common reasons for that would be > > 1. Your default fl in solrconfig is set to id, _version_. Can you try > > explicitly mentioning fl=id,tagid,bizid in the request? Also, it'd be > good > > to look at your solrconfig.xml. > > 2. Chances are, those documents do not contain those fields to begin > with. > > Both bizid and and tagid aren't required fields and so those documents > > might not even have those. What you've shared confuses me a bit. Does > your > > query contain q=bizid:2380505101 or q=*:* ? If you are querying for > bizid, > > the field (with that value) should be a part of the document and the > > previous point should be the reason why you're seeing this behavior. > > One more possibility to consider is that if you change the schema, you > would need to reload the Solr container, and reindex. > > Regards, > Gora >