Did you try QueryParsing.toString? As in:
logger.info("db retrieve time=" + (System.currentTimeMillis() - start) + ",
query=" +
QueryParsing.toString(rb.getQuery(), rb.req.getSchema()) + ",
indexIds=" + getIndexIds(rb));
-- Jack Krupansky
-----Original Message-----
From: Andrew Lundgren
Sent: Tuesday, March 19, 2013 11:52 AM
To: solr-user@lucene.apache.org
Subject: RE: Query.toString printing binary in the output...
Thank you for clarifying.
The logging line is this:
logger.info("db retrieve time=" + (System.currentTimeMillis() - start) + ",
query=" +
rb.getQuery().toString().replaceAll("\\p{Cntrl}", "_") + ",
indexIds=" + getIndexIds(rb));
(The replaceAll call is used to clean out the binary.)
The a complete log looks like this: (I removed some values and inserted
Zs.)
2013-03-19 01:36:58,648 INFO
[org.apache.solr.handler.component.DatabaseComponent] (http-8080-19) [] []
[] [] [] ip-10-212-91-229/10.212.91.229 db retrieve time=53,
query=+(+(givenname:ZZZZ^1.8 | givenname_standard:ZZZZ^1.08 |
givenname:?^-3.6179998 | givenname:Z^0.17999999) +(surname:ZZZZ^1.8 |
surname_standard:ZZZZ^1.08) +(birth_year:1855^0.495 | birth_year:1856^0.495
| (-marriage_year:[1850 TO 1854]^1.0E-4 -death_year:[1850 TO
1854]^1.0E-4 -residence_year:[1850 TO 1854]^1.0E-4 -other_year:[1850 TO
1854]^1.0E-4 +est_birth_year_range:[180 TO 185]^-1.005))
+((+(birth_place:amherst,1929953 |
birth_place_ancestors:amherst,1929953^0.99 | birth_place:amherst,6279984 |
birth_place_ancestors:amherst,6279984^0.99 |
birth_place:novascotia,1927164^0.7 |
birth_place_ancestors:novascotia,1927164^0.69 |
birth_place:cumberland,1929953^0.7 |
birth_place_ancestors:cumberland,1929953^0.69 | birth_place:canada,-1^0.2))
| (+birth_place:?^-2.01 +((record_place:amherst,1929953^0.7 |
record_place_ancestors:amherst,1929953^0.69299996 |
record_place:amherst,6279984^0.7 |
record_place_ancestors:amherst,6279984^0.69299996 |
record_place:novascotia,1927164^0.48999998 |
record_place_ancestors:novascotia,1927164^0.48299998 |
record_place:cumberland,1929953^0.48999998 |
record_place_ancestors:cumberland,1929953^0.48299998 |
record_place:canada,-1^0.14))))) is_principal:T^0.01
(collection_id:`__z_[^0.027 collection_id:`__nB+^0.026
collection_id:`__Zl_^0.025 collection_id:`__i49^0.024
collection_id:`__Pq%^0.023 collection_id:`__VCS^0.022
collection_id:`__WbH^0.021 collection_id:`__Yu_^0.02
collection_id:`__UF&^0.019 collection_id:`__I2g^0.018
collection_id:`__PP_^0.016999999 collection_id:`__Ysv^0.015999999
collection_id:`__Oe_^0.014999999 collection_id:`__Ysw^0.013999999
collection_id:`__Wi_^0.012999998 collection_id:`__fLi^0.011999998
collection_id:`__XRk^0.010999998 collection_id:`__Uz[^0.009999998
collection_id:`__SE_^0.008999998 collection_id:`__Ysx^0.007999998
collection_id:`__Ysh^0.0069999974 collection_id:`__fLh^0.0059999973
collection_id:`__f _^0.004999997 collection_id:`__`^C^0.003999997
collection_id:`__fKM^0.002999997 collection_id:`__Szo^0.001999997
collection_id:`__f ]^9.99997E-4) record_type:`_____^0.11
record_country:Canada^0.1 record_subcountry:Canada,Nova Scotia^0.1,
indexIds=5649621248770, 5649707485955, 5649774056450, 5650368372995,
5650800358658, 40314148353, 17914147586, 77849158944, 77849158945,
77849158946, 77849158947, 77849158948, 77849158949, 77849158950,
77849158951, 77849158952, 77849158953, 77849158954, 77849158955, 77849158956
We have seen these types of issues (though the opposite) when querying with
non-encoded ints.
When preparing the query we have to encode the collection IDs like this:
Query q = new TermQuery(new Term(SolrTag.COLLECTION_ID.getName(),
type.readableToIndexed(Integer.toString(collectionId))));
So perhaps I am using the wrong term when I used encoded, maybe it should
have been Indexed? But that seems to have other meanings would be
potentially more confusing. These are the Terms that are being printed
above that remain in the non-readable format when toString is called.
(Perhaps we should be using something other than readableToIndexed?)
Thanks!
-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Monday, March 18, 2013 7:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Query.toString printing binary in the output...
If you simply attach &debug=all to your URL, you should see the query come
back in your response, XML, JSON, whatever. If that also shows bizarre
characters, then that will give you some idea whether it's in Solr or not.
But you haven't given us much info about how/where you call toString. You
may be getting into trouble with character sets (although I'd find that
quite odd, but its a possibility.
What I'm really finding confusing is that you're mentioning Term alongside
query.toString() (at least that's what I think you're saying), which has
nothing at all to do with Terms, it's just the query string passed in. So
I'm really puzzled as to what you're doing to get this kind of output, it
almost looks like you're trying to print out the _results_ of a query, not
the query.
So some clarification would be helpful...
Best
Erick
On Mon, Mar 18, 2013 at 12:01 PM, Andrew Lundgren <lundg...@familysearch.org
wrote:
I am sorry, I don't follow what you mean by debug=query. Can you
elaborate on that a bit?
Thanks!
-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Sunday, March 17, 2013 8:09 AM
To: solr-user@lucene.apache.org
Subject: Re: Query.toString printing binary in the output...
Hmmm, without looking at the code, somehow when you specify
debug=query you get readable results, maybe that code would be a place to
start?
And are you looking for the parsed output? Otherwise you could print
original query.
Not much help....
Erick
On Fri, Mar 15, 2013 at 3:24 PM, Andrew Lundgren
<lundg...@familysearch.org>wrote:
> We use the toString call on the query in our logs. For some numeric
> types, the encoded form of the number is being printed instead of
> the readable form.
>
> This makes tail and some other tools very unhappy...
>
> Here is a partial example of a query.toString() that would have had
> binary in it. As a short term work around I replaced all
> non-printable characters in the string with an '_'.
>
> (collection_id:`__z_[^0.027 collection_id:`__nB+^0.026
> collection_id:`__Zl_^0.025 collection_id:`__i49^0.024
> collection_id:`__Pq%^0.023 collection_id:`__VCS^0.022
> collection_id:`__WbH^0.021 collection_id:`__Yu_^0.02
> collection_id:`__UF&^0.019 collection_id:`__I2g^0.018
> collection_id:`__PP_^0.016999999 collection_id:`__Ysv^0.015999999
> collection_id:`__Oe_^0.014999999 collection_id:`__Ysw^0.013999999
> collection_id:`__Wi_^0.012999998 collection_id:`__fLi^0.011999998
> collection_id:`__XRk^0.010999998 collection_id:`__Uz[^0.009999998
> collection_id:`__SE_^0.008999998 collection_id:`__Ysx^0.007999998
> collection_id:`__Ysh^0.0069999974 collection_id:`__fLh^0.0059999973
> collection_id:`__f _^0.004999997 collection_id:`__`^C^0.003999997
> collection_id:`__fKM^0.002999997 collection_id:`__Szo^0.001999997
> collection_id:`__f ]^9.99997E-4)
>
> But, as you can see, that is less than useful...
>
> I spent some time looking at the source and found that Term does not
> contain the type of the embedded data. Any possible solutions to
> this short of walking the query and getting the type of each field
> from the schema and creating my own print function?
>
> Thanks!
>
> --
> Andrew
>
>
>
>
> NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information.
> Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact
> the sender by reply email and destroy all copies of the original
> message.
>
>
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.
NOTICE: This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and destroy all
copies of the original message.