Re: analysis tool vs. reality

Chris Hostetter Thu, 12 Aug 2010 16:56:25 -0700

: Furthermore, I would like to add its not just the highlight matches
: functionality that is horribly broken here, but the output of the analysis
: itself is misleading.
: 
: lets say i take 'textTight' from the example, and add the following synonym:
: 
: this is broken => broke
: 
: the query time analysis is wrong, as it clearly shows synonymfilter
: collapsing "this is broken" to broke, but in reality with the qp for that
: field, you are gonna get 3 separate tokenstreams and this will never
: actually happen (because the qp will divide it up on whitespace first)
: 
: So really the output from 'Query Analyzer' is completely bogus.


analysis.jsp is only intended to explain *analysis* ... it accurately 
tells you what the <analyzer type="query" ...> for the specified field (or 
fieldType) is going to produce given a hunk of text.

That is what it does, that is all that it does, that is all it has ever 
done, and all it has ever purported to do.

You say it's bogus because the qp will divide on whitesapce first -- but 
you're assuming you know what query parser will be used ... the "field" 
query parser (to name one) doesn't split on whitespace first.  That's my 
point: analysis.jsp doesn't make any assumptions about what query parser 
*might* be used, it just tells you what your analyzers do with strings.

Saying the output of analisys.jsp is bogus because it doesn't take into 
account QueryParsing is like saying the output of stats.jsp is bogus 
because those are only the stats of the local solr instance on that 
machine, and it doesn't do distributed stats -- yeah that would be nice to 
have, but the stats.jsp never implies that's what it's giving you.

If there are ways we can make the purpose of analysis.jsp more obvious, 
and less missleading for people who don't udnerstand the distinction 
between query parsing and analysis then i am all for it.  if you really 
believe getting rid of the "highlite" check box is going to help, then 
fine -- but i have yet to see any evidence that people who don't 
understand the relationship between query parsing and analysis are 
confused by the blue boxes.

what people seem to be confused by is when they see the same tokens 
ultimately produced by both the "index" analyzer and the "query" analyzer 
-- it doesn't matter if those tokens are in blue or not, if they see that 
the tokens in the "index" analyzer output are a super set of the tokens in 
the "query" analyzer output then they tend to assume that means searching 
for the string in the "query" box will match documents containing hte 
string in the "index" text box.

Getting rid of the blue table cell is just going to make it harder to 
notice matching tokens in the output -- not reduce the confusion when 
those matching tokens exist in the output.

My question is: What can we do to make it more clear what the *purpose* of 
analysis.jsp is?  is there verbage we can add to the page to make it more 
obvious?

NOTE: I'm not just asking Robert, this is a question for the solr-user 
community as a whole.  I *know* what analysis.jsp is for, i've never been 
confused -- for people who have been confused in hte past (or are still 
confused) please help us understand what type of changes we could make to 
the output of analysis.jsp to make it's functionality more understandable.



-Hoss

Re: analysis tool vs. reality

Reply via email to