Re: Apply clustering to field names?

Erick Erickson Fri, 23 Nov 2012 05:16:48 -0800

Per:

1> relevancy sorting on field names: First, you have to define what that
means <G>... Relavant to your query terms? Relevant by the count of field
names in a particular document? Under any circumstances, this seems like
it's heading towards some kind of analytics. Take a look at
FunctionQueries, (http://wiki.apache.org/solr/FunctionQuery). But remember
that these are generally intended to be scoring documents, independently of
one another. The whole notion of corpus-wide scoring is intended for
documents, not necessarily parts of documents.


2> "..... can retrieve field names that are relevant but not in the result
set." Much the same problem. How do you define "relevant"?

When you say "scraping the result set", I assume you're talking about
inspecting the returned results. The problem here is that there's no
information about the documents that didn't make it into the returned list
(unless you're returning _all_ documents by, say, setting &rows=(some
number bigger than maxDocs). The problem here is that it won't scale.

Your idea of putting all the field names into a single field and faceting
on _that_ would fix this problem, but then you're right back to where you
started, you have a zillion values you're trying to show the user, and
that's really hard to navigate. Although you could supply some set of rules
at that point because you have the complet set of values for all field
names in any document that matches your initial query.

Maybe you can back up a step and ask what the value you're trying to
provide the user by faceting this way? Perhaps if you back up and state the
use-case there might be a completely different approach...

Best
Erick


On Tue, Nov 20, 2012 at 9:22 AM, Per Fredelius <per.fredel...@gmail.com>wrote:

> (Sorry for spamming) It does not solve the whole issue though. I'm still
> looking for a way to "cluster the terms of a field".
>
>
> 2012/11/20 Per Fredelius <per.fredel...@gmail.com>
>
> > I see now that the TermsComponent<
> http://wiki.apache.org/solr/TermsComponent> supply
> > a lot of the data I was looking for.
> >
> > // Per
> >
> >
> > 2012/11/20 Per Fredelius <per.fredel...@gmail.com>
> >
> >> Hello Solr users,
> >>
> >> I'm new at using Solr, working with it for my thesis. I have a
> >> configuration up and running, doing the basic stuff, data import,
> running
> >> queries from a web front end and some faceting. I may still be a bit
> off on
> >> the faceting terminology but here goes.
> >>
> >> *What my set up is doing at the moment:*
> >> In addition to a small number of static fields that are common to all
> >> articles there is a large variety of dynamic fields with names such as
> >> "p_Material" or "p_Secondary_color_scheme". This is neatly dealt with in
> >> the schema using dynamic fields with a "p_*" wildcard. And while each
> >> article may have a small number of such properties, say 0 to 20, the
> total
> >> number of unique properties are quite large, say >1000. For a single
> result
> >> set of ~20 I get sometimes 100 different fields or more. Each field can
> in
> >> turn have +100 possible values throughout the database.
> >>
> >> *What I'm looking to accomplish:*
> >> I want the user to be able to select from relevant properties
> >> and property values, adding them iteratively/interactively to the query
> to
> >> refine the result set.
> >>
> >> *How I do this at the moment:*
> >> I scrape field names from the result set and display them in a side bar.
> >> The user may click a field name to 'expand it'. When expansion happens,
> a
> >> new request is sent to solr, asking for facets of that particular field
> (or
> >> is it 'values of that particular facet' in IR-speak?), and so the field
> UI
> >> component is expanded to show the applicable field values.
> >>
> >> """
> >> p_some_property_1
> >> p_Material [expanded]
> >>   >  Concrete
> >>   >  Glass
> >>   >  Wood
> >>   >  Cotton
> >> p_Secondary_color_scheme
> >> p_SomeProperty_31
> >> p_Battery_type
> >> p_length
> >> p_...
> >> ...
> >> """
> >>
> >> *Problems with my current approach:*
> >> 1. I don't have any good idea on how to apply *relevancy sorting on my
> >> list of field names*. Currently the user has to comb through a large
> >> number of field names in a plain list format.
> >>     I only have 'frequency in result set' as a metric at the moment.
> >> There may be better metrics that take the whole document database into
> >> account. Also, I haven't found a reasonable way to query Solr for field
> >> names relevant to a query. Perhaps I'm overlooking some obvious feature
> for
> >> this use case?
> >>
> >> 2. It would be nice to *apply clustering to the field names*, so that I
> >> may order them into sub directories in the UI and so that I can retrieve
> >> field names that are relevant but not in the result set.
> >>     I have a vague idea how this could be done and it seems to me that
> >> field names would be a very good candidate set for clustering. I could
> >> cluster them according to what documents they appear in. Field names
> >> appearing in the same document would be closely connected. Although I
> don't
> >> know where to begin in practical terms. What would be the best approach?
> >> Should I make a plugin replacing the default clustering component?
> Should I
> >> create a separate index, or separate core? I'm thinking creating
> documents
> >> for each field name with article identifiers as document content.
> >>     Has this been done before? Am I heading into a dead end?
> >>
> >> *Late edit: *
> >> Another perhaps obvious addition that I could make would be to store all
> >> field names of each article in a separate 'field names' field, allowing
> >> facet queries "one level up". I'm at the moment uncertain what
> >> possibilities that would allow though.
> >>
> >> // Thanks for any feedback
> >> Per
> >>
> >
> >
>

Re: Apply clustering to field names?

Reply via email to