Re: convert custom facets to Solr facets...

Erik Hatcher Fri, 02 Feb 2007 21:16:58 -0800


On Feb 2, 2007, at 4:29 PM, Yonik Seeley wrote:

One downside of doing joins is that it makes it pretty hard to
distribute/federate in the future because a document doesn't stand
alone.


The connection between objects is key in our library domain though.

A flat structure for tagging could be to add a
taguser and tag field to the actual document each time a usertagged a document.

I've been contemplating how that would look and work. But thedownsides you mention are sorta show-stoppers for our needs:

- filter query resultst by a constraint tag=foo
fq=tag:foo

You wouldn't be able to query for:
- total number of tags


Couldn't that be the term frequency information?

- items with the largest number of tags

tag frequency is very important, but having a tag field would give usfrequency per tag term. So I don't see this as a problem.

- a tag by a specific user... that would require something like a
phrase match across fields.

This is necessary too. The Collex sidebar allows you to see allobjects tagged as "foo" by a specific user.

Downsides of a flat structure:
- you need to reindex the whole document, or have updateable documents
- even with updateable documents, it could be costly to update
 (if people's tagging rate is fairly low, this may not matter much)

I figured this usecase would lend itself well to updateable docs,though I've not yet visualized how this would work entirely.


--- Separate tag or collectible objects ---

   - all collected objects

The count of all tagged objects?  how would you do this?

   - all objects collected by erikhatcher

facet.query=C_user:erikhatcher

   - all collected objects with tag "foo"

facet.query=C_tag:foo


The "all objects tagged "foo" by erikhatcher is the holy grail, eh?

- facet by tag
facet.field=C_tag   (this would give counts of *tags* not documents)


These are important numbers too.  But object count per tag is the ideal.

- filter query resultst by a constraint tag=foo
Not currently doable, would need to build up a filter somehow...
indirectFilter=id:((C_tag:foo).C_uid)

If an indirect approach has enough advantages, we could perhaps come
up with a way to express it.


I like it!

My custom facet cache differs from the built-in facets
in that it builds a cross-reference cache from the "C" types to the
"A" types (a JOIN, heh).

What does the cross-reference cache look like when it's built? Asimple int[]?

To do more efficiently, it seems like one would want separate indicies
for the A and C docs  to keep maxDoc() down.



    cache = new HashMap<String, Map>();

Map<String,Map<String,DocSet>> userTagMap = newHashMap<String,Map<String,DocSet>>();

    Map<String,DocSet> tagMap = new HashMap<String, DocSet>();
    Map<String,DocSet> userMap = new HashMap<String, DocSet>();
    Map<String, DocSet> collectedMap = new HashMap<String, DocSet>();
    DocSet collectedSet = new BitDocSet();
    collectedMap.put("collected", collectedSet);
    cache.put("tag", tagMap);
    cache.put("usertag", userTagMap);
    cache.put("username", userMap);
    cache.put("collected", collectedMap);

so basically (in Ruby code) I have the following to get a DocSet:

        cache['tag'][tag]

or
        cache['usertag'][username][tag]

Interestingly, I do build a separate RAMDirectory index for anotherpurpose under Collex: agent name lookup, where agents are associatedwith one or more roles.

What's the id for the C docs?  user catenated with id of the collected
doc, so all tags/comments for a particular user on a particular doc go
in the same C doc?

Yes, a collectable object has a URI in this form: "#{object_id}/#{username}"

Thanks for the feedback thus far. I'm optimistic we'll find a goodsolution to this. Worst case, I continue to use my hack for mappingassociations, but tune the cache generation a bit.


        Erik

Re: convert custom facets to Solr facets...

Reply via email to