On Feb 3, 2007, at 11:55 AM, Yonik Seeley wrote:
On 2/3/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On Feb 2, 2007, at 4:29 PM, Yonik Seeley wrote:
> One downside of doing joins is that it makes it pretty hard to
> distribute/federate in the future because a document doesn't stand
> alone.
The connection between objects is key in our library domain though.
> A flat structure for tagging could be to add a
> taguser and tag field to the actual document each time a user
> tagged a document.
I've been contemplating how that would look and work. But the
downsides you mention are sorta show-stoppers for our needs:
The main one being query or facet by all tags for a specific user?
Yeah, being able to query across all objects, or all objects a user
has collected. Both the all/mine modes are requirements in Collex
(and already there, with cache reloading now being too slow).
I assume an annotation is a comment (like a few sentences)?
If you search on comments, do you just get the comments back with a
pointer to the original doc, or do you get the original doc back?
Currently we don't have a feature to search annotations (yeah, they
are just private user-specific comments). If we searched on it we'd
want both back, the original object and the annotation.
To load a specific object in Collex, I have a special request handler
that pulls the original object by id, and also folds in the tags/
annotation for the username parameter specified in the request.
Storing comments on a document:
- could lead to increased relevancy... all comments from all users
would be
considered together for term-freq
Note, we do keep annotations private between users, but tags are public.
- easy to get comments for a list of documents in a single query
- can use lucene syntax across "A" fields like tite, and commentary.
+title:solr+comments:great
- harder to search for comments from a specific user only
(need sloppy phrase or span queries to do this?)
Keeping things separate between users is important, as well as
folding them together on tags. Again, annotations are currently
private in our system.
Storing comments separately:
- if you search in comments, you get the exact comment that
matched... if you
stored all comments on the A doc, you wouldn't know which matched
(but highlighting
could help with that).
With annotations being private this won't be an issue. Any search in
annotations would be ANDed with the logged in username. And there is
only one annotation per collected object per user.
It has been discussed to allow the user to set whether an annotation
is public or private.
- easy to search comments only from a specific user
Do comments need to be included in faceting in any way?
No, not at all. Again, we've not done any annotation searching in
Collex yet. That is a very desirable feature though.
ps: If I'm making less sense than usual, it might just be because it's
the time of the year that kids bring home nasty germs, and I'm feeling
rather fuzzy headed :-)
I know the feeling!
Erik
p.s. If Solr can solve this situation of tagging objects in a
generalizable way, we are really really rocking! Consider Flickr's
latest "machine tags": <http://www.flickr.com/groups/api/discuss/
72157594497877875/>