Re: How to make string fields (solr.StrField) case insensitive for search?

2007-02-03 Thread Mike Klaas

On 2/2/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


LowerCaseAnalyzer does some tokenization (on whitespace i think ... i
cna't remember off hte top of my head) you can also use the
KeywordAnalyzer along with the LowerCaseFilter if you want the entire
string left as a single token (for sorting perhaps)


Good point--I wasn't aware of the difference in behaviour.

-Mike


Re: Date ranges

2007-02-03 Thread Michael Kimsal

Thanks Hoss - I'll give that a try - intuitively that sounds like it'll work
(I'm still new to this - it's not second nature to me just yet!)

On 2/3/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: However, when I run the following search
: foobar date:[2005-08-01T00:00:00Z TO 2005-08-01T23:59:59Z]

: I get values back that do not have a date value in the 08/01/2005 range.

unless you changed somethine else to mkae queries default to "all clauses
mandatroy" (aka: and "AND" query) that's searching for anythign mathcing
foobar, or anything in that date range)

try this...

+foobar +date:[2005-08-01T00:00:00Z TO 2005-08-01T23:59:59Z]

: Does anyone have any clues/pointers to help me debug this?

adding debugQuery=1 to any URL will help you see exactly what query is
being used, and show you an "explanation" of why each document matched.

:
: Thanks!
:



-Hoss





--
Michael Kimsal
http://webdevradio.com


Re: Date ranges

2007-02-03 Thread Yonik Seeley

On 2/3/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: print c.search(q='a_dt:["2005-08-01T00:00:00Z" TO "2005-08-01T23:59:59Z"]')
:
: Note that without the quotes around the date in the range query, I get
: an exception because the ':' causes the value to be truncated by the
: queryparser.

uhh... i'm not sure what you're talking about dude ... it works just fine
for me when i paste...
a_dt:[2005-08-01T00:00:00Z TO 2005-08-01T23:59:59Z]
..into the query form on the admin screen.


Oops, right... the range query works... it's the term query that gives
the exception:
a_dt:2005-08-01T23:59:59Z

-Yonik


Re: Custom Tokenizer

2007-02-03 Thread Yonik Seeley

Hmmm, classloader hell...
I assume you are putting your analyzer in solr/lib?

Perhaps try to explode the solr webapp and put your custom analyzer
directly in WEB-INF/lib/

-Yonik

On 2/2/07, Smith,Devon <[EMAIL PROTECTED]> wrote:

Hi,

I'm trying to get a custom tokenizer working, but I'm having some
problems. Per the instructions on various pages [1][2], I've been able
to develop and build the factory and tokenizer. However, when I start
solr up, I get a stack trace, that says "java.lang.NoClassDefFoundError:
org/apache/solr/analysis/BaseTokenizerFactory" That's really confusing.

Any thoughts on what I'm missing/doing wrong?

Devon

[1] http://wiki.apache.org/solr/SolrPlugins
[2] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

...
Feb 2, 2007 1:40:53 PM org.apache.solr.schema.IndexSchema readConfig
INFO: Schema name=mapstore
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:151)
at org.mortbay.start.Main.start(Main.java:476)
at org.mortbay.start.Main.main(Main.java:94)
Caused by: java.lang.NoClassDefFoundError:
org/apache/solr/analysis/BaseTokenizerFactory
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at
org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:233)
at java.lang.ClassLoader.loadClass(ClassLoader.java:299)
at
java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:594)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.solr.core.Config.findClass(Config.java:192)
at org.apache.solr.core.Config.newInstance(Config.java:213)
at
org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java
:504)
at
org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:478)
at
org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:296)
at
org.apache.solr.schema.IndexSchema.(IndexSchema.java:69)
at org.apache.solr.core.SolrCore.(SolrCore.java:191)
at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:172)
at org.apache.solr.servlet.SolrServlet.init(SolrServlet.java:72)
at javax.servlet.GenericServlet.init(GenericServlet.java:168)
at
org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:3
83)
at
org.mortbay.jetty.servlet.ServletHolder.start(ServletHolder.java:243)
at
org.mortbay.jetty.servlet.ServletHandler.initializeServlets(ServletHandl
er.java:446)
at
org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp
plicationHandler.java:321)
at
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo
ntext.java:509)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.jetty.Server.main(Server.java:460)
... 7 more

--
Devon Smith <[EMAIL PROTECTED]>
Senior Software Engineer, Office of Research OCLC Online Computer
Library Center, Inc http://www.oclc.org/research/
http://www.oclc.org/research/staff/smith.htm



Re: convert custom facets to Solr facets...

2007-02-03 Thread Yonik Seeley

On 2/3/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

On Feb 2, 2007, at 4:29 PM, Yonik Seeley wrote:
> One downside of doing joins is that it makes it pretty hard to
> distribute/federate in the future because a document doesn't stand
> alone.

The connection between objects is key in our library domain though.

> A flat structure for tagging could be to add a
> taguser and tag field to the actual document each time a user
> tagged a document.

I've been contemplating how that would look and work.  But the
downsides you mention are sorta show-stoppers for our needs:


The main one being query or facet by all tags for a specific user?
That's doable I think.

I assume an annotation is a comment (like a few sentences)?
If you search on comments, do you just get the comments back with a
pointer to the original doc, or do you get the original doc back?

Storing comments on a document:
- could lead to increased relevancy... all comments from all users would be
  considered together for term-freq
- easy to get comments for a list of documents in a single query
- can use lucene syntax across "A" fields like tite, and commentary.
   +title:solr+comments:great
- harder to search for comments from a specific user only
  (need sloppy phrase or span queries to do this?)

Storing comments separately:
- if you search in comments, you get the exact comment that matched... if you
  stored all comments on the A doc, you wouldn't know which matched
(but highlighting
  could help with that).
- easy to search comments only from a specific user

Do comments need to be included in faceting in any way?

-Yonik

ps: If I'm making less sense than usual, it might just be because it's
the time of the year that kids bring home nasty germs, and I'm feeling
rather fuzzy headed :-)


JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Brian Whitman

On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote:


I would LOVE to see a JOIN in SOLR.

I have an index of artists, albums, and songs.  The artists have lots
of metadata and the songs very little.  I'd love to be able to search
for songs using the artist metadata.  Right now, I have to add all the
metadata for each artist and album to each song.


I share my dream with Ryan. My dream API looks like this, sticking  
with the artist/track metaphor, in which we have metadata, say the  
artist's name, associated to artists that should immediately be  
available in track searches:


schema.xml: 

For track documents, we never  with a filled-in artistName. But  
whenever we query: type:track artistName:aerosmith bpm:[120 TO 140] ,  
the aliasField live searches the solr index for  
artistName:aerosmith&fl=artistID, and limits the rest of search space  
by those results.


And for each result, if artistName is an aliasField, solr searches on  
artistID:X&fl=artistName for each artistID X in the results.


If the field denoted in aliasField is given (as it would be for  
artistName in artist-type documents), the aliasField is disabled for  
that document.


Possibly insane, possibly breaking the intent of Solr, could get slow  
quick, but also very useful.


-Brian




Re: Custom Tokenizer

2007-02-03 Thread Erik Hatcher


On Feb 3, 2007, at 11:18 AM, Yonik Seeley wrote:

Hmmm, classloader hell...


Yeah, I had a bad feeling about that external lib thing.  It's a holy  
grail to allow dynamic pluggability in Java, but its much more  
difficult than it perhaps should be.



I assume you are putting your analyzer in solr/lib?

Perhaps try to explode the solr webapp and put your custom analyzer
directly in WEB-INF/lib/


I recommended this to Devon in the #code4lib room as well when he  
mentioned this to me.  I'm curious to see how this resolves, as it  
would be mighty handy to allow external classes but from past  
experiences with classloaders I'd be surprised if this works out as  
well as we'd like.


Erik



Re: convert custom facets to Solr facets...

2007-02-03 Thread Erik Hatcher


On Feb 3, 2007, at 11:55 AM, Yonik Seeley wrote:

On 2/3/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

On Feb 2, 2007, at 4:29 PM, Yonik Seeley wrote:
> One downside of doing joins is that it makes it pretty hard to
> distribute/federate in the future because a document doesn't stand
> alone.

The connection between objects is key in our library domain though.

> A flat structure for tagging could be to add a
> taguser and tag field to the actual document each time a user
> tagged a document.

I've been contemplating how that would look and work.  But the
downsides you mention are sorta show-stoppers for our needs:


The main one being query or facet by all tags for a specific user?


Yeah, being able to query across all objects, or all objects a user  
has collected.  Both the all/mine modes are requirements in Collex  
(and already there, with cache reloading now being too slow).



I assume an annotation is a comment (like a few sentences)?
If you search on comments, do you just get the comments back with a
pointer to the original doc, or do you get the original doc back?


Currently we don't have a feature to search annotations (yeah, they  
are just private user-specific comments).  If we searched on it we'd  
want both back, the original object and the annotation.


To load a specific object in Collex, I have a special request handler  
that pulls the original object by id, and also folds in the tags/ 
annotation for the username parameter specified in the request.



Storing comments on a document:
- could lead to increased relevancy... all comments from all users  
would be

  considered together for term-freq


Note, we do keep annotations private between users, but tags are public.


- easy to get comments for a list of documents in a single query
- can use lucene syntax across "A" fields like tite, and commentary.
   +title:solr+comments:great
- harder to search for comments from a specific user only
  (need sloppy phrase or span queries to do this?)


Keeping things separate between users is important, as well as  
folding them together on tags.  Again, annotations are currently  
private in our system.



Storing comments separately:
- if you search in comments, you get the exact comment that  
matched... if you

  stored all comments on the A doc, you wouldn't know which matched
(but highlighting
  could help with that).


With annotations being private this won't be an issue.  Any search in  
annotations would be ANDed with the logged in username.  And there is  
only one annotation per collected object per user.


It has been discussed to allow the user to set whether an annotation  
is public or private.



- easy to search comments only from a specific user

Do comments need to be included in faceting in any way?


No, not at all.  Again, we've not done any annotation searching in  
Collex yet.  That is a very desirable feature though.



ps: If I'm making less sense than usual, it might just be because it's
the time of the year that kids bring home nasty germs, and I'm feeling
rather fuzzy headed :-)


I know the feeling!

Erik

p.s. If Solr can solve this situation of tagging objects in a  
generalizable way, we are really really rocking!   Consider Flickr's  
latest "machine tags": 







Re: JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Walter Underwood
We would never use JOIN. We denormalize for speed. Not a big deal.

wunder
==
Search Guru, Netflix


On 2/3/07 11:16 AM, "Brian Whitman" <[EMAIL PROTECTED]> wrote:

> On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote:
> 
>> I would LOVE to see a JOIN in SOLR.
>> 
>> I have an index of artists, albums, and songs.  The artists have lots
>> of metadata and the songs very little.  I'd love to be able to search
>> for songs using the artist metadata.  Right now, I have to add all the
>> metadata for each artist and album to each song.
> 
> I share my dream with Ryan. My dream API looks like this, sticking
> with the artist/track metaphor, in which we have metadata, say the
> artist's name, associated to artists that should immediately be
> available in track searches:
> 
> schema.xml: 
> 
> For track documents, we never  with a filled-in artistName. But
> whenever we query: type:track artistName:aerosmith bpm:[120 TO 140] ,
> the aliasField live searches the solr index for
> artistName:aerosmith&fl=artistID, and limits the rest of search space
> by those results.
> 
> And for each result, if artistName is an aliasField, solr searches on
> artistID:X&fl=artistName for each artistID X in the results.
> 
> If the field denoted in aliasField is given (as it would be for
> artistName in artist-type documents), the aliasField is disabled for
> that document.
> 
> Possibly insane, possibly breaking the intent of Solr, could get slow
> quick, but also very useful.
> 
> -Brian
> 
> 



Re: JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Erik Hatcher


On Feb 3, 2007, at 4:54 PM, Walter Underwood wrote:

We would never use JOIN. We denormalize for speed. Not a big deal.


So out of curiosity then, what would your design be for indexing  
"objects" which have attributes that frequently change and the  
quicker you get that reflected to users the better?  Consider that  
the objects may be large amounts of full-text associated with them as  
well.


Erik




wunder
==
Search Guru, Netflix


On 2/3/07 11:16 AM, "Brian Whitman" <[EMAIL PROTECTED]> wrote:


On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote:


I would LOVE to see a JOIN in SOLR.

I have an index of artists, albums, and songs.  The artists have  
lots
of metadata and the songs very little.  I'd love to be able to  
search
for songs using the artist metadata.  Right now, I have to add  
all the

metadata for each artist and album to each song.


I share my dream with Ryan. My dream API looks like this, sticking
with the artist/track metaphor, in which we have metadata, say the
artist's name, associated to artists that should immediately be
available in track searches:

schema.xml: 

For track documents, we never  with a filled-in artistName. But
whenever we query: type:track artistName:aerosmith bpm:[120 TO 140] ,
the aliasField live searches the solr index for
artistName:aerosmith&fl=artistID, and limits the rest of search space
by those results.

And for each result, if artistName is an aliasField, solr searches on
artistID:X&fl=artistName for each artistID X in the results.

If the field denoted in aliasField is given (as it would be for
artistName in artist-type documents), the aliasField is disabled for
that document.

Possibly insane, possibly breaking the intent of Solr, could get slow
quick, but also very useful.

-Brian






Re: JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Ryan McKinley

On 2/3/07, Walter Underwood <[EMAIL PROTECTED]> wrote:

We would never use JOIN. We denormalize for speed. Not a big deal.



I'm looking at an application where speed is not the only concern.  If
I can remove the need for a 'normalized' and 'denormalized' form it
would be a HUGE win.  Essentially I'd like solr to handle the JOIN
rather then embed an SQL database and keep two databases synchronized.
If this is at a slight performance hit, thats fine by me!

When speed is the #1 concern, this may not be an option people should
use.  Likewise, I don't think the fact this would break federated
searching should deter the ambition to enable a JOIN like support for
solr.


Re: JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Ryan McKinley


I share my dream with Ryan. My dream API looks like this, sticking
with the artist/track metaphor, in which we have metadata, say the
...


you share my dream!  thats amazing!

I really hope eric persists with the JOIN direction...  (its would be
great to get the Lucene in Action guys working on this!)  I just sent
off for the ruby on rails books...  I've been fighting it for a while,
but eric is working on solr 'flare' which essentially looks *exactly*
like what I'm building for the BPL portal


Re: JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Ryan McKinley

oops!!!  I meant to reply directly to Brian - an old friend of mine
from graduate school...

next time I'll check the reply-to button more closely.


Re: JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Erik Hatcher
I'm quite open to NOT having a JOIN in Solr if flattening the model  
still provides the querying capability desired.  I've not fully  
followed the specifics that Yonik has mentioned on this thread, but  
it certainly is the case that denormalizing/flattening our domain  
does not exactly lend itself (easily) to querying exactly how we want.


I implemented the cross-reference caches in Collex knowing it wasn't  
very scalable and was implemented very crudely.  I think the warming  
of these cross-references can be made much smarter (it's braindead  
right now, and builds *everything* over again on a single commit of a  
single object being tagged) - but I've not yet grasped how to be more  
clever with the warming.  If I have the full picture of what changed  
document-wise between the current searcher and the warming one,  
reducing the effort the warmer takes shouldn't be too hard.  Can a  
warming routine know about what changed precisely?


In short, the JOIN is merely a means to an end.  If I can get to that  
end with Solr as-is, JOIN?  What JOIN?


Erik



On Feb 3, 2007, at 11:40 PM, Ryan McKinley wrote:


On 2/3/07, Walter Underwood <[EMAIL PROTECTED]> wrote:

We would never use JOIN. We denormalize for speed. Not a big deal.



I'm looking at an application where speed is not the only concern.  If
I can remove the need for a 'normalized' and 'denormalized' form it
would be a HUGE win.  Essentially I'd like solr to handle the JOIN
rather then embed an SQL database and keep two databases synchronized.
If this is at a slight performance hit, thats fine by me!

When speed is the #1 concern, this may not be an option people should
use.  Likewise, I don't think the fact this would break federated
searching should deter the ambition to enable a JOIN like support for
solr.