DIH nested cached entities not working after upgrade

2013-07-20 Thread Zac Smith
I recently upgraded a solr index from 3.5 to 4.3.0. I'm now having trouble with 
the data import handler when using the CachedSqlEntityProcessor.

The first issue I found was that the 'where' option doesn't work anymore. 
Instead I am now using 'cacheKey' and 'cacheLookup'.

My next issue is that if any nested entities are used, the delta import does 
not process more than 2 documents.
e.g. (simplified from my actual import file)



 ...
  







Full imports run fine. But delta imports will show as having processed 2 
documents, and then will keep fetching more rows until it eventually runs out 
of memory. For some reason, no additional documents are processed. This was 
working fine in 3.x versions of SOLR (up to 3.5).

I'm aware that there have been some significant changes to caching in 
SOLR-2382, but don't think this scenario should be affected. It seems to be 
specifically when there is an entity using caching that contains a sub entity 
that is also using caching.


RE: DIH nested cached entities not working after upgrade

2013-07-21 Thread Zac Smith
Same problem with 4.4.0 RC1.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Sunday, July 21, 2013 5:57 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH nested cached entities not working after upgrade

Could you check with Solr 4.4 RC1:
http://people.apache.org/~sarowe/staging_area/lucene-solr-4.4.0-RC1-rev1504776/solr/?

There were some issues with nested keys ${a.b.c} due to the scoping mechanism 
implementation changes. Not a direct match, but might be easier to check this 
first than dig into deeper causes.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


How to specify dismax boost based on rating

2011-12-03 Thread Zac Smith
Hi,

I think this is a pretty common requirement so hoping someone can easily point 
out the solution:

I have an average rating field defined in my schema that is a tdouble and can 
be anything from 0 - 5 (including decimals). I am using dismax so I want to 
define a boost based on the average rating. So the higher the number, the more 
it gets boosted. I am not sure how to specify the boost in my request handler. 
The examples I have found show something like this:
 
rating:1^1.0 rating:2^2.0 rating:3^3.0 rating:4^4.0 rating:5^5.0


But that seems to assume I would be using whole numbers. I need my rating to 
take into account decimal values as well.

Any pointers?


Multi word synonyms

2012-02-04 Thread Zac Smith
Hi

I have seen several questions on this already but haven't been able to sort my 
issue. My problem is that multi-word synonyms aren't behaving as I would 
expect. I have copied my field type definition at the bottom of this message, 
but the relevant synonym filter is here (used at index time):


Say I have synonyms.txt setup like this:
syrup,sugar syrup,stock syrup

When indexing the text 'syrup', the 3 phrases are treated equivalently as 
expected. I can see this in the Index Analyzer as they all occupy the same term 
position.

But if all of the synonyms are a phrase, it doesn't work. 
e.g. synonyms.txt looks like:
simple syrup,sugar syrup,stock syrup

Now when putting the text 'simple syrup' into the Index Analyzer I can only see 
the original term listed. It is not finding the synonyms.

Anyone know how to fix this?

Zac

Field Type definition:


  




















RE: Multi word synonyms

2012-02-05 Thread Zac Smith
Thanks for the response. This almost worked, I created a new field using the 
KeywordTokenizerFactory as you suggested. The only problem was that searches 
only found documents when quotes were used. 
E.g. 
synonyms.txt setup like this:
simple syrup,sugar syrup,stock syrup

I indexed a document with the value 'simple syrup'. Searches only found the 
document when using quotes:
e.g.
"simple syrup" or "stock syrup" matched
simple syrup (no quotes) did not match

Here is the field I created:


  


  




  

  




Any ideas? Also, I am using dismax and solr 3.5.0.

Thanks
Zac

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Sunday, February 05, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

Your query analyser will tokenize "simple sirup" into "simple" and "sirup"
and wont match on "simple syrup" in the synonyms.txt

So you have to change the query analyzer into KeywordTokenizerFactory as well.

It might be idea to make a field for synonyms only with this tokenizer and 
another field to search on and use dismax. Never tried this though.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3717215.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-05 Thread Zac Smith
Thanks for your response. When I don't include the KeywordTokenizerFactory in 
the SynonymFilter definition, I get additional term values that I don't want.

e.g. synonyms.txt looks like:
simple syrup,sugar syrup,stock syrup

A document with a value containing 'simple syrup' can now be found when 
searching for just 'stock'.

So the problem I am trying to address with KeywordTokenizerFactory, is to 
prevent my multi word synonyms from getting broken down into single words.

Thanks
Zac

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, February 05, 2012 8:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

I'm not quite sure what you're trying to do with KeywordTokenizerFactory in 
your SynonymFilter definition, but if I use the defaults, then the all-phrase 
form works just fine.

So the question is "what problem are you trying to address by using 
KeywordTokenizerFactory?"

Best
Erick

On Sun, Feb 5, 2012 at 8:21 AM, O. Klein  wrote:
> Your query analyser will tokenize "simple sirup" into "simple" and "sirup"
> and wont match on "simple syrup" in the synonyms.txt
>
> So you have to change the query analyzer into KeywordTokenizerFactory 
> as well.
>
> It might be idea to make a field for synonyms only with this tokenizer 
> and another field to search on and use dismax. Never tried this though.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p37172
> 15.html Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-07 Thread Zac Smith
I suppose I could translate every user query to include the term with quotes.

e.g. if someone searches for stock syrup I send a query like:
q=stock syrup OR "stock syrup"

Seems like a bit of a hack though, is there a better way of doing this?

Zac

-Original Message-----
From: Zac Smith 
Sent: Sunday, February 05, 2012 7:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Thanks for the response. This almost worked, I created a new field using the 
KeywordTokenizerFactory as you suggested. The only problem was that searches 
only found documents when quotes were used. 
E.g. 
synonyms.txt setup like this:
simple syrup,sugar syrup,stock syrup

I indexed a document with the value 'simple syrup'. Searches only found the 
document when using quotes:
e.g.
"simple syrup" or "stock syrup" matched
simple syrup (no quotes) did not match

Here is the field I created:


  


  




  

  




Any ideas? Also, I am using dismax and solr 3.5.0.

Thanks
Zac

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Sunday, February 05, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

Your query analyser will tokenize "simple sirup" into "simple" and "sirup"
and wont match on "simple syrup" in the synonyms.txt

So you have to change the query analyzer into KeywordTokenizerFactory as well.

It might be idea to make a field for synonyms only with this tokenizer and 
another field to search on and use dismax. Never tried this though.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3717215.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-07 Thread Zac Smith
It doesn't seem to do it for me. My field type is:





 
  
  



I am using edismax and solr 3.5 and multi word values can only be matched when 
using quotes.

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 07, 2012 12:49 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Isn't that what autoGeneratePhraseQueries="true" is for?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3723886.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-07 Thread Zac Smith
Are you able to explain how I would create another field to fit my scenario?

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 07, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Well, if you want both multi word and single words I guess you will have to 
create another field :) Or make queries like you suggested.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3724009.html
Sent from the Solr - User mailing list archive at Nabble.com.




Keyword Tokenizer Phrase Issue

2012-02-09 Thread Zac Smith
Hi,

I have a simple field type that uses the KeywordTokenizerFactory. I would like 
to use this so that values in this field are only matched with the full text of 
the field.
e.g. If I indexed the text 'chicken stock', searches on this field would only 
match when searching for 'chicken stock'. If searching for just 'chicken' or 
just 'stock' there should not match.

This mostly works, except if there is more than one word in the text I only get 
a match when searching with quotes. e.g.
"chicken stock" (matches)
chicken stock (doesn't match)

Is there any way I can set this up so that I don't have to provide quotes? I am 
using dismax and if I put quotes in it will mess up the search for the rest of 
my fields. I had an idea that I could issue a separate search using the regular 
query parser, but couldn't work out how to do this:
I thought I could do something like this: qt=dismax&q=fish OR 
_query_:ingredient:"chicken stock"

I am using solr 3.5.0. My field type is:









Thanks
Zac


RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
I have done some further analysis on this and I am now even more confused. When 
I use the Field Analysis tool with the text 'chicken stock' it highlights that 
text as a match.
The dismax query looks ok to me:
+(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01) 
DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01)) 
DisjunctionMaxQuery((ingredient_synonyms:chicken stock^0.6)~0.01)

Then I have done an explainOther and it shows a failure to meet condition. 
However there does seem to be some kind of match registered:
0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s)
  0.0 = no match on required clause (ingredient_synonyms:chicken^0.6 
ingredient_synonyms:stock^0.6)
  0.0650662 = (MATCH) weight(ingredient_synonyms:chicken stock^0.6 in 0), 
product of:
0.21204369 = queryWeight(ingredient_synonyms:chicken stock^0.6), product of:
  0.6 = boost
  0.30685282 = idf(docFreq=1, maxDocs=1)
  1.1517122 = queryNorm
0.30685282 = (MATCH) fieldWeight(ingredient_synonyms:chicken stock in 0), 
product of:
  1.0 = tf(termFreq(ingredient_synonyms:chicken stock)=1)
  0.30685282 = idf(docFreq=1, maxDocs=1)
  1.0 = fieldNorm(field=ingredient_synonyms, doc=0)

Any ideas?

My dismax handler is setup like this:
  

 dismax
 explicit
 0.01
 ingredient_synonyms^0.6
 ingredient_synonyms^0.6


Zac

From: Zac Smith
Sent: Thursday, February 09, 2012 12:52 PM
To: solr-user@lucene.apache.org
Subject: Keyword Tokenizer Phrase Issue

Hi,

I have a simple field type that uses the KeywordTokenizerFactory. I would like 
to use this so that values in this field are only matched with the full text of 
the field.
e.g. If I indexed the text 'chicken stock', searches on this field would only 
match when searching for 'chicken stock'. If searching for just 'chicken' or 
just 'stock' there should not match.

This mostly works, except if there is more than one word in the text I only get 
a match when searching with quotes. e.g.
"chicken stock" (matches)
chicken stock (doesn't match)

Is there any way I can set this up so that I don't have to provide quotes? I am 
using dismax and if I put quotes in it will mess up the search for the rest of 
my fields. I had an idea that I could issue a separate search using the regular 
query parser, but couldn't work out how to do this:
I thought I could do something like this: qt=dismax&q=fish OR 
_query_:ingredient:"chicken stock"

I am using solr 3.5.0. My field type is:









Thanks
Zac


RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
Thanks, that explains why the individual terms 'chicken' and 'stock' are still 
in the query (and are required).
So I have tried a few things to get around this, but to no avail:

Changed the query analyzer to use the WhitespaceTokenizerFactory with 
autoGeneratePhraseQueries=true. This creates the correct phrase query, but the 
dismax query still requires the individual terms to match ('chicken' and 
'stock'):
+(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) 
DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) 
DisjunctionMaxQuery((ingredient_synonyms:"chicken stock"~100)~0.01)

So the next thing I have tried is to remove the individual terms during the 
query analysis. I did this using the ShingleFilterFactory, so my query analyzer 
now looks like this:

   



This leaves the single term 'chicken stock' in the query analysis and the 
dismax query is:
+() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)

Which looks OK except for the +(). It looks like it is requiring an empty 
clause.

This seems like a pretty simple requirement - to only have exact matches on 
multi word text. Am I missing something here?

Thanks
Zac


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Friday, February 10, 2012 1:50 AM
To: solr-user@lucene.apache.org
Subject: RE: Keyword Tokenizer Phrase Issue

Hi Zac,

Field Analysis tool (analysis.jsp) does not perform actual query parsing.

One thing to be aware of when Using Keyword Tokenizer at query time is: Query 
string (chicken stock) is pre-tokenized according to white spaces, before it 
reaches keyword tokenizer.

If you use quotes ("chicken stock"), query parser does no pre-tokenizes, though.

--- On Fri, 2/10/12, Zac Smith  wrote:

> From: Zac Smith 
> Subject: RE: Keyword Tokenizer Phrase Issue
> To: "solr-user@lucene.apache.org" 
> Date: Friday, February 10, 2012, 10:35 AM I have done some further 
> analysis on this and I am now even more confused. When I use the Field 
> Analysis tool with the text 'chicken stock' it highlights that text as 
> a match.
> The dismax query looks ok to me:
> +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01)
> DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01))
> DisjunctionMaxQuery((ingredient_synonyms:chicken
> stock^0.6)~0.01)
> 
> Then I have done an explainOther and it shows a failure to meet 
> condition. However there does seem to be some kind of match 
> registered:
> 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited 
> clause(s)
>   0.0 = no match on required clause
> (ingredient_synonyms:chicken^0.6
> ingredient_synonyms:stock^0.6)
>   0.0650662 = (MATCH)
> weight(ingredient_synonyms:chicken stock^0.6 in 0), product
> of:
>     0.21204369 =
> queryWeight(ingredient_synonyms:chicken stock^0.6), product
> of:
>       0.6 = boost
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.1517122 = queryNorm
>     0.30685282 = (MATCH)
> fieldWeight(ingredient_synonyms:chicken stock in 0), product
> of:
>       1.0 =
> tf(termFreq(ingredient_synonyms:chicken stock)=1)
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.0 =
> fieldNorm(field=ingredient_synonyms, doc=0)
> 
> Any ideas?
> 
> My dismax handler is setup like this:
>    class="solr.SearchHandler" >
>     
>       name="defType">dismax
>       name="echoParams">explicit
>       name="tie">0.01
>       name="qf">ingredient_synonyms^0.6
>       name="pf">ingredient_synonyms^0.6
> 
> 
> Zac
> 
> From: Zac Smith
> Sent: Thursday, February 09, 2012 12:52 PM
> To: solr-user@lucene.apache.org
> Subject: Keyword Tokenizer Phrase Issue
> 
> Hi,
> 
> I have a simple field type that uses the KeywordTokenizerFactory. I 
> would like to use this so that values in this field are only matched 
> with the full text of the field.
> e.g. If I indexed the text 'chicken stock', searches on this field 
> would only match when searching for 'chicken stock'.
> If searching for just 'chicken' or just 'stock' there should not 
> match.
> 
> This mostly works, except if there is more than one word in the text I 
> only get a match when searching with quotes.
> e.g.
> "chicken stock" (matches)
> chicken stock (doesn't match)
> 
> Is there any way I can set this up so that I don't have to provide 
> quotes? I am using dismax and if I put quotes in it will mess up the 
> search for the rest of my fields. I had an idea that I could issue a 
> separate search using the regular query parser, but couldn't work out 
> how to do this:
> I thought I could do something like this:
> qt=dismax&q=fish OR _query_:ingredient:"chicken stock"
> 
> I am using solr 3.5.0. My field type is:
>  positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>                
> 
>                
>                
> 
>                
> 
>                
> 
>                
>                
> 
>                
> 
> 
> 
> Thanks
> Zac
> 




RE: Keyword Tokenizer Phrase Issue

2012-02-12 Thread Zac Smith
I have come to the conclusion that this isn't possible due to the way dismax 
queries are created. I found someone else that had the exact same issue last 
year: 
http://lucene.472066.n3.nabble.com/Multi-word-exact-keyword-case-insensitive-search-suggestions-td2246516.html
I believe this makes it impossible to do exact matching on multi word terms 
with dismax.

So I have created two JIRA tickets that hopefully address the issue:
1) a suggested improvement to dismax specific to the KeywordTokenizerFactory: 
https://issues.apache.org/jira/browse/SOLR-3127
2) what I believe is a bug when removing terms from the query: 
https://issues.apache.org/jira/browse/SOLR-3128

Feedback welcome.

Thanks
Zac

-Original Message-
From: Zac Smith 
Sent: Friday, February 10, 2012 3:30 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Keyword Tokenizer Phrase Issue

Thanks, that explains why the individual terms 'chicken' and 'stock' are still 
in the query (and are required).
So I have tried a few things to get around this, but to no avail:

Changed the query analyzer to use the WhitespaceTokenizerFactory with 
autoGeneratePhraseQueries=true. This creates the correct phrase query, but the 
dismax query still requires the individual terms to match ('chicken' and 
'stock'):
+(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) 
+DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) 
+DisjunctionMaxQuery((ingredient_synonyms:"chicken stock"~100)~0.01)

So the next thing I have tried is to remove the individual terms during the 
query analysis. I did this using the ShingleFilterFactory, so my query analyzer 
now looks like this:

   

  This leaves the single term 'chicken stock' 
in the query analysis and the dismax query is:
+() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)

Which looks OK except for the +(). It looks like it is requiring an empty 
clause.

This seems like a pretty simple requirement - to only have exact matches on 
multi word text. Am I missing something here?

Thanks
Zac



Sort by bayesian function for 5 star rating

2012-03-12 Thread Zac Smith
Does anyone have an example formula that can be used to sort by a 5 star rating 
in SOLR?
I am looking at an example on IMDB's top 250 movie list: 

The formula for calculating the Top Rated 250 Titles gives a true Bayesian 
estimate:
 weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C 
where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the Top 250 (currently 3000)
C = the mean vote across the whole report (currently 6.9)



Using the Data Import Handler with SQLite

2011-04-01 Thread Zac Smith
I hope this question is being directed to the right place ...

I am trying to use SQLite (v3) as a source for the Data Import Handler. I am 
using a sqllite jdbc driver (link below) and this works when using with only 
one entity. As soon as I add a sub-entity it falls over with a locked DB error: 
"java.sql.SQLException: database is locked".
Now I realize that you can only have one connection open to SQLite at a time. 
So I assume that the first query is leaving a connection open before it moves 
onto the sub-query. I am not sure if the issue would be in the jdbc driver or 
the DIH. It works fine with SQL Server.

Is this a bug? Or something that just isn't possible with SQLite?

Here is a sample of my data config file:

  
  


 




  


sqllite jdbc driver : http://www.zentus.com/sqlitejdbc/


RE: Using the Data Import Handler with SQLite

2011-04-04 Thread Zac Smith
I was able to resolve this issue by using a different jdbc driver: 
http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC


-Original Message-
From: Zac Smith [mailto:z...@trinkit.com] 
Sent: Friday, April 01, 2011 5:56 PM
To: solr-user@lucene.apache.org
Subject: Using the Data Import Handler with SQLite

I hope this question is being directed to the right place ...

I am trying to use SQLite (v3) as a source for the Data Import Handler. I am 
using a sqllite jdbc driver (link below) and this works when using with only 
one entity. As soon as I add a sub-entity it falls over with a locked DB error: 
"java.sql.SQLException: database is locked".
Now I realize that you can only have one connection open to SQLite at a time. 
So I assume that the first query is leaving a connection open before it moves 
onto the sub-query. I am not sure if the issue would be in the jdbc driver or 
the DIH. It works fine with SQL Server.

Is this a bug? Or something that just isn't possible with SQLite?

Here is a sample of my data config file:

  
  


 




  


sqllite jdbc driver : http://www.zentus.com/sqlitejdbc/


DIH CachedSqlEntityProcessor null exception

2011-04-13 Thread Zac Smith
I have come across an issue with the DIH where I get a null exception when 
pre-caching entities. I expect my entity to have null values so this is a bit 
of a roadblock for me. The issue was described more succinctly in this 
discussion: 
http://lucene.472066.n3.nabble.com/DataImportHandlerException-when-cache-key-is-null-in-SOLR-1-4-1-td2003059.html

Anyone know anything about this?



Schema Design Question

2011-05-13 Thread Zac Smith
Let's say I have a data model that involves books and bookshelves. I have tens 
of thousands of books and thousands of bookshelves. There is a many-many 
relationship between books & bookshelves. All of the books are indexed by SOLR.

I need to be able to query SOLR and get all the books for a given bookshelf. I 
see two schema design options here:


1)  Each book has a multi-value field that contains a list of all the 
bookshelf ID's. Many books will have thousands of bookshelf ID's. In this case 
the query is simple, I just send solr the bookshelf ID.

2)  I send solr a query with each book on the bookshelf e.g. 
q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of book ID's 
so the query can get rather large.

Right now I am using option 2 and it seems to be working fine. I have had to 
crank 'maxBooleanClauses' right up but it does seem to be pretty fast.

Anyone have an opinion?



RE: Schema Design Question

2011-05-13 Thread Zac Smith
Thanks that looks interesting. Don't think it helps my situation though as I 
would have to index all the bookshelves and will still end up having to put 
thousands of Book ID values in a multi-value field.

I guess the question I have is: Is it more appropriate to load a multi-value 
field with a large number of values or should you pass a large number of values 
in as a Boolean clause?

Zac

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, May 13, 2011 10:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Design Question

Hi Zac,

Solr 4.0 (trunk) has support for relationships/JOIN.  Have a look: 
http://search-lucene.com/?q=solr+join

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem 
search :: http://search-lucene.com/



- Original Message 
> From: Zac Smith 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, May 13, 2011 12:28:35 PM
> Subject: Schema Design Question
> 
> Let's say I have a data model that involves books and bookshelves. I 
>have tens of thousands of books and thousands of bookshelves. There is 
>a many-many relationship between books & bookshelves. All of the books are 
>indexed by  SOLR.
> 
> I need to be able to query SOLR and get all the books for a given  
>bookshelf. I see two schema design options here:
> 
> 
> 1)   Each book has a multi-value field that contains a list of all the  
>bookshelf ID's. Many books will have thousands of bookshelf ID's. In 
>this case the query is simple, I just send solr the bookshelf ID.
> 
> 2)   I send solr a query with each book on the bookshelf e.g.  
>q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of 
>book ID's so the query can get rather large.
> 
> Right now I am using option 2 and it  seems to be working fine. I have 
>had to crank 'maxBooleanClauses' right up but  it does seem to be pretty fast.
> 
> Anyone have an opinion?
> 
> 


RE: Schema Design Question

2011-05-15 Thread Zac Smith
Ok thanks for the responses. My option #2 will be easier to implement than 
having the new doc with combinations so will give it a try. But that has opened 
my eyes to different possibilities!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, May 15, 2011 8:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Design Question

Of your first two options, I'd go with a multi-valued field for each book (1).

But kenf_nc's suggestion is a good one too.

On Sun, May 15, 2011 at 3:54 AM, kenf_nc  wrote:
> create a separate document for each book-bookshelf combination.
> doc 1 = book 1,shelf 1
> doc 2 = book 1,shelf 3
> doc 3 = book 2,shelf 1
> etc.
>
> then your queries are q=book_id   to get all bookshelfs a given book 
> is on or q=shelf_id to get all books on a given bookshelf.
>
> Biggest problem people face with Solr schema design is thinking either 
> object orientedly or RDBMs orientedly. You need to think differently.
> Solr/Lucene find text and they find it very fast over huge amounts of data.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p29
> 42809.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>


Spatial Solr 3.1: filter by viewport

2011-05-22 Thread Zac Smith
How would I specify a filter that covered a rectangular viewport? I have 4 
coordinate points for the corners and I want to return everything inside that 
area.
My first naive attempt was this:
q=*:*&fq=coords:[44.119141,-125.948638 TO 47.931066,-111.029205]

At first this seems to work OK, except where the viewport crosses over a point 
where the longitude goes from a positive value to a negative value.

Thanks
Zac


RE: Spatial Solr 3.1: filter by viewport

2011-05-23 Thread Zac Smith
It looks like someone asked this question a few months ago and didn't get an 
answer either ... 
http://lucene.472066.n3.nabble.com/Spatial-Solr-Representing-a-bounding-box-and-searching-for-it-tc2447262.html#none

I really thought this would be a pretty simple question to answer? Is there no 
way to specify the exact coordinates of the bounding box - 
http://wiki.apache.org/solr/SpatialSearch#bbox_-_Bounding-box_filter ??


Zac

-Original Message-
From: Zac Smith [mailto:z...@trinkit.com] 
Sent: Sunday, May 22, 2011 9:34 PM
To: solr-user@lucene.apache.org
Subject: Spatial Solr 3.1: filter by viewport

How would I specify a filter that covered a rectangular viewport? I have 4 
coordinate points for the corners and I want to return everything inside that 
area.
My first naive attempt was this:
q=*:*&fq=coords:[44.119141,-125.948638 TO 47.931066,-111.029205]

At first this seems to work OK, except where the viewport crosses over a point 
where the longitude goes from a positive value to a negative value.

Thanks
Zac


RE: newbie question for DataImportHandler

2011-05-24 Thread Zac Smith
Sounds like you might not be committing the delete. How are you deleting it?
If you run the data import handler with clean=true (which is the default) it 
will delete the data for you anyway so you don't need to delete it yourself.

Hope that helps.

-Original Message-
From: antoniosi [mailto:antonio...@gmail.com] 
Sent: Tuesday, May 24, 2011 4:43 PM
To: solr-user@lucene.apache.org
Subject: newbie question for DataImportHandler

Hi,

I am new to Solr; apologize in advance if this is a stupid question.

I have created a simple database, with only 1 table with 3 columns, id, name, 
and last_update fields.

I populate the database with 1 million test rows.
I run solr, go to the data import handler development console and do a full 
import. I use the "Luke" tool to look at the content of the lucene index.

This all works fine so far.

I remove all the 1 million rows from my table and populate the table with 
another million rows of data.
I remove the index that solr previously create. I restart solr and go to the 
data import handler development console and do the full import again.

I use the "Luke" tool to look at the content of the lucene index. However, I am 
seeing the old data in my new index.

Doe Solr keeps a cached copy of the index somewhere?

I hope I have described my problem clearly.

Thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/newbie-question-for-DataImportHandler-tp2982277p2982277.html
Sent from the Solr - User mailing list archive at Nabble.com.