Re: REST API Alternative to admin/luke

2014-12-05 Thread Ahmet Arslan
Hi,

I use it in production with numTerms=0 parameter set.

Ahmet


On Thursday, December 4, 2014 10:48 PM, Constantin Wolber 
 wrote:
Hi,

Basically using an endpoint in the admin section is something that makes me 
think if there is an alternative.

And it would have been nice to have a straight forward resource oriented 
approach. Which the Luke certainly is not. 

Regards 

Constantin




> Am 04.12.2014 um 20:46 schrieb Chris Hostetter :
> 
> 
> : I did not oversee a feature of the rest endpoints. So probably we will 
> : stick with the admin/luke endpoint to achieve our goal.
> 
> Ok ... i mean ... yeah -- the /admin/luke endpoint exists to tell you what 
> fields are *actually* in your index, regardless of who/how they are in 
> your index.
> 
> the Schema API is for letting you do CRUD operations on your *schema* - 
> even if those fields (or dynamic fields patterns) aren't used in your 
> index.
> 
> so based on what you said your goal is, /admin/luke is exactly what you 
> want.
> 
> but since you already knew about /admin/luke, and already knew it gave you 
> exactly what you wanted, i'm stll not sure i understand what prompted you 
> to ask your question about trying tofind a diff way of doing this... ?
> 
> -Hoss
> http://www.lucidworks.com/


Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Mikhail Khludnev
Thanks Roman! Let's expand it for the sake of completeness.
Such issue is not possible in Solr, because caches are associated with the
searcher. While you follow this design (see Solr userCache), and don't
update what's cached once, there is no chance to shoot the foot.
There were few caches inside of Lucene (old FieldCache,
CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
onto segment keys, hence it exclude such leakage across different
searchers.

On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla  wrote:

> +1, additionally (as it follows from your observation) the query can get
> out of sync with the index, if eg it was saved for later use and ran
> against newly opened searcher
>
> Roman
> On 4 Dec 2014 10:51, "Darin Amos"  wrote:
>
> > Hello All,
> >
> > I have been doing a lot of research in building some custom queries and I
> > have been looking at the Lucene Join library as a reference. I noticed
> > something that I believe could actually have a negative side effect.
> >
> > Specifically I was looking at the JoinUtil.createJoinQuery(…) method and
> > within that method you see the following code:
> >
> > TermsWithScoreCollector termsWithScoreCollector =
> > TermsWithScoreCollector.create(fromField,
> > multipleValuesPerDocument, scoreMode);
> > fromSearcher.search(fromQuery, termsWithScoreCollector);
> >
> > As you can see, when the JoinQuery is being built, the code is executing
> > the query that is wraps with it’s own collector to collect all the
> scores.
> > If I were to write a query parser using this library (which someone has
> > done here), doesn’t this reduce the benefit of the SOLR query cache? The
> > wrapped query is being executing when the Join Query is being
> constructed,
> > not when it is executed.
> >
> > Thanks
> >
> > Darin
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





RE: Proximity Search with Grouping

2014-12-05 Thread Allison, Timothy B.
Y, if you use the ComplexPhraseQueryParser: 
http://wiki.apache.org/solr/ComplexPhraseQueryParser .





-Original Message-
From: Emre ERKEK [mailto:h.emre.er...@gmail.com] 
Sent: Friday, December 05, 2014 2:42 AM
To: solr
Subject: Proximity Search with Grouping

Hi All,

Can I use proximity search with grouping like this "A B (C D)"~19   ?

Thanks,
Emre


RE: Proximity Search with Grouping

2014-12-05 Thread Allison, Timothy B.
With updated link (sorry!): 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
 

-Original Message-
From: Emre ERKEK [mailto:h.emre.er...@gmail.com] 
Sent: Friday, December 05, 2014 2:42 AM
To: solr
Subject: Proximity Search with Grouping

Hi All,

Can I use proximity search with grouping like this "A B (C D)"~19   ?

Thanks,
Emre


Get the new terms of fields since last update

2014-12-05 Thread lboutros
Dear all,

I would like to get the new terms of fields since last update (once a week). 
If I retrieve some terms which were already present, it's not a problem (but
terms which did not exist before must be retrieved).

Is there an easy way to do that ? 

I'm currently investigating the possibility to create an CompositeReader on
new Segments (Readers ?) to read terms from it.

The initial need is to scroll thru field dictionaries with a given position.
And some dictionaries can be merged for this visualization...

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-new-terms-of-fields-since-last-update-tp4172755.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get the new terms of fields since last update

2014-12-05 Thread Erik Hatcher
Interesting problem.   Can you save last weeks results and then “diff” them?   
My first thought would be to use the terms component or faceting to get all the 
terms, save that off (in a simple alpha sorted text file, maybe) and then next 
week do the same thing and diff the files?   The lower level stuff you 
mentioned sounds kinda scary, and maybe a simpler facility might suffice.

Erik



> On Dec 5, 2014, at 7:54 AM, lboutros  wrote:
> 
> Dear all,
> 
> I would like to get the new terms of fields since last update (once a week). 
> If I retrieve some terms which were already present, it's not a problem (but
> terms which did not exist before must be retrieved).
> 
> Is there an easy way to do that ? 
> 
> I'm currently investigating the possibility to create an CompositeReader on
> new Segments (Readers ?) to read terms from it.
> 
> The initial need is to scroll thru field dictionaries with a given position.
> And some dictionaries can be merged for this visualization...
> 
> Ludovic.
> 
> 
> 
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Get-the-new-terms-of-fields-since-last-update-tp4172755.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Get the new terms of fields since last update

2014-12-05 Thread Alexandre Rafalovitch
What about using payloads to store timestamps? And then some sort of
post-filtering to remove what's too old.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 5 December 2014 at 07:54, lboutros  wrote:
> Dear all,
>
> I would like to get the new terms of fields since last update (once a week).
> If I retrieve some terms which were already present, it's not a problem (but
> terms which did not exist before must be retrieved).
>
> Is there an easy way to do that ?
>
> I'm currently investigating the possibility to create an CompositeReader on
> new Segments (Readers ?) to read terms from it.
>
> The initial need is to scroll thru field dictionaries with a given position.
> And some dictionaries can be merged for this visualization...
>
> Ludovic.
>
>
>
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Get-the-new-terms-of-fields-since-last-update-tp4172755.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr for finding Flight Routes

2014-12-05 Thread Alexandre Rafalovitch
Sounds like a standard graph-database problem. I think some GraphDBs
integrate with Solr (or at least Lucene) for search.

Regards,
   Alex.


Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 5 December 2014 at 01:11, Robin Woods  wrote:
> Hello,
>
> Anyone implemented Solr for searching the flights between two destinations,
> sort by shortest trip and best price? is geo-spatial search a right module
> to use?
>
> Thanks!


Re: Get the new terms of fields since last update

2014-12-05 Thread Michael Sokolov
How about creating a new core that only holds a single week's documents, 
and retrieving all of its terms?  Then each week, flush it and start over.


-Mike

On 12/05/2014 07:54 AM, lboutros wrote:

Dear all,

I would like to get the new terms of fields since last update (once a week).
If I retrieve some terms which were already present, it's not a problem (but
terms which did not exist before must be retrieved).

Is there an easy way to do that ?

I'm currently investigating the possibility to create an CompositeReader on
new Segments (Readers ?) to read terms from it.

The initial need is to scroll thru field dictionaries with a given position.
And some dictionaries can be merged for this visualization...

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-new-terms-of-fields-since-last-update-tp4172755.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Get the new terms of fields since last update

2014-12-05 Thread lboutros
The Apache Solr community is sooo great !

Interesting problem with 3 interesting answers in less than 2 hours !

Thank you all, really.

Erik,

I'm already saving the billion of terms each week. It's hard to diff 1
billion of terms.
I'm already rebuilding the whole dictionaries each week in a custom
distributed terms query handler.

I'm saving the result in Mongo DB in order to scroll thru it quickly with
term position in the dictionary.

It takes 3-4 hours each week. Now I would like to update the result in order
to do it faster.

Alex, I will check, this seems to be a good idea.
Is it possible to filter terms with payloads in index readers ? I did not
see anything like that in my first investigation. 
I suppose it would take some additional disk space.

Michael,

this is the easiest way to do it. You are right. But I'm not sure that
indexing twice and update the dictionaries would be faster than the current
process. But it worth it to do some math ;)

Ludovic.





-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-new-terms-of-fields-since-last-update-tp4172755p4172785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Proximity Search with Grouping

2014-12-05 Thread Emre ERKEK
Thanks for answer.

On Fri, Dec 5, 2014 at 2:01 PM, Allison, Timothy B. 
wrote:

> With updated link (sorry!):
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>
> -Original Message-
> From: Emre ERKEK [mailto:h.emre.er...@gmail.com]
> Sent: Friday, December 05, 2014 2:42 AM
> To: solr
> Subject: Proximity Search with Grouping
>
> Hi All,
>
> Can I use proximity search with grouping like this "A B (C D)"~19   ?
>
> Thanks,
> Emre
>



-- 
İyi çalışmalar,
Hasan Emre ERKEK


How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
Hi,

We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to 
tokenise my search term which has space in it Eg: If there is a user by the 
name Tom Hanks Major, then

1) When I do a query for " Tom Hanks Major " , I don't want solr break this 
search phrase and search for individual words (ie, Tom ,Hanks, Major), but 
search for the whole phrase and get me the Tom Hanks Major user

2) Also if I query for "Hanks Major" I should get the Tom Hanks Major user back

We used !prefix, but that does no allow the scenario 2. Also !prefix will 
restrict the search to one field and can't do on mutiple fields. Any solutions?

Regards,
Dinesh Babu.





Re: Get the new terms of fields since last update

2014-12-05 Thread Sujit Pal
Hi Ludovic,

A bit late to the party, sorry, but here is a bit of a riff off Eric's
idea. Why not store the previous terms in a Bloom filter and once you get
the terms from this week, check to see if they are not in the set. Once you
find the set, add them to the Bloom filter. Bloom filters are space
efficient, by increasing the false positive rate you can make it consume
less space (more keys hash to the same element), since you are only
concerned with finding if something is not in the set.

-sujit

On Fri, Dec 5, 2014 at 7:21 AM, lboutros  wrote:

> The Apache Solr community is sooo great !
>
> Interesting problem with 3 interesting answers in less than 2 hours !
>
> Thank you all, really.
>
> Erik,
>
> I'm already saving the billion of terms each week. It's hard to diff 1
> billion of terms.
> I'm already rebuilding the whole dictionaries each week in a custom
> distributed terms query handler.
>
> I'm saving the result in Mongo DB in order to scroll thru it quickly with
> term position in the dictionary.
>
> It takes 3-4 hours each week. Now I would like to update the result in
> order
> to do it faster.
>
> Alex, I will check, this seems to be a good idea.
> Is it possible to filter terms with payloads in index readers ? I did not
> see anything like that in my first investigation.
> I suppose it would take some additional disk space.
>
> Michael,
>
> this is the easiest way to do it. You are right. But I'm not sure that
> indexing twice and update the dictionaries would be faster than the current
> process. But it worth it to do some math ;)
>
> Ludovic.
>
>
>
>
>
> -
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Get-the-new-terms-of-fields-since-last-update-tp4172755p4172785.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Erik Hatcher
try using {!field} instead of {!prefix}.  {!field} will create a phrase query 
(or term query if it’s just one term) after analysis.  [it also could construct 
other query types if the analysis overlaps tokens, but maybe not relevant here]

Also note that you can use multiple of these expressions if needed:  q={!prefix 
f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where &f1_val=&f2_val=

Erik



> On Dec 5, 2014, at 10:45 AM, Dinesh Babu  wrote:
> 
> Hi,
> 
> We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to 
> tokenise my search term which has space in it Eg: If there is a user by the 
> name Tom Hanks Major, then
> 
> 1) When I do a query for " Tom Hanks Major " , I don't want solr break this 
> search phrase and search for individual words (ie, Tom ,Hanks, Major), but 
> search for the whole phrase and get me the Tom Hanks Major user
> 
> 2) Also if I query for "Hanks Major" I should get the Tom Hanks Major user 
> back
> 
> We used !prefix, but that does no allow the scenario 2. Also !prefix will 
> restrict the search to one field and can't do on mutiple fields. Any 
> solutions?
> 
> Regards,
> Dinesh Babu.
> 
> 
> 



RE: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
One more quick question Erik,

If I want to do search on multiple fields using {!field} do we have a query 
similar to what  {!prefix} has
:  q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where 
&f1_val=&f2_val=

Regards,
Dinesh Babu.



-Original Message-
From: Dinesh Babu
Sent: 05 December 2014 16:26
To: solr-user@lucene.apache.org
Subject: RE: How to stop Solr tokenising search terms with spaces

Thanks a lot Erik. {!field} seems to solve our issue. Much appreciate your help


Regards,
Dinesh Babu.



-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: 05 December 2014 16:00
To: solr-user@lucene.apache.org
Subject: Re: How to stop Solr tokenising search terms with spaces

try using {!field} instead of {!prefix}.  {!field} will create a phrase query 
(or term query if it’s just one term) after analysis.  [it also could construct 
other query types if the analysis overlaps tokens, but maybe not relevant here]

Also note that you can use multiple of these expressions if needed:  q={!prefix 
f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where &f1_val=&f2_val=

Erik



> On Dec 5, 2014, at 10:45 AM, Dinesh Babu  wrote:
>
> Hi,
>
> We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to 
> tokenise my search term which has space in it Eg: If there is a user by the 
> name Tom Hanks Major, then
>
> 1) When I do a query for " Tom Hanks Major " , I don't want solr break this 
> search phrase and search for individual words (ie, Tom ,Hanks, Major), but 
> search for the whole phrase and get me the Tom Hanks Major user
>
> 2) Also if I query for "Hanks Major" I should get the Tom Hanks Major user 
> back
>
> We used !prefix, but that does no allow the scenario 2. Also !prefix will 
> restrict the search to one field and can't do on mutiple fields. Any 
> solutions?
>
> Regards,
> Dinesh Babu.
>
> 
>






RE: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
Thanks a lot Erik. {!field} seems to solve our issue. Much appreciate your help


Regards,
Dinesh Babu.



-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: 05 December 2014 16:00
To: solr-user@lucene.apache.org
Subject: Re: How to stop Solr tokenising search terms with spaces

try using {!field} instead of {!prefix}.  {!field} will create a phrase query 
(or term query if it’s just one term) after analysis.  [it also could construct 
other query types if the analysis overlaps tokens, but maybe not relevant here]

Also note that you can use multiple of these expressions if needed:  q={!prefix 
f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where &f1_val=&f2_val=

Erik



> On Dec 5, 2014, at 10:45 AM, Dinesh Babu  wrote:
>
> Hi,
>
> We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to 
> tokenise my search term which has space in it Eg: If there is a user by the 
> name Tom Hanks Major, then
>
> 1) When I do a query for " Tom Hanks Major " , I don't want solr break this 
> search phrase and search for individual words (ie, Tom ,Hanks, Major), but 
> search for the whole phrase and get me the Tom Hanks Major user
>
> 2) Also if I query for "Hanks Major" I should get the Tom Hanks Major user 
> back
>
> We used !prefix, but that does no allow the scenario 2. Also !prefix will 
> restrict the search to one field and can't do on mutiple fields. Any 
> solutions?
>
> Regards,
> Dinesh Babu.
>
> 
>






Re: Get the new terms of fields since last update

2014-12-05 Thread Alexandre Rafalovitch
On 5 December 2014 at 10:21, lboutros  wrote:
> Alex, I will check, this seems to be a good idea.
> Is it possible to filter terms with payloads in index readers ? I did not
> see anything like that in my first investigation.
> I suppose it would take some additional disk space.

Payloads are kind of step-children. The index - I believe - provides
space for them, but end-to-end tooling is not there. So, you have to
dig a bit harder. It will, of course, take more space. Though, if you
are rewriting your index every week, you only need to store a flag or
an offset from the last rewrite, not a full timestamp.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Erik Hatcher
But also, to spell out the more typical way to do that:

   q=field1:”…” OR field2:”…”

The nice thing about {!field} is that the value doesn’t have to have quotes and 
deal with escaping issues, but if you just want phrase queries and 
quote/escaping isn’t a hassle maybe that’s cleaner for you.

Erik


> On Dec 5, 2014, at 11:30 AM, Dinesh Babu  wrote:
> 
> One more quick question Erik,
> 
> If I want to do search on multiple fields using {!field} do we have a query 
> similar to what  {!prefix} has
> :  q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where 
> &f1_val=&f2_val=
> 
> Regards,
> Dinesh Babu.
> 
> 
> 
> -Original Message-
> From: Dinesh Babu
> Sent: 05 December 2014 16:26
> To: solr-user@lucene.apache.org
> Subject: RE: How to stop Solr tokenising search terms with spaces
> 
> Thanks a lot Erik. {!field} seems to solve our issue. Much appreciate your 
> help
> 
> 
> Regards,
> Dinesh Babu.
> 
> 
> 
> -Original Message-
> From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Sent: 05 December 2014 16:00
> To: solr-user@lucene.apache.org
> Subject: Re: How to stop Solr tokenising search terms with spaces
> 
> try using {!field} instead of {!prefix}.  {!field} will create a phrase query 
> (or term query if it’s just one term) after analysis.  [it also could 
> construct other query types if the analysis overlaps tokens, but maybe not 
> relevant here]
> 
> Also note that you can use multiple of these expressions if needed:  
> q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where 
> &f1_val=&f2_val=
> 
>Erik
> 
> 
> 
>> On Dec 5, 2014, at 10:45 AM, Dinesh Babu  wrote:
>> 
>> Hi,
>> 
>> We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to 
>> tokenise my search term which has space in it Eg: If there is a user by the 
>> name Tom Hanks Major, then
>> 
>> 1) When I do a query for " Tom Hanks Major " , I don't want solr break this 
>> search phrase and search for individual words (ie, Tom ,Hanks, Major), but 
>> search for the whole phrase and get me the Tom Hanks Major user
>> 
>> 2) Also if I query for "Hanks Major" I should get the Tom Hanks Major user 
>> back
>> 
>> We used !prefix, but that does no allow the scenario 2. Also !prefix will 
>> restrict the search to one field and can't do on mutiple fields. Any 
>> solutions?
>> 
>> Regards,
>> Dinesh Babu.
>> 
>> 
>> 
> 
> 
> 
> 



Re: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Erik Hatcher
Dinesh - indeed.  You can compose arbitrarily complex queries using what has 
been termed “nested queries” like this.

It used to be q=_query_:”{!…}...” OR _query_:”{!…}…”, but the _query_ trick 
isn’t strictly necessary now (though care has to be take to make sure these 
complex nested expressions parse as you expect)

See slide #12 here: http://www.slideshare.net/erikhatcher/sa-22830939 


Erik


> On Dec 5, 2014, at 11:30 AM, Dinesh Babu  wrote:
> 
> One more quick question Erik,
> 
> If I want to do search on multiple fields using {!field} do we have a query 
> similar to what  {!prefix} has
> :  q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where 
> &f1_val=&f2_val=
> 
> Regards,
> Dinesh Babu.
> 
> 
> 
> -Original Message-
> From: Dinesh Babu
> Sent: 05 December 2014 16:26
> To: solr-user@lucene.apache.org
> Subject: RE: How to stop Solr tokenising search terms with spaces
> 
> Thanks a lot Erik. {!field} seems to solve our issue. Much appreciate your 
> help
> 
> 
> Regards,
> Dinesh Babu.
> 
> 
> 
> -Original Message-
> From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Sent: 05 December 2014 16:00
> To: solr-user@lucene.apache.org
> Subject: Re: How to stop Solr tokenising search terms with spaces
> 
> try using {!field} instead of {!prefix}.  {!field} will create a phrase query 
> (or term query if it’s just one term) after analysis.  [it also could 
> construct other query types if the analysis overlaps tokens, but maybe not 
> relevant here]
> 
> Also note that you can use multiple of these expressions if needed:  
> q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where 
> &f1_val=&f2_val=
> 
>Erik
> 
> 
> 
>> On Dec 5, 2014, at 10:45 AM, Dinesh Babu  wrote:
>> 
>> Hi,
>> 
>> We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to 
>> tokenise my search term which has space in it Eg: If there is a user by the 
>> name Tom Hanks Major, then
>> 
>> 1) When I do a query for " Tom Hanks Major " , I don't want solr break this 
>> search phrase and search for individual words (ie, Tom ,Hanks, Major), but 
>> search for the whole phrase and get me the Tom Hanks Major user
>> 
>> 2) Also if I query for "Hanks Major" I should get the Tom Hanks Major user 
>> back
>> 
>> We used !prefix, but that does no allow the scenario 2. Also !prefix will 
>> restrict the search to one field and can't do on mutiple fields. Any 
>> solutions?
>> 
>> Regards,
>> Dinesh Babu.
>> 
>> 
>> 
> 
> 
> 
> 



Re: Get the new terms of fields since last update

2014-12-05 Thread lboutros
I think payloads are per posting informations which means that it's not
trivial (to me at least ;)) to get terms for a given payload. And it's quite
intensive to scan all postings.

I will check for the bloom filter idea.

Thx

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-new-terms-of-fields-since-last-update-tp4172755p4172832.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Tika HTTP 400 Errors with DIH

2014-12-05 Thread Teague James
Alex,

Your suggestion might be a solution, but the issue isn't that the resource 
isn't found. Like Walter said 400 is a "bad request" which makes me wonder, 
what is the DIH/Tika doing when trying to access the documents? What is the 
"request" that is bad? Is there any other way to suss this out? Placing a 
network monitor in this case would be on the extreme end of difficult.

I know that the URL stored is good and that the resource exists by copying it 
out of a Solr query and pasting it into the browser, so that eliminates 404 and 
500 errors. Is the format of the URL correct? Is there some other setting I've 
missed?

I appreciate the suggestions!

-Teague


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 04, 2014 12:22 PM
To: solr-user
Subject: Re: Tika HTTP 400 Errors with DIH

Right. Resource not found (on server).

The end result is the same. If it works in the browser but not from the 
application than either not the same URL is being requested or - somehow - not 
even the same server.

The solution (watching network traffic) is still the same, right?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
community: https://www.linkedin.com/groups?gid=6713853


On 4 December 2014 at 11:51, Walter Underwood  wrote:
> No, 400 should mean that the request was bad. When the server fails, that is 
> a 500.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch  wrote:
>
>> 400 error means something wrong on the server (resource not found).
>> So, it would be useful to see what URL is actually being requested.
>>
>> Can you run some sort of network tracer to see the actual network 
>> request (dtrace, Wireshark, etc)? That will dissect the problem into 
>> half for you.
>>
>> Regards,
>>   Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
>> popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On 4 December 2014 at 09:42, Teague James  wrote:
>>> The database stores the URL as a CLOB. Querying Solr shows that the field 
>>> value is "http://www.someaddress.com/documents/document1.docx";
>>> The URL works if I copy and paste it to the browser, but Tika gets a 400 
>>> error.
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>> -Teague
>>> -Original Message-
>>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>>> Sent: Tuesday, December 02, 2014 1:45 PM
>>> To: solr-user
>>> Subject: Re: Tika HTTP 400 Errors with DIH
>>>
>>> On 2 December 2014 at 13:19, Teague James  wrote:
 clob="true"
>>>
>>> What does ClobTransformer is doing on the DownloadURL field? Is it possible 
>>> it is corrupting the value somehow?
>>>
>>> Regards,
>>>   Alex.
>>>
>>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
>>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
>>> popularizers community: https://www.linkedin.com/groups?gid=6713853
>>>
>



Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
Thanks for the information!

The reason I ask is I am doing a POC on building a custom 
Query+QueryParser+Facet Component customization. I have had some issues finding 
exactly what I am looking for OOTB and I believe I need something custom. (its 
also a really good learning exercise)


I do ecommerce, when you type in a search in the website, we execute the search 
against available sku’s (i.e.. small red shirt, blue 34x34 jeans) then want to 
perform a rollup into those items products (shirt, jeans) and return the top 
level document. (We are on SOLR 4.3.0, and don’t have parent/child support yet)

I can’t use grouping because it throws off the pagination and is another can of 
worms, what I want can be easily done with the join query parser {![score]join 
from=blah to=blah} but there are two gaps:

1) My customer wants to return facets calculated by the child dataset, not the 
final parent dataset. Meaning if my search returns a couple shirts, I don’t 
want to show the “small” facet if none of the shirts have the small size in 
stock (meaning the small shirt wasn’t in the child docset). The product 
documents won’t even have the “size” field populated anyway.

2) We want to be able to add filters to the search string that will filter the 
child documents, not the final result set documents. Example:
q={!join from=parent to=id}name:(*Shirt*)&fq=size:small&fq=color:blue

- * In this case… if the shirt doesn’t have small or blue in stock, it 
won’t be returned at all.


Hence, I have been working on a customization. I am looking to build my own 
custom join query (been calling it a rollup) and base it off of the scorejoin 
query implementation pointed out to me.

A sample query for small red shirts would look like the following:

q={!rollup from=parent to=id}name:(*Shirt*)&childfq=size:small&childfq=color:red

My code would do the following:
1) Custom query parser (extends ExtendedDisMaxQParser) would wrap the 
main query in a BooleanQuery and add the “childfq” queries (Occur.MUST) similar 
to how the edismax boost queries work today.
2) Custom query parser would wrap the final BooeanQuery in my custom 
Rollup query (would use the score mode for max child score in my case)
3) Custom rollup query would record and save the child documents docset 
and make it available in an accessor method, therefore in a custom facet 
component I can execute:

4) 
Query q = rb.getQuery();
if(q instance RollupQuery){
RollupQuery rq = (RollupQuery)q;
DocSet children = rq.getChildren();

//Build facets from the children docset.

}
else{
//build facets from rb.getResults().docSet….
}


If anyone has taken the time to read this example, I greatly appreciate it 
would would appreciate any feedback. I would be glad to share the final 
implementation for review.

Thanks

Darin



> On Dec 5, 2014, at 4:52 AM, Mikhail Khludnev  
> wrote:
> 
> Thanks Roman! Let's expand it for the sake of completeness.
> Such issue is not possible in Solr, because caches are associated with the
> searcher. While you follow this design (see Solr userCache), and don't
> update what's cached once, there is no chance to shoot the foot.
> There were few caches inside of Lucene (old FieldCache,
> CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
> onto segment keys, hence it exclude such leakage across different
> searchers.
> 
> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla  wrote:
> 
>> +1, additionally (as it follows from your observation) the query can get
>> out of sync with the index, if eg it was saved for later use and ran
>> against newly opened searcher
>> 
>> Roman
>> On 4 Dec 2014 10:51, "Darin Amos"  wrote:
>> 
>>> Hello All,
>>> 
>>> I have been doing a lot of research in building some custom queries and I
>>> have been looking at the Lucene Join library as a reference. I noticed
>>> something that I believe could actually have a negative side effect.
>>> 
>>> Specifically I was looking at the JoinUtil.createJoinQuery(…) method and
>>> within that method you see the following code:
>>> 
>>>TermsWithScoreCollector termsWithScoreCollector =
>>>TermsWithScoreCollector.create(fromField,
>>> multipleValuesPerDocument, scoreMode);
>>>fromSearcher.search(fromQuery, termsWithScoreCollector);
>>> 
>>> As you can see, when the JoinQuery is being built, the code is executing
>>> the query that is wraps with it’s own collector to collect all the
>> scores.
>>> If I were to write a query parser using this library (which someone has
>>> done here), doesn’t this reduce the benefit of the SOLR query cache? The
>>> wrapped query is being executing when the Join Query is being
>> constructed,
>>> not when it is executed.
>>> 
>>> Thanks
>>> 
>>> Darin
>>> 
>> 
> 
> 
> 
> -

RE: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
Hi Erik,

Probably I celebrated too soon. When I tested {!field} it seemed to work as the 
query was on such a data that it made to look like it is working.  using the 
example that I originally mentioned to search for Tom Hanks Major

1) If I search {!field f=displayName}: Hanks Major,  it works

2) If I provide partial word {!field f=displayName}: Hanks Ma,  it does not work

Is this how {!field is designed to work?

Also I tried without and with escaping space as you suggested. It has the same 
issue

1) q= field1:"Hanks Major" , it works
2) q= field1:"Hanks Maj" , does not works

Regards,
Dinesh Babu.



-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: 05 December 2014 16:44
To: solr-user@lucene.apache.org
Subject: Re: How to stop Solr tokenising search terms with spaces

But also, to spell out the more typical way to do that:

   q=field1:”…” OR field2:”…”

The nice thing about {!field} is that the value doesn’t have to have quotes and 
deal with escaping issues, but if you just want phrase queries and 
quote/escaping isn’t a hassle maybe that’s cleaner for you.

Erik


> On Dec 5, 2014, at 11:30 AM, Dinesh Babu  wrote:
>
> One more quick question Erik,
>
> If I want to do search on multiple fields using {!field} do we have a query 
> similar to what  {!prefix} has
> :  q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where 
> &f1_val=&f2_val=
>
> Regards,
> Dinesh Babu.
>
>
>
> -Original Message-
> From: Dinesh Babu
> Sent: 05 December 2014 16:26
> To: solr-user@lucene.apache.org
> Subject: RE: How to stop Solr tokenising search terms with spaces
>
> Thanks a lot Erik. {!field} seems to solve our issue. Much appreciate your 
> help
>
>
> Regards,
> Dinesh Babu.
>
>
>
> -Original Message-
> From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Sent: 05 December 2014 16:00
> To: solr-user@lucene.apache.org
> Subject: Re: How to stop Solr tokenising search terms with spaces
>
> try using {!field} instead of {!prefix}.  {!field} will create a phrase query 
> (or term query if it’s just one term) after analysis.  [it also could 
> construct other query types if the analysis overlaps tokens, but maybe not 
> relevant here]
>
> Also note that you can use multiple of these expressions if needed:  
> q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where 
> &f1_val=&f2_val=
>
>Erik
>
>
>
>> On Dec 5, 2014, at 10:45 AM, Dinesh Babu  wrote:
>>
>> Hi,
>>
>> We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to 
>> tokenise my search term which has space in it Eg: If there is a user by the 
>> name Tom Hanks Major, then
>>
>> 1) When I do a query for " Tom Hanks Major " , I don't want solr break this 
>> search phrase and search for individual words (ie, Tom ,Hanks, Major), but 
>> search for the whole phrase and get me the Tom Hanks Major user
>>
>> 2) Also if I query for "Hanks Major" I should get the Tom Hanks Major user 
>> back
>>
>> We used !prefix, but that does no allow the scenario 2. Also !prefix will 
>> restrict the search to one field and can't do on mutiple fields. Any 
>> solutions?
>>
>> Regards,
>> Dinesh Babu.
>>
>> 
>>
>
>
> 
>






[ANN] Heliosearch 0.09 (JSON Request API + Distrib for Facet API)

2014-12-05 Thread Yonik Seeley
http://heliosearch.org/download

Heliosearch v0.09 Features:

o Heliosearch v0.09 is based on (and contains all features of)
Lucene/Solr 4.10.2 + most of 4.10.3

o Distributed search support for the new faceted search module / JSON
Facet API: http://heliosearch.org/json-facet-api/

o Automatic conversion of legacy field/range/query facets when
facet.version=2 is passed. This includes support for the deprecated
heliosearch syntax of facet.stat=facet_function and
subfacet.parentfacet.type=facet_param.

o New JSON Request API:
http://heliosearch.org/heliosearch-solr-json-request-api/

Example:
$ curl -XGET http://localhost:8983/solr/query -d '
{
  query : "*:*",
  filter : [
"author:brandon",
"genre_s:fantasy"
  ],
  offset : 0,
  limit : 5,
  fields : ["title","author"],  // we could also use the string form
"title,author"
  sort : "sequence_i desc",

  facet : {  // the JSON Facet API is nicely integrated as well
avg_price : "avg(price)",
top_authors : {terms : author}
  }
}'

This includes "smart JSON merging" including support for a mixed
environment of normal request params and JSON objects / snippets.


-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
Hi Mikhail, I think you are right, it won't be problem for SOLR, but it is
likely an antipattern inside a lucene component. Because custom components
may create join queries, hold to them and then execute much later against a
different searcher. One approach would be to postpone term collection until
the query actually runs, I looked far and wide for appropriate place, but
only found createWeight() - but at least it does give developers NO
opportunity to shoot their feet! ;-)

Since it may serve as an inspiration to someone, here is a link:
https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101

roman

On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev  wrote:

> Thanks Roman! Let's expand it for the sake of completeness.
> Such issue is not possible in Solr, because caches are associated with the
> searcher. While you follow this design (see Solr userCache), and don't
> update what's cached once, there is no chance to shoot the foot.
> There were few caches inside of Lucene (old FieldCache,
> CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
> onto segment keys, hence it exclude such leakage across different
> searchers.
>
> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla  wrote:
>
> > +1, additionally (as it follows from your observation) the query can get
> > out of sync with the index, if eg it was saved for later use and ran
> > against newly opened searcher
> >
> > Roman
> > On 4 Dec 2014 10:51, "Darin Amos"  wrote:
> >
> > > Hello All,
> > >
> > > I have been doing a lot of research in building some custom queries
> and I
> > > have been looking at the Lucene Join library as a reference. I noticed
> > > something that I believe could actually have a negative side effect.
> > >
> > > Specifically I was looking at the JoinUtil.createJoinQuery(…) method
> and
> > > within that method you see the following code:
> > >
> > > TermsWithScoreCollector termsWithScoreCollector =
> > > TermsWithScoreCollector.create(fromField,
> > > multipleValuesPerDocument, scoreMode);
> > > fromSearcher.search(fromQuery, termsWithScoreCollector);
> > >
> > > As you can see, when the JoinQuery is being built, the code is
> executing
> > > the query that is wraps with it’s own collector to collect all the
> > scores.
> > > If I were to write a query parser using this library (which someone has
> > > done here), doesn’t this reduce the benefit of the SOLR query cache?
> The
> > > wrapped query is being executing when the Join Query is being
> > constructed,
> > > not when it is executed.
> > >
> > > Thanks
> > >
> > > Darin
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Using Solr for finding Flight Routes

2014-12-05 Thread Robin Woods
Thanks Alex. I'll check the GraphDB solutions.

On Fri, Dec 5, 2014 at 6:20 AM, Alexandre Rafalovitch 
wrote:

> Sounds like a standard graph-database problem. I think some GraphDBs
> integrate with Solr (or at least Lucene) for search.
>
> Regards,
>Alex.
>
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 5 December 2014 at 01:11, Robin Woods  wrote:
> > Hello,
> >
> > Anyone implemented Solr for searching the flights between two
> destinations,
> > sort by shortest trip and best price? is geo-spatial search a right
> module
> > to use?
> >
> > Thanks!
>


Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
Couldn’t you just keep passing the wrapped query and searcher down to 
Weight.scorer()?

This would allow you to wait until the query is executed to do term collection. 
If you want to protect against creating and executing the query with different 
searchers, you would have to make the query factory (or constructor) only 
visible to the query parser or parser plugin?

I might not have followed you, this discussing challenges my understanding of 
Lucene and SOLR.

Darin



> On Dec 5, 2014, at 12:47 PM, Roman Chyla  wrote:
> 
> Hi Mikhail, I think you are right, it won't be problem for SOLR, but it is
> likely an antipattern inside a lucene component. Because custom components
> may create join queries, hold to them and then execute much later against a
> different searcher. One approach would be to postpone term collection until
> the query actually runs, I looked far and wide for appropriate place, but
> only found createWeight() - but at least it does give developers NO
> opportunity to shoot their feet! ;-)
> 
> Since it may serve as an inspiration to someone, here is a link:
> https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101
> 
> roman
> 
> On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev > wrote:
> 
>> Thanks Roman! Let's expand it for the sake of completeness.
>> Such issue is not possible in Solr, because caches are associated with the
>> searcher. While you follow this design (see Solr userCache), and don't
>> update what's cached once, there is no chance to shoot the foot.
>> There were few caches inside of Lucene (old FieldCache,
>> CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
>> onto segment keys, hence it exclude such leakage across different
>> searchers.
>> 
>> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla  wrote:
>> 
>>> +1, additionally (as it follows from your observation) the query can get
>>> out of sync with the index, if eg it was saved for later use and ran
>>> against newly opened searcher
>>> 
>>> Roman
>>> On 4 Dec 2014 10:51, "Darin Amos"  wrote:
>>> 
 Hello All,
 
 I have been doing a lot of research in building some custom queries
>> and I
 have been looking at the Lucene Join library as a reference. I noticed
 something that I believe could actually have a negative side effect.
 
 Specifically I was looking at the JoinUtil.createJoinQuery(…) method
>> and
 within that method you see the following code:
 
TermsWithScoreCollector termsWithScoreCollector =
TermsWithScoreCollector.create(fromField,
 multipleValuesPerDocument, scoreMode);
fromSearcher.search(fromQuery, termsWithScoreCollector);
 
 As you can see, when the JoinQuery is being built, the code is
>> executing
 the query that is wraps with it’s own collector to collect all the
>>> scores.
 If I were to write a query parser using this library (which someone has
 done here), doesn’t this reduce the benefit of the SOLR query cache?
>> The
 wrapped query is being executing when the Join Query is being
>>> constructed,
 not when it is executed.
 
 Thanks
 
 Darin
 
>>> 
>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>> 
>> 
>> 
>> 



unable to build spellcheck in solr

2014-12-05 Thread Min L
Hi all:

My code using solr spellchecker to suggest keywords worked fine locally,
however in qa solr env, it failed to build it with the following error in
solr log:

ERROR Suggester Store Lookup build from index on field: myfieldname failed
reader has: xxx docs

I checked the solr directory and the file fst.bin was created successfully
though. Does anyone know what caused the issue?

command to build:
http://:8080/solr/mycore/suggestkeyword?spellcheck.build=true

Thanks a lot!


Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
Not sure I understand. It is the searcher which executes the query, how
would you 'convince' it to pass the query? First the Weight is created,
weight instance creates scorer - you would have to change the API to do the
passing (or maybe not...?)
In my case, the relationships were across index segments, so I had to
collect them first - but in some other situations, when you look only at
the data inside one index segments, it _might_ be better to wait



On Fri, Dec 5, 2014 at 1:25 PM, Darin Amos  wrote:

> Couldn’t you just keep passing the wrapped query and searcher down to
> Weight.scorer()?
>
> This would allow you to wait until the query is executed to do term
> collection. If you want to protect against creating and executing the query
> with different searchers, you would have to make the query factory (or
> constructor) only visible to the query parser or parser plugin?
>
> I might not have followed you, this discussing challenges my understanding
> of Lucene and SOLR.
>
> Darin
>
>
>
> > On Dec 5, 2014, at 12:47 PM, Roman Chyla  wrote:
> >
> > Hi Mikhail, I think you are right, it won't be problem for SOLR, but it
> is
> > likely an antipattern inside a lucene component. Because custom
> components
> > may create join queries, hold to them and then execute much later
> against a
> > different searcher. One approach would be to postpone term collection
> until
> > the query actually runs, I looked far and wide for appropriate place, but
> > only found createWeight() - but at least it does give developers NO
> > opportunity to shoot their feet! ;-)
> >
> > Since it may serve as an inspiration to someone, here is a link:
> >
> https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101
> >
> > roman
> >
> > On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com
> >> wrote:
> >
> >> Thanks Roman! Let's expand it for the sake of completeness.
> >> Such issue is not possible in Solr, because caches are associated with
> the
> >> searcher. While you follow this design (see Solr userCache), and don't
> >> update what's cached once, there is no chance to shoot the foot.
> >> There were few caches inside of Lucene (old FieldCache,
> >> CachingWrapperFilter, ExternalFileField, etc), but they are properly
> mapped
> >> onto segment keys, hence it exclude such leakage across different
> >> searchers.
> >>
> >> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla 
> wrote:
> >>
> >>> +1, additionally (as it follows from your observation) the query can
> get
> >>> out of sync with the index, if eg it was saved for later use and ran
> >>> against newly opened searcher
> >>>
> >>> Roman
> >>> On 4 Dec 2014 10:51, "Darin Amos"  wrote:
> >>>
>  Hello All,
> 
>  I have been doing a lot of research in building some custom queries
> >> and I
>  have been looking at the Lucene Join library as a reference. I noticed
>  something that I believe could actually have a negative side effect.
> 
>  Specifically I was looking at the JoinUtil.createJoinQuery(…) method
> >> and
>  within that method you see the following code:
> 
> TermsWithScoreCollector termsWithScoreCollector =
> TermsWithScoreCollector.create(fromField,
>  multipleValuesPerDocument, scoreMode);
> fromSearcher.search(fromQuery, termsWithScoreCollector);
> 
>  As you can see, when the JoinQuery is being built, the code is
> >> executing
>  the query that is wraps with it’s own collector to collect all the
> >>> scores.
>  If I were to write a query parser using this library (which someone
> has
>  done here), doesn’t this reduce the benefit of the SOLR query cache?
> >> The
>  wrapped query is being executing when the Join Query is being
> >>> constructed,
>  not when it is executed.
> 
>  Thanks
> 
>  Darin
> 
> >>>
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> 
> >> 
> >>
>
>


Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
In this case I was thinking about something like the following.. if you changed 
the Query implementation or created your own similar query:

If you consider this query: q={!scorejoin from=parent to=id}type:child

public class ScoreJoinQuery extends Query(){


private Query q = null;
private IndexSearcher s = null;

public JoinQuery(Query q, IndexSearcher s){
this.q = q;   //THis is the term query type:child
this.s = s;
}

.
.
.
public Weight createWeight(…..){
return new Weight(){
.
.
.
public Scorer scorer(){
TermsWithScoreCollector collector = new 
TermsWithScoreCollector();
JoinQuery.this.s.search(JoinQuery.this.q, 
collector);

//do the rest.. 

}

}
}
}

This is what I was thinking in my head…. but I don’t really believe it offers 
any value above how the scorcejoin query works today.



> On Dec 5, 2014, at 2:16 PM, Roman Chyla  wrote:
> 
> Not sure I understand. It is the searcher which executes the query, how
> would you 'convince' it to pass the query? First the Weight is created,
> weight instance creates scorer - you would have to change the API to do the
> passing (or maybe not...?)
> In my case, the relationships were across index segments, so I had to
> collect them first - but in some other situations, when you look only at
> the data inside one index segments, it _might_ be better to wait
> 
> 
> 
> On Fri, Dec 5, 2014 at 1:25 PM, Darin Amos  wrote:
> 
>> Couldn’t you just keep passing the wrapped query and searcher down to
>> Weight.scorer()?
>> 
>> This would allow you to wait until the query is executed to do term
>> collection. If you want to protect against creating and executing the query
>> with different searchers, you would have to make the query factory (or
>> constructor) only visible to the query parser or parser plugin?
>> 
>> I might not have followed you, this discussing challenges my understanding
>> of Lucene and SOLR.
>> 
>> Darin
>> 
>> 
>> 
>>> On Dec 5, 2014, at 12:47 PM, Roman Chyla  wrote:
>>> 
>>> Hi Mikhail, I think you are right, it won't be problem for SOLR, but it
>> is
>>> likely an antipattern inside a lucene component. Because custom
>> components
>>> may create join queries, hold to them and then execute much later
>> against a
>>> different searcher. One approach would be to postpone term collection
>> until
>>> the query actually runs, I looked far and wide for appropriate place, but
>>> only found createWeight() - but at least it does give developers NO
>>> opportunity to shoot their feet! ;-)
>>> 
>>> Since it may serve as an inspiration to someone, here is a link:
>>> 
>> https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101
>>> 
>>> roman
>>> 
>>> On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev <
>> mkhlud...@griddynamics.com
 wrote:
>>> 
 Thanks Roman! Let's expand it for the sake of completeness.
 Such issue is not possible in Solr, because caches are associated with
>> the
 searcher. While you follow this design (see Solr userCache), and don't
 update what's cached once, there is no chance to shoot the foot.
 There were few caches inside of Lucene (old FieldCache,
 CachingWrapperFilter, ExternalFileField, etc), but they are properly
>> mapped
 onto segment keys, hence it exclude such leakage across different
 searchers.
 
 On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla 
>> wrote:
 
> +1, additionally (as it follows from your observation) the query can
>> get
> out of sync with the index, if eg it was saved for later use and ran
> against newly opened searcher
> 
> Roman
> On 4 Dec 2014 10:51, "Darin Amos"  wrote:
> 
>> Hello All,
>> 
>> I have been doing a lot of research in building some custom queries
 and I
>> have been looking at the Lucene Join library as a reference. I noticed
>> something that I believe could actually have a negative side effect.
>> 
>> Specifically I was looking at the JoinUtil.createJoinQuery(…) method
 and
>> within that method you see the following code:
>> 
>>   TermsWithScoreCollector termsWithScoreCollector =
>>   TermsWithScoreCollector.create(fromField,
>> multipleValuesPerDocument, scoreMode);
>>   fromSearcher.search(fromQuery, termsWithScoreCollector);
>> 
>> As you can see, when the JoinQuery is being built, the code is
 executing
>> the query that is wraps with it’s own collector to collect all the
> scores.
>> If I were to write a query parser using this library (which someone
>> has
>>

Logging in Solr's DataImportHandler

2014-12-05 Thread Dan Davis
I have a script transformer and a log transformer, and I'm not seeing the
log messages, at least not where I expect.
Is there anyway I can simply log a custom message from within my script?
Can the script easily interact with its containers logger?


Re: Using Solr for finding Flight Routes

2014-12-05 Thread Nazik Huq
Check Grant's SOLR Air reference app here 
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html .

@Nazik_Huq


On Dec 5, 2014, at 1:19 PM, Robin Woods  wrote:

> Thanks Alex. I'll check the GraphDB solutions.
> 
> On Fri, Dec 5, 2014 at 6:20 AM, Alexandre Rafalovitch 
> wrote:
> 
>> Sounds like a standard graph-database problem. I think some GraphDBs
>> integrate with Solr (or at least Lucene) for search.
>> 
>> Regards,
>>   Alex.
>> 
>> 
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>> 
>> 
>> On 5 December 2014 at 01:11, Robin Woods  wrote:
>>> Hello,
>>> 
>>> Anyone implemented Solr for searching the flights between two
>> destinations,
>>> sort by shortest trip and best price? is geo-spatial search a right
>> module
>>> to use?
>>> 
>>> Thanks!
>> 


Re: unable to build spellcheck in solr

2014-12-05 Thread Erick Erickson
What's the rest of the stack trace? There should
be a root cause somewhere.

Best,
Erick

On Fri, Dec 5, 2014 at 11:07 AM, Min L  wrote:
> Hi all:
>
> My code using solr spellchecker to suggest keywords worked fine locally,
> however in qa solr env, it failed to build it with the following error in
> solr log:
>
> ERROR Suggester Store Lookup build from index on field: myfieldname failed
> reader has: xxx docs
>
> I checked the solr directory and the file fst.bin was created successfully
> though. Does anyone know what caused the issue?
>
> command to build:
> http://:8080/solr/mycore/suggestkeyword?spellcheck.build=true
>
> Thanks a lot!


RE: Tika HTTP 400 Errors with DIH

2014-12-05 Thread steve
Likely a good http debugger would help (wireshark, or fiddler2, for example)
http://www.telerik.com/fiddler
https://www.wireshark.org/download.html
For example, it could show the http header that the "client" uses to request 
info from an api, then the show results of that query. One small caveat: I have 
not tried this with "standalone" server or with any SOLR type project.
Cheers!Steve

> From: teag...@insystechinc.com
> To: solr-user@lucene.apache.org
> Subject: RE: Tika HTTP 400 Errors with DIH
> Date: Fri, 5 Dec 2014 12:03:23 -0500
> 
> Alex,
> 
> Your suggestion might be a solution, but the issue isn't that the resource 
> isn't found. Like Walter said 400 is a "bad request" which makes me wonder, 
> what is the DIH/Tika doing when trying to access the documents? What is the 
> "request" that is bad? Is there any other way to suss this out? Placing a 
> network monitor in this case would be on the extreme end of difficult.
> 
> I know that the URL stored is good and that the resource exists by copying it 
> out of a Solr query and pasting it into the browser, so that eliminates 404 
> and 500 errors. Is the format of the URL correct? Is there some other setting 
> I've missed?
> 
> I appreciate the suggestions!
> 
> -Teague
> 
> 
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
> Sent: Thursday, December 04, 2014 12:22 PM
> To: solr-user
> Subject: Re: Tika HTTP 400 Errors with DIH
> 
> Right. Resource not found (on server).
> 
> The end result is the same. If it works in the browser but not from the 
> application than either not the same URL is being requested or - somehow - 
> not even the same server.
> 
> The solution (watching network traffic) is still the same, right?
> 
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
> newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
> community: https://www.linkedin.com/groups?gid=6713853
> 
> 
> On 4 December 2014 at 11:51, Walter Underwood  wrote:
> > No, 400 should mean that the request was bad. When the server fails, that 
> > is a 500.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/
> >
> >
> > On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch  
> > wrote:
> >
> >> 400 error means something wrong on the server (resource not found).
> >> So, it would be useful to see what URL is actually being requested.
> >>
> >> Can you run some sort of network tracer to see the actual network 
> >> request (dtrace, Wireshark, etc)? That will dissect the problem into 
> >> half for you.
> >>
> >> Regards,
> >>   Alex.
> >> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
> >> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
> >> popularizers community: https://www.linkedin.com/groups?gid=6713853
> >>
> >>
> >> On 4 December 2014 at 09:42, Teague James  wrote:
> >>> The database stores the URL as a CLOB. Querying Solr shows that the field 
> >>> value is "http://www.someaddress.com/documents/document1.docx";
> >>> The URL works if I copy and paste it to the browser, but Tika gets a 400 
> >>> error.
> >>>
> >>> Any ideas?
> >>>
> >>> Thanks!
> >>> -Teague
> >>> -Original Message-
> >>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> >>> Sent: Tuesday, December 02, 2014 1:45 PM
> >>> To: solr-user
> >>> Subject: Re: Tika HTTP 400 Errors with DIH
> >>>
> >>> On 2 December 2014 at 13:19, Teague James  
> >>> wrote:
>  clob="true"
> >>>
> >>> What does ClobTransformer is doing on the DownloadURL field? Is it 
> >>> possible it is corrupting the value somehow?
> >>>
> >>> Regards,
> >>>   Alex.
> >>>
> >>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
> >>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
> >>> popularizers community: https://www.linkedin.com/groups?gid=6713853
> >>>
> >
> 
  

Re: unable to build spellcheck in solr

2014-12-05 Thread Min L
Thanks for your reply.

This is all it is in the solr log, no stack. It fails regardless of
buildOncommit=true or false by building it manually. The file fst.bin was
created.

I found the source code in suggester.java where it logged the error.
Perhaps lookup.store(new FileOutputStream(target)) failed? Wonder why.

if (storeDir != null) {

  File target = new File(storeDir, factory.storeFileName());

  if(!lookup.store(new FileOutputStream(target))) {

if (sourceLocation == null) {

  assert reader != null && field != null;

*  LOG.error("Store Lookup build from index on field: " + field + "
failed reader has: " + reader.maxDoc() + " docs");*

} else {

  LOG.error("Store Lookup build from sourceloaction: " +
sourceLocation + " failed");

}

  } else {

LOG.info("Stored suggest data to: " + target.getAbsolutePath());

  }

}

On Fri, Dec 5, 2014 at 12:59 PM, Erick Erickson 
wrote:

> What's the rest of the stack trace? There should
> be a root cause somewhere.
>
> Best,
> Erick
>
> On Fri, Dec 5, 2014 at 11:07 AM, Min L  wrote:
> > Hi all:
> >
> > My code using solr spellchecker to suggest keywords worked fine locally,
> > however in qa solr env, it failed to build it with the following error in
> > solr log:
> >
> > ERROR Suggester Store Lookup build from index on field: myfieldname
> failed
> > reader has: xxx docs
> >
> > I checked the solr directory and the file fst.bin was created
> successfully
> > though. Does anyone know what caused the issue?
> >
> > command to build:
> > http://:8080/solr/mycore/suggestkeyword?spellcheck.build=true
> >
> > Thanks a lot!
>


Re: unable to build spellcheck in solr

2014-12-05 Thread Erick Erickson
Not sure, of course. Sure seems like a better error message is in order,
is there anything above the message you pasted in the log file that sheds
more light on the subject?

Erick

On Fri, Dec 5, 2014 at 1:22 PM, Min L  wrote:
> Thanks for your reply.
>
> This is all it is in the solr log, no stack. It fails regardless of
> buildOncommit=true or false by building it manually. The file fst.bin was
> created.
>
> I found the source code in suggester.java where it logged the error.
> Perhaps lookup.store(new FileOutputStream(target)) failed? Wonder why.
>
> if (storeDir != null) {
>
>   File target = new File(storeDir, factory.storeFileName());
>
>   if(!lookup.store(new FileOutputStream(target))) {
>
> if (sourceLocation == null) {
>
>   assert reader != null && field != null;
>
> *  LOG.error("Store Lookup build from index on field: " + field + "
> failed reader has: " + reader.maxDoc() + " docs");*
>
> } else {
>
>   LOG.error("Store Lookup build from sourceloaction: " +
> sourceLocation + " failed");
>
> }
>
>   } else {
>
> LOG.info("Stored suggest data to: " + target.getAbsolutePath());
>
>   }
>
> }
>
> On Fri, Dec 5, 2014 at 12:59 PM, Erick Erickson 
> wrote:
>
>> What's the rest of the stack trace? There should
>> be a root cause somewhere.
>>
>> Best,
>> Erick
>>
>> On Fri, Dec 5, 2014 at 11:07 AM, Min L  wrote:
>> > Hi all:
>> >
>> > My code using solr spellchecker to suggest keywords worked fine locally,
>> > however in qa solr env, it failed to build it with the following error in
>> > solr log:
>> >
>> > ERROR Suggester Store Lookup build from index on field: myfieldname
>> failed
>> > reader has: xxx docs
>> >
>> > I checked the solr directory and the file fst.bin was created
>> successfully
>> > though. Does anyone know what caused the issue?
>> >
>> > command to build:
>> > http://:8080/solr/mycore/suggestkeyword?spellcheck.build=true
>> >
>> > Thanks a lot!
>>


Re: unable to build spellcheck in solr

2014-12-05 Thread Alexandre Rafalovitch
What's your suggester XML definition?

Do you have a link similar to:
fuzzysuggest.txt

That particular code path seems to be expecting it.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 5 December 2014 at 14:07, Min L  wrote:
> Hi all:
>
> My code using solr spellchecker to suggest keywords worked fine locally,
> however in qa solr env, it failed to build it with the following error in
> solr log:
>
> ERROR Suggester Store Lookup build from index on field: myfieldname failed
> reader has: xxx docs
>
> I checked the solr directory and the file fst.bin was created successfully
> though. Does anyone know what caused the issue?
>
> command to build:
> http://:8080/solr/mycore/suggestkeyword?spellcheck.build=true
>
> Thanks a lot!


Re: Preferred Scema/Config for Chinese Language Cores?

2014-12-05 Thread Tom Zimmermann
Thanks for the links. The dzone lnk was nice and concise, but unfortunately
makes use of the now deprecated CJK tokenizer. Does anyone out there have
some examples or experience working with the recommended replacement for
CJK?

Thanks,
TZ


DocsEnum and TermsEnum "reuse" in lucene join library?

2014-12-05 Thread Darin Amos
Hi All,

I have been working on a custom query and I am going off of samples in the 
lucene join library (4.3.0) and I am a little unclear about a couple lines.

1) When getting a TermsEnum in 
TermsIncludingScoreQuery.createWeight(…).scorer()… A previous TermsEnum is used 
like the following:

segmentTermsEnum = terms.iterator(segmentTermsEnum);

2) When getting a DocsEnum SVInOrderScorer.fillDocsAndScores:

 for (int i = 0; i < terms.size(); i++) {
if (termsEnum.seekExact(terms.get(ords[i], spare), true)) {
  docsEnum = termsEnum.docs(acceptDocs, docsEnum, DocsEnum.FLAG_NONE);

My assumption is that the previous enum values are not reused, but this is a 
tuning mechanism for garbage collection, is the correct assumption?

Thanks!

Darin

Creating a Custom Query Response Writer

2014-12-05 Thread Ryan Yacyshyn
Hey Everyone,

I'm a little stuck on building a custom query response writer. I want to
create a response writer similar to the one explained in the book, Taming
Text, on the TypeAheadResponseWriter. I know I need to implement the
QueryResponseWriter, but I'm not sure where to find the Solr JAR files I
need to include. Where can I find these?

Thanks,
Ryan