Re: using PositionIncrementAttribute to increment certain term positions to large values

2012-12-27 Thread Dmitry Kan
Hi,

answering my own question for the records: the experiments show that the
described functionality is achievable with the TokenFilter class
implementation. The only caveat though, is that Highlighter component stops
working properly, if the match position goes beyond the length of the text
field.

As for the performance, no major delays compared to the original proximity
search implementation have been noticed.

Best,

Dmitry Kan

On Wed, Dec 19, 2012 at 10:53 AM, Dmitry Kan  wrote:

> Dear list,
>
> We are currently evaluating proximity searches ("term1 term2" ~slope) for
> a specific use case. In particular, each document contains artificial
> delimiter characters (one character between each pair of sentences in the
> text). Our goal is to hit the sentences individually for any proximity
> search and avoid sentence cross-boundary matches.
>
> We figured, that by using PositionIncrementAttribute as a field in the
> descendant of TokenFilter class it is possible to set a position
> increment of each artificial character (which is a term in Lucene / SOLR
> notation) to an arbitrarily large number. Thus any proximity searches with
> reasonably small slope values should automatically hit withing the sentence
> boundaries.
>
> Does this sound like a right way to tackle the problem? Are there any
> performance costs involved?
>
> Thanks in advance for any input,
>
> Dmitry Kan
>


Re: Which token filter can combine 2 terms into 1?

2012-12-27 Thread Dmitry Kan
Hi,

Have a look onto TokenFilter. Extending it will give you access to a
TokenStream.

Regards,

Dmitry Kan

On Fri, Dec 21, 2012 at 9:05 AM, Xi Shen  wrote:

> Hi,
>
> I am looking for a token filter that can combine 2 terms into 1? E.g.
>
> the input has been tokenized by white space:
>
> t1 t2 t2a t3
>
> I want a filter that output:
>
> t1 t2t2a t3
>
> I know it is a very special case, and I am thinking about develop a filter
> of my own. But I cannot figure out which API I should use to look for terms
> in a Token Stream.
>
>
> --
> Regards,
> David Shen
>
> http://about.me/davidshen
> https://twitter.com/#!/davidshen84
>


search with spaces

2012-12-27 Thread Sangeetha
Hi,

I have a text field with value O O Jaane Jaane. When i search with *q=Jaane
Jaane* it is giving the results. But if i give *q=O O Jaane Jaane* it is not
working? What could be the reason?

Thanks,
Sangeetha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-with-spaces-tp4029265.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search with spaces

2012-12-27 Thread Chandan Tamrakar
Which Analyzer is being used in the field that was indexed ?
May be you can use solradmin to analyze and see how is your index

thanks

On Thu, Dec 27, 2012 at 2:30 PM, Sangeetha  wrote:

> Hi,
>
> I have a text field with value O O Jaane Jaane. When i search with *q=Jaane
> Jaane* it is giving the results. But if i give *q=O O Jaane Jaane* it is
> not
> working? What could be the reason?
>
> Thanks,
> Sangeetha
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/search-with-spaces-tp4029265.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Chandan Tamrakar
*
*


solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
Hi,

I am having trouble with getting solr + jetty to work. I am following all
instructions to the letter from - http://wiki.apache.org/solr/SolrJetty. I
also created a work folder - /opt/solr/work. I am also setting tmpdir to a
new path in /etc/default/jetty . I am confirming the tmpdir is set to the
new path from admin dashboard, under args.

It works like a charm. But when I restart jetty multiple times, after 3/4
such restarts it starts hanging. Admin pages just dont load and my app
fails to acquire a connection with solr.

What I might be missing? Should I be rather looking at my code and see if I
am not committing correctly?

Please let me know if you have faced similar issue in the past and how to
tackle it.

Thank you.

-- 
Best Regards,
Sushrut


Re: Reindex ALL Solr CORES in one GO..

2012-12-27 Thread Anupam Bhattacharya
Thanks Gora,

I can definitely trigger the full re-indexing using CURL for multiple cores
although if i try to index multiple cores (more than 4-5 cores)
simultaneously then the re-indexing fails due to DB connection pool
problems( Connection not available ). Thus I need to schedule indexing once
the previous indexing is over. Unfortunately to track the status of
indexing for a core one need to keeping pinging the server to check
completion status. Is there a way to get a response from SOLR once the
indexing is complete ?

How can i increase the connection pool size in SOLR ?

Regards
Anupam


On Wed, Dec 26, 2012 at 7:06 PM, Gora Mohanty  wrote:

> On 26 December 2012 18:06, Anupam Bhattacharya 
> wrote:
> > Hello Everyone,
> >
> > Is it possible to schedule full reindexing of all solr cores without
> going
> > to individually to the DIH screen of each core ?
>
> One could quite easily write a wrapper around Solr's
> URLs for indexing. You could use a tool like curl, a
> simple shell script, or pretty much any programming
> language to do this.
>
> Regards,
> Gora
>



-- 
Thanks & Regards
Anupam Bhattacharya


Re: Reindex ALL Solr CORES in one GO..

2012-12-27 Thread Ahmet Arslan
> Unfortunately to track the
> status of
> indexing for a core one need to keeping pinging the server
> to check
> completion status. Is there a way to get a response from
> SOLR once the
> indexing is complete ?

Yes it is possible : 
http://wiki.apache.org/solr/DataImportHandler#EventListeners


Re: Dynamic collections in SolrCloud for log indexing

2012-12-27 Thread Otis Gospodnetic
Added https://issues.apache.org/jira/browse/SOLR-4237

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html



On Tue, Dec 25, 2012 at 9:13 PM, Mark Miller  wrote:

> I've been thinking about aliases for a while as well. Seem very handy and
> fairly easy to implement. So far there has just always been higher priority
> things (need to finish collection api responses this week…) but this is
> something I'd def help work on.
>
> - Mark
>
> On Dec 25, 2012, at 1:49 AM, Otis Gospodnetic 
> wrote:
>
> > Hi,
> >
> > Right, this is not really about routing in ElasticSearch-sense.
> > What's handy for indexing logs are index aliases which I thought I
> had
> > added to JIRA a while back, but it looks like I have not.
> > Index aliases would let you keep a "last 7 days" alias fixed while
> > underneath you push and pop an index every day without the client app
> > having to adjust.
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm/index.html
> > Search Analytics - http://sematext.com/search-analytics/index.html
> >
> >
> >
> > On Mon, Dec 24, 2012 at 4:30 AM, Per Steffensen 
> wrote:
> >
> >> I believe it is a misunderstandig to use custom routing (or sharding as
> >> Erick calls it) for this kind of stuff. Custom routing is nice if you
> want
> >> to control which slice/shard under a collection a specific document
> goes to
> >> - mainly to be able to control that two (or more) documents are indexed
> on
> >> the same slice/shard, but also just to be able to control on which
> >> slice/shard a specific document is indexed. Knowing/controlling this
> kind
> >> of stuff can be used for a lot of nice purposes. But you dont want to
> move
> >> slices/shards around among collection or delete/add slices from/to a
> >> collection - unless its for elasticity reasons.
> >>
> >> I think you should fill a collection every week/month and just keep
> those
> >> collections as is. Instead of ending up with a big "historic" collection
> >> containing many slices/shards/cores (one for each historic week/month),
> you
> >> will end up with many historic collections (one for each historic
> >> week/month). Searching historic data you will have to cross-search those
> >> historic collections, but that is no problem at all. If Solr Cloud is
> made
> >> at it is supposed to be made (and I believe it is) it shouldnt require
> more
> >> resouces or be harder in any way to cross-search X slices across many
> >> collections, than it is to cross-search X slices under the same
> collection.
> >>
> >> Besides that see my answer for topic "Will SolrCloud always slice by ID
> >> hash?" a few days back.
> >>
> >> Regards, Per Steffensen
> >>
> >>
> >> On 12/24/12 1:07 AM, Erick Erickson wrote:
> >>
> >>> I think this is one of the primary use-cases for custom sharding. Solr
> 4.0
> >>> doesn't really lend itself to this scenario, but I _believe_ that the
> >>> patch
> >>> for custom sharding has been committed...
> >>>
> >>> That said, I'm not quite sure how you drop off the old shard if you
> don't
> >>> need to keep old data. I'd guess it's possible, but haven't implemented
> >>> anything like that myself.
> >>>
> >>> FWIW,
> >>> Erick
> >>>
> >>>
> >>> On Fri, Dec 21, 2012 at 12:17 PM, Upayavira  wrote:
> >>>
> >>> I'm working on a system for indexing logs. We're probably looking at
>  filling one core every month.
> 
>  We'll maintain a short term index containing the last 7 days - that
> one
>  is easy to handle.
> 
>  For the longer term stuff, we'd like to maintain a collection that
> will
>  query across all the historic data, but that means every month we need
>  to add another core to an existing collection, which as I understand
> it
>  in 4.0 is not possible.
> 
>  How do people handle this sort of situation where you have rolling new
>  content arriving? I'm sure I've heard people using SolrCloud for this
>  sort of thing.
> 
>  Given it is logs, distributed IDF has no real bearing.
> 
>  Upayavira
> 
> 
> >>
>
>


Re: Which token filter can combine 2 terms into 1?

2012-12-27 Thread Mattmann, Chris A (388J)
Hi Guys,

I also worked on a CombiningTokenFilter, see:

https://issues.apache.org/jira/browse/LUCENE-3413


Patch has been up and available for a while.

HTH!

Cheers,
Chris


On 12/27/12 12:26 AM, "Dmitry Kan"  wrote:

>Hi,
>
>Have a look onto TokenFilter. Extending it will give you access to a
>TokenStream.
>
>Regards,
>
>Dmitry Kan
>
>On Fri, Dec 21, 2012 at 9:05 AM, Xi Shen  wrote:
>
>> Hi,
>>
>> I am looking for a token filter that can combine 2 terms into 1? E.g.
>>
>> the input has been tokenized by white space:
>>
>> t1 t2 t2a t3
>>
>> I want a filter that output:
>>
>> t1 t2t2a t3
>>
>> I know it is a very special case, and I am thinking about develop a
>>filter
>> of my own. But I cannot figure out which API I should use to look for
>>terms
>> in a Token Stream.
>>
>>
>> --
>> Regards,
>> David Shen
>>
>> http://about.me/davidshen
>> https://twitter.com/#!/davidshen84
>>



Re: Converting fq params to Filter object

2012-12-27 Thread Nalini Kartha
Hi Lance,

Thanks for the response.

I didn't quite understand how to issue the queries from DirectSpellChecker
with the fq params applied like you were suggesting - could you point me to
the API that can be used for this?

Also, we haven't benchmarked the DirectSpellChecker against the
IndexBasedSpellChecker.

I considered issuing one large OR query with all corrections but that
doesn't ensure that *every* correction would return some hits with the fq
params applied, it only tells us that some correction returned hits so this
isn't restrictive enough for us. And ANDing the corrections together
becomes too restrictive since it requires that *all* corrections existed in
the same documents instead of checking that they individually exist in some
docs (which satisfy the filter queries of course).

Thanks,
Nalini


On Wed, Dec 26, 2012 at 9:32 PM, Lance Norskog  wrote:

> A Solr facet query does a boolean query, caches the Lucene facet data
> structure, and uses it as a Lucene filter. After that until you do a full
> commit, using the same fq=string (you must match the string exactly)
> fetches the cached data structure and uses it again as a Lucene filter.
>
> Have you benchmarked the DirectSpellChecker against
> IndexBasedSpellChecker? If you use the fq= filter query as the
> spellcheck.q= query it should use the cached filter.
>
> Also, since you are checking all words against the same filter query, can
> you just do one large OR query with all of the words?
>
>
> On 12/26/2012 03:10 PM, Nalini Kartha wrote:
>
>> Hi Otis,
>>
>> Sorry, let me be more specific.
>>
>> The end goal is for the DirectSpellChecker to make sure that the
>> corrections it is returning will return some results taking into account
>> the fq params included in the original query. This is a follow up question
>> to another question I had posted earlier -
>>
>> http://mail-archives.apache.**org/mod_mbox/lucene-solr-user/**
>> 201212.mbox/%**3CCAMqOzYFTgiWyRbvwSdF0hFZ1SZN**
>> kQ9gnBJfDb_OBNeLsMvR0XA@mail.**gmail.com%3E
>>
>> Initially, the way I was thinking of implementing this was to call one of
>> the SolrIndexSearcher.getDocSet() methods for ever correction, passing in
>> the correction as the Query and a DocSet created from the fq queries. But
>> I
>> didn't think that calling a SolrIndexSearcher method in Lucene code
>> (DirectSpellChecker) was a good idea. So I started looking at which method
>> on IndexSearcher would accomplish this. That's where I'm stuck trying to
>> figure out how to convert the fq params into a Filter object.
>>
>> Does this approach make sense? Also I realize that this implementation is
>> probably non-performant but wanted to give it a try and measure how it
>> does. Any advice about what the perf overhead from issuing such queries
>> for
>> say 50 corrections would be? Note that the filter from the fq params is
>> the
>> same for every query - would that be cached and help speed things up?
>>
>> Thanks,
>> Nalini
>>
>>
>> On Wed, Dec 26, 2012 at 3:34 PM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com> wrote:
>>
>>  Hi,
>>>
>>> The fq *is* for filtering.
>>>
>>> What is your end goal, what are you trying to achieve?
>>>
>>> Otis
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> On Dec 26, 2012 11:22 AM, "Nalini Kartha" 
>>> wrote:
>>>
>>>  Hi,

 I'm trying to figure out how to convert the fq params that are being

>>> passed
>>>
 to Solr into something that can be used to filter the results of a query
 that's being issued against the Lucene IndexSearcher (I'm modifying some
 Lucene code to issue the query so calling through to one of the
 SolrIndexSearcher methods would be ugly).

 Looks like one of the IndexSearcher.search(Query query, Filter filter,

>>> ...)
>>>
   methods would do what I want but I'm wondering if there's any easy way

>>> of
>>>
 converting the fq params into a Filter? Or is there a better way of
 doing
 all of this?

 Thanks,
 Nalini


>


Re: Converting fq params to Filter object

2012-12-27 Thread Erik Hatcher
I think the answer is yes, that there's a better way to doing all of this.  But 
I'm not yet sure what this all entails in your situation.  What are you 
overriding with the Lucene searches?   I imagine Solr has the flexibility to 
handle what you're trying to do without overriding anything core in 
SolrIndexSearcher.

Generally, the way to get a custom filter in place is to create a custom query 
parser and use that for your fq parameter, like fq={!myparser param1='some 
value'}possible+expression+if+needed, so maybe that helps?

Tell us more about what you're doing specifically, and maybe we can guide you 
to a more elegant way to plug in any custom logic you want.

Erik

On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:

> Hi,
> 
> I'm trying to figure out how to convert the fq params that are being passed
> to Solr into something that can be used to filter the results of a query
> that's being issued against the Lucene IndexSearcher (I'm modifying some
> Lucene code to issue the query so calling through to one of the
> SolrIndexSearcher methods would be ugly).
> 
> Looks like one of the IndexSearcher.search(Query query, Filter filter, ...)
> methods would do what I want but I'm wondering if there's any easy way of
> converting the fq params into a Filter? Or is there a better way of doing
> all of this?
> 
> Thanks,
> Nalini



Re: Converting fq params to Filter object

2012-12-27 Thread Nalini Kartha
Hi Eric,

Sorry, I think I wasn't very clear in explaining what we need to do.

We don't really need to do any complicated overriding, just want to change
the DirectSpellChecker to issue a query for every correction it finds *with
fq params from the original query taken into account* so that we can check
if the correction would actually result in some hits.

I was thinking of implementing this using the IndexSearcher.search(Query
query, Filter filter, int n) method where 'query' is a regular TermQuery
(the term is the correction) and 'filter' would represent the fq params.
What I'm not sure about is how to convert the fq params from Solr into a
Filter object and whether this is something we need to build ourselves or
if there's an existing API for this.

Also, I'm new to this code so not sure if I'm approaching this the wrong
way. Any advice/pointers are much appreciated.

Thanks,
Nalini



On Thu, Dec 27, 2012 at 12:53 PM, Erik Hatcher wrote:

> I think the answer is yes, that there's a better way to doing all of this.
>  But I'm not yet sure what this all entails in your situation.  What are
> you overriding with the Lucene searches?   I imagine Solr has the
> flexibility to handle what you're trying to do without overriding anything
> core in SolrIndexSearcher.
>
> Generally, the way to get a custom filter in place is to create a custom
> query parser and use that for your fq parameter, like fq={!myparser
> param1='some value'}possible+expression+if+needed, so maybe that helps?
>
> Tell us more about what you're doing specifically, and maybe we can guide
> you to a more elegant way to plug in any custom logic you want.
>
> Erik
>
> On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:
>
> > Hi,
> >
> > I'm trying to figure out how to convert the fq params that are being
> passed
> > to Solr into something that can be used to filter the results of a query
> > that's being issued against the Lucene IndexSearcher (I'm modifying some
> > Lucene code to issue the query so calling through to one of the
> > SolrIndexSearcher methods would be ugly).
> >
> > Looks like one of the IndexSearcher.search(Query query, Filter filter,
> ...)
> > methods would do what I want but I'm wondering if there's any easy way of
> > converting the fq params into a Filter? Or is there a better way of doing
> > all of this?
> >
> > Thanks,
> > Nalini
>
>


Re: Converting fq params to Filter object

2012-12-27 Thread Erik Hatcher
Apologies for misunderstanding.  

Does what you're trying to do already work this way using the 
 
maxCollationTries feature of the spellcheck component?

It looks like it passes through the fq's even, so that the hit count that the 
extended results is inclusive of the filters.

Maybe I'm missing something though, sorry.

Erik

On Dec 27, 2012, at 14:09 , Nalini Kartha wrote:

> Hi Eric,
> 
> Sorry, I think I wasn't very clear in explaining what we need to do.
> 
> We don't really need to do any complicated overriding, just want to change
> the DirectSpellChecker to issue a query for every correction it finds *with
> fq params from the original query taken into account* so that we can check
> if the correction would actually result in some hits.
> 
> I was thinking of implementing this using the IndexSearcher.search(Query
> query, Filter filter, int n) method where 'query' is a regular TermQuery
> (the term is the correction) and 'filter' would represent the fq params.
> What I'm not sure about is how to convert the fq params from Solr into a
> Filter object and whether this is something we need to build ourselves or
> if there's an existing API for this.
> 
> Also, I'm new to this code so not sure if I'm approaching this the wrong
> way. Any advice/pointers are much appreciated.
> 
> Thanks,
> Nalini
> 
> 
> 
> On Thu, Dec 27, 2012 at 12:53 PM, Erik Hatcher wrote:
> 
>> I think the answer is yes, that there's a better way to doing all of this.
>> But I'm not yet sure what this all entails in your situation.  What are
>> you overriding with the Lucene searches?   I imagine Solr has the
>> flexibility to handle what you're trying to do without overriding anything
>> core in SolrIndexSearcher.
>> 
>> Generally, the way to get a custom filter in place is to create a custom
>> query parser and use that for your fq parameter, like fq={!myparser
>> param1='some value'}possible+expression+if+needed, so maybe that helps?
>> 
>> Tell us more about what you're doing specifically, and maybe we can guide
>> you to a more elegant way to plug in any custom logic you want.
>> 
>>Erik
>> 
>> On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:
>> 
>>> Hi,
>>> 
>>> I'm trying to figure out how to convert the fq params that are being
>> passed
>>> to Solr into something that can be used to filter the results of a query
>>> that's being issued against the Lucene IndexSearcher (I'm modifying some
>>> Lucene code to issue the query so calling through to one of the
>>> SolrIndexSearcher methods would be ugly).
>>> 
>>> Looks like one of the IndexSearcher.search(Query query, Filter filter,
>> ...)
>>> methods would do what I want but I'm wondering if there's any easy way of
>>> converting the fq params into a Filter? Or is there a better way of doing
>>> all of this?
>>> 
>>> Thanks,
>>> Nalini
>> 
>> 



RE: Converting fq params to Filter object

2012-12-27 Thread Dyer, James
Nalini,

You could take the code from SpellCheckCollator#collate and have it issue a 
test query for each word individually instead of for each collation.  This 
would do exactly what you want. See 
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java

If you are concerned this isn't low-level enough and that performance would 
suffer, then see https://issues.apache.org/jira/browse/SOLR-3240 , which has a 
patch that uses a collector that quits after finding one document.  This makes 
each test query faster at the expense of not getting exact hit-counts.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Thursday, December 27, 2012 1:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Converting fq params to Filter object

Hi Eric,

Sorry, I think I wasn't very clear in explaining what we need to do.

We don't really need to do any complicated overriding, just want to change
the DirectSpellChecker to issue a query for every correction it finds *with
fq params from the original query taken into account* so that we can check
if the correction would actually result in some hits.

I was thinking of implementing this using the IndexSearcher.search(Query
query, Filter filter, int n) method where 'query' is a regular TermQuery
(the term is the correction) and 'filter' would represent the fq params.
What I'm not sure about is how to convert the fq params from Solr into a
Filter object and whether this is something we need to build ourselves or
if there's an existing API for this.

Also, I'm new to this code so not sure if I'm approaching this the wrong
way. Any advice/pointers are much appreciated.

Thanks,
Nalini



On Thu, Dec 27, 2012 at 12:53 PM, Erik Hatcher wrote:

> I think the answer is yes, that there's a better way to doing all of this.
>  But I'm not yet sure what this all entails in your situation.  What are
> you overriding with the Lucene searches?   I imagine Solr has the
> flexibility to handle what you're trying to do without overriding anything
> core in SolrIndexSearcher.
>
> Generally, the way to get a custom filter in place is to create a custom
> query parser and use that for your fq parameter, like fq={!myparser
> param1='some value'}possible+expression+if+needed, so maybe that helps?
>
> Tell us more about what you're doing specifically, and maybe we can guide
> you to a more elegant way to plug in any custom logic you want.
>
> Erik
>
> On Dec 26, 2012, at 11:21 , Nalini Kartha wrote:
>
> > Hi,
> >
> > I'm trying to figure out how to convert the fq params that are being
> passed
> > to Solr into something that can be used to filter the results of a query
> > that's being issued against the Lucene IndexSearcher (I'm modifying some
> > Lucene code to issue the query so calling through to one of the
> > SolrIndexSearcher methods would be ugly).
> >
> > Looks like one of the IndexSearcher.search(Query query, Filter filter,
> ...)
> > methods would do what I want but I'm wondering if there's any easy way of
> > converting the fq params into a Filter? Or is there a better way of doing
> > all of this?
> >
> > Thanks,
> > Nalini
>
>



Re: Converting fq params to Filter object

2012-12-27 Thread Nalini Kartha
Hi James,

Yup, that was what I tried to do initially but it seems like calling
through to those Solr methods from DirectSpellChecker was not a good idea -
am I wrong? And like you mentioned, this seemed like it wasn't low-level
enough.

Eric: Unfortunately the collate functionality does not work for our use
case since the queries we're correcting are default OR. Here's the original
thread about this -

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3ccamqozyftgiwyrbvwsdf0hfz1sznkq9gnbjfdb_obnelsmvr...@mail.gmail.com%3E

Thanks,
Nalini

On Thu, Dec 27, 2012 at 2:46 PM, Dyer, James
wrote:

> https://issues.apache.org/jira/browse/SOLR-3240


RE: Converting fq params to Filter object

2012-12-27 Thread Dyer, James
Nalini,

Assuming that you're using Solr, the hook into the collate functionality is in 
SpellCheckComponent#addCollationsToResponse .  To do what you want, you would 
have to modify the call to SpellCheckCollator to issue test queries against the 
individual words instead of the collations.

See 
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/handler/component/SpellCheckComponent.java

Of course if you're using Lucene directly and not Solr, then you would want to 
build a series of queries that each query one word with the filters applied.  
DirectSpellChecker#suggestSimilar returns an array of SuggestWord instances 
that contain the individual words you would want to try.  To optimize this, you 
can use the same approach as in SOLR-3240, implementing a Collector that only 
looks for 1 document then quits.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Thursday, December 27, 2012 2:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Converting fq params to Filter object

Hi James,

Yup, that was what I tried to do initially but it seems like calling
through to those Solr methods from DirectSpellChecker was not a good idea -
am I wrong? And like you mentioned, this seemed like it wasn't low-level
enough.

Eric: Unfortunately the collate functionality does not work for our use
case since the queries we're correcting are default OR. Here's the original
thread about this -

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3ccamqozyftgiwyrbvwsdf0hfz1sznkq9gnbjfdb_obnelsmvr...@mail.gmail.com%3E

Thanks,
Nalini

On Thu, Dec 27, 2012 at 2:46 PM, Dyer, James
wrote:

> https://issues.apache.org/jira/browse/SOLR-3240



Re: search with spaces

2012-12-27 Thread Jack Krupansky

That's &debugQuery=true or &debug=query.

-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Thursday, December 27, 2012 10:56 AM
To: solr-user@lucene.apache.org
Subject: Re: search with spaces

Hi,

Add &debugQuery=query to your search requests.  That will point you in the
right direction.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html



On Thu, Dec 27, 2012 at 3:45 AM, Sangeetha  wrote:


Hi,

I have a text field with value O O Jaane Jaane. When i search with 
*q=Jaane

Jaane* it is giving the results. But if i give *q=O O Jaane Jaane* it is
not
working? What could be the reason?

Thanks,
Sangeetha



--
View this message in context:
http://lucene.472066.n3.nabble.com/search-with-spaces-tp4029265.html
Sent from the Solr - User mailing list archive at Nabble.com.





Frequent OOM - (Unknown source in logs).

2012-12-27 Thread shreejay
Hello, 

I am seeing frequent OOMs for the past 2 days on a SolrCloud Cluster
(Solr4.0 with a patch from Solr-2592) setup (3 shards, each shard with 2
instances. Each instance is running CentOS with 30GB memory, 500GB disk
space), with a separate Zoo Keeper ensemble of 3. 

Here is the stacktrace: http://pastebin.com/cV5DxD4N

I also saw there is a Jira issue which looks similar, the difference being,
in the stacktrace I get, I can Not see which process is trying to do a
expandCapacity.
/java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)/

Where as the stacktrace mentioned in this issue
(https://issues.apache.org/jira/browse/SOLR-3881) is 
  /at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)/

Has anyone seen this issue before? Any fixes for this? 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361.html
Sent from the Solr - User mailing list archive at Nabble.com.


old index not cleaned up on the slave

2012-12-27 Thread Jason
Hi,
I'm using master/slave replication on Solr 4.0.
Replication is successfully run.
But old index not cleaned up.
Is that bug or not? 

My slave index directory is below...

$ ls -l solr_kr/krg01/data/index/
total 23472512
-rw-r--r--. 1 tomcat tomcat563722625 Dec 24 21:48 _15.fdt
-rw-r--r--. 1 tomcat tomcat   4855210 Dec 24 21:48 _15.fdx
-rw-r--r--. 1 tomcat tomcat4155 Dec 24 22:01 _15.fnm
-rw-r--r--. 1 tomcat tomcat  3367203143 Dec 24 22:01 _15_Lucene40_0.frq
-rw-r--r--. 1 tomcat tomcat  6951612380 Dec 24 22:01 _15_Lucene40_0.prx
-rw-r--r--. 1 tomcat tomcat  1096591353 Dec 24 22:01 _15_Lucene40_0.tim
-rw-r--r--. 1 tomcat tomcat 26026916 Dec 24 22:01 _15_Lucene40_0.tip
-rw-r--r--. 1 tomcat tomcat 388 Dec 24 22:01 _15.si
-rw-r--r--. 1 tomcat tomcat  98 Nov 30 13:43 segments_3
-rw-r--r--. 1 tomcat tomcat  99 Dec 24 22:01 segments_4
-rw-r--r--. 1 tomcat tomcat  20 Aug 12 07:21 segments.gen
-rw-r--r--. 1 tomcat tomcat   563742324 Nov 30 13:32 _t.fdt
-rw-r--r--. 1 tomcat tomcat  4855210 Nov 30 13:32 _t.fdx
-rw-r--r--. 1 tomcat tomcat   4155 Nov 30 13:43 _t.fnm
-rw-r--r--. 1 tomcat tomcat 3382846438 Nov 30 13:43 _t_Lucene40_0.frq
-rw-r--r--. 1 tomcat tomcat 6951620034 Nov 30 13:43 _t_Lucene40_0.prx
-rw-r--r--. 1 tomcat tomcat 1096654275 Nov 30 13:43 _t_Lucene40_0.tim
-rw-r--r--. 1 tomcat tomcat26027222 Nov 30 13:43 _t_Lucene40_0.tip
-rw-r--r--. 1 tomcat tomcat379 Nov 30 13:43 _t.si




--
View this message in context: 
http://lucene.472066.n3.nabble.com/old-index-not-cleaned-up-on-the-slave-tp4029370.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr + jetty deployment issue

2012-12-27 Thread David Parks
Do you see any errors coming in on the console, stderr?

I start solr this way and redirect the stdout and stderr to log files, when
I have a problem stderr generally has the answer:

java \
-server \
-Djetty.port=8080 \
-Dsolr.solr.home=/opt/solr \
-Dsolr.data.dir=/mnt/solr_data \
-jar /opt/solr/start.jar >/opt/solr/logs/stdout.log
2>/opt/solr/logs/stderr.log &



-Original Message-
From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com] 
Sent: Thursday, December 27, 2012 7:40 PM
To: solr-user@lucene.apache.org
Subject: solr + jetty deployment issue

Hi,

I am having trouble with getting solr + jetty to work. I am following all
instructions to the letter from - http://wiki.apache.org/solr/SolrJetty. I
also created a work folder - /opt/solr/work. I am also setting tmpdir to a
new path in /etc/default/jetty . I am confirming the tmpdir is set to the
new path from admin dashboard, under args.

It works like a charm. But when I restart jetty multiple times, after 3/4
such restarts it starts hanging. Admin pages just dont load and my app fails
to acquire a connection with solr.

What I might be missing? Should I be rather looking at my code and see if I
am not committing correctly?

Please let me know if you have faced similar issue in the past and how to
tackle it.

Thank you.

--
Best Regards,
Sushrut



MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.

If I read it correctly, it's taking "id:1004401713626" as the term (not the
document ID) and only finding it once. But I want it to match the document
with ID 1004401713626 of course. I tried &q=id[1004410713626], but that
generates an exception:

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'id:[1004401713626]': Encountered " "]" "] "" at line 1, column 17.
Was expecting one of:
"TO" ...
 ...
 ...

This must be easy, but the documentation is minimal.

My Query:
http://107.23.102.164:8080/solr/select/?qt=mlt&q=id:[1004401713626]&rows=10&;
mlt.fl=item_name,item_brand,short_description,long_description,catalog_names
,categories,keywords,attributes,facetime&mlt.mintf=2&mlt.mindf=5&mlt.maxqt=1
00&mlt.boost=false&debugQuery=true


  
0
1

  5
  
item_name,item_brand,short_description,long_description,catalog_names,catego
ries,keywords,attributes,facetime

  false
  true
  id:1004401713626
  2
  100
  mlt
  10


  
0
1004401713626
  


  id:1004401713626
  id:1004401713626
  id:1004401713626
  id:1004401713626
  

18.29481 = (MATCH) fieldWeight(id:1004401713626 in 2843152), product of: 1.0
= tf(termFreq(id:1004401713626)=1) 18.29481 = idf(docFreq=1,
maxDocs=64873893) 1.0 = fieldNorm(field=id, doc=2843152)

  



Re: solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
Hi David,

>From what I see in the log and threaddump it seems that getSearcher method
in SolrCore is not able to acquire required lock and because of that its
blocking startup of the server. Here is threaddump -
http://pastebin.com/GPnAzF1q .

On Fri, Dec 28, 2012 at 8:01 AM, David Parks  wrote:

> Do you see any errors coming in on the console, stderr?
>
> I start solr this way and redirect the stdout and stderr to log files, when
> I have a problem stderr generally has the answer:
>
> java \
> -server \
> -Djetty.port=8080 \
> -Dsolr.solr.home=/opt/solr \
> -Dsolr.data.dir=/mnt/solr_data \
> -jar /opt/solr/start.jar >/opt/solr/logs/stdout.log
> 2>/opt/solr/logs/stderr.log &
>
>
>
> -Original Message-
> From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com]
> Sent: Thursday, December 27, 2012 7:40 PM
> To: solr-user@lucene.apache.org
> Subject: solr + jetty deployment issue
>
> Hi,
>
> I am having trouble with getting solr + jetty to work. I am following all
> instructions to the letter from - http://wiki.apache.org/solr/SolrJetty. I
> also created a work folder - /opt/solr/work. I am also setting tmpdir to a
> new path in /etc/default/jetty . I am confirming the tmpdir is set to the
> new path from admin dashboard, under args.
>
> It works like a charm. But when I restart jetty multiple times, after 3/4
> such restarts it starts hanging. Admin pages just dont load and my app
> fails
> to acquire a connection with solr.
>
> What I might be missing? Should I be rather looking at my code and see if I
> am not committing correctly?
>
> Please let me know if you have faced similar issue in the past and how to
> tackle it.
>
> Thank you.
>
> --
> Best Regards,
> Sushrut
>
>


-- 
Best Regards,
Sushrut
http://sushrutbidwai.com


Re: MoreLikeThis only returns 1 result

2012-12-27 Thread Jack Krupansky
Sounds like it is simply dispatching to the normal search request handler. 
Although you specified qt=mlt, make sure you enable the legacy select 
handler dispatching in solrconfig.xml.


Change:

   

to

   

Or, simply address the MLT handler directly:

   http://107.23.102.164:8080/solr/mlt?q=...

Or, use the MoreLikeThis search component:

   http://localhost:8983/solr/select?q=...&mlt=true&;...

See:
http://wiki.apache.org/solr/MoreLikeThis

-- Jack Krupansky

-Original Message- 
From: David Parks

Sent: Thursday, December 27, 2012 9:59 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis only returns 1 result

I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.

If I read it correctly, it's taking "id:1004401713626" as the term (not the
document ID) and only finding it once. But I want it to match the document
with ID 1004401713626 of course. I tried &q=id[1004410713626], but that
generates an exception:

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'id:[1004401713626]': Encountered " "]" "] "" at line 1, column 17.
Was expecting one of:
   "TO" ...
...
...

This must be easy, but the documentation is minimal.

My Query:
http://107.23.102.164:8080/solr/select/?qt=mlt&q=id:[1004401713626]&rows=10&;
mlt.fl=item_name,item_brand,short_description,long_description,catalog_names
,categories,keywords,attributes,facetime&mlt.mintf=2&mlt.mindf=5&mlt.maxqt=1
00&mlt.boost=false&debugQuery=true


 
   0
   1
   
 5
 
item_name,item_brand,short_description,long_description,catalog_names,catego
ries,keywords,attributes,facetime

 false
 true
 id:1004401713626
 2
 100
 mlt
 10
   
   
 
   0
   1004401713626
 
   
   
 id:1004401713626
 id:1004401713626
 id:1004401713626
 id:1004401713626
 
   
18.29481 = (MATCH) fieldWeight(id:1004401713626 in 2843152), product of: 1.0
= tf(termFreq(id:1004401713626)=1) 18.29481 = idf(docFreq=1,
maxDocs=64873893) 1.0 = fieldNorm(field=id, doc=2843152)

  



RE: MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
Ok, that worked, I had the /mlt request handler misconfigured (forgot a
'/'). It's working now. Thanks!

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Friday, December 28, 2012 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis only returns 1 result

Sounds like it is simply dispatching to the normal search request handler. 
Although you specified qt=mlt, make sure you enable the legacy select
handler dispatching in solrconfig.xml.

Change:



to



Or, simply address the MLT handler directly:

http://107.23.102.164:8080/solr/mlt?q=...

Or, use the MoreLikeThis search component:

http://localhost:8983/solr/select?q=...&mlt=true&;...

See:
http://wiki.apache.org/solr/MoreLikeThis

-- Jack Krupansky

-Original Message-
From: David Parks
Sent: Thursday, December 27, 2012 9:59 PM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis only returns 1 result

I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.

If I read it correctly, it's taking "id:1004401713626" as the term (not the
document ID) and only finding it once. But I want it to match the document
with ID 1004401713626 of course. I tried &q=id[1004410713626], but that
generates an exception:

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'id:[1004401713626]': Encountered " "]" "] "" at line 1, column 17.
Was expecting one of:
"TO" ...
 ...
 ...

This must be easy, but the documentation is minimal.

My Query:
http://107.23.102.164:8080/solr/select/?qt=mlt&q=id:[1004401713626]&rows=10&;
mlt.fl=item_name,item_brand,short_description,long_description,catalog_names
,categories,keywords,attributes,facetime&mlt.mintf=2&mlt.mindf=5&mlt.maxqt=1
00&mlt.boost=false&debugQuery=true


  
0
1

  5
  
item_name,item_brand,short_description,long_description,catalog_names,catego
ries,keywords,attributes,facetime

  false
  true
  id:1004401713626
  2
  100
  mlt
  10


  
0
1004401713626
  


  id:1004401713626
  id:1004401713626
  id:1004401713626
  id:1004401713626
  

18.29481 = (MATCH) fieldWeight(id:1004401713626 in 2843152), product of: 1.0
= tf(termFreq(id:1004401713626)=1) 18.29481 = idf(docFreq=1,
maxDocs=64873893) 1.0 = fieldNorm(field=id, doc=2843152) 
   



RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
I'm somewhat new to Solr (it's running, I've been through the books, but I'm
no master). What I hear you say is that MLT *can* accept, say 5, documents
and provide results, but the results would essentially be the same as
running the query 5 times for each document?

If that's the case, I might accept it. I would just have to merge them
together at the end (perhaps I'd take the top 2 of each result, for
example).

Being somewhat new I'm a little confused by the difference between a "Search
Component" and a "Handler". I've got the /mlt handler working and I'm using
that. But how's that different from a "Search Component"? Is that referring
to the default /solr/select?q="..." style query?

And if what I said about multiple documents above is correct, what's the
syntax to try that out?

Thanks very much for the great help!
Dave


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, December 26, 2012 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

MLT has both a request handler and a search component.

The MLT handler returns similar documents only for the first document that
the query matches.

The MLT search component returns similar documents for each of the documents
in the search results, but processes each search result base document one at
a time and keeps its similar documents segregated by each of the base
documents.

It sounds like you wanted to merge the base search results and then find
documents similar to that merged super-document. Is that what you were
really seeking, as opposed to what the MLT component does? Unfortunately,
you can't do that with the components as they are.

You would have to manually merge the values from the base documents and then
you could POST that text back to the MLT handler and find similar documents
using the posted text rather than a query. Kind of messy, but in theory that
should work.

-- Jack Krupansky

-Original Message-
From: David Parks
Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis supporting multiple document IDs as input?

I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?

Example:

  - The user is browsing 5 different articles
  - I send Solr the IDs of these 5 articles so I can present the user other
similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mlt&q=id:[document
id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10

But can I send it 2+ document IDs as the query? 



Re: solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
Here is latest threaddump taken after setting up latest nightly build
version - apache-solr-4.1-2012-12-27_04-32-37 - http://pastebin.com/eum7CxX4

Kind of stuck with this from few days now, so can use little help.

Here is more details on the issue -
1. Setting up jetty + solr using instructions -
http://wiki.apache.org/solr/SolrJetty
2. Initial install with clean data dirs goes smoothly.
3. I can connect to server and index 10K+ documents with out any issues. I
use 10 threads in my app to do so. Not experiencing any
concurrency/deadlock issues.
4. When stop my app and then restart jetty, after few restarts - I get
above mentioned threaddump and startup of server stays blocked forever.
5. If I delete data dir and start again, problem goes away. But reappears
on server restarts.

On Fri, Dec 28, 2012 at 9:03 AM, Sushrut Bidwai wrote:

> Hi David,
>
> From what I see in the log and threaddump it seems that getSearcher method
> in SolrCore is not able to acquire required lock and because of that its
> blocking startup of the server. Here is threaddump -
> http://pastebin.com/GPnAzF1q .
>
>
> On Fri, Dec 28, 2012 at 8:01 AM, David Parks wrote:
>
>> Do you see any errors coming in on the console, stderr?
>>
>> I start solr this way and redirect the stdout and stderr to log files,
>> when
>> I have a problem stderr generally has the answer:
>>
>> java \
>> -server \
>> -Djetty.port=8080 \
>> -Dsolr.solr.home=/opt/solr \
>> -Dsolr.data.dir=/mnt/solr_data \
>> -jar /opt/solr/start.jar >/opt/solr/logs/stdout.log
>> 2>/opt/solr/logs/stderr.log &
>>
>>
>>
>> -Original Message-
>> From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com]
>> Sent: Thursday, December 27, 2012 7:40 PM
>> To: solr-user@lucene.apache.org
>> Subject: solr + jetty deployment issue
>>
>> Hi,
>>
>> I am having trouble with getting solr + jetty to work. I am following all
>> instructions to the letter from - http://wiki.apache.org/solr/SolrJetty.
>> I
>> also created a work folder - /opt/solr/work. I am also setting tmpdir to a
>> new path in /etc/default/jetty . I am confirming the tmpdir is set to the
>> new path from admin dashboard, under args.
>>
>> It works like a charm. But when I restart jetty multiple times, after 3/4
>> such restarts it starts hanging. Admin pages just dont load and my app
>> fails
>> to acquire a connection with solr.
>>
>> What I might be missing? Should I be rather looking at my code and see if
>> I
>> am not committing correctly?
>>
>> Please let me know if you have faced similar issue in the past and how to
>> tackle it.
>>
>> Thank you.
>>
>> --
>> Best Regards,
>> Sushrut
>>
>>
>
>
> --
> Best Regards,
> Sushrut
> http://sushrutbidwai.com
>



-- 
Best Regards,
Sushrut
http://sushrutbidwai.com


RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread Otis Gospodnetic
Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how
they are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks"  wrote:

> I'm somewhat new to Solr (it's running, I've been through the books, but
> I'm
> no master). What I hear you say is that MLT *can* accept, say 5, documents
> and provide results, but the results would essentially be the same as
> running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them
> together at the end (perhaps I'd take the top 2 of each result, for
> example).
>
> Being somewhat new I'm a little confused by the difference between a
> "Search
> Component" and a "Handler". I've got the /mlt handler working and I'm using
> that. But how's that different from a "Search Component"? Is that referring
> to the default /solr/select?q="..." style query?
>
> And if what I said about multiple documents above is correct, what's the
> syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document that
> the query matches.
>
> The MLT search component returns similar documents for each of the
> documents
> in the search results, but processes each search result base document one
> at
> a time and keeps its similar documents segregated by each of the base
> documents.
>
> It sounds like you wanted to merge the base search results and then find
> documents similar to that merged super-document. Is that what you were
> really seeking, as opposed to what the MLT component does? Unfortunately,
> you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents and
> then
> you could POST that text back to the MLT handler and find similar documents
> using the posted text rather than a query. Kind of messy, but in theory
> that
> should work.
>
> -- Jack Krupansky
>
> -Original Message-
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to give
> Solr X # of document IDs and tell it that I want documents similar to those
> X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user other
> similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
>


Re: solr + jetty deployment issue

2012-12-27 Thread Sushrut Bidwai
If I comment out the /browse requesthandler from solrconfig.xml, problem
goes away. So issue is definitely with the way I am configuring
solrconfig.xml. I will debug into on my side.

On Fri, Dec 28, 2012 at 11:55 AM, Sushrut Bidwai
wrote:

> Here is latest threaddump taken after setting up latest nightly build
> version - apache-solr-4.1-2012-12-27_04-32-37 -
> http://pastebin.com/eum7CxX4
>
> Kind of stuck with this from few days now, so can use little help.
>
> Here is more details on the issue -
> 1. Setting up jetty + solr using instructions -
> http://wiki.apache.org/solr/SolrJetty
> 2. Initial install with clean data dirs goes smoothly.
> 3. I can connect to server and index 10K+ documents with out any issues. I
> use 10 threads in my app to do so. Not experiencing any
> concurrency/deadlock issues.
> 4. When stop my app and then restart jetty, after few restarts - I get
> above mentioned threaddump and startup of server stays blocked forever.
> 5. If I delete data dir and start again, problem goes away. But reappears
> on server restarts.
>
>
> On Fri, Dec 28, 2012 at 9:03 AM, Sushrut Bidwai 
> wrote:
>
>> Hi David,
>>
>> From what I see in the log and threaddump it seems that getSearcher
>> method in SolrCore is not able to acquire required lock and because of that
>> its blocking startup of the server. Here is threaddump -
>> http://pastebin.com/GPnAzF1q .
>>
>>
>> On Fri, Dec 28, 2012 at 8:01 AM, David Parks wrote:
>>
>>> Do you see any errors coming in on the console, stderr?
>>>
>>> I start solr this way and redirect the stdout and stderr to log files,
>>> when
>>> I have a problem stderr generally has the answer:
>>>
>>> java \
>>> -server \
>>> -Djetty.port=8080 \
>>> -Dsolr.solr.home=/opt/solr \
>>> -Dsolr.data.dir=/mnt/solr_data \
>>> -jar /opt/solr/start.jar >/opt/solr/logs/stdout.log
>>> 2>/opt/solr/logs/stderr.log &
>>>
>>>
>>>
>>> -Original Message-
>>> From: Sushrut Bidwai [mailto:bidwai.sush...@gmail.com]
>>> Sent: Thursday, December 27, 2012 7:40 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: solr + jetty deployment issue
>>>
>>> Hi,
>>>
>>> I am having trouble with getting solr + jetty to work. I am following all
>>> instructions to the letter from - http://wiki.apache.org/solr/SolrJetty.
>>> I
>>> also created a work folder - /opt/solr/work. I am also setting tmpdir to
>>> a
>>> new path in /etc/default/jetty . I am confirming the tmpdir is set to the
>>> new path from admin dashboard, under args.
>>>
>>> It works like a charm. But when I restart jetty multiple times, after 3/4
>>> such restarts it starts hanging. Admin pages just dont load and my app
>>> fails
>>> to acquire a connection with solr.
>>>
>>> What I might be missing? Should I be rather looking at my code and see
>>> if I
>>> am not committing correctly?
>>>
>>> Please let me know if you have faced similar issue in the past and how to
>>> tackle it.
>>>
>>> Thank you.
>>>
>>> --
>>> Best Regards,
>>> Sushrut
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Sushrut
>> http://sushrutbidwai.com
>>
>
>
>
> --
> Best Regards,
> Sushrut
> http://sushrutbidwai.com
>



-- 
Best Regards,
Sushrut
http://sushrutbidwai.com


RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

 - The MLT search component returns similar documents for each of the
documents in the search results
 - The MLT handler returns similar documents only for the first document
that the query matches.



-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks"  wrote:

> I'm somewhat new to Solr (it's running, I've been through the books, 
> but I'm no master). What I hear you say is that MLT *can* accept, say 
> 5, documents and provide results, but the results would essentially be 
> the same as running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them 
> together at the end (perhaps I'd take the top 2 of each result, for 
> example).
>
> Being somewhat new I'm a little confused by the difference between a 
> "Search Component" and a "Handler". I've got the /mlt handler working 
> and I'm using that. But how's that different from a "Search 
> Component"? Is that referring to the default /solr/select?q="..." 
> style query?
>
> And if what I said about multiple documents above is correct, what's 
> the syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document 
> that the query matches.
>
> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then 
> find documents similar to that merged super-document. Is that what you 
> were really seeking, as opposed to what the MLT component does? 
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents 
> and then you could POST that text back to the MLT handler and find 
> similar documents using the posted text rather than a query. Kind of 
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -Original Message-
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to 
> give Solr X # of document IDs and tell it that I want documents 
> similar to those X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user 
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
>