Re: spellcheck-index is rebuilt on commit

2012-01-03 Thread OliverS
Hi all

Thanks a lot, and it seems to be a bug, but not of 4.0 only. You are right,
I was doing a commit on an optimized index without adding any new docs (in
fact, I did this for replication on the master). I will open a ticket as
soon as I fully understand what's going on. I have difficulties
understanding Simons answer:
* building the spellcheck-index is triggered by a new searcher?
* why would this not happen after post/commit?

Thanks
Oliver

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-index-is-rebuilt-on-commit-tp3626492p3628423.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: spellcheck-index is rebuilt on commit

2012-01-03 Thread Simon Willnauer
On Tue, Jan 3, 2012 at 9:12 AM, OliverS  wrote:
> Hi all
>
> Thanks a lot, and it seems to be a bug, but not of 4.0 only. You are right,
> I was doing a commit on an optimized index without adding any new docs (in
> fact, I did this for replication on the master). I will open a ticket as
> soon as I fully understand what's going on. I have difficulties
> understanding Simons answer:
> * building the spellcheck-index is triggered by a new searcher?
> * why would this not happen after post/commit?

a commit in solr forces a new searcher to be opened. this new searcher
is passed to the spellcheckers listener which reopens / rebuilds the
spellcheck index. Yet, if you way rebuildOnOptimize=true it only
checks if the index has a single segment. since you didn't change
anything since this was last checked it still has one segment. The
problem is that the listener doesn't safe any state or the version of
the index since it was last called and assumes the index was just
optimized.

simon
>
> Thanks
> Oliver
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/spellcheck-index-is-rebuilt-on-commit-tp3626492p3628423.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: spellcheck-index is rebuilt on commit

2012-01-03 Thread OliverS
Thanks for the clear explanation. I'll open a ticket as soon as jira is up
running again.

Oliver

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-index-is-rebuilt-on-commit-tp3626492p3628603.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: spellcheck-index is rebuilt on commit

2012-01-03 Thread OliverS
A jira-ticket has been issued, this discussion here is closed.
https://issues.apache.org/jira/browse/SOLR-2999

Oliver

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-index-is-rebuilt-on-commit-tp3626492p3628894.html
Sent from the Solr - User mailing list archive at Nabble.com.


soft commit 2

2012-01-03 Thread ramires
hi 

softcommit work with below command but don`t work in solrconfig.xml. What is
wrong with below xml part?

curl http://localhost:8984/solr/update -H "Content-Type: text/xml"
--data-binary ''

 

   1000
 
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/soft-commit-2-tp3628975p3628975.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting in 3.5?

2012-01-03 Thread Juan Grande
Hi Darren,

Would you please tell us all the parameters that you are sending in the
request? You can use the parameter "echoParams=all" to get the list in the
output.

Thanks,

*Juan*



On Mon, Jan 2, 2012 at 8:37 PM, Darren Govoni  wrote:

> Forgot to add, that the time when I DO want the highlight to appear would
> be with a query that DOES match the default field.
>
> {!lucene q.op=OR df=text_t}  kind_s:doc AND (( field_t:[* TO *] ))  cars
>
> Where the term 'cars' would be matched against the df. Then I want the
> highlight for it.
>
> If there are no query term matches for the df, then getting ALL the field
> terms highlighted (as it does now) is rather perplexing feature.
>
> Darren
>
>
> On 01/02/2012 06:28 PM, Darren Govoni wrote:
>
>> Hi Juan,
>>  Setting that parameter produces the same extraneous results. Here is my
>> query:
>>
>> {!lucene q.op=OR df=text_t}  kind_s:doc AND (( field_t:[* TO *] ))
>>
>> Clearly, the default field (text_t) is not being searched by this query
>> and highlighting it would be semantically incongruent with the query.
>>
>> Is it a bug?
>>
>> Darren
>>
>> On 01/02/2012 04:39 PM, Juan Grande wrote:
>>
>>> Hi Darren,
>>>
>>> This is the expected behavior. Have you tried setting the
>>> hl.requireFieldMatch parameter to true? See:
>>> http://wiki.apache.org/solr/**HighlightingParameters#hl.**
>>> requireFieldMatch
>>>
>>> *Juan*
>>>
>>>
>>>
>>> On Mon, Jan 2, 2012 at 10:54 AM, Darren Govoni
>>>  wrote:
>>>
>>>  Hi,
  Can someone tell me if this is correct behavior from Solr.

 I search on a dynamic field:

 field_t:[* TO *]

 I set highlight fields to "field_t,text_t" but I am not searching
 specifically inside text_t field.

 The highlights for text_t come back with EVERY WORD. Maybe because of
 the
 [* TO *], but
 the query semantics indicate not searching on text_t even though
 highlighting is enabled.

 Is this correct behavior? it produces unwanted highlight results.

 I would expect Solr to know what fields are participating in the query
 and
 only highlight
 those that are involved in the result set.

 Thanks,
 Darren


>>
>


Re: Using SOLR Autocomplete for addresses (i.e. multiple terms)

2012-01-03 Thread Jan Høydahl
Hi,

As you see, you've got an answer at StackOverflow already with a proposed 
solution to implement your own QueryConverter.

Another way is to create a Solr core solely for Suggest, and tune it exactly 
the way you like. Then you can have it suggest from the whole input as well as 
individual tokens and weigh these as you choose, as well as implement phonetic 
normalization and other useful tricks.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 3. jan. 2012, at 00:52, Dave wrote:

> Hi,
> 
> I'm reposting my StackOverflow question to this thread as I'm not getting
> much of a response there. Thank you for any assistance you can provide!
> 
> http://stackoverflow.com/questions/8705600/using-solr-autocomplete-for-addresses
> 
> I'm new to SOLR, but I've got it up and running, indexing data via the DIH,
> and properly returning results for queries. I'm trying to setup another
> core to run suggester, in order to autocomplete geographical locations. We
> have a web application that needs to take a city, state / region, country
> input. We'd like to do this in a single entry box. Here are some examples:
> 
> Brooklyn, New York, United States of America
> Philadelphia, Pennsylvania, United States of America
> Barcelona, Catalunya, Spain
> 
> Assume for now that every location around the world can be split into this
> 3-form input. I've setup my DIH to create a TemplateTransformer field that
> combines the 4 tables (city, state and country are all independent tables
> connected to each other by a master places table) into a field called
> "fullplacename":
> 
> 
> 
> I've defined a "text_auto" field in schema.xml:
> 
> 
>
>
>
>
> 
> 
> and have defined these two fields as well:
> 
>  stored="true" multiValued="true" />
> 
> 
> Now, here's my problem. This works fine for the first term, i.e. if I type
> "brooklyn" I get the results I'd expect, using this URL to query:
> 
> http://localhost:8983/solr/places/suggest?q=brooklyn
> 
> However, as soon as I put a comma and/or a space in there, it breaks them
> up into 2 suggestions, and I get a suggestion for each:
> 
> http://localhost:8983/solr/places/suggest?q=brooklyn%2C%20ny
> 
> Gives me a suggestion for "brooklyn" and a suggestion for "ny" instead of a
> suggestion that matches "brooklyn, ny". I've tried every solution I can
> find via google and haven't had any luck. Is there something simple that
> I've missed, or is this the wrong approach?
> 
> Just in case, here's the searchComponent and requestHandler definition:
> 
>  class="org.apache.solr.handler.component.SearchHandler">
>
>true
>suggest
>10
>
>
>suggest
>
> 
> 
> 
>
>suggest
>org.apache.solr.spelling.suggest.Suggester
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>name_autocomplete`
>
> 
> 
> Thanks for any assistance!



doing snapshot after optimize - rotation parameter?

2012-01-03 Thread Torsten Krah
Hi,

i am taking snapshots of my master index after optimize calls (run each
day once), to get a clean backup of the index.
Is there a parameter to tell the replication handler how many snapshots
to keep and the rest should be deleted? Or must i use a custom script
via cron?

regards

Torsten


smime.p7s
Description: S/MIME cryptographic signature


Re: Highlighting in 3.5?

2012-01-03 Thread darren
I will. Thanks.

> Hi Darren,
>
> Would you please tell us all the parameters that you are sending in the
> request? You can use the parameter "echoParams=all" to get the list in the
> output.
>
> Thanks,
>
> *Juan*
>
>
>
> On Mon, Jan 2, 2012 at 8:37 PM, Darren Govoni  wrote:
>
>> Forgot to add, that the time when I DO want the highlight to appear
>> would
>> be with a query that DOES match the default field.
>>
>> {!lucene q.op=OR df=text_t}  kind_s:doc AND (( field_t:[* TO *] ))  cars
>>
>> Where the term 'cars' would be matched against the df. Then I want the
>> highlight for it.
>>
>> If there are no query term matches for the df, then getting ALL the
>> field
>> terms highlighted (as it does now) is rather perplexing feature.
>>
>> Darren
>>
>>
>> On 01/02/2012 06:28 PM, Darren Govoni wrote:
>>
>>> Hi Juan,
>>>  Setting that parameter produces the same extraneous results. Here is
>>> my
>>> query:
>>>
>>> {!lucene q.op=OR df=text_t}  kind_s:doc AND (( field_t:[* TO *] ))
>>>
>>> Clearly, the default field (text_t) is not being searched by this query
>>> and highlighting it would be semantically incongruent with the query.
>>>
>>> Is it a bug?
>>>
>>> Darren
>>>
>>> On 01/02/2012 04:39 PM, Juan Grande wrote:
>>>
 Hi Darren,

 This is the expected behavior. Have you tried setting the
 hl.requireFieldMatch parameter to true? See:
 http://wiki.apache.org/solr/**HighlightingParameters#hl.**
 requireFieldMatch

 *Juan*



 On Mon, Jan 2, 2012 at 10:54 AM, Darren Govoni
  wrote:

  Hi,
>  Can someone tell me if this is correct behavior from Solr.
>
> I search on a dynamic field:
>
> field_t:[* TO *]
>
> I set highlight fields to "field_t,text_t" but I am not searching
> specifically inside text_t field.
>
> The highlights for text_t come back with EVERY WORD. Maybe because of
> the
> [* TO *], but
> the query semantics indicate not searching on text_t even though
> highlighting is enabled.
>
> Is this correct behavior? it produces unwanted highlight results.
>
> I would expect Solr to know what fields are participating in the
> query
> and
> only highlight
> those that are involved in the result set.
>
> Thanks,
> Darren
>
>
>>>
>>
>



Re: Using SOLR Autocomplete for addresses (i.e. multiple terms)

2012-01-03 Thread Dave
Hi Jan,

Yes, I just saw the answer. I've implemented that, and it's working as
expected. I do have Suggest running on its own core, separate from my
standard search handler. I think, however, that the custom QueryConverter
that was linked to is now too restrictive. For example, it works perfectly
when someone enters "brooklyn, n", but if they start by entering "ny" or
"new york" it doesn't return anything. I think what you're talking about,
suggesting from whole input and individual tokens is the way to go. Is
there anything you can point me to as a starting point? I think I've got
the basic setup, but I'm not quite comfortable enough with SOLR and the
SOLR architecture yet (honestly I've only been using it for about 2 weeks
now).

Thanks for the help!

Dave

On Tue, Jan 3, 2012 at 8:24 AM, Jan Høydahl  wrote:

> Hi,
>
> As you see, you've got an answer at StackOverflow already with a proposed
> solution to implement your own QueryConverter.
>
> Another way is to create a Solr core solely for Suggest, and tune it
> exactly the way you like. Then you can have it suggest from the whole input
> as well as individual tokens and weigh these as you choose, as well as
> implement phonetic normalization and other useful tricks.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 3. jan. 2012, at 00:52, Dave wrote:
>
> > Hi,
> >
> > I'm reposting my StackOverflow question to this thread as I'm not getting
> > much of a response there. Thank you for any assistance you can provide!
> >
> >
> http://stackoverflow.com/questions/8705600/using-solr-autocomplete-for-addresses
> >
> > I'm new to SOLR, but I've got it up and running, indexing data via the
> DIH,
> > and properly returning results for queries. I'm trying to setup another
> > core to run suggester, in order to autocomplete geographical locations.
> We
> > have a web application that needs to take a city, state / region, country
> > input. We'd like to do this in a single entry box. Here are some
> examples:
> >
> > Brooklyn, New York, United States of America
> > Philadelphia, Pennsylvania, United States of America
> > Barcelona, Catalunya, Spain
> >
> > Assume for now that every location around the world can be split into
> this
> > 3-form input. I've setup my DIH to create a TemplateTransformer field
> that
> > combines the 4 tables (city, state and country are all independent tables
> > connected to each other by a master places table) into a field called
> > "fullplacename":
> >
> > 
> >
> > I've defined a "text_auto" field in schema.xml:
> >
> > 
> >
> >
> >
> >
> > 
> >
> > and have defined these two fields as well:
> >
> >  > stored="true" multiValued="true" />
> > 
> >
> > Now, here's my problem. This works fine for the first term, i.e. if I
> type
> > "brooklyn" I get the results I'd expect, using this URL to query:
> >
> > http://localhost:8983/solr/places/suggest?q=brooklyn
> >
> > However, as soon as I put a comma and/or a space in there, it breaks them
> > up into 2 suggestions, and I get a suggestion for each:
> >
> > http://localhost:8983/solr/places/suggest?q=brooklyn%2C%20ny
> >
> > Gives me a suggestion for "brooklyn" and a suggestion for "ny" instead
> of a
> > suggestion that matches "brooklyn, ny". I've tried every solution I can
> > find via google and haven't had any luck. Is there something simple that
> > I've missed, or is this the wrong approach?
> >
> > Just in case, here's the searchComponent and requestHandler definition:
> >
> >  > class="org.apache.solr.handler.component.SearchHandler">
> >
> >true
> >suggest
> >10
> >
> >
> >suggest
> >
> > 
> >
> > 
> >
> >suggest
> > name="classname">org.apache.solr.spelling.suggest.Suggester
> > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
> >name_autocomplete`
> >
> > 
> >
> > Thanks for any assistance!
>
>


Re: Solr and External Fields

2012-01-03 Thread astubbs
I'm also very interested in this - for my regex augmenter. If we could get an
augmenter to add highlighting results directly to the doc, like the explain
augmenter does, then I could definitely write up that regex augmenter..

http://lucene.472066.n3.nabble.com/Regex-DocTransformer-td3627314.html

I've been looking at the code, but I don't see where to get the highlighting
results from the context object, or if highlighting analysis has even been
performed by the point the doctransformer's get run. Although IMO,
doctransformers should be run when the result is ready to be sent to the
client, but before serialisation.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-External-Fields-tp3180030p3629669.html
Sent from the Solr - User mailing list archive at Nabble.com.


charFilter PatternReplaceCharFilterFactory and highlighting

2012-01-03 Thread darul
Hello,

I wanted to use char filter PatternReplaceCharFilterFactory to avoid
specific content to be indexed. 

At the end I get many issues with highlights and offsets...so I remove it,
example :



Example of content :



My charfilter should clean it like :



I do not understand why offset of highlights are disturbed by charFilter
while it is defined in first, it may change content before highlight
processing occurs ?



Do you have any solutions, we really need charFilter feature ?

Thanks,

Jul





--
View this message in context: 
http://lucene.472066.n3.nabble.com/charFilter-PatternReplaceCharFilterFactory-and-highlighting-tp3629699p3629699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Commit without an update handler?

2012-01-03 Thread Martin Koch
Hi List

I have a Solr cluster set up in a master/slave configuration where the
master acts as an indexing node and the slaves serve user requests.

To avoid accidental posts of new documents to the slaves, I have disabled
the update handlers.

However, I use an externalFileField. When the file is updated, I need to
issue a commit to reload the new file. This requires an update handler. Is
there an update handler that doesn't accept new documents, but will effect
a commit?

Thanks,
/Martin


Re: Using SOLR Autocomplete for addresses (i.e. multiple terms)

2012-01-03 Thread Dave
I've got another question for anyone that might have some insight - how do
you get all of your indexed information along with the suggestions? i.e. if
each suggestion has an ID# associated with it, do I have to then query for
that ID#, or is there some way or specifying a field list in the URL to the
suggester?

Thanks!
Dave

On Tue, Jan 3, 2012 at 9:41 AM, Dave  wrote:

> Hi Jan,
>
> Yes, I just saw the answer. I've implemented that, and it's working as
> expected. I do have Suggest running on its own core, separate from my
> standard search handler. I think, however, that the custom QueryConverter
> that was linked to is now too restrictive. For example, it works perfectly
> when someone enters "brooklyn, n", but if they start by entering "ny" or
> "new york" it doesn't return anything. I think what you're talking about,
> suggesting from whole input and individual tokens is the way to go. Is
> there anything you can point me to as a starting point? I think I've got
> the basic setup, but I'm not quite comfortable enough with SOLR and the
> SOLR architecture yet (honestly I've only been using it for about 2 weeks
> now).
>
> Thanks for the help!
>
> Dave
>
>
> On Tue, Jan 3, 2012 at 8:24 AM, Jan Høydahl  wrote:
>
>> Hi,
>>
>> As you see, you've got an answer at StackOverflow already with a proposed
>> solution to implement your own QueryConverter.
>>
>> Another way is to create a Solr core solely for Suggest, and tune it
>> exactly the way you like. Then you can have it suggest from the whole input
>> as well as individual tokens and weigh these as you choose, as well as
>> implement phonetic normalization and other useful tricks.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> On 3. jan. 2012, at 00:52, Dave wrote:
>>
>> > Hi,
>> >
>> > I'm reposting my StackOverflow question to this thread as I'm not
>> getting
>> > much of a response there. Thank you for any assistance you can provide!
>> >
>> >
>> http://stackoverflow.com/questions/8705600/using-solr-autocomplete-for-addresses
>> >
>> > I'm new to SOLR, but I've got it up and running, indexing data via the
>> DIH,
>> > and properly returning results for queries. I'm trying to setup another
>> > core to run suggester, in order to autocomplete geographical locations.
>> We
>> > have a web application that needs to take a city, state / region,
>> country
>> > input. We'd like to do this in a single entry box. Here are some
>> examples:
>> >
>> > Brooklyn, New York, United States of America
>> > Philadelphia, Pennsylvania, United States of America
>> > Barcelona, Catalunya, Spain
>> >
>> > Assume for now that every location around the world can be split into
>> this
>> > 3-form input. I've setup my DIH to create a TemplateTransformer field
>> that
>> > combines the 4 tables (city, state and country are all independent
>> tables
>> > connected to each other by a master places table) into a field called
>> > "fullplacename":
>> >
>> > 
>> >
>> > I've defined a "text_auto" field in schema.xml:
>> >
>> > 
>> >
>> >
>> >
>> >
>> > 
>> >
>> > and have defined these two fields as well:
>> >
>> > > > stored="true" multiValued="true" />
>> > 
>> >
>> > Now, here's my problem. This works fine for the first term, i.e. if I
>> type
>> > "brooklyn" I get the results I'd expect, using this URL to query:
>> >
>> > http://localhost:8983/solr/places/suggest?q=brooklyn
>> >
>> > However, as soon as I put a comma and/or a space in there, it breaks
>> them
>> > up into 2 suggestions, and I get a suggestion for each:
>> >
>> > http://localhost:8983/solr/places/suggest?q=brooklyn%2C%20ny
>> >
>> > Gives me a suggestion for "brooklyn" and a suggestion for "ny" instead
>> of a
>> > suggestion that matches "brooklyn, ny". I've tried every solution I can
>> > find via google and haven't had any luck. Is there something simple that
>> > I've missed, or is this the wrong approach?
>> >
>> > Just in case, here's the searchComponent and requestHandler definition:
>> >
>> > > > class="org.apache.solr.handler.component.SearchHandler">
>> >
>> >true
>> >suggest
>> >10
>> >
>> >
>> >suggest
>> >
>> > 
>> >
>> > 
>> >
>> >suggest
>> >> name="classname">org.apache.solr.spelling.suggest.Suggester
>> >> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>> >name_autocomplete`
>> >
>> > 
>> >
>> > Thanks for any assistance!
>>
>>
>


Re: Highlighting with prefix queries and maxBooleanClause

2012-01-03 Thread Chris Hostetter

: About bumping MaxBooleanQueries. You can certainly
: bump it up, but it's a legitimate question whether the
: user is well served by allowing that pattern as opposed
: to requiring 2 or 3 leading characters. The assumption

i think the root of the issue here is that when executing queries, really 
broad prefix queries like "q=*" generate constant score queries, so relaly 
broad prefix queries are "safe" to execute.  but (based on his error) it 
seems like the highlighter fails loudly an painfully on these otherwise 
"safe" queries.

understandably, part of the reason this happens is that the highlighter 
needs to know all the terms that that prefix expands to in order to know 
what to highlight, but the fact that it generates an error when 
maxBooleanClause is hit seems unfortunate -- maybe there is no way arround 
it, but i *thought* there were options that could be used related to 
highlighting to mitigate these issues, i just couldn't remember what they 
are (does the FastVectorHighlighter have these problems? is it only if you 
use WeightedSpanTermExtractor?) and hence my suggestion to Michael to 
start a thread here in the hopes that the highlighting experts (Yeah Koji! 
... better you then me!) would chime in.


-Hoss


Doubt Regarding Shards Index

2012-01-03 Thread Suneel
I am using solr My index becomes too large I want to implement shards concept
but i have some doubt. i searched a lot but not found satisfied result.

1. We need to create handler for shards in solrconfig.xml ?

2. Index will be different for each shards instance means we need to break
data in part to create index for each instance or index will be same?

3. How i will recognize which instance return the result ?


Please provide me above details this will be very helpful for me.

Thanks & Regards
Suneel Pandey 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doubt-Regarding-Shards-Index-tp3629964p3629964.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: best practice to introducing singletons inside of Solr (IoC)

2012-01-03 Thread Chris Hostetter

: Ok. Let me try with plain java one. Possibly I'll need more tight
: integration like injecting a core into the singleton, etc. But I don't know
: yet.

yeah ... it really depends on what you mean by "singleton" ...

...single instance in entire JVM?
...single instance in each webapp?
...single instance in each solr core?

If you want the first or the second, standard java patterns will be your 
best bet, but you'll need to be careful with the classpath and make sure 
your class is loaded at the appropriate level (and in the first case: 
can't directly know about any Solr specific classes).

If you want one instance per solr core, that is part of the solr core 
lifecycle (so a new one is created if/when the core is replaced) i 
think the approach of an explicitly named request handler is simplest way 
to go.  

-Hoss


Re: Interpreting solr response time from log

2012-01-03 Thread Chris Hostetter
: If your log level is set at least to INFO, as it should be by default Solr 
does
: log response time to a different file. E.g., I have
: INFO: [] webapp=/solr path=/select/
: params={indent=on&start=0&q=*:*&version=2.2&rows=10} hits=22 status=0
: QTime=40
: where the QTime is 40ms, as also reflected in the HTTP response. You

It's also really important to understand exactly what QTime is measuring.  

I've added a FAQ to try and make this more obvious...

https://wiki.apache.org/solr/FAQ#Why_is_the_QTime_Solr_returns_lower_then_the_amount_of_time_I.27m_measuring_in_my_client.3F

-Hoss


Re: Doubt Regarding Shards Index

2012-01-03 Thread Sethi, Parampreet
Hi Suneel,

I have implemented Solr sharding in one of my projects where data was of
the order of 1 billion documents and my queries were throwing Out of
memory exception because of huge index. Here are my views:

- Have identical Solr server setups for each shard with same schema.

1. We need to create handler for shards in solrconfig.xml ?
- In my case, I did not add any handlers in solrconfig.xml for sharding.

2. Index will be different for each shards instance means we need to break
data in part to create index for each instance or index will be same?
- yes, Index needs to be broken into each shard instance. I used
creation_date field in my case to divide the data by years into each shard
(for example all documents with year 2007 will go to shard 1 and 2008 will
go to shard 2 and so on) and similarly while writing the data look for the
same field and index the corresponding shard.

3. How i will recognize which instance return the result ?
- Once you know how data is divided, you can easily figure out which shard
is serving. 

I have put some of my analysis on this blog post
http://www.params.me/2010/04/working-with-solr.html. Hope it helps!

Best,
Param
http://params.me



On 1/3/12 2:02 PM, "Suneel"  wrote:

>I am using solr My index becomes too large I want to implement shards
>concept
>but i have some doubt. i searched a lot but not found satisfied result.
>
>1. We need to create handler for shards in solrconfig.xml ?
>
>2. Index will be different for each shards instance means we need to break
>data in part to create index for each instance or index will be same?
>
>3. How i will recognize which instance return the result ?
>
>
>Please provide me above details this will be very helpful for me.
>
>Thanks & Regards
>Suneel Pandey 
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Doubt-Regarding-Shards-Index-tp3629964p
>3629964.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Solr support for compound geospatial indexs?

2012-01-03 Thread Maxim Veksler
Hello,

I've started to evaluate Solr and so far haven't seen anything mentions for
support of compound indexes.

I'm looking to either radius or share based geospatial proximity queries
(find all document that are 20km from given lat,lng)
I would also at times be doing geo queries bonded with another term (for
ex. "house rooms" = 5).

My aim is to do very fast queries against the indexed data. I have no real
constraints on the time it would take to build this index.

Does Solr support building and index on the 2 types of fields lat,lng &
"house rooms" ?


Thank you,
Maxim.


Re: Solr support for compound geospatial indexs?

2012-01-03 Thread Mikhail Khludnev
Hello,

Please find my thoughts below.

On Wed, Jan 4, 2012 at 12:39 AM, Maxim Veksler  wrote:
>
> Hello,
>
> I've started to evaluate Solr and so far haven't seen anything mentions for
> support of compound indexes.

If I get you right, it doesn't. AFAIK It combines separate indexes
basing on the condensed internal ids aka docNums

>
> I'm looking to either radius or share based geospatial proximity queries
> (find all document that are 20km from given lat,lng)

http://wiki.apache.org/solr/SpatialSearch#geofilt_-_The_distance_filter
consider https://issues.apache.org/jira/browse/SOLR-2155 if you are
dealing with multivalue coordinates.

> I would also at times be doing geo queries bonded with another term (for
> ex. "house rooms" = 5).

just add separate ...&fq=H_ROOMS:5&...
http://wiki.apache.org/solr/CommonQueryParameters#fq

>
>
> My aim is to do very fast queries against the indexed data. I have no real
> constraints on the time it would take to build this index.
>
> Does Solr support building and index on the 2 types of fields lat,lng &
> "house rooms" ?
Sure. It sounds like intersecting fqs.

Please let me know if it works for you.

>
>
> Thank you,
> Maxim.




--
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


Re: soft commit

2012-01-03 Thread Jason Rutherglen
*Laugh*

I stand by what Mark said:

"Right - in most NRT cases (very frequent soft commits), the cache should
probably be disabled."

On Mon, Jan 2, 2012 at 7:45 PM, Yonik Seeley  wrote:
> On Mon, Jan 2, 2012 at 9:58 PM, Jason Rutherglen
>  wrote:
>>> It still normally makes sense to have the caches enabled (esp filter and 
>>> document caches).
>>
>> In the NRT case that statement is completely incorrect
>
> *shrug*
>
> To each their own.  I stand my my statement.
>
> -Yonik
> http://www.lucidimagination.com


Re: Solr support for compound geospatial indexs?

2012-01-03 Thread Maxim Veksler
Hello Mikhail

Thank you for the fast reply, please find my answers inline.

On Tue, Jan 3, 2012 at 11:00 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> Please find my thoughts below.
>
> On Wed, Jan 4, 2012 at 12:39 AM, Maxim Veksler  wrote:
> >
> > Hello,
> >
> > I've started to evaluate Solr and so far haven't seen anything mentions
> for
> > support of compound indexes.
>
> If I get you right, it doesn't. AFAIK It combines separate indexes
> basing on the condensed internal ids aka docNums
>
> >
> > I'm looking to either radius or share based geospatial proximity queries
> > (find all document that are 20km from given lat,lng)
>
> http://wiki.apache.org/solr/SpatialSearch#geofilt_-_The_distance_filter
> consider https://issues.apache.org/jira/browse/SOLR-2155 if you are
> dealing with multivalue coordinates.
>
>
Thank you for the reference to SOLR-2155.

I've studied geohash[1] and the work David Smiley[2] is doing[3] for Solr
4 thoroughly.
I think that my problem is simpler - I don't need multivalue coordinates
support because my locations are represented by a single lat,lng point and
I will be searching for all the points that fall into my defined radius.
Where there is a 1:1 mapping between a document and the point it is
categorized by.


> I would also at times be doing geo queries bonded with another term (for
> > ex. "house rooms" = 5).
>
> just add separate ...&fq=H_ROOMS:5&...
> http://wiki.apache.org/solr/CommonQueryParameters#fq
>
> >
> >
> > My aim is to do very fast queries against the indexed data. I have no
> real
> > constraints on the time it would take to build this index.
> >
> > Does Solr support building and index on the 2 types of fields lat,lng &
> > "house rooms" ?
> Sure. It sounds like intersecting fqs.
>
>
Wonderful to hear this, guess I'm no really understanding how Solr / Lucene
works then.
Could you please reference me or explain how Solr builds it's index? I'm
especially interested in how the search is implemented under the hood -
Given geo & "plular" terms what would lucene do?  How would it do the
actual searching / or perhaps what I need to be asking is what & how the
"intersecting fqs" are implemented?

I apologize for the messy question, I'm only starting to understand Lucene.

Please let me know if it works for you.
>
> >
> >
> > Thank you,
> > Maxim.
>
>
[1]
http://gis.stackexchange.com/questions/18330/would-it-be-possible-to-use-geohash-for-proximity-searches
[2]
http://www.basistech.com/pdf/events/open-source-search-conference/oss-2011-smiley-geospatial-search.pdf
[3] http://code.google.com/p/lucene-spatial-playground/


Re: soft commit

2012-01-03 Thread Erik Hatcher
As I understand it, the document and filter caches add value *intra* request 
such that it keeps additional work (like fetching stored fields from disk more 
than once) from occurring.

Erik

On Jan 3, 2012, at 16:26 , Jason Rutherglen wrote:

> *Laugh*
> 
> I stand by what Mark said:
> 
> "Right - in most NRT cases (very frequent soft commits), the cache should
> probably be disabled."
> 
> On Mon, Jan 2, 2012 at 7:45 PM, Yonik Seeley  
> wrote:
>> On Mon, Jan 2, 2012 at 9:58 PM, Jason Rutherglen
>>  wrote:
 It still normally makes sense to have the caches enabled (esp filter and 
 document caches).
>>> 
>>> In the NRT case that statement is completely incorrect
>> 
>> *shrug*
>> 
>> To each their own.  I stand my my statement.
>> 
>> -Yonik
>> http://www.lucidimagination.com



Re: soft commit

2012-01-03 Thread Yonik Seeley
On Tue, Jan 3, 2012 at 4:36 PM, Erik Hatcher  wrote:
> As I understand it, the document and filter caches add value *intra* request 
> such that it keeps additional work (like fetching stored fields from disk 
> more than once) from occurring.

Yep.  Highlighting, multi-select faceting, and distributed search are
just some of the scenarios where the caches are utilized in the scope
of a single request.
Please folks, don't disable your caches!

-Yonik
http://www.lucidimagination.com


Re: soft commit

2012-01-03 Thread Jason Rutherglen
> multi-select faceting

Yikes.  I'd love to see a test showing that un-inverted field cache
(which is for ALL segments as a single unit) can be used efficiently
with NRT / soft commit.

On Tue, Jan 3, 2012 at 1:50 PM, Yonik Seeley  wrote:
> On Tue, Jan 3, 2012 at 4:36 PM, Erik Hatcher  wrote:
>> As I understand it, the document and filter caches add value *intra* request 
>> such that it keeps additional work (like fetching stored fields from disk 
>> more than once) from occurring.
>
> Yep.  Highlighting, multi-select faceting, and distributed search are
> just some of the scenarios where the caches are utilized in the scope
> of a single request.
> Please folks, don't disable your caches!
>
> -Yonik
> http://www.lucidimagination.com


Re: soft commit

2012-01-03 Thread Jason Rutherglen
The main point is, Solr unlike for example Elastic Search and other
Lucene based systems does NOT cache filters or facets per-segment.

This is a fundamental design flaw.

On Tue, Jan 3, 2012 at 1:50 PM, Yonik Seeley  wrote:
> On Tue, Jan 3, 2012 at 4:36 PM, Erik Hatcher  wrote:
>> As I understand it, the document and filter caches add value *intra* request 
>> such that it keeps additional work (like fetching stored fields from disk 
>> more than once) from occurring.
>
> Yep.  Highlighting, multi-select faceting, and distributed search are
> just some of the scenarios where the caches are utilized in the scope
> of a single request.
> Please folks, don't disable your caches!
>
> -Yonik
> http://www.lucidimagination.com


Re: soft commit

2012-01-03 Thread Yonik Seeley
On Tue, Jan 3, 2012 at 5:03 PM, Jason Rutherglen
 wrote:
> Yikes.  I'd love to see a test showing that un-inverted field cache
> (which is for ALL segments as a single unit) can be used efficiently
> with NRT / soft commit.

Please stop being a troll.
Solr as multiple faceting methods - only one uses un-inverted field cache.

Oh, and for the record, Solr does have a faceting method in trunk that
caches per-segment.
There are always tradeoffs though - string faceting per-segment will
always be slower than string faceting over the complete index (due to
the cost of merging per-segment counts).

Anyway, disabling any of those caches won't make anything any
faster... the data structures will still be built, they just won't be
reused.
Seems like you realized your original statement was erroneous and have
just reverted to troll state, trying to find something to pick at.

-Yonik
http://www.lucidimagination.com


Re: soft commit

2012-01-03 Thread Jason Rutherglen
Address the points I brought up or don't reply with funny name calling.

Below are two key points reiterated and re-articulated is an easy to answer way:

* Multi-select faceting is per-segment (true or false)

* Filters are cached per-segment (true or false)

On Tue, Jan 3, 2012 at 2:16 PM, Yonik Seeley  wrote:
> On Tue, Jan 3, 2012 at 5:03 PM, Jason Rutherglen
>  wrote:
>> Yikes.  I'd love to see a test showing that un-inverted field cache
>> (which is for ALL segments as a single unit) can be used efficiently
>> with NRT / soft commit.
>
> Please stop being a troll.
> Solr as multiple faceting methods - only one uses un-inverted field cache.
>
> Oh, and for the record, Solr does have a faceting method in trunk that
> caches per-segment.
> There are always tradeoffs though - string faceting per-segment will
> always be slower than string faceting over the complete index (due to
> the cost of merging per-segment counts).
>
> Anyway, disabling any of those caches won't make anything any
> faster... the data structures will still be built, they just won't be
> reused.
> Seems like you realized your original statement was erroneous and have
> just reverted to troll state, trying to find something to pick at.
>
> -Yonik
> http://www.lucidimagination.com


Re: charFilter PatternReplaceCharFilterFactory and highlighting

2012-01-03 Thread Koji Sekiguchi

Jul,

Maybe you missed "Example of content :" and "My charfilter should clean it like 
:"
in your previous mail? We need them in order to consider your problem. :->

koji
--
http://www.rondhuit.com/en/


(12/01/04 2:19), darul wrote:

Hello,

I wanted to use char filter PatternReplaceCharFilterFactory to avoid
specific content to be indexed.

At the end I get many issues with highlights and offsets...so I remove it,
example :



Example of content :



My charfilter should clean it like :



I do not understand why offset of highlights are disturbed by charFilter
while it is defined in first, it may change content before highlight
processing occurs ?



Do you have any solutions, we really need charFilter feature ?

Thanks,

Jul





--
View this message in context: 
http://lucene.472066.n3.nabble.com/charFilter-PatternReplaceCharFilterFactory-and-highlighting-tp3629699p3629699.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Interpreting solr response time from log

2012-01-03 Thread Jithin
Thans Chris for clarifying. This helps a lot.

On Wed, Jan 4, 2012 at 2:07 AM, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n3630181...@n3.nabble.com> wrote:

> : If your log level is set at least to INFO, as it should be by default
> Solr does
> : log response time to a different file. E.g., I have
> : INFO: [] webapp=/solr path=/select/
> : params={indent=on&start=0&q=*:*&version=2.2&rows=10} hits=22 status=0
> : QTime=40
> : where the QTime is 40ms, as also reflected in the HTTP response. You
>
> It's also really important to understand exactly what QTime is measuring.
>
>
> I've added a FAQ to try and make this more obvious...
>
>
> https://wiki.apache.org/solr/FAQ#Why_is_the_QTime_Solr_returns_lower_then_the_amount_of_time_I.27m_measuring_in_my_client.3F
>
> -Hoss
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Interpreting-solr-response-time-from-log-tp3624340p3630181.html
>  To unsubscribe from Interpreting solr response time from log, click 
> here
> .
> NAML
>



-- 
Thanks
Jithin Emmanuel


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Interpreting-solr-response-time-from-log-tp3624340p3630843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Optional filter queries

2012-01-03 Thread Allistair Crossley
Evening all,

A subset of my documents have a field, filterMinutes, that some other documents 
do not. filterMinutes stores a number.

I often issue a query that contains a filter query range, e.g.

q=filterMinutes:[* TO 50]

I am finding that adding this query excludes all documents that do not feature 
this field, but what I want is for the filter query to act upon those documents 
that do have the field but also to return documents that don't have it at all.

Is this a possibility?

Best,

Allistair

Re: Optional filter queries

2012-01-03 Thread Christopher Childs
-filterMinutes:[* TO *] should return documents that do not have a value 
assigned to that field.

On Jan 3, 2012, at 11:30 PM, Allistair Crossley wrote:

> Evening all,
> 
> A subset of my documents have a field, filterMinutes, that some other 
> documents do not. filterMinutes stores a number.
> 
> I often issue a query that contains a filter query range, e.g.
> 
> q=filterMinutes:[* TO 50]
> 
> I am finding that adding this query excludes all documents that do not 
> feature this field, but what I want is for the filter query to act upon those 
> documents that do have the field but also to return documents that don't have 
> it at all.
> 
> Is this a possibility?
> 
> Best,
> 
> Allistair

-- 
Christopher Childs