Re: Update on shards

2013-04-25 Thread Arkadi Colson

Hi

It seems not to work in my case. We are using the solr php module for 
talking to Solr. Currently we have 2 collections 'intradesk' and 'lvs' 
for 10 solr hosts (shards: 5 - repl: 2). Because there is no more disc 
space I created 6 new hosts for collection 'messages' (shards: 3 - repl: 2).


'intradesk + lvs':
solr01-dcg
solr01-gs
solr02-dcg
solr02-gs
solr03-dcg
solr03-gs
solr04-dcg
solr04-gs
solr05-dcg
solr05-gs

'messages':
solr06-dcg
solr06-gs
solr07-dcg
solr07-gs
solr08-dcg
solr08-gs

So when doing a select, I can talk to any host. When updating I must 
talk to a host with at least 1 shard on it.


I created the new messages shard with the following command to get them 
on the new hosts (06 -> 08): 
http://solr01-dcg.intnet.smartbit.be:8983/solr/admin/collections?action=CREATE&name=messages&numShards=3&replicationFactor=2&collection.configName=smsc&createNodeSet=solr06-gs.intnet.smartbit.be:8983_solr,solr06-dcg.intnet.smartbit.be:8983_solr,solr07-gs.intnet.smartbit.be:8983_solr,solr07-dcg.intnet.smartbit.be:8983_solr,solr08-gs.intnet.smartbit.be:8983_solr,solr08-dcg.intnet.smartbit.be:8983_solr 



They are all in the same config set 'smsc'.

Below is the code:

$client = new SolrClient(
array(
'hostname'  => "solr01-dcg.intnet.smartbit.be 
",

'port'  => "8983",
'login' => "***",
'password'  => "***",
'path'  => "solr/messages",
'wt'=> "json"
)
);

$doc = new 
SolrInputDocument();

$doc->addField('id', $uniqueID);
$doc->addField('smsc_ssid', $ssID);
$doc->addField('smsc_module',   $i['module']);
$doc->addField('smsc_modulekey', $i['moduleKey']);
$doc->addField('smsc_courseid', $courseID);
$doc->addField('smsc_description', $i['description']);
$doc->addField('smsc_content',  $i['content']);
$doc->addField('smsc_lastdate', $lastdate);
$doc->addField('smsc_userid',   $userID);

$client->addDocument($doc);

The exception I get look like this:
exception 'SolrClientException' with message 'Unsuccessful update 
request. Response Code 200. (null)'


Nothing special to find in the solr log.

Any idea?


Arkadi

On 04/24/2013 08:43 PM, Mark Miller wrote:

Sorry - need to correct myself - updates worked the same as read requests - 
they also needed to hit a SolrCore in order to get forwarded to the right node. 
I was not thinking clearly when I said this applied to just reads and not 
writes. Both needed a SolrCore to do their work - with the request proxying, 
this is no longer the case, so you can hit Solr instances with no SolrCores or 
with SolrCores that are not part of the collection you are working with, and 
both read and write side requests are now proxied to a suitable node that has a 
SolrCore that can do the search or forward the update (or accept the update).

- Mark

On Apr 23, 2013, at 3:38 PM, Mark Miller  wrote:


We have a 3rd release candidate for 4.3 being voted on now.

I have never tested this feature with Tomcat - only Jetty. Users have reported 
it does not work with Tomcat. That leads one to think it may have a problem in 
other containers as well.

A previous contributor donated a patch that explicitly flushes a stream in our 
proxy code - he says this allows the feature to work with Tomcat. I committed 
this feature - the flush can't hurt, and given the previous contributions of 
this individual, I'm fairly confident the fix makes things work in Tomcat. I 
have no first hand knowledge that it does work though.

You might take the RC for a spin and test it our yourself: 
http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/

- Mark

On Apr 23, 2013, at 3:20 PM, Furkan KAMACI  wrote:


Hi Mark;

All in all you say that when 4.3 is tagged at repository (I mean when it is
ready) this feature will work for Tomcat too at a stable version?


2013/4/23 Mark Miller 


On Apr 23, 2013, at 2:49 PM, Shawn Heisey  wrote:


What exactly is the 'request proxying' thing that doesn't work on

tomcat?  Is this something different from basic SolrCloud operation where
you send any kind of request to any server and they get directed where they
need to go? I haven't heard of that not working on tomcat before.

Before 4.2, if you made a read request to a node that didn't contain part
of the collection you where searching, it would return 404. Write requests
w

Re: Using Solr For a Real Search Engine

2013-04-25 Thread Furkan KAMACI
Hi Otis;

You are right. start.jar starts up an Jetty and there is a war file under
example directory and deploys start.jar to itself, is that true?

2013/4/25 Otis Gospodnetic 

> Suggestion :
> Don't call this embedded Jetty to avoid confusion with the actual embedded
> jetty.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Apr 23, 2013 4:56 PM, "Furkan KAMACI"  wrote:
>
> > Thanks for the answers. I will go with embedded Jetty for my SolrCloud.
> If
> > I face with something important I would want to share my experiences with
> > you.
> >
> > 2013/4/23 Shawn Heisey 
> >
> > > On 4/23/2013 2:25 PM, Furkan KAMACI wrote:
> > >
> > >> Is there any documentation that explains using Jetty as embedded or
> > not? I
> > >> use Solr deployed at Tomcat but after you message I will consider
> about
> > >> Jetty. If we think about other issues i.e. when I want to update my
> Solr
> > >> jars/wars etc.(this is just an foo example) does any pros and cons
> > Tomcat
> > >> or Jetty has?
> > >>
> > >
> > > The Jetty in the example is only 'embedded' in the sense that you don't
> > > have to install it separately.  It is not special -- the Jetty
> components
> > > are not changed at all, a subset of them is just included in the Solr
> > > download with a tuned configuration file.
> > >
> > > If you go to www.eclipse.org/jetty and download the latest stable-8
> > > version, you'll see some familiar things - start.jar, an etc
> directory, a
> > > lib directory, and a contexts directory.  They have more in them than
> the
> > > example does -- extra functionality Solr doesn't need.  If you want to
> > > start the downloaded version, you can use 'java -jar start.jar' just
> like
> > > you do with Solr.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: filter before facet

2013-04-25 Thread Toke Eskildsen
On Wed, 2013-04-24 at 23:10 +0200, Daniel Tyreus wrote:
> But why is it slow to generate facets on a result set of 0? Furthermore,
> why does it take the same amount of time to generate facets on a result set
> of 2000 as 100,000 documents?

The default faceting method for your query is field cache. Field cache
faceting works by generating a structure for all the values for the
field in the whole corpus. It is exactly the same work whether you hit
0, 2K or 100M documents with your query.

After the structure has been build, the actual counting of values in the
facet is fast. There is not much difference between 2K and 100K hits.

> This leads me to believe that the FQ is being applied AFTER the facets are
> calculated on the whole data set. For my use case it would make a ton of
> sense to apply the FQ first and then facet. Is it possible to specify this
> behavior or do I need to get into the code and get my hands dirty?

As you write later, you have tried fc, enum and fcs, with fcs having the
fastest first-request-time time. That is understandable as it is
segment-oriented and (nearly) just a matter of loading the values
sequentially from storage. However, the general observation is that it
is about 10 times as slow as the fc-method for subsequent queries. Since
you are doing NRT that might still leave fcs as the best method for you.

As for creating a new faceting implementation that avoids the startup
penalty by using only the found documents, then it is technically quite
simple: Use stored fields, iterate the hits and request the values.
Unfortunately this scales poorly with the number of hits, so unless you
can guarantee that you will always have small result sets, this is
probably not a viable option.

- Toke Eskildsen, State and University Library, Denmark



Re: JVM Parameters to Startup Solr?

2013-04-25 Thread Toke Eskildsen
On Wed, 2013-04-24 at 18:03 +0200, Mark Miller wrote:
> On Apr 24, 2013, at 12:00 PM, Mark Miller  wrote:
> 
> >> -XX:OnOutOfMemoryError="kill -9 %p" -XX:+HeapDumpOnOutOfMemoryError
> 
> The way I like to handle this is to have the OOM trigger a little script or 
> set of cmds that logs the issue and kills the process.

We treat all Errors as fatal by writing to a dedicated log and shutting
down the JVM (which triggers the load balancer etc.). Unfortunately that
means that some XML + XSLT combinations can bring the JVM down due to
StackOverflowError. This might be a little too diligent as the Oracle
JVM running on Linux (our current setup) is resilient to Threads hitting
stack overflow.

- Toke Eskildsen, State and University Library, Denmark



Re: solr.StopFilterFactory doesn't work with wildcard

2013-04-25 Thread Dmitry Baranov
1) I use StopFilterFactory in "multiterm" analyzer because without it "query"
analizer doesn't work with multi-terms, in particular terms with wildcard.
2) I expect that:
search_string_ss_i:(hp* pavilion* series*
d4*)
search_string_ss_i:(hp* pavilion* series* d4*)
search_string_ss_i:hp* +search_string_ss_i:pavilion*
+search_string_ss_i:d4*
+search_string_ss_i:hp*
+search_string_ss_i:pavilion* +search_string_ss_i:d4*
i.e. I expect that StopFilterFactory will work likewise query without
wildcard



Thanks for you answer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581p4058856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JVM Parameters to Startup Solr?

2013-04-25 Thread Furkan KAMACI
Could you explain that what you mean with such kind of scripts? What it
checks and do exactly?


2013/4/25 Toke Eskildsen 

> On Wed, 2013-04-24 at 18:03 +0200, Mark Miller wrote:
> > On Apr 24, 2013, at 12:00 PM, Mark Miller  wrote:
> >
> > >> -XX:OnOutOfMemoryError="kill -9 %p" -XX:+HeapDumpOnOutOfMemoryError
> >
> > The way I like to handle this is to have the OOM trigger a little script
> or set of cmds that logs the issue and kills the process.
>
> We treat all Errors as fatal by writing to a dedicated log and shutting
> down the JVM (which triggers the load balancer etc.). Unfortunately that
> means that some XML + XSLT combinations can bring the JVM down due to
> StackOverflowError. This might be a little too diligent as the Oracle
> JVM running on Linux (our current setup) is resilient to Threads hitting
> stack overflow.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>


Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Alan Woodward
Hi Walter, Dmitry,

I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with some 
work-in-progress.  Have a look!

Alan Woodward
www.flax.co.uk


On 23 Apr 2013, at 07:40, Dmitry Kan wrote:

> Hello Walter,
> 
> Have you had a chance to get something working with graphite, codahale and
> solr?
> 
> Has anyone else tried these tools with Solr 3.x family? How much work is it
> to set things up?
> 
> We have tried zabbix in the past. Even though it required lots of up front
> investment on configuration, it looks like a compelling option.
> In the meantime, we are looking into something more "solr-tailed" yet
> simple. Even without metrics persistence. Tried: jconsole and viewing stats
> via jmx. Main point for us now is to gather the RAM usage.
> 
> Dmitry
> 
> 
> On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wrote:
> 
>> If it isn't obvious, I'm glad to help test a patch for this. We can run a
>> simulated production load in dev and report to our metrics server.
>> 
>> wunder
>> 
>> On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
>> 
>>> That approach sounds great. --wunder
>>> 
>>> On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
>>> 
 I've been thinking about how to improve this reporting, especially now
>> that metrics-3 (which removes all of the funky thread issues we ran into
>> last time I tried to add it to Solr) is close to release.  I think we could
>> go about it as follows:
 
 * refactor the existing JMX reporting to use metrics-3.  This would
>> mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
>> adding a JmxReporter, keeping the existing config logic to determine which
>> JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
>> the metrics-3 data back into SolrMBean format to keep the reporting
>> backwards-compatible.  This seems like a lot of work for no visible
>> benefit, but…
 * we can then add the ability to define other metrics reporters in
>> solrconfig.xml.  There are already reporters for Ganglia and Graphite - you
>> just add then to the Solr lib/ directory, configure them in solrconfig, and
>> voila - Solr can be monitored using the same devops tools you use to
>> monitor everything else.
 
 Does this sound sane?
 
 Alan Woodward
 www.flax.co.uk
 
 
 On 6 Apr 2013, at 20:49, Walter Underwood wrote:
 
> Wow, that really doesn't help at all, since these seem to only be
>> reported in the stats page.
> 
> I don't need another non-standard app-specific set of metrics,
>> especially one that needs polling. I need metrics delivered to the common
>> system that we use for all our servers.
> 
> This is also why SPM is not useful for us, sorry Otis.
> 
> Also, there is no time period on these stats. How do you graph the
>> 95th percentile? I know there was a lot of work on these, but they seem
>> really useless to me. I'm picky about metrics, working at Netflix does that
>> to you.
> 
> wunder
> 
> On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
> 
>> In the Jira, but not in the docs.
>> 
>> It would be nice to have VM stats like GC, too, so we can have common
>> monitoring and alerting on all our services.
>> 
>> wunder
>> 
>> On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
>> 
>>> It's there! :)
>>> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
>>> 
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> 
>>> On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood <
>> wun...@wunderwood.org> wrote:
 That sounds great. I'll check out the bug, I didn't see anything in
>> the docs about this. And if I can't find it with a search engine, it
>> probably isn't there.  --wunder
 
 On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
 
> On 3/29/2013 12:07 PM, Walter Underwood wrote:
>> What are folks using for this?
> 
> I don't know that this really answers your question, but Solr 4.1
>> and
> later includes a big chunk of codahale metrics internally for
>> request
> handler statistics - see SOLR-1972.  First we tried including the
>> jar
> and using the API, but that created thread leak problems, so the
>> source
> code was added.
> 
> Thanks,
> Shawn
> 
> 
> 
> 
 
>>> 
>>> --
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> 
>>> 
>>> 
>> 
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>> 
>> 
>> 
>> 



Re: Solr 3.6.1: changing a field from stored to not stored

2013-04-25 Thread Majirus FANSI
Good to know I missed something about solr replication.
Thanks Jan


On 24 April 2013 17:42, Jan Høydahl  wrote:

> > I would create a new core as slave of the existing configuration without
> > replicating the core schema and configuration. This way I can get the
>
> This won't work, as master/slave replication copies the index files as-is.
>
> You should re-index all your data. You don't need to take down the cluster
> to do that, just re-index on top of what's there already, and your index
> will become smaller and smaller as merging kicks out the old data :)
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 24. apr. 2013 kl. 15:59 skrev Majirus FANSI :
>
> > I would create a new core as slave of the existing configuration without
> > replicating the core schema and configuration. This way I can get the
> > information from one index to the other while saving the space as fields
> in
> > the new schema are mainly not stored. After the replication I would swap
> > the cores for the online core to point to the right index dir and conf.
> > i.e. the one with less stored fields.
> >
> > Maj
> >
> >
> > On 24 April 2013 01:48, Petersen, Robert
> > wrote:
> >
> >> Hey I just want to verify one thing before I start doing this:  function
> >> queries only require fields to be indexed but don't require them to be
> >> stored right?
> >>
> >> -Original Message-
> >> From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com]
> >> Sent: Tuesday, April 23, 2013 4:39 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: Solr 3.6.1: changing a field from stored to not stored
> >>
> >> Good info, Thanks Hoss!  I was going to add a more specific fl=
> parameter
> >> to my queries at the same time.  Currently I am doing fl=*,score so that
> >> will have to be changed.
> >>
> >>
> >> -Original Message-
> >> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> >> Sent: Tuesday, April 23, 2013 4:18 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr 3.6.1: changing a field from stored to not stored
> >>
> >>
> >> : index?  I noticed I am unnecessarily storing some fields in my index
> and
> >> : I'd like to stop storing them without having to 'reindex the world'
> and
> >> : let the changes just naturally percolate into my index as updates come
> >> : in the normal course of things.  Do you guys think I could get away
> with
> >> : this?
> >>
> >> Yes, you can easily get away with this type of change w/o re-indexing,
> >> however you won't gain any immediate index size savings until each and
> >> every existing doc has been reindexed and the old copies expunged from
> the
> >> index via segment merges.
> >>
> >> the one hicup thta can affect people when doing this is what happens if
> >> you use something like "fl=*" (and likely "hl=*" as well) ... many
> places
> >> in Solr will try to "avoid failure" if a stored field is found in the
> index
> >> which isn't defined in the schema, and treat that stored value as a
> string
> >> (legacy behavior designed to make it easier for people to point Solr at
> old
> >> lucene indexes built w/o using Solr) ... so if these stored values are
> not
> >> strings, you might get some weird data in your response for these
> documents.
> >>
> >>
> >> -Hoss
> >>
> >>
> >>
> >>
> >>
>
>


Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Dmitry Kan
Hi Alan,

Great! What is the solr version you are patching?

Speaking of graphite, we have set it up recently to monitor our shard farm.
So far since the RAM usage has been most important metric we were fine with
pidstat command and a little script generating stats for carbon.
Having some additional stats from SOLR itself would certainly be great to
have.

Dmitry

On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward  wrote:

> Hi Walter, Dmitry,
>
> I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with
> some work-in-progress.  Have a look!
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 23 Apr 2013, at 07:40, Dmitry Kan wrote:
>
> > Hello Walter,
> >
> > Have you had a chance to get something working with graphite, codahale
> and
> > solr?
> >
> > Has anyone else tried these tools with Solr 3.x family? How much work is
> it
> > to set things up?
> >
> > We have tried zabbix in the past. Even though it required lots of up
> front
> > investment on configuration, it looks like a compelling option.
> > In the meantime, we are looking into something more "solr-tailed" yet
> > simple. Even without metrics persistence. Tried: jconsole and viewing
> stats
> > via jmx. Main point for us now is to gather the RAM usage.
> >
> > Dmitry
> >
> >
> > On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood  >wrote:
> >
> >> If it isn't obvious, I'm glad to help test a patch for this. We can run
> a
> >> simulated production load in dev and report to our metrics server.
> >>
> >> wunder
> >>
> >> On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
> >>
> >>> That approach sounds great. --wunder
> >>>
> >>> On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
> >>>
>  I've been thinking about how to improve this reporting, especially now
> >> that metrics-3 (which removes all of the funky thread issues we ran into
> >> last time I tried to add it to Solr) is close to release.  I think we
> could
> >> go about it as follows:
> 
>  * refactor the existing JMX reporting to use metrics-3.  This would
> >> mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
> >> adding a JmxReporter, keeping the existing config logic to determine
> which
> >> JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
> >> the metrics-3 data back into SolrMBean format to keep the reporting
> >> backwards-compatible.  This seems like a lot of work for no visible
> >> benefit, but…
>  * we can then add the ability to define other metrics reporters in
> >> solrconfig.xml.  There are already reporters for Ganglia and Graphite -
> you
> >> just add then to the Solr lib/ directory, configure them in solrconfig,
> and
> >> voila - Solr can be monitored using the same devops tools you use to
> >> monitor everything else.
> 
>  Does this sound sane?
> 
>  Alan Woodward
>  www.flax.co.uk
> 
> 
>  On 6 Apr 2013, at 20:49, Walter Underwood wrote:
> 
> > Wow, that really doesn't help at all, since these seem to only be
> >> reported in the stats page.
> >
> > I don't need another non-standard app-specific set of metrics,
> >> especially one that needs polling. I need metrics delivered to the
> common
> >> system that we use for all our servers.
> >
> > This is also why SPM is not useful for us, sorry Otis.
> >
> > Also, there is no time period on these stats. How do you graph the
> >> 95th percentile? I know there was a lot of work on these, but they seem
> >> really useless to me. I'm picky about metrics, working at Netflix does
> that
> >> to you.
> >
> > wunder
> >
> > On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
> >
> >> In the Jira, but not in the docs.
> >>
> >> It would be nice to have VM stats like GC, too, so we can have
> common
> >> monitoring and alerting on all our services.
> >>
> >> wunder
> >>
> >> On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
> >>
> >>> It's there! :)
> >>>
> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
> >>>
> >>> Otis
> >>> --
> >>> Solr & ElasticSearch Support
> >>> http://sematext.com/
> >>>
> >>> On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood <
> >> wun...@wunderwood.org> wrote:
>  That sounds great. I'll check out the bug, I didn't see anything
> in
> >> the docs about this. And if I can't find it with a search engine, it
> >> probably isn't there.  --wunder
> 
>  On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
> 
> > On 3/29/2013 12:07 PM, Walter Underwood wrote:
> >> What are folks using for this?
> >
> > I don't know that this really answers your question, but Solr 4.1
> >> and
> > later includes a big chunk of codahale metrics internally for
> >> request
> > handler statistics - see SOLR-1972.  First we tried including the
> >> jar
> > and using the API, but that created thread leak problems,

Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Hi,
 is it possible to get exact matched result if the search term is combined
e.g. "cats" AND London NOT Leeds


In the previus threads I have read that it is possible to create new field
of String type and perform phrase search on it but nowhere the above
mentioned combined search term had been taken into consideration.

BR
Pawel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Alan Woodward
This is on top of trunk at the moment, but would be back ported to 4.4 if there 
was interest.

Alan Woodward
www.flax.co.uk


On 25 Apr 2013, at 10:32, Dmitry Kan wrote:

> Hi Alan,
> 
> Great! What is the solr version you are patching?
> 
> Speaking of graphite, we have set it up recently to monitor our shard farm.
> So far since the RAM usage has been most important metric we were fine with
> pidstat command and a little script generating stats for carbon.
> Having some additional stats from SOLR itself would certainly be great to
> have.
> 
> Dmitry
> 
> On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward  wrote:
> 
>> Hi Walter, Dmitry,
>> 
>> I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with
>> some work-in-progress.  Have a look!
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 23 Apr 2013, at 07:40, Dmitry Kan wrote:
>> 
>>> Hello Walter,
>>> 
>>> Have you had a chance to get something working with graphite, codahale
>> and
>>> solr?
>>> 
>>> Has anyone else tried these tools with Solr 3.x family? How much work is
>> it
>>> to set things up?
>>> 
>>> We have tried zabbix in the past. Even though it required lots of up
>> front
>>> investment on configuration, it looks like a compelling option.
>>> In the meantime, we are looking into something more "solr-tailed" yet
>>> simple. Even without metrics persistence. Tried: jconsole and viewing
>> stats
>>> via jmx. Main point for us now is to gather the RAM usage.
>>> 
>>> Dmitry
>>> 
>>> 
>>> On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood >> wrote:
>>> 
 If it isn't obvious, I'm glad to help test a patch for this. We can run
>> a
 simulated production load in dev and report to our metrics server.
 
 wunder
 
 On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
 
> That approach sounds great. --wunder
> 
> On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
> 
>> I've been thinking about how to improve this reporting, especially now
 that metrics-3 (which removes all of the funky thread issues we ran into
 last time I tried to add it to Solr) is close to release.  I think we
>> could
 go about it as follows:
>> 
>> * refactor the existing JMX reporting to use metrics-3.  This would
 mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
 adding a JmxReporter, keeping the existing config logic to determine
>> which
 JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
 the metrics-3 data back into SolrMBean format to keep the reporting
 backwards-compatible.  This seems like a lot of work for no visible
 benefit, but…
>> * we can then add the ability to define other metrics reporters in
 solrconfig.xml.  There are already reporters for Ganglia and Graphite -
>> you
 just add then to the Solr lib/ directory, configure them in solrconfig,
>> and
 voila - Solr can be monitored using the same devops tools you use to
 monitor everything else.
>> 
>> Does this sound sane?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>> On 6 Apr 2013, at 20:49, Walter Underwood wrote:
>> 
>>> Wow, that really doesn't help at all, since these seem to only be
 reported in the stats page.
>>> 
>>> I don't need another non-standard app-specific set of metrics,
 especially one that needs polling. I need metrics delivered to the
>> common
 system that we use for all our servers.
>>> 
>>> This is also why SPM is not useful for us, sorry Otis.
>>> 
>>> Also, there is no time period on these stats. How do you graph the
 95th percentile? I know there was a lot of work on these, but they seem
 really useless to me. I'm picky about metrics, working at Netflix does
>> that
 to you.
>>> 
>>> wunder
>>> 
>>> On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
>>> 
 In the Jira, but not in the docs.
 
 It would be nice to have VM stats like GC, too, so we can have
>> common
 monitoring and alerting on all our services.
 
 wunder
 
 On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
 
> It's there! :)
> 
>> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood <
 wun...@wunderwood.org> wrote:
>> That sounds great. I'll check out the bug, I didn't see anything
>> in
 the docs about this. And if I can't find it with a search engine, it
 probably isn't there.  --wunder
>> 
>> On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
>> 
>>> On 3/29/2013 12:07 PM, Walter Underwood wrote:
 What are folks using for this?
>>> 
>>> I don't know that this really answers your question, 

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry
Hi Pawel,

Not sure which parser you are using, I am using edismax and tried using the
bq parameter to boost the results having exact matches at the top.
You may try something like:
q="cats" AND London NOT Leeds&bq="cats"^50

In edismax, pf and pf2 parameters also need some tuning to get the results
at the top.

HTH,
Sandeep


On 25 April 2013 10:33, vsl  wrote:

> Hi,
>  is it possible to get exact matched result if the search term is combined
> e.g. "cats" AND London NOT Leeds
>
>
> In the previus threads I have read that it is possible to create new field
> of String type and perform phrase search on it but nowhere the above
> mentioned combined search term had been taken into consideration.
>
> BR
> Pawel
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1

2013-04-25 Thread Shahar Davidson
Hi all,

I'm trying to run 'ant idea' on 4.2.* and I'm getting "invalid sha1" error 
messages. (see below)

I'll appreciate any help,

Shahar
===
.
.
.
resolve
ivy:retrieve

:: problems summary ::
 WARNINGS
problem while downloading module descriptor: 
http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
sha1: expected=

Re: Solr faceted search UI

2013-04-25 Thread Majirus FANSI
Hi Rocha,
In your webapp I guess you have at list a view and a service layers.
The indexing and search modules should preferably be hosted at the service
layer.
I recommend you read the Api doc (
http://lucene.apache.org/solr/4_2_1/solr-solrj/index.html) to get a sense
of what you can do with SolrJ.
Followinf is a Basic Example Facets with SolrJ:
//adding the query keyword to the SolrQuery object
 mySolrQuery.setQuery(queryBuilder.toString());
// add a facet field
mySolrQuery.addFacetField("myFieldName")
//add a facet query
validatedFromTheLast7DaysFacetQuery = validationDateField + ":[NOW/DAY-7DAY
TO NOW]"
mySolrQuery.addFacetQuery(validatedFromTheLast7DaysFacetQuery)

//send the request in HTTP POST as with HTTP GET you run into issues when
the request string is too long.
QueryResponse queryResponse =  getSolrHttpServer().query(mysolrQuery,
METHOD.POST);

//write a transformer to convert the Solr response to a format
understandable by the caller (the client of the search service)
//List of results to transform
SolrDocumentList responseSolrDocumentList = queryResponse.getResults();
//get the facet fields, interate over the list, parse each FacetField and
extract the information you are intersted in
queryResponse.getFacetFields()
//get the facet query from the response
 Map mapOfFacetQueries = queryResponse.getFacetQuery();
The keys of this map are your facet queries. The values are the counts you
display to the user. In general, I have an identifier for each facetQuery.
When I parse the keys of this map of facet queries, I return the identifier
of each facet along with its count (if the count > 0 of course). The caller
is aware of this identifier so it knows what to display to the user.

When the user clicks on a facet, you send it as a search criteria along
with the initial keywords to the search service. The criteria resulting
from the facet is treated as a filter query. That is how faceting search
works. Adding a filter to your query is as simple as this snippet
mySolrQuery.addFilterQuery(myfilterQuery). should you are filtering because
your user click on the previously defined facet query, then the filter
query is the same as the facet query. that is myfilterQuery =
validationDateField + ":[NOW/DAY-7DAY TO NOW]".

I hope this helps.

Cheers,

Maj


On 24 April 2013 17:27, richa  wrote:

> Hi Maj,
>
> Thanks for your suggestion.
> Tell me one thing, do you have any example on solrj? suppose I decide to
> use solrj in simple web application, to display faceted search on web page.
> Where will this fit into? what will be the flow?
>
> Please suggest.
>
> Thanks
>
>
> On Wed, Apr 24, 2013 at 11:01 AM, Majirus FANSI [via Lucene] <
> ml-node+s472066n4058610...@n3.nabble.com> wrote:
>
> > Hi richa,
> > You can use solrJ (
> > http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr)
> > to query your solr index.
> > On the wiki page indicated, you will see example of faceted search using
> > solrJ.
> > 2009 article by Yonik available on
> > searchhub
> > is
> > a good tutorial on faceted search.
> > Whether you go for MVC framework or not is up to you. It is recommend
> > tough
> > to develop search engine application in a Service Oriented Architecture.
> > Regards,
> >
> > Maj
> >
> >
> > On 24 April 2013 16:43, richa <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4058610&i=0>>
> > wrote:
> >
> > > Hi,
> > > I am working on a POC, where I have to display faceted search result on
> > web
> > > page. can anybody please help me to suggest what all set up I need to
> > > configure to display. I would prefer java technologies. Just to
> mention,
> > I
> > > have solr cloud running on remote server.
> > > I would like to know:
> > > 1. Should I use MVC framework?
> > > 2. How will my local interact with remote solr server?
> > > 3. How will I send query through java code and what technology I should
> > use
> > > to display faceted search result?
> > >
> > > Please help me on this.
> > >
> > > Thanks,
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058610.html
> >  To unsubscribe from Solr faceted search UI, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4058598&code=c3RyaWtldGhlZ29hbEBnbWFpbC5jb218NDA1ODU5OHwxNzIzOTAyMzYx
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%2

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Thanks for your reply. I am using edismax as well. What I want to get is the
exact match without other results that could be close to the given term.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Jack Krupansky
As indicated previously, yes, exact matching is possible in Solr. You, the 
developer, have full control over the exactness or inexactness of all 
queries. If any query is inexact in some way, it is solely due to decisions 
that you, the developer, have made.


Generally speaking, inexactness, fuzziness if you will, is the precise 
quality that most developers - and users - are looking for in search. I 
mean, generally, having to be precise and "exact" in search requests... is 
tedious and a real drag, and something to be avoided - in general.


But, that's what string fields, the white space tokenizer, the regular 
expression tokenizer, and full developer control of the token filter 
sequence are for - to let you, the developer, to have full control, 
including all aspects of "exactness" of search.


As to your specific question - there is nothing about the "AND", "OR", or 
"NOT" (or "+" or "-") operators that is in any way anything other than 
"exact", in terms of document matching. "OR" can be considered a form of 
"inexactness" in that presence of a term is optional, but "AND" means 
absolutely MUST, and "NOT" means absolutely MUST_NOT. About as exact as 
anything could get.


Scoring and relevancy are another story, but have nothing to do with 
matching or "exactness". Exactness and matching only affect whether a 
document is counted in "numFound" and included in results or not, not the 
ordering of results.


But why are you asking? Is there some problem you are trying to solve? Is 
there some query that is not giving you the results you expect? If this is 
simply a general information question, fine, answered. But if you are trying 
to solve some problem, you will need to clearly state your problem rather 
than asking some general, abstract question.


-- Jack Krupansky

-Original Message- 
From: vsl

Sent: Thursday, April 25, 2013 5:33 AM
To: solr-user@lucene.apache.org
Subject: Exact matching in Solr 3.6.1

Hi,
is it possible to get exact matched result if the search term is combined
e.g. "cats" AND London NOT Leeds


In the previus threads I have read that it is possible to create new field
of String type and perform phrase search on it but nowhere the above
mentioned combined search term had been taken into consideration.

BR
Pawel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry
I think in that case, making a field String type is your option, however
remember that it'd be case sensitive.
Another approach is to create a case insensitive field type and doing
searches on those fields only.


   





Can you provide your fields and dismax config and if possible records you
would like and records you do not want?

-S


On 25 April 2013 11:50, vsl  wrote:

> Thanks for your reply. I am using edismax as well. What I want to get is
> the
> exact match without other results that could be close to the given term.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
I will explain my case in the example below:

We have three documents with given content:

First document:
london cats glenvilet

Second document
london cat glenvilet leeds

Third document
london cat glenvilet 

Search term: "cats" AND London NOT Leeds 

Expected result: First document
Current result: First document, Third document

Additionaly, next requirement says that when I type as search term: "cats"
AND Londo NOT Leeds 
then I should get spell check collation: "cats" AND London NOT Leeds 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Jack Krupansky
It sounds as if your field type is doing stemming - mapping "cats" to "cat". 
That is a valuable feature of search, but if you wish to turn it off... go 
ahead and do so by editing the field type. But just be aware that turning 
off stemming is a great loss of search flexibility.


Who knows, maybe you might want to have both stemmed and unstemmed fields in 
an edismax query and give a higher boost to the unstemmed field - but it's 
not up to us to guess your requirements. We're dependent on you clearly 
expressing your requirements.


As indicated before, you, the developer have complete control here. But... 
it is up to you, the developer to choose wisely, to suit your application 
requirements. But if you don't describe your requirements with greater 
precision and detail, we won't be able to be of much help to you.


Your second (only two) requirement relates to spellcheck, which is 
completely unrelated to query matching and exactness. Yes, Solr has a 
spellcheck capability, and yes, it does collation. Is that all you are 
asking? If there is a specific issue, please be specific about it.


-- Jack Krupansky

-Original Message- 
From: vsl

Sent: Thursday, April 25, 2013 8:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching in Solr 3.6.1

I will explain my case in the example below:

We have three documents with given content:

First document:
london cats glenvilet

Second document
london cat glenvilet leeds

Third document
london cat glenvilet

Search term: "cats" AND London NOT Leeds

Expected result: First document
Current result: First document, Third document

Additionaly, next requirement says that when I type as search term: "cats"
AND Londo NOT Leeds
then I should get spell check collation: "cats" AND London NOT Leeds




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Dmitry Kan
We are very much interested in 3.4.


On Thu, Apr 25, 2013 at 12:55 PM, Alan Woodward  wrote:

> This is on top of trunk at the moment, but would be back ported to 4.4 if
> there was interest.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 25 Apr 2013, at 10:32, Dmitry Kan wrote:
>
> > Hi Alan,
> >
> > Great! What is the solr version you are patching?
> >
> > Speaking of graphite, we have set it up recently to monitor our shard
> farm.
> > So far since the RAM usage has been most important metric we were fine
> with
> > pidstat command and a little script generating stats for carbon.
> > Having some additional stats from SOLR itself would certainly be great to
> > have.
> >
> > Dmitry
> >
> > On Thu, Apr 25, 2013 at 12:01 PM, Alan Woodward  wrote:
> >
> >> Hi Walter, Dmitry,
> >>
> >> I opened https://issues.apache.org/jira/browse/SOLR-4735 for this, with
> >> some work-in-progress.  Have a look!
> >>
> >> Alan Woodward
> >> www.flax.co.uk
> >>
> >>
> >> On 23 Apr 2013, at 07:40, Dmitry Kan wrote:
> >>
> >>> Hello Walter,
> >>>
> >>> Have you had a chance to get something working with graphite, codahale
> >> and
> >>> solr?
> >>>
> >>> Has anyone else tried these tools with Solr 3.x family? How much work
> is
> >> it
> >>> to set things up?
> >>>
> >>> We have tried zabbix in the past. Even though it required lots of up
> >> front
> >>> investment on configuration, it looks like a compelling option.
> >>> In the meantime, we are looking into something more "solr-tailed" yet
> >>> simple. Even without metrics persistence. Tried: jconsole and viewing
> >> stats
> >>> via jmx. Main point for us now is to gather the RAM usage.
> >>>
> >>> Dmitry
> >>>
> >>>
> >>> On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood <
> wun...@wunderwood.org
> >>> wrote:
> >>>
>  If it isn't obvious, I'm glad to help test a patch for this. We can
> run
> >> a
>  simulated production load in dev and report to our metrics server.
> 
>  wunder
> 
>  On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
> 
> > That approach sounds great. --wunder
> >
> > On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
> >
> >> I've been thinking about how to improve this reporting, especially
> now
>  that metrics-3 (which removes all of the funky thread issues we ran
> into
>  last time I tried to add it to Solr) is close to release.  I think we
> >> could
>  go about it as follows:
> >>
> >> * refactor the existing JMX reporting to use metrics-3.  This would
>  mean replacing the SolrCore.infoRegistry map with a MetricsRegistry,
> and
>  adding a JmxReporter, keeping the existing config logic to determine
> >> which
>  JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler
> translate
>  the metrics-3 data back into SolrMBean format to keep the reporting
>  backwards-compatible.  This seems like a lot of work for no visible
>  benefit, but…
> >> * we can then add the ability to define other metrics reporters in
>  solrconfig.xml.  There are already reporters for Ganglia and Graphite
> -
> >> you
>  just add then to the Solr lib/ directory, configure them in
> solrconfig,
> >> and
>  voila - Solr can be monitored using the same devops tools you use to
>  monitor everything else.
> >>
> >> Does this sound sane?
> >>
> >> Alan Woodward
> >> www.flax.co.uk
> >>
> >>
> >> On 6 Apr 2013, at 20:49, Walter Underwood wrote:
> >>
> >>> Wow, that really doesn't help at all, since these seem to only be
>  reported in the stats page.
> >>>
> >>> I don't need another non-standard app-specific set of metrics,
>  especially one that needs polling. I need metrics delivered to the
> >> common
>  system that we use for all our servers.
> >>>
> >>> This is also why SPM is not useful for us, sorry Otis.
> >>>
> >>> Also, there is no time period on these stats. How do you graph the
>  95th percentile? I know there was a lot of work on these, but they
> seem
>  really useless to me. I'm picky about metrics, working at Netflix does
> >> that
>  to you.
> >>>
> >>> wunder
> >>>
> >>> On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
> >>>
>  In the Jira, but not in the docs.
> 
>  It would be nice to have VM stats like GC, too, so we can have
> >> common
>  monitoring and alerting on all our services.
> 
>  wunder
> 
>  On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
> 
> > It's there! :)
> >
> >> http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> > On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood <
>  wun...@wunderwood.org> wrote:
> >> That sounds great. I'll check out the bug, I didn't see anything

Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-25 Thread Dmitry Kan
Are there any distrib facet gurus on the list? I would be ready to try
sensible ideas, including on the source code level, if someone of you could
give me a hand.

Dmitry


On Wed, Apr 24, 2013 at 3:08 PM, Dmitry Kan  wrote:

> Hello list,
>
> We deal with an anomaly when doing a distributed facet query against 102
> shards.
>
> The problem manifests itself in both the frontend solr (router) and a
> shard. Each time the request is executed, always different shard is
> affected (at random, hence the "anomaly").
>
> The query is: http://router_host:router_port
> /solr/select?q=test&facet=true&facet.field=field_of_type_long&facet.limit=1330&facet.mincount=1&rows=1&facet.sort=index&facet.zeros=false&facet.offset=0
> I have omitted the shards parameter.
>
> The router log:
>
> request: http://10.155.244.181:9150/solr/select
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> Notice the port of a shard, that is affected. That port changes all the
> time, even for the same request
> The log entry is prepended with lines:
>
> SEVERE: org.apache.solr.common.SolrException: Internal Server Error
>
> Internal Server Error
>
> (they are not in the pastebin link)
>
> The shard log:
>
> Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at 
> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
> at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
> at org.apache.solr.search.QParser.getQuery(QParser.java:142)
> at 
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> at java.lang.Thread.run(Thread.java:722)
>
> Apr 24, 2013 11:08:49 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select params={} status=500 QTime=2
> Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at 
> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
> at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
> at org.apache.solr.search.QParser.getQuery(QParser.java:142)
> at 
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
> at 
> org.apache.solr.handler.com

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Exact matching is just one of my cases.  Currently I perform search on field
with given definition:


  






  
  







  


This field definition fullfils all other requirments.
Examples:
- special characters
- passengers<-> passenger
 
The case with exact matching is the last one I have to complete.

The problem with cats <-> cat is caused by SnowballPorterFilterFactory. This
is what I know.

The question is whether it is possible to handle exact matching (edismax)
with only one result like described in the previous post without influencing
existing functionalities?

BR 
Pawel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr maven install - authorization problem when downloading maven.restlet.org dependencies

2013-04-25 Thread Shahar Davidson
Hi,

I'm trying to build Solr 4.2.x with Maven and I'm getting the following error 
in solr-core:

[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1.341s
[INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013
[INFO] Final Memory: 12M/174M
[INFO] 
[ERROR] Failed to execute goal on project solr-core: Could not resolve 
dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed 
to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT 
(compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), 
org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), 
commons-codec:commons-codec:jar:1.7 (compile), commons-cli:commons-cli:jar:1.2 
(compile), commons-fileupload:commons-fileupload:jar:1.2.1 (compile), 
org.restlet.jee:org.restlet:jar:2.1.1 (compile), 
org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), 
org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 
(compile), commons-io:commons-io:jar:2.1 (compile), 
commons-lang:commons-lang:jar:2.6 (compile), com.google.guava:guava:jar:13.0.1 
(compile), org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), 
org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), 
org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), 
org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), 
javax.servlet:servlet-api:jar:2.4 (provided), 
org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), 
org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), 
org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed 
to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could 
not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to 
maven-restlet (http://maven.restlet.org): Not authorized, 
ReasonPhrase:Unauthorized. -> [Help 1]


Has anyone encountered this issue?

Thanks,

Shahar.


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Majirus FANSI
Hi Pawel,
If you are searching on any field of type "text_general" as defined in your
schema, you are stuck with the porter stemmer. In fact in your setting solr
is not aware of a term like "cats", but "cat". Thus no way to do exact
match  with "cats" in this case.
What you can do is creating a new type of field and with the copyField
facility save a verbatim version of your data in that field while the field
of type "text-general" still performs stemming. Finally, do add the new
field to the list of searcheable field with a higher boost so that exact
match receives highest score.
Hope this helps.
regards,

Maj


On 25 April 2013 14:43, vsl  wrote:

> Exact matching is just one of my cases.  Currently I perform search on
> field
> with given definition:
>
>  positionIncrementGap="100">
>   
>  generateWordParts="1" generateNumberParts="1"
>   catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1" preserveOriginal="1" types="characters.txt"/>
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
>
> 
>  language="English"/>
>   
>   
>  generateWordParts="1" generateNumberParts="1"
>   catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1" preserveOriginal="1" types="characters.txt"/>
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  ignoreCase="true" expand="true"/>
> 
>  language="English"/>
>
>   
> 
>
> This field definition fullfils all other requirments.
> Examples:
> - special characters
> - passengers<-> passenger
>
> The case with exact matching is the last one I have to complete.
>
> The problem with cats <-> cat is caused by SnowballPorterFilterFactory.
> This
> is what I know.
>
> The question is whether it is possible to handle exact matching (edismax)
> with only one result like described in the previous post without
> influencing
> existing functionalities?
>
> BR
> Pawel
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


FieldCache insanity with field used as facet and group

2013-04-25 Thread Elodie Sannier

Hello,

I am using the Lucene FieldCache with SolrCloud and I have "insane" instances 
with messages like:

VALUEMISMATCH: Multiple distinct value objects for 
SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',class 
org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,null=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'=>'merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713

All insane instances are for a field "merchantid" of type "int" used as facet 
and group field.

I'm using a custom SearchHandler which makes two sub-queries, a first query 
with group.field=merchantid and a second query with facet.field=merchantid.

When I'm using the parameter facet.method=enum, I don't have the insane 
instance but I'm not sure it is the good fix.

This insanity can have performance impact ?
How can I fix it ?

Elodie Sannier


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Solr maven install - authorization problem when downloading maven.restlet.org dependencies

2013-04-25 Thread Dmitry Kan
Building the solr 4.2.1 worked fine for me. Here is the relevant portion of
ivy-settings.xml that I had to change:


  
  
  

  
  


Dmitry


On Thu, Apr 25, 2013 at 3:53 PM, Shahar Davidson wrote:

> Hi,
>
> I'm trying to build Solr 4.2.x with Maven and I'm getting the following
> error in solr-core:
>
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 1.341s
> [INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013
> [INFO] Final Memory: 12M/174M
> [INFO]
> 
> [ERROR] Failed to execute goal on project solr-core: Could not resolve
> dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT:
> Failed to collect dependencies for
> [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile),
> org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile),
> commons-codec:commons-codec:jar:1.7 (compile),
> commons-cli:commons-cli:jar:1.2 (compile),
> commons-fileupload:commons-fileupload:jar:1.2.1 (compile),
> org.restlet.jee:org.restlet:jar:2.1.1 (compile),
> org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile),
> org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile),
> org.slf4j:slf4j-jdk14:jar:1.6.4 (compile), commons-io:commons-io:jar:2.1
> (compile), commons-lang:commons-lang:jar:2.6 (compile),
> com.google.guava:guava:jar:13.0.1 (compile),
> org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?),
> org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?),
> org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?),
> org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime),
> javax.servlet:servlet-api:jar:2.4 (provided),
> org.apache.httpcomponents:httpclient:jar:4.2.3 (compile),
> org.apache.httpcomponents:httpmime:jar:4.2.3 (compile),
> org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]:
> Failed to read artifact descriptor for
> org.restlet.jee:org.restlet:jar:2.1.1: Could not transfer artifact
> org.restlet.jee:org.restlet:pom:2.1.1 from/to maven-restlet (
> http://maven.restlet.org): Not authorized, ReasonPhrase:Unauthorized. ->
> [Help 1]
>
>
> Has anyone encountered this issue?
>
> Thanks,
>
> Shahar.
>


Re: Preparing Solr 4.2.1 for IntelliJ fails - invalid sha1

2013-04-25 Thread Steve Rowe
Hi Shahar,

I suspect you may have an older version of Ivy installed - the errors you're 
seeing look like IVY-1194 , 
which was fixed in Ivy 2.2.0.  Lucene/Solr uses Ivy 2.3.0.  Take a look in 
C:\Users\account\.ant\lib\ and remove older versions of ivy-*.jar, then run 
'ant ivy-bootstrap' from the Solr source code to download ivy-2.3.0.jar to 
C:\Users\account\.ant\lib\.

Just now on a Windows 7 box, I downloaded solr-4.2.1-src.tgz from one of the 
Apache mirrors, unpacked it, deleted my C:\Users\account\.ivy2\ directory (so 
that ivy would re-download everything), and ran 'ant idea' from a cmd window.  
BUILD SUCCESSFUL.

Steve

On Apr 25, 2013, at 6:07 AM, Shahar Davidson  wrote:

> Hi all,
> 
> I'm trying to run 'ant idea' on 4.2.* and I'm getting "invalid sha1" error 
> messages. (see below)
> 
> I'll appreciate any help,
> 
> Shahar
> ===
> .
> .
> .
> resolve
> ivy:retrieve
> 
> :: problems summary ::
>  WARNINGS
>   problem while downloading module descriptor: 
> http://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
> sha1: expected= computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (53ms)
>   problem while downloading module descriptor: 
> http://maven.restlet.org/org/apache/ant/ant/1.8.2/ant-1.8.2.pom: invalid 
> sha1: expected= computed=3e839ffb83951c79858075ddd4587bf67612b3c4 (60ms)
>   problem while downloading module descriptor: 
> http://maven.restlet.org/junit/junit/4.10/junit-4.10.pom: invalid sha1: 
> expected=

Re: Did something change with Payloads?

2013-04-25 Thread hariistou
Hi Jim,

I faced almost the same issue with payloads recently, and thought I would
rather write about it.
Please see the link below (my blog). I hope it helps.

http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/
  

Additionally, like what Mark Miller has said, using Solr 4.x, you have to
add documents one by one during indexing, to reflect payload scores
correctly. Like say..

solr.addBean(doc);
solr.commit();

When you try to add documents as a collection through addBeans() there is
only one .PAY file created and all documents are scored as per the payload
score of the first document to be indexed.

There is surely some problem with Lucene 4.1 codec APIs. So for now the
above solution would work.
Probably, I need to write a sequel to my first article regarding the above
point on indexing. :)

Thanks,
Hari.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4058919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr maven install - authorization problem when downloading maven.restlet.org dependencies

2013-04-25 Thread Steve Rowe
Hi Shahar,

On a Windows 7 box, after downloading solr-4.2.1-src.tgz from one of the Apache 
mirrors and unpacking it, I did the following from a cmd window:

PROMPT> cd solr-4.2.1
PROMPT> ant get-maven-poms
PROMPT> cd maven-build
PROMPT> mvn install

Is the above what you did?

After a while, I see: 

-
[INFO] 
[INFO] Building Apache Solr Core
[INFO]task-segment: [install]
[INFO] 
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.pom
614b downloaded  (org.restlet-2.1.1.pom)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet.parent/2.1.1/org.restlet.parent-2.1.1.pom
7K downloaded  (org.restlet.parent-2.1.1.pom)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.pom
907b downloaded  (org.restlet.ext.servlet-2.1.1.pom)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.jar
[…]
709K downloaded  (org.restlet-2.1.1.jar)
Downloading: 
http://maven.restlet.org/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.jar
19K downloaded  (org.restlet.ext.servlet-2.1.1.jar)
-

It's possible that the Restlet maven repository was temporarily malfunctioning. 
 Have you tried again?

Steve

On Apr 25, 2013, at 8:53 AM, Shahar Davidson  wrote:

> Hi,
> 
> I'm trying to build Solr 4.2.x with Maven and I'm getting the following error 
> in solr-core:
> 
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 1.341s
> [INFO] Finished at: Thu Apr 25 15:33:09 IDT 2013
> [INFO] Final Memory: 12M/174M
> [INFO] 
> 
> [ERROR] Failed to execute goal on project solr-core: Could not resolve 
> dependencies for project org.apache.solr:solr-core:jar:4.2.1-SNAPSHOT: Failed 
> to collect dependencies for [org.apache.solr:solr-solrj:jar:4.2.1-SNAPSHOT 
> (compile), org.apache.lucene:lucene-core:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-codecs:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-analyzers-common:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-analyzers-kuromoji:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-analyzers-morfologik:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-analyzers-phonetic:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-highlighter:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-memory:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-misc:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-queryparser:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-spatial:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-suggest:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-grouping:jar:4.2.1-SNAPSHOT (compile), 
> org.apache.lucene:lucene-queries:jar:4.2.1-SNAPSHOT (compile), 
> commons-codec:commons-codec:jar:1.7 (compile), 
> commons-cli:commons-cli:jar:1.2 (compile), 
> commons-fileupload:commons-fileupload:jar:1.2.1 (compile), 
> org.restlet.jee:org.restlet:jar:2.1.1 (compile), 
> org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1 (compile), 
> org.slf4j:jcl-over-slf4j:jar:1.6.4 (compile), org.slf4j:slf4j-jdk14:jar:1.6.4 
> (compile), commons-io:commons-io:jar:2.1 (compile), 
> commons-lang:commons-lang:jar:2.6 (compile), 
> com.google.guava:guava:jar:13.0.1 (compile), 
> org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106 (compile?), 
> org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106 (compile?), 
> org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106 (compile?), 
> org.codehaus.woodstox:wstx-asl:jar:3.2.7 (runtime), 
> javax.servlet:servlet-api:jar:2.4 (provided), 
> org.apache.httpcomponents:httpclient:jar:4.2.3 (compile), 
> org.apache.httpcomponents:httpmime:jar:4.2.3 (compile), 
> org.slf4j:slf4j-api:jar:1.6.4 (compile), junit:junit:jar:4.10 (test)]: Failed 
> to read artifact descriptor for org.restlet.jee:org.restlet:jar:2.1.1: Could 
> not transfer artifact org.restlet.jee:org.restlet:pom:2.1.1 from/to 
> maven-restlet (http://maven.restlet.org): Not authorized, 
> ReasonPhrase:Unauthorized. -> [Help 1]
> 
> 
> Has anyone encountered this issue?
> 
> Thanks,
> 
> Shahar.



Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-25 Thread Yonik Seeley
On Thu, Apr 25, 2013 at 8:32 AM, Dmitry Kan  wrote:
> Are there any distrib facet gurus on the list? I would be ready to try
> sensible ideas, including on the source code level, if someone of you could
> give me a hand.

The Lucene/Solr Revolution conference is coming up next week, so I
think many are busy creating their presentations.
What version of Solr are you using?  Have you tried using a newer
version?  Is it reproducable with a smaller cluster?  If so, you could
try using the included Jetty server instead of Tomcat to rule out that
factor.

-Yonik
http://lucidworks.com


Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-25 Thread Dmitry Kan
Thanks, Yonik. Yes, I supposed that. We are in the pre-release phase, so we
have the pressure.

Solr 3.4.

Would setting up 4.2.1 router work with 3.4 shards?
On 25 Apr 2013 17:11, "Yonik Seeley"  wrote:

> On Thu, Apr 25, 2013 at 8:32 AM, Dmitry Kan  wrote:
> > Are there any distrib facet gurus on the list? I would be ready to try
> > sensible ideas, including on the source code level, if someone of you
> could
> > give me a hand.
>
> The Lucene/Solr Revolution conference is coming up next week, so I
> think many are busy creating their presentations.
> What version of Solr are you using?  Have you tried using a newer
> version?  Is it reproducable with a smaller cluster?  If so, you could
> try using the included Jetty server instead of Tomcat to rule out that
> factor.
>
> -Yonik
> http://lucidworks.com
>


What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Gustav
Hello guys, i saw this thread on stackoverflow, but still not satisfied with
the answers. 

I am trying to index data across multiple tables using Solr's Data Import
Handler. The official wiki on the DIH suggests using embedded entities to
link multiple tables like so:







Are these two methods functionally different? Is there a performance
difference?

Another though would be that, if using join tables in MySQL, using the SQL
query method with multiple joins could cause multiple documents to be
indexed instead of one.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question on storage and index/data management in solr

2013-04-25 Thread Vinay Rai
Hi,
I am relatively new to solr and evaluating it for my project.

I would have lots of data coming in at a fast rate (say 10 MB per sec) and I 
would need the recent data (last 24 hours, or last 100GB) to be searchable 
faster than the old data. I did a bit of reading on the controls provided by 
solr and came across the concept of mergeFactor (defaults to 10) - this means 
solr merges every 10 segments into one.

However, I need something like this -

1. Keep each of last 24 hours segments separate.
2. Segments generated between last 48 to 24 hours to be merged into one. 
Similarly, for segments created between 72 to 48 hours and so on for last 1 
week.
3. Similarly, merge previous 4 week's data into one segment each week.
4. Merge all previous months data into one segment each month.

I am not sure if there is a configuration possible in solr application. If not, 
are there APIs which will allow me to do this?

Also, I want to understand how solr stores data or does it have a dependency on 
the way data is stored. Since the volumes are high, it would be great if the 
data is compressed and stored (while still searchable). If it is possible, I 
would like to know what kind of compression does solr do?

Thank you for the responses.
 
Regards,
Vinay

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread vsl
Thanks for your reply but this solution does not fullfil my requirment
because other documents (not exact matched) will be returned as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Alexandre Rafalovitch
I think JOIN is more performant as - by default - DIH will run an
inner query for each outer one. You can use cached source, but JOIN
will be still more efficient.

The nested entities are more useful when the sources are heterogeneous
(e.g. DB and XML) or when you need to do custom transformers in
between.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Apr 25, 2013 at 10:17 AM, Gustav  wrote:
> Hello guys, i saw this thread on stackoverflow, but still not satisfied with
> the answers.
>
> I am trying to index data across multiple tables using Solr's Data Import
> Handler. The official wiki on the DIH suggests using embedded entities to
> link multiple tables like so:
>
> 
> 
> 
> 
> 
>
> Are these two methods functionally different? Is there a performance
> difference?
>
> Another though would be that, if using join tables in MySQL, using the SQL
> query method with multiple joins could cause multiple documents to be
> indexed instead of one.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Dyer, James
Gustav,

DIH should give you the same results in both scenarios.  The performance 
trade-offs depend on your data.  In your case, it looks like there is a 1-to-1 
or many-to-1 relationship between "item" and "member", so use the SQL Join.  
You'll get all of your data in one query and you'll be using your rbdms for 
what it does best.

But in the case there was a 1-to-many relationship between "item" and "member", 
and especially if each "item" has several "member" rows, you might get better 
performance using the child entity setup.  Although by default DIH is going to 
do an "n+1" select on member.  For every row in item, it will issue a separate 
query to the db.  Also, DIH does not use prepared statements, so this might be 
a bad choice.  

To work around this, specify "cacheImpl='SortedMapBackedCache'" on the child 
entity (this is the same as using CachedSqlEntityProcessor instead of 
SqlEntityProcessor).  Do not include a "where" clause in this child entity.  
Instead, specify "cacheKey='memberId'" and "cacheLookup='item.memberId'".  DIH 
will now pull down your entire "member" table in 1 query and cache it in 
memory, then it can do fast hash joins against "item".

But if your "member" table is too big to fit into memory, then you need to use 
a disk-backed cache instead of SortedMapBackedCache.  For that, see 
https://issues.apache.org/jira/browse/SOLR-2948 and 
https://issues.apache.org/jira/browse/SOLR-2613 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gustav [mailto:xbihy...@sharklasers.com] 
Sent: Thursday, April 25, 2013 9:17 AM
To: solr-user@lucene.apache.org
Subject: What is the difference between a Join Query and Embedded Entities in 
Solr DIH?

Hello guys, i saw this thread on stackoverflow, but still not satisfied with
the answers. 

I am trying to index data across multiple tables using Solr's Data Import
Handler. The official wiki on the DIH suggests using embedded entities to
link multiple tables like so:







Are these two methods functionally different? Is there a performance
difference?

Another though would be that, if using join tables in MySQL, using the SQL
query method with multiple joins could cause multiple documents to be
indexed instead of one.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-difference-between-a-Join-Query-and-Embedded-Entities-in-Solr-DIH-tp4058923.html
Sent from the Solr - User mailing list archive at Nabble.com.




SolrJ Custom RowMapper

2013-04-25 Thread Luis Lebolo
Hi All,

Does SolrJ have an option for a custom RowMapper or BeanPropertyRowMapper
(I'm using Spring/JDBC terms).

I know the QueryResponse has a getBeans method, but I would like to create
my own mapping and plug it in.

Any pointers?

Thanks,
Luis


Using another way instead of DIH

2013-04-25 Thread xiaoqi
hi,all

i using DIH to build index is slow , when it fetch 2 million rows , it will
spend 20 minutes , very slow. 

i am not very familar with  solr , try to   using lucene direct building
index file from db then move to solr folder.

i am not sure ,that is right way. or any other good way? 

thanks a lot .


 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr For a Real Search Engine

2013-04-25 Thread Otis Gospodnetic
Hi,

No, start.jar is not deployed.  That *is* Jetty.
This is what the real Embedded Jetty is about:
http://wiki.eclipse.org/Jetty/Tutorial/Embedding_Jetty

What we have here is Solr is just an *included* Jetty, so it's easier
to get started.  That's all. :)

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 25, 2013 at 3:30 AM, Furkan KAMACI  wrote:
> Hi Otis;
>
> You are right. start.jar starts up an Jetty and there is a war file under
> example directory and deploys start.jar to itself, is that true?
>
> 2013/4/25 Otis Gospodnetic 
>
>> Suggestion :
>> Don't call this embedded Jetty to avoid confusion with the actual embedded
>> jetty.
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Apr 23, 2013 4:56 PM, "Furkan KAMACI"  wrote:
>>
>> > Thanks for the answers. I will go with embedded Jetty for my SolrCloud.
>> If
>> > I face with something important I would want to share my experiences with
>> > you.
>> >
>> > 2013/4/23 Shawn Heisey 
>> >
>> > > On 4/23/2013 2:25 PM, Furkan KAMACI wrote:
>> > >
>> > >> Is there any documentation that explains using Jetty as embedded or
>> > not? I
>> > >> use Solr deployed at Tomcat but after you message I will consider
>> about
>> > >> Jetty. If we think about other issues i.e. when I want to update my
>> Solr
>> > >> jars/wars etc.(this is just an foo example) does any pros and cons
>> > Tomcat
>> > >> or Jetty has?
>> > >>
>> > >
>> > > The Jetty in the example is only 'embedded' in the sense that you don't
>> > > have to install it separately.  It is not special -- the Jetty
>> components
>> > > are not changed at all, a subset of them is just included in the Solr
>> > > download with a tuned configuration file.
>> > >
>> > > If you go to www.eclipse.org/jetty and download the latest stable-8
>> > > version, you'll see some familiar things - start.jar, an etc
>> directory, a
>> > > lib directory, and a contexts directory.  They have more in them than
>> the
>> > > example does -- extra functionality Solr doesn't need.  If you want to
>> > > start the downloaded version, you can use 'java -jar start.jar' just
>> like
>> > > you do with Solr.
>> > >
>> > > Thanks,
>> > > Shawn
>> > >
>> > >
>> >
>>


Re: what is the maximum XML file size to import?

2013-04-25 Thread Otis Gospodnetic
Hi,

Even if you could import giant files, I'd avoid it because it feels
like just asking for trouble.  Chunk the file into smaller pieces.
You can index such smaller pieces in parallel, too, and end up with
faster indexing as the result.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 25, 2013 at 12:10 AM, Sharmila Thapa  wrote:
> Yes,
> I have again tried to post the XML of size 2.02GB, now it throws a different
> error message
> 
>
> While searching the cause for this error message it is found that
> java's setFixedLengthStreamingMode method throws this error.
> Reference:Documentation:http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html.
> So we have to limit size of the file to 16 bits i.e. 2GB.
> I have also tried by setting unlimited java heap size, but does not work.
>
> So is there anything that can be done to support upto 6GB xml file size. If
> possible I would like to try to use java -Durl to import the XML data. If
> this does not work, then I will try for other alternatives as you have
> suggested DIH.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263p4058825.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr For a Real Search Engine

2013-04-25 Thread Furkan KAMACI
Thanks, Otis I got it.

2013/4/25 Otis Gospodnetic 

> Hi,
>
> No, start.jar is not deployed.  That *is* Jetty.
> This is what the real Embedded Jetty is about:
> http://wiki.eclipse.org/Jetty/Tutorial/Embedding_Jetty
>
> What we have here is Solr is just an *included* Jetty, so it's easier
> to get started.  That's all. :)
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Apr 25, 2013 at 3:30 AM, Furkan KAMACI 
> wrote:
> > Hi Otis;
> >
> > You are right. start.jar starts up an Jetty and there is a war file under
> > example directory and deploys start.jar to itself, is that true?
> >
> > 2013/4/25 Otis Gospodnetic 
> >
> >> Suggestion :
> >> Don't call this embedded Jetty to avoid confusion with the actual
> embedded
> >> jetty.
> >>
> >> Otis
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >> On Apr 23, 2013 4:56 PM, "Furkan KAMACI" 
> wrote:
> >>
> >> > Thanks for the answers. I will go with embedded Jetty for my
> SolrCloud.
> >> If
> >> > I face with something important I would want to share my experiences
> with
> >> > you.
> >> >
> >> > 2013/4/23 Shawn Heisey 
> >> >
> >> > > On 4/23/2013 2:25 PM, Furkan KAMACI wrote:
> >> > >
> >> > >> Is there any documentation that explains using Jetty as embedded or
> >> > not? I
> >> > >> use Solr deployed at Tomcat but after you message I will consider
> >> about
> >> > >> Jetty. If we think about other issues i.e. when I want to update my
> >> Solr
> >> > >> jars/wars etc.(this is just an foo example) does any pros and cons
> >> > Tomcat
> >> > >> or Jetty has?
> >> > >>
> >> > >
> >> > > The Jetty in the example is only 'embedded' in the sense that you
> don't
> >> > > have to install it separately.  It is not special -- the Jetty
> >> components
> >> > > are not changed at all, a subset of them is just included in the
> Solr
> >> > > download with a tuned configuration file.
> >> > >
> >> > > If you go to www.eclipse.org/jetty and download the latest stable-8
> >> > > version, you'll see some familiar things - start.jar, an etc
> >> directory, a
> >> > > lib directory, and a contexts directory.  They have more in them
> than
> >> the
> >> > > example does -- extra functionality Solr doesn't need.  If you want
> to
> >> > > start the downloaded version, you can use 'java -jar start.jar' just
> >> like
> >> > > you do with Solr.
> >> > >
> >> > > Thanks,
> >> > > Shawn
> >> > >
> >> > >
> >> >
> >>
>


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Jack Krupansky

Well then just do an exact match ONLY!

It sounds like you haven't worked out the inconsistencies in your 
requirements.


To be clear: We're not offering you "solutions" - that's your job. We're 
only pointing out tools that you can use. It is up to you to utilize the 
tools wisely to implement your solution.


I suspect that you simply haven't experimented enough with various boosts to 
assure that the unstemmed result is consistently higher.


Maybe you need a custom stemmer or stemmer overide so that "passengers" does 
get stemmed to "passenger", but "cats" does not (but "dogs" does.) That can 
be a choice that you can make, but I would urge caution. Still, it is a 
decision that you can make - it's not a matter of Solr forcing or preventing 
you. I still think boosting of an unstemmed field should be sufficient.


But until you clarify the inconsistencies in your requirements, we won't be 
able to make much progress.


-- Jack Krupansky

-Original Message- 
From: vsl

Sent: Thursday, April 25, 2013 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching in Solr 3.6.1

Thanks for your reply but this solution does not fullfil my requirment
because other documents (not exact matched) will be returned as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
Sent from the Solr - User mailing list archive at Nabble.com. 



How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
I have a Zookeepeer ensemble with three machines. I have started a cluster
with one shard. However I decided to change my shard number. I want to
clean Zookeeper data but whatever I do I always get one shard and rest of
added Solr nodes are as replica.

What should I do?


Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry
Agree with Jack.

The current field type text_general is designed to match the query tokens
instead of exact matches - so it's not able to fulfill your requirements.

Can you use flat file
as spell check
dictionary instead and that way you can search on exact
matched field while generating spell check suggestions from the file
instead of from index?

-S


On 25 April 2013 16:25, Jack Krupansky  wrote:

> Well then just do an exact match ONLY!
>
> It sounds like you haven't worked out the inconsistencies in your
> requirements.
>
> To be clear: We're not offering you "solutions" - that's your job. We're
> only pointing out tools that you can use. It is up to you to utilize the
> tools wisely to implement your solution.
>
> I suspect that you simply haven't experimented enough with various boosts
> to assure that the unstemmed result is consistently higher.
>
> Maybe you need a custom stemmer or stemmer overide so that "passengers"
> does get stemmed to "passenger", but "cats" does not (but "dogs" does.)
> That can be a choice that you can make, but I would urge caution. Still, it
> is a decision that you can make - it's not a matter of Solr forcing or
> preventing you. I still think boosting of an unstemmed field should be
> sufficient.
>
> But until you clarify the inconsistencies in your requirements, we won't
> be able to make much progress.
>
>
> -- Jack Krupansky
>
> -Original Message- From: vsl
> Sent: Thursday, April 25, 2013 10:45 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Exact matching in Solr 3.6.1
>
> Thanks for your reply but this solution does not fullfil my requirment
> because other documents (not exact matched) will be returned as well.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Exact-matching-in-**Solr-3-6-1-tp4058865p4058929.**html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Michael Della Bitta
This is what I have done.

1. Turn off all your Solr nodes.

2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On
my machine, it's in /usr/lib/zookeeper/bin.

3. If you've chrooted Solr, just rmr /solr_chroot_dir.  Otherwise, use
rmr to delete these files and folders:

clusterstate.json
aliases.json
live_nodes
overseer
overseer_elect
collections

If you use a chroot jail, make it again with "create /solr_chroot_dir []"

4. Use Solr's zkCli to upload your configs again.

5. Start all your Solr nodes.

6. Create your collections again.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI  wrote:
> I have a Zookeepeer ensemble with three machines. I have started a cluster
> with one shard. However I decided to change my shard number. I want to
> clean Zookeeper data but whatever I do I always get one shard and rest of
> added Solr nodes are as replica.
>
> What should I do?


RE: Using another way instead of DIH

2013-04-25 Thread Dyer, James
If you post your data-config.xml here, someone might be able to find something 
you could change to speed things up.  If the issue is parallelization, then you 
could possibly partition your data somehow and then run multiple DIH request 
handlers at the same time.  This might be easier than writing your own update 
program.

If you still think you need to write something custom, see this:  
http://wiki.apache.org/solr/Solrj#Adding_Data_to_Solr

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: xiaoqi [mailto:belivexia...@gmail.com] 
Sent: Thursday, April 25, 2013 10:01 AM
To: solr-user@lucene.apache.org
Subject: Using another way instead of DIH

hi,all

i using DIH to build index is slow , when it fetch 2 million rows , it will
spend 20 minutes , very slow. 

i am not very familar with  solr , try to   using lucene direct building
index file from db then move to solr folder.

i am not sure ,that is right way. or any other good way? 

thanks a lot .


 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Deletes and inserts

2013-04-25 Thread Jon Strayer
Thanks Michael,
  How do you handle configurations in zookeeper?  I tried reusing the same
configuration but I'm getting an error message that may mean that doesn't
work.  Or maybe I'm doing something wrong.


On Wed, Apr 24, 2013 at 12:50 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> We're using aliases to control visibility of collections we rebuild
> from scratch nightly. It works pretty well. If you run CREATEALIAS
> again, it'll switch to a new one, not augment the old one.
>
> If for some reason, you want to bridge more than one collection, you
> can add more than one collection to the alias at creation time, but
> then it becomes read-only.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer  wrote:
> > We are using a Solr collection to serve auto complete suggestions.  We'd
> > like for the update to be without any noticeable delay for the users.
> >
> > I've been looking at adding new cores, loading them with the new data and
> > then swapping them with the current ones, but but I don't see how that
> > would work in a cloud installation.  It seems that when I create a new
> core
> > it is part of the collection and the old data will start replicating to
> it.
> >  Is that correct?
> >
> > I've also looked at standing up a new collection and then adding an alias
> > for it, but that's not well documented.  If the alias already exists and
> I
> > add to to another collection is it removed from the first collection?
> >
> > I'm open to any suggestions.
> >
> > --
> > To *know* is one thing, and to know for certain *that* we know is
> another.
> > --William James
>



-- 
To *know* is one thing, and to know for certain *that* we know is another.
--William James


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Otis Gospodnetic
Nice.  Sounds like FAQ/Wiki material, Mike! :)

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 25, 2013 at 11:33 AM, Michael Della Bitta
 wrote:
> This is what I have done.
>
> 1. Turn off all your Solr nodes.
>
> 2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On
> my machine, it's in /usr/lib/zookeeper/bin.
>
> 3. If you've chrooted Solr, just rmr /solr_chroot_dir.  Otherwise, use
> rmr to delete these files and folders:
>
> clusterstate.json
> aliases.json
> live_nodes
> overseer
> overseer_elect
> collections
>
> If you use a chroot jail, make it again with "create /solr_chroot_dir []"
>
> 4. Use Solr's zkCli to upload your configs again.
>
> 5. Start all your Solr nodes.
>
> 6. Create your collections again.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI  
> wrote:
>> I have a Zookeepeer ensemble with three machines. I have started a cluster
>> with one shard. However I decided to change my shard number. I want to
>> clean Zookeeper data but whatever I do I always get one shard and rest of
>> added Solr nodes are as replica.
>>
>> What should I do?


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
You said: Otherwise, use rmr to delete these files and folders.

Can you give an example?


2013/4/25 Otis Gospodnetic 

> Nice.  Sounds like FAQ/Wiki material, Mike! :)
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Apr 25, 2013 at 11:33 AM, Michael Della Bitta
>  wrote:
> > This is what I have done.
> >
> > 1. Turn off all your Solr nodes.
> >
> > 2. Ssh to one of your zookeeper machines and run Zookeeper's CLI. On
> > my machine, it's in /usr/lib/zookeeper/bin.
> >
> > 3. If you've chrooted Solr, just rmr /solr_chroot_dir.  Otherwise, use
> > rmr to delete these files and folders:
> >
> > clusterstate.json
> > aliases.json
> > live_nodes
> > overseer
> > overseer_elect
> > collections
> >
> > If you use a chroot jail, make it again with "create /solr_chroot_dir []"
> >
> > 4. Use Solr's zkCli to upload your configs again.
> >
> > 5. Start all your Solr nodes.
> >
> > 6. Create your collections again.
> >
> > Michael Della Bitta
> >
> > 
> > Appinions
> > 18 East 41st Street, 2nd Floor
> > New York, NY 10017-6271
> >
> > www.appinions.com
> >
> > Where Influence Isn’t a Game
> >
> >
> > On Thu, Apr 25, 2013 at 11:27 AM, Furkan KAMACI 
> wrote:
> >> I have a Zookeepeer ensemble with three machines. I have started a
> cluster
> >> with one shard. However I decided to change my shard number. I want to
> >> clean Zookeeper data but whatever I do I always get one shard and rest
> of
> >> added Solr nodes are as replica.
> >>
> >> What should I do?
>


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Mark Miller
What are you doing to clean zk?

You should be able to simply use the ZkCli clear cmd:

http://wiki.apache.org/solr/SolrCloud#Command_Line_Util

Just make sure you stop your Solr instances before clearing it. Clearing out zk 
from under a running Solr instance is not a good thing to do.

This should be as simple as, stop your Solr instances, use the clean command on 
/ or /solr (whatever the root is in zk for you Solr stuff), start your Solr 
instances, create the collection again.

- Mark

On Apr 25, 2013, at 11:27 AM, Furkan KAMACI  wrote:

> I have a Zookeepeer ensemble with three machines. I have started a cluster
> with one shard. However I decided to change my shard number. I want to
> clean Zookeeper data but whatever I do I always get one shard and rest of
> added Solr nodes are as replica.
> 
> What should I do?



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Mark Miller
Of course deleting the collection and then recreating it should also work - if 
it doesn't, there is a bug to address.

- Mark

On Apr 25, 2013, at 12:00 PM, Mark Miller  wrote:

> What are you doing to clean zk?
> 
> You should be able to simply use the ZkCli clear cmd:
> 
> http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
> 
> Just make sure you stop your Solr instances before clearing it. Clearing out 
> zk from under a running Solr instance is not a good thing to do.
> 
> This should be as simple as, stop your Solr instances, use the clean command 
> on / or /solr (whatever the root is in zk for you Solr stuff), start your 
> Solr instances, create the collection again.
> 
> - Mark
> 
> On Apr 25, 2013, at 11:27 AM, Furkan KAMACI  wrote:
> 
>> I have a Zookeepeer ensemble with three machines. I have started a cluster
>> with one shard. However I decided to change my shard number. I want to
>> clean Zookeeper data but whatever I do I always get one shard and rest of
>> added Solr nodes are as replica.
>> 
>> What should I do?
> 



Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
Hi;
If you can help it would be nice:

I have erased the data. I use that commands:

Firstly I do that:

java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
-Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2
-Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf
-Dcollection.configName=myconf -jar start.jar

and do that:

java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
-Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar

However when I look at the graph at Admin GUI there is only one shard but
two replicas? What is the problem why it is not two shards?


2013/4/25 Mark Miller 

> Of course deleting the collection and then recreating it should also work
> - if it doesn't, there is a bug to address.
>
> - Mark
>
> On Apr 25, 2013, at 12:00 PM, Mark Miller  wrote:
>
> > What are you doing to clean zk?
> >
> > You should be able to simply use the ZkCli clear cmd:
> >
> > http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
> >
> > Just make sure you stop your Solr instances before clearing it. Clearing
> out zk from under a running Solr instance is not a good thing to do.
> >
> > This should be as simple as, stop your Solr instances, use the clean
> command on / or /solr (whatever the root is in zk for you Solr stuff),
> start your Solr instances, create the collection again.
> >
> > - Mark
> >
> > On Apr 25, 2013, at 11:27 AM, Furkan KAMACI 
> wrote:
> >
> >> I have a Zookeepeer ensemble with three machines. I have started a
> cluster
> >> with one shard. However I decided to change my shard number. I want to
> >> clean Zookeeper data but whatever I do I always get one shard and rest
> of
> >> added Solr nodes are as replica.
> >>
> >> What should I do?
> >
>
>


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Michael Della Bitta
Today I learned there's a clear command in the command line util. :)

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 25, 2013 at 12:00 PM, Mark Miller  wrote:
> What are you doing to clean zk?
>
> You should be able to simply use the ZkCli clear cmd:
>
> http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
>
> Just make sure you stop your Solr instances before clearing it. Clearing out 
> zk from under a running Solr instance is not a good thing to do.
>
> This should be as simple as, stop your Solr instances, use the clean command 
> on / or /solr (whatever the root is in zk for you Solr stuff), start your 
> Solr instances, create the collection again.
>
> - Mark
>
> On Apr 25, 2013, at 11:27 AM, Furkan KAMACI  wrote:
>
>> I have a Zookeepeer ensemble with three machines. I have started a cluster
>> with one shard. However I decided to change my shard number. I want to
>> clean Zookeeper data but whatever I do I always get one shard and rest of
>> added Solr nodes are as replica.
>>
>> What should I do?
>


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
Ooppss, I wrote numshards, I think it should be numShards

2013/4/25 Michael Della Bitta 

> Today I learned there's a clear command in the command line util. :)
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Thu, Apr 25, 2013 at 12:00 PM, Mark Miller 
> wrote:
> > What are you doing to clean zk?
> >
> > You should be able to simply use the ZkCli clear cmd:
> >
> > http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
> >
> > Just make sure you stop your Solr instances before clearing it. Clearing
> out zk from under a running Solr instance is not a good thing to do.
> >
> > This should be as simple as, stop your Solr instances, use the clean
> command on / or /solr (whatever the root is in zk for you Solr stuff),
> start your Solr instances, create the collection again.
> >
> > - Mark
> >
> > On Apr 25, 2013, at 11:27 AM, Furkan KAMACI 
> wrote:
> >
> >> I have a Zookeepeer ensemble with three machines. I have started a
> cluster
> >> with one shard. However I decided to change my shard number. I want to
> >> clean Zookeeper data but whatever I do I always get one shard and rest
> of
> >> added Solr nodes are as replica.
> >>
> >> What should I do?
> >
>


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Mark Miller
I think it's numShards, not numshards.

- Mark

On Apr 25, 2013, at 12:07 PM, Furkan KAMACI  wrote:

> Hi;
> If you can help it would be nice:
> 
> I have erased the data. I use that commands:
> 
> Firstly I do that:
> 
> java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
> -Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2
> -Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf
> -Dcollection.configName=myconf -jar start.jar
> 
> and do that:
> 
> java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
> -Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar
> 
> However when I look at the graph at Admin GUI there is only one shard but
> two replicas? What is the problem why it is not two shards?
> 
> 
> 2013/4/25 Mark Miller 
> 
>> Of course deleting the collection and then recreating it should also work
>> - if it doesn't, there is a bug to address.
>> 
>> - Mark
>> 
>> On Apr 25, 2013, at 12:00 PM, Mark Miller  wrote:
>> 
>>> What are you doing to clean zk?
>>> 
>>> You should be able to simply use the ZkCli clear cmd:
>>> 
>>> http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
>>> 
>>> Just make sure you stop your Solr instances before clearing it. Clearing
>> out zk from under a running Solr instance is not a good thing to do.
>>> 
>>> This should be as simple as, stop your Solr instances, use the clean
>> command on / or /solr (whatever the root is in zk for you Solr stuff),
>> start your Solr instances, create the collection again.
>>> 
>>> - Mark
>>> 
>>> On Apr 25, 2013, at 11:27 AM, Furkan KAMACI 
>> wrote:
>>> 
 I have a Zookeepeer ensemble with three machines. I have started a
>> cluster
 with one shard. However I decided to change my shard number. I want to
 clean Zookeeper data but whatever I do I always get one shard and rest
>> of
 added Solr nodes are as replica.
 
 What should I do?
>>> 
>> 
>> 



Re: How do set compression for compression on stored fields in SOLR 4.2.1

2013-04-25 Thread Otis Gospodnetic
Hi,

Is the question how/where to set that?
This is what I found in my repo checkout:

$ ffxg COMPRE
./core/src/test-files/solr/collection1/conf/solrconfig-slave.xml:
  COMPRESSION

Hm, but that's about replication compression.  Maybe we don't have any
examples of this in configs?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Apr 24, 2013 at 3:06 PM, William Bell  wrote:
> https://issues.apache.org/jira/browse/LUCENE-4226
> It mentions that we can set compression mode:
> FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION.
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076


Re: How to Clean Zookeeper Data for Solr

2013-04-25 Thread Furkan KAMACI
Ok, it works

2013/4/25 Mark Miller 

> I think it's numShards, not numshards.
>
> - Mark
>
> On Apr 25, 2013, at 12:07 PM, Furkan KAMACI 
> wrote:
>
> > Hi;
> > If you can help it would be nice:
> >
> > I have erased the data. I use that commands:
> >
> > Firstly I do that:
> >
> > java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
> > -Dsolr.data.dir=/home/solr-4.2.1/solr/data -Dnumshards=2
> > -Dbootstrap_confdir=/home/solr-4.2.1/solr/collection1/conf
> > -Dcollection.configName=myconf -jar start.jar
> >
> > and do that:
> >
> > java -Xms512M -Xmx5120M -Dsolr.solr.home=/home/solr-4.2.1/solr
> > -Dsolr.data.dir=/home/solr-4.2.1/solr/data -jar start.jar
> >
> > However when I look at the graph at Admin GUI there is only one shard but
> > two replicas? What is the problem why it is not two shards?
> >
> >
> > 2013/4/25 Mark Miller 
> >
> >> Of course deleting the collection and then recreating it should also
> work
> >> - if it doesn't, there is a bug to address.
> >>
> >> - Mark
> >>
> >> On Apr 25, 2013, at 12:00 PM, Mark Miller 
> wrote:
> >>
> >>> What are you doing to clean zk?
> >>>
> >>> You should be able to simply use the ZkCli clear cmd:
> >>>
> >>> http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
> >>>
> >>> Just make sure you stop your Solr instances before clearing it.
> Clearing
> >> out zk from under a running Solr instance is not a good thing to do.
> >>>
> >>> This should be as simple as, stop your Solr instances, use the clean
> >> command on / or /solr (whatever the root is in zk for you Solr stuff),
> >> start your Solr instances, create the collection again.
> >>>
> >>> - Mark
> >>>
> >>> On Apr 25, 2013, at 11:27 AM, Furkan KAMACI 
> >> wrote:
> >>>
>  I have a Zookeepeer ensemble with three machines. I have started a
> >> cluster
>  with one shard. However I decided to change my shard number. I want to
>  clean Zookeeper data but whatever I do I always get one shard and rest
> >> of
>  added Solr nodes are as replica.
> 
>  What should I do?
> >>>
> >>
> >>
>
>


Re: Deletes and inserts

2013-04-25 Thread Michael Della Bitta
We've successfully reused the same config in Zookeeper across multiple
collections and using aliases.

Could you describe your problem? What does the error say?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 25, 2013 at 11:44 AM, Jon Strayer  wrote:
> Thanks Michael,
>   How do you handle configurations in zookeeper?  I tried reusing the same
> configuration but I'm getting an error message that may mean that doesn't
> work.  Or maybe I'm doing something wrong.
>
>
> On Wed, Apr 24, 2013 at 12:50 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
>> We're using aliases to control visibility of collections we rebuild
>> from scratch nightly. It works pretty well. If you run CREATEALIAS
>> again, it'll switch to a new one, not augment the old one.
>>
>> If for some reason, you want to bridge more than one collection, you
>> can add more than one collection to the alias at creation time, but
>> then it becomes read-only.
>>
>> Michael Della Bitta
>>
>> 
>> Appinions
>> 18 East 41st Street, 2nd Floor
>> New York, NY 10017-6271
>>
>> www.appinions.com
>>
>> Where Influence Isn’t a Game
>>
>>
>> On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer  wrote:
>> > We are using a Solr collection to serve auto complete suggestions.  We'd
>> > like for the update to be without any noticeable delay for the users.
>> >
>> > I've been looking at adding new cores, loading them with the new data and
>> > then swapping them with the current ones, but but I don't see how that
>> > would work in a cloud installation.  It seems that when I create a new
>> core
>> > it is part of the collection and the old data will start replicating to
>> it.
>> >  Is that correct?
>> >
>> > I've also looked at standing up a new collection and then adding an alias
>> > for it, but that's not well documented.  If the alias already exists and
>> I
>> > add to to another collection is it removed from the first collection?
>> >
>> > I'm open to any suggestions.
>> >
>> > --
>> > To *know* is one thing, and to know for certain *that* we know is
>> another.
>> > --William James
>>
>
>
>
> --
> To *know* is one thing, and to know for certain *that* we know is another.
> --William James


Re: how to get & display Jessionid with solr results

2013-04-25 Thread Michael Della Bitta
You should look into the documentation of your load balancer to see
how you can enable sticky sessions. If you've already done that and
the load balancer requires jsessionid rather than using it's own
sticky session method, it looks like documentation for using
jsessionid with Jetty is here:
http://wiki.eclipse.org/Jetty/Howto/SessionIds

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 24, 2013 at 6:36 PM, gpssolr2020  wrote:
> Hi,
>
> We are using jetty as a container for solr 3.6. We have two slave servers to
> serve queries for the user request and queries distributed to any one slave
> through load balancer.
>
> When one user send a first search request say its going to slave1 and when
> that user queries again we want to send the query to the same server with
> the help of Jsessionid.
>
> how to achieve this? How to get that Jsessionid with solr search results?
> Please provide your suggestions.
>
> Thanks.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-get-display-Jessionid-with-solr-results-tp4058751.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Need to log query request before it is processed

2013-04-25 Thread Timothy Potter
I would like to log query requests before they are processed.
Currently, it seems they are only logged after being processed. I've
tried enabling a finer logging level but that didn't seem to help.
I've enabled request logging in Jetty but most queries come in as
POSTs from SolrJ

I was thinking of adding a query request logger as a 
but wanted to see what others have done for this?

Thanks.
Tim


Re: Problem with solr deployment on weblogic 10.3

2013-04-25 Thread Shawn Heisey
On 4/25/2013 12:04 AM, Shawn Heisey wrote:
> It looks like the solution is adding some config to the weblogic.xml
> file in the solr.war so that weblogic prefers application classes.  I
> filed SOLR-4762.  I do not know if this change might have unintended
> consequences.
> 
> http://ananthkannan.blogspot.com/2009/08/beware-of-stringutilscontainsignorecase.html
> 
> https://issues.apache.org/jira/browse/SOLR-4762

Radhakrishna: Do you know how to extract solr.war, change the
WEB-INF/weblogic.xml file, and repack it?   I have created a patch for
the Solr source code, but I don't have weblogic, so I can't test it to
make sure it works.  I am running tests to make sure that the change
doesn't break anything else.

Alternatively, you could download the source code, apply the patch I
uploaded to SOLR-4762, build Solr, and try the changed version.  You
never said what version of Solr you are using.  The important part of
the patch should apply correctly to the source of most versions.  The
CHANGES.txt part of the patch will fail on anything older than the 4.x
dev branch (4.4), but that's not an important part of the patch.

Thanks,
Shawn



Re: What is the difference between a Join Query and Embedded Entities in Solr DIH?

2013-04-25 Thread Shawn Heisey
On 4/25/2013 8:17 AM, Gustav wrote:
> Are these two methods functionally different? Is there a performance
> difference?
> 
> Another though would be that, if using join tables in MySQL, using the SQL
> query method with multiple joins could cause multiple documents to be
> indexed instead of one.

They may be equivalent in terms of results, but they work differently
and probably will NOT have the same performance.

When using nested entities in DIH, the main entity results in one SQL
query, but the inner entities will result in a separate SQL query for
every single item returned by the main query.  If you have exactly 1
million rows in your main table and you're using a nested config with
two entities, you will be executing 101 queries.  DIH will be
spending a fair amount of time doing nothing but waiting for the latency
on a million individual queries via JDBC.  It probably also results in
extra work for the database server.

With a server-side join, you're down to one query via JDBC, and the
database server is doing the work of combining your tables, normally
something it can do very efficiently.

Thanks,
Shawn



Re: Question on storage and index/data management in solr

2013-04-25 Thread Shawn Heisey
On 4/25/2013 8:39 AM, Vinay Rai wrote:
> 1. Keep each of last 24 hours segments separate.
> 2. Segments generated between last 48 to 24 hours to be merged into one. 
> Similarly, for segments created between 72 to 48 hours and so on for last 1 
> week.
> 3. Similarly, merge previous 4 week's data into one segment each week.
> 4. Merge all previous months data into one segment each month.
> 
> I am not sure if there is a configuration possible in solr application. If 
> not, are there APIs which will allow me to do this?

To accomplish this exact scenario, you would probably have to write a
custom merge policy class for Lucene.  If you do so, I hope you'll
strongly consider donating it to the Lucene/Solr project.

Another approach: Use distributed search and put the divisions you are
looking at into separate indexes (shards) in their own cores.  You can
then manually do whatever index merging your situation requires.
Constructing the shards parameter for your queries will take some work.

Here's a blog post about this method and a video of the Lucene
Revolution talk mentioned in the blog post:

http://www.loggly.com/blog/2010/08/our-solr-system/
http://loggly.com/videos/lucene-revolution-2010/

I had the honor of being there for that talk in Boston.  They've done
some amazing things with Solr.

> Also, I want to understand how solr stores data or does it have a dependency 
> on the way data is stored. Since the volumes are high, it would be great if 
> the data is compressed and stored (while still searchable). If it is 
> possible, I would like to know what kind of compression does solr do?

Solr 4.1 uses compression for stored fields.  Solr 4.2 also uses
compression for term vectors.  From a performance perspective,
compression is probably not viable at this time for the indexed data,
but if that changes in the future, I'm sure that it will be added.

Here is documentation on the file format used by Solr 4.2:

http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description

Thanks,
Shawn



Re: Solr metrics in Codahale metrics and Graphite?

2013-04-25 Thread Shawn Heisey
On 4/25/2013 6:30 AM, Dmitry Kan wrote:
> We are very much interested in 3.4.
> 
> On Thu, Apr 25, 2013 at 12:55 PM, Alan Woodward  wrote:
>> This is on top of trunk at the moment, but would be back ported to 4.4 if
>> there was interest.

This will be bad news, I'm sorry:

All remaining work on 3.x versions happens in the 3.6 branch. This
branch is in maintenance mode.  It will only get fixes for serious bugs
with no workaround.  Improvements and new features won't be considered
at all.

You're welcome to try backporting patches from newer issues.  Due to the
major differences in the 3x and 4x codebases, the best case scenario is
that you'll be facing a very manual task.  Some changes can't be
backported because they rely on other features only found in 4.x code.

Thanks,
Shawn



Atomic update issue with 4.0 and 4.2.1

2013-04-25 Thread David Fennessey
Hi everyone ,

We have hit this strange bug using the atomic update functionality of both SOLR 
4.0 and SOLR 4.2.1.

We're currently posting a JSON formatted file to the core's updater using a 
simple curl method however we've run a very bizarre error where periodically it 
will fail and return a 400 error message. If we were to send the exact same 
request and file 5 minutes later, sometimes it will be accepted and return a 
200 and other times it will continue to throw 400's.  This tends to happen when 
the SOLR is receiving a lot of updates and restarting tomcat seems to clear up 
the issue, however I feel that there is probably something important that I am 
missing.

The error message that it throws is quite strange and I don't really feel that 
it means very much because we can fire the exact same message 5 minutes later 
and it will happily fill that field. I am positive that I am only sending the 
value 965.00 in this case.

2013-04-25 00:20:39,373 [ERROR] org.apache.solr.core.SolrCore 
org.apache.solr.common.SolrException: ERROR: [doc=1764656] Error adding field 
'maxPrice'='java.math.BigDecimal:965.' msg=For input string: 
"java.math.BigDecimal:965.' msg=For input string: 
"java.math.BigDecimal:965."
>---at 
>org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:300)
>---at 
>org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
>---at 
>org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199)
>---at 
>org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
>---at 
>org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
>---at 
>org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
>---at 
>org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
>---at 
>org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
>---at 
>org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>---at 
>org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:387)
>---at 
>org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:112)
>---at 
>org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:96)
>---at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:60)
>---at 
>org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>---at 
>org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>---at 
>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>---at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
>---at 
>org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
>---at 
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>---at 
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>---at 
>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>---at 
>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>---at 
>org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>---at 
>org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>---at 
>org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>---at 
>org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>---at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
>---at 
>org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>---at 
>org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
>---at 
>org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
>---at 
>org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>---at 
>org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>---at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>---at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>---at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NumberFormatException: For input string: 
"java.math.BigDecimal:965."
>---at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
>---at java.lang.Float.parseFloat(Float.java:452)
>---at org.apache.solr.schema.TrieField.createField(TrieField.java:598)
>---at org.apache.solr.schema.TrieField.createFields(TrieField.java:655)
>---at org.apache.solr.update.DocumentBuilder.addField(Docu

Cloudspace and Solr Support Page

2013-04-25 Thread Nina Talley
Hi there,

 We offer Solr support and were wondering how we would go about being added
to the Solr Support page ? Thanks so
much for your time!

-- 
[image: Cloudspace.com] Nina TalleyAccount
ManagerOffice: 877.823.8808

11551 University Blvd, Suite 2

Orlando, FL 32817


Massive Positions Files

2013-04-25 Thread Mike
Hi All,

I'm indexing a pretty large collection of documents (about 500K relatively
long documents taking up >1TB space, mostly in MS Office formats), and am
confused about the file sizes in the index.  I've gotten through about 180K
documents, and the *.pos files add up to 325GB, while the all of the rest
combined are using less than 5GB--including some large stored fields and
term vectors.  It makes sense to me that the compression on stored fields
helps to keep that part down on large text fields, and that term vectors
wouldn't be too big since they don't need position information, but the
magnitude of the difference is alarming.  Is that to be expected?  Is there
any way to reduce the size of the positions index if phrase searching is a
requirement?

I am using Solr 4.2.1.  These documents have some a number of small
metadata elements, along with the big content field.  Like the default
schema, I'm storing but not indexing the content field, and a lot of the
fields get put into a catchall that is indexed and uses term vectors, but
is not stored.

Thanks,
Mike


Re: Cloudspace and Solr Support Page

2013-04-25 Thread Jan Høydahl
Hi,

Just give your WIKI user name and we'll give you access to edit that page to 
add yourself.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

25. apr. 2013 kl. 21:39 skrev Nina Talley :

> Hi there,
> 
> We offer Solr support and were wondering how we would go about being added
> to the Solr Support page ? Thanks so
> much for your time!
> 
> -- 
> [image: Cloudspace.com] Nina TalleyAccount
> ManagerOffice: 877.823.8808
> 
> 11551 University Blvd, Suite 2
> 
> Orlando, FL 32817



Re: Reordered DBQ.

2013-04-25 Thread Marcin Rzewucki
OK. Thanks for explanation.


On 23 April 2013 23:16, Yonik Seeley  wrote:

> On Tue, Apr 23, 2013 at 3:51 PM, Marcin Rzewucki 
> wrote:
> > Recently I noticed a lot of "Reordered DBQs detected" messages in logs.
> As
> > far as I checked in logs it could be related with deleting documents, but
> > not sure. Do you know what is the reason of those messages ?
>
> For high throughput indexing, we version updates on the leader and
> forward onto other replicas w/o strict serialization.
> If on a leader, an add happened before a DBQ, then on a replica the
> DBQ is serviced before the add, Solr detects this reordering and fixes
> it.
> It's not an error or an indication that anything is wrong (hence the
> INFO level log message).
>
> -Yonik
> http://lucidworks.com
>


How To Make Index Backup at SolrCloud?

2013-04-25 Thread Furkan KAMACI
I use SolrCloud. Let's assume that I want to move all indexes from one
place to another. There maybe two reasons for that:

First one is that: I will close all my system and I will use new machines
with previous indexes (if it is a must they may have same network topology)
at anywhere else after some time later.
Second one is that: I know that SolrCloud handles failures but I will back
up my indexes for a disaster event.

How can I back up my indexes? I know that I can start up new nodes and I
can close the old ones so I can move my indexes to other machines. However
how can I do such kind of backup (should I just copy data folder of Solr
nodes and put them to new Solr nodes after I change Zookeeper
configuration)?

What folks do?


Re: filter before facet

2013-04-25 Thread Daniel Tyreus
On Thu, Apr 25, 2013 at 12:35 AM, Toke Eskildsen 
wrote:

>
>
> > This leads me to believe that the FQ is being applied AFTER the facets
> are
> > calculated on the whole data set. For my use case it would make a ton of
> > sense to apply the FQ first and then facet. Is it possible to specify
> this
> > behavior or do I need to get into the code and get my hands dirty?
>
>
> As for creating a new faceting implementation that avoids the startup
> penalty by using only the found documents, then it is technically quite
> simple: Use stored fields, iterate the hits and request the values.
> Unfortunately this scales poorly with the number of hits, so unless you
> can guarantee that you will always have small result sets, this is
> probably not a viable option.
>
>
Thank you Toke for your detailed reply. I have perhaps an unusual use case
where we may have hundreds of thousands of users each with a few thousand
documents. On some queries I can guarantee the result size will be small
compared to the entire corpus since I'm filtering on one user's documents.
I may give this alternative faceting implementation a try.

Best regards,
Daniel


Re: Massive Positions Files

2013-04-25 Thread Jack Krupansky
These are the "postings" for all terms - the lists of positions for every 
occurrence of every term for all documents. Sounds to me like it could be 
huge.


Did you try a back of the envelope calculation?

3.25 GB divided by 180K = 18 K per doc (call it 2K).

How many "words" in a document? You say they are "long".

Even if there were only 5000 to 1 postings per "long" document, that 
would work out to 2 to 4 bytes or so per posting. I have no idea how big an 
"average" term posting might be, but these numbers do not seem at all 
unreasonable.


Now, let's see what kind of precise answer the Lucene guys give you!

-- Jack Krupansky

-Original Message- 
From: Mike

Sent: Thursday, April 25, 2013 4:00 PM
To: solr-user@lucene.apache.org
Subject: Massive Positions Files

Hi All,

I'm indexing a pretty large collection of documents (about 500K relatively
long documents taking up >1TB space, mostly in MS Office formats), and am
confused about the file sizes in the index.  I've gotten through about 180K
documents, and the *.pos files add up to 325GB, while the all of the rest
combined are using less than 5GB--including some large stored fields and
term vectors.  It makes sense to me that the compression on stored fields
helps to keep that part down on large text fields, and that term vectors
wouldn't be too big since they don't need position information, but the
magnitude of the difference is alarming.  Is that to be expected?  Is there
any way to reduce the size of the positions index if phrase searching is a
requirement?

I am using Solr 4.2.1.  These documents have some a number of small
metadata elements, along with the big content field.  Like the default
schema, I'm storing but not indexing the content field, and a lot of the
fields get put into a catchall that is indexed and uses term vectors, but
is not stored.

Thanks,
Mike 



Re: How To Make Index Backup at SolrCloud?

2013-04-25 Thread Timothy Potter
Hi Furkan,

So here's what I do (not saying this is the best method, but
definitely works great albeit with a little work on my part)

The replication handler (which must be enabled for Solr cloud)
supports a backup command, e.g.
.../replication?command=backup&location=/mnt/backups

>From what I've heard, this is the "safe" way to backup an index in Solr.

Every X minutes, I send a hard commit and then send this "backup"
command to only 1 replica per shard as you don't need multiple
replicas of the same shard backing up the same thing. I chose to send
this to the leaders but that is definitely not required.

The actual backup runs in the background asynchronously, so the
command returns to your client immediately. So there's not a good way
to know when the backup is done ... so I "poll" the replication
handler's details action, e.g. .../replication?command=details, which
will return a completed on date when your backup is done.  So my
backup tool simply waits until all shards report as being completed.
Then I move the backup offsite, which for us is S3.

Incidentally, I wrote my backup driver in Java b/c the SolrJ library
gives you access to all the cluster state information you need to pick
one replica per shard to start the backup on, e.g. something like
this:

private static final Map
getShardLeaders(CloudSolrServer solr, String collection) throws
Exception {
Map leaders = new TreeMap();
ZkStateReader zkStateReader = solr.getZkStateReader();
for (Slice slice :
zkStateReader.getClusterState().getSlices(collection)) {
leaders.put(slice.getName(),
zkStateReader.getLeaderUrl(collection, slice.getName(), ZK_TIMEOUT));
}
return leaders;
}

On Thu, Apr 25, 2013 at 3:22 PM, Furkan KAMACI  wrote:
> I use SolrCloud. Let's assume that I want to move all indexes from one
> place to another. There maybe two reasons for that:
>
> First one is that: I will close all my system and I will use new machines
> with previous indexes (if it is a must they may have same network topology)
> at anywhere else after some time later.
> Second one is that: I know that SolrCloud handles failures but I will back
> up my indexes for a disaster event.
>
> How can I back up my indexes? I know that I can start up new nodes and I
> can close the old ones so I can move my indexes to other machines. However
> how can I do such kind of backup (should I just copy data folder of Solr
> nodes and put them to new Solr nodes after I change Zookeeper
> configuration)?
>
> What folks do?


Re: Need to log query request before it is processed

2013-04-25 Thread Sudhakar Maddineni
HI Tim,
  Have you tried by enabling the logging levels on httpclient, which is
used by solrj classes internally?

Thx,Sudhakar.


On Thu, Apr 25, 2013 at 10:12 AM, Timothy Potter wrote:

> I would like to log query requests before they are processed.
> Currently, it seems they are only logged after being processed. I've
> tried enabling a finer logging level but that didn't seem to help.
> I've enabled request logging in Jetty but most queries come in as
> POSTs from SolrJ
>
> I was thinking of adding a query request logger as a 
> but wanted to see what others have done for this?
>
> Thanks.
> Tim
>


Re: SolrJ Custom RowMapper

2013-04-25 Thread Sudhakar Maddineni
Hey Luis,
Check this example in the source:TestDocumentObjectBinder
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/solrj/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java

Thx,Sudhakar.



On Thu, Apr 25, 2013 at 7:56 AM, Luis Lebolo  wrote:

> Hi All,
>
> Does SolrJ have an option for a custom RowMapper or BeanPropertyRowMapper
> (I'm using Spring/JDBC terms).
>
> I know the QueryResponse has a getBeans method, but I would like to create
> my own mapping and plug it in.
>
> Any pointers?
>
> Thanks,
> Luis
>


Re: How To Make Index Backup at SolrCloud?

2013-04-25 Thread Otis Gospodnetic
You can use the index backup command that's part of index replication,
check the Wiki.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Apr 25, 2013 5:23 PM, "Furkan KAMACI"  wrote:

> I use SolrCloud. Let's assume that I want to move all indexes from one
> place to another. There maybe two reasons for that:
>
> First one is that: I will close all my system and I will use new machines
> with previous indexes (if it is a must they may have same network topology)
> at anywhere else after some time later.
> Second one is that: I know that SolrCloud handles failures but I will back
> up my indexes for a disaster event.
>
> How can I back up my indexes? I know that I can start up new nodes and I
> can close the old ones so I can move my indexes to other machines. However
> how can I do such kind of backup (should I just copy data folder of Solr
> nodes and put them to new Solr nodes after I change Zookeeper
> configuration)?
>
> What folks do?
>


Re: Too many close, count -1

2013-04-25 Thread Erick Erickson
One outside possibility (and 4.3 should refuse to start if this is the
case). Is it possible that more than one of your cores has the same
name?

FWIW,
Erick

On Tue, Apr 23, 2013 at 5:30 PM, Chris Hostetter
 wrote:
>
> : Subject: Re: Too many close, count -1
>
> Thanks for the details, nothing jumps out at me, but we're now tracking
> this in SOLR-4753...
>
> https://issues.apache.org/jira/browse/SOLR-4753
>
> -Hoss


Re: Query specific replica

2013-04-25 Thread Erick Erickson
bq: I was wondering wether it is possible to query the same core every request,

Not that I know of. You can ping a single node by appending
&distrib=false, but that
won't then look at multiple shards. If you don't have any shards, this
would work I think...

Best
Erick

On Tue, Apr 23, 2013 at 6:31 PM, Manuel Le Normand
 wrote:
> Hello,
> Since i replicated my shards (i have 2 cores per shard now), I get a
> remarkable decrease in qTime. I assume it happens since my memory has to
> split between twice more cores than it used to.
>
> In my low qps rate use-case, I use replications as shard backup only (in
> case one of my servers goes down) and not for the ability of serving
> parallel requests. In this case i decrease because the two cores of the
> shard are active.
>
> I was wondering wether it is possible to query the same core every request,
> instead of "load balancing" between the different replicas? And only if the
> "leader" replica goes down the second replica would start serving requests.
>
> Cheers,
> Manu


Re: Luke misreporting index-time boosts?

2013-04-25 Thread Erick Erickson
I think you're kinda missing the idea of index time boosting. The
semantic of this (as I remember Chris Hostetter explaining) is
"this document's content is more important than other document's
content".

By doing an index-time boost that's the same for all your documents,
you're effectively doing nothing to the relative ranks of the results.

Not quite sure what Luke is doing here, but using &debugQuery=on
will give you the actual scores of the actual documents. And if you're
doing anything like wildcards or *:* queries, shortcuts are taken
that set the scores to 1.0.

If none of that helps, I'm out of my depth ..

Best
Erick

On Wed, Apr 24, 2013 at 6:01 AM, Timothy Hill  wrote:
> Hello, all
>
> I have recently been attempting to apply index-time boosts to fields using
> the following syntax:
>
> 
> 
> bleah bleah bleah
> content here
> content here
> 
> 
> content here
> bleah bleah bleah
> content here
> 
> 
>
> The intention is that matches on important_field should be more important
> to score than matches on trivial_field (so that a search across all fields
> for the term 'content' would return the second document above the first),
> while still being able to use the standard query parser.
>
> Looking at output from Luke, however, all fields are reported as having a
> boost of 1.0.
>
> The following possibilities occur to me.
>
> (1) The entire index-time-boosting approach is misconceived
> (2) Luke is misreporting, because index-time boosting alters more
> fundamental aspects of scoring (tf-idf calculations, I suppose), and the
> index-time boost is thus invisible to it
> (3) Some combination of (1) and (2)
>
> Can anyone help illuminate the situation for me? Documentation for these
> questions seems patchy.
>
> Thanks,
>
> Tim


Re: Facets with OR clause

2013-04-25 Thread Erick Erickson
If you're talking about _filter queries_, Kai's answer is good

But your question is confusing. You
talk about facet queries, but then use fq, which is "filter
query" and has nothing to do with facets at all unless
you're talking about turning facet information into filter
queries..

FWIW,
Erick

On Wed, Apr 24, 2013 at 6:43 AM, Kai Becker  wrote:
> Try fq=(groups:group1 OR locations:location1)
>
> Am 24.04.2013 um 12:39 schrieb vsl:
>
>> Hi,
>>
>> my request contains following term:
>>
>> The are 3 facets:
>> groups, locations, categories.
>>
>>
>>
>> When I select some items then I see such syntax in my request.
>> fq=groups:group1&fq=locations:location1
>>
>> Is it possible to add OR clause between facets items in query?
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Facets-with-OR-clause-tp4058553.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Using another way instead of DIH

2013-04-25 Thread xiaoqi
Thanks for help .

"data-config.xml" ? i can not find this file , u mean data-import.xml or
solrconfig.xml ? 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-another-way-instead-of-DIH-tp4058937p4059067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Luke misreporting index-time boosts?

2013-04-25 Thread Chris Hostetter

: Looking at output from Luke, however, all fields are reported as having a
: boost of 1.0.
: 
: The following possibilities occur to me.
: 
: (1) The entire index-time-boosting approach is misconceived

Yes. see Erick's comments about index boosts vs query boosts and why what 
you are trying to do won't work.

: (2) Luke is misreporting, because index-time boosting alters more
: fundamental aspects of scoring (tf-idf calculations, I suppose), and the
: index-time boost is thus invisible to it

I'm not exactly sure how Luke is labeling things, but one aspect to 
remember is that index time boosts are used to generate the field norms -- 
if you have omitNorms="true" on your fields, the filed norms are always 
going to be reported as "1", which may be what you are seeing.

If you are not using omirNorms="true" then please provide more details as 
to what exactly you are seeing in Luke.


-Hoss


Re: Solr Cloud 4.2 - Distributed Requests failing with NPE

2013-04-25 Thread Chris Hostetter

: "trace":"java.lang.NullPointerException\r\n\tat
: 
org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\r\n\tat
: 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\r\n\tat

yea, definitely a bug.  

Raintung reported this recently, and made a patch available...

https://issues.apache.org/jira/browse/SOLR-4705


-Hoss


Re: How do set compression for compression on stored fields in SOLR 4.2.1

2013-04-25 Thread Chris Hostetter
: Subject: How do set compression for compression on stored fields in SOLR 4.2.1
: 
: https://issues.apache.org/jira/browse/LUCENE-4226
: It mentions that we can set compression mode:
: FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION.

The compression details are hardcoded into the various codecs.  If you 
wanted to customize this, you'd need to write your own codec subclass...

https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/compressing/class-use/CompressionMode.html

See, for example, the implementations of Lucene41StoredFieldsFormat and 
Lucene42TermVectorsFormat...


public final class Lucene41StoredFieldsFormat extends 
CompressingStoredFieldsFormat {
  /** Sole constructor. */
  public Lucene41StoredFieldsFormat() {
super("Lucene41StoredFields", CompressionMode.FAST, 1 << 14);
  }
}

public final class Lucene42TermVectorsFormat extends 
CompressingTermVectorsFormat {
  /** Sole constructor. */
  public Lucene42TermVectorsFormat() {
super("Lucene41StoredFields", "", CompressionMode.FAST, 1 << 12);
  }
}




-Hoss


Re: Question on storage and index/data management in solr

2013-04-25 Thread Vinay Rai
Thank you very much Shawn for a detailed response. Let me read all the 
documentation you pointed to and digest it.

Sure, if I do use using solr and need to make this change, I would love to also 
submit it to the Lucene/Solr project.

Regards,
Vinay



 From: Shawn Heisey 
To: solr-user@lucene.apache.org 
Sent: Thursday, April 25, 2013 11:32 PM
Subject: Re: Question on storage and index/data management in solr
 

On 4/25/2013 8:39 AM, Vinay Rai wrote:
> 1. Keep each of last 24 hours segments separate.
> 2. Segments generated between last 48 to 24 hours to be merged into one. 
> Similarly, for segments created between 72 to 48 hours and so on for last 1 
> week.
> 3. Similarly, merge previous 4 week's data into one segment each week.
> 4. Merge all previous months data into one segment each month.
> 
> I am not sure if there is a configuration possible in solr application. If 
> not, are there APIs which will allow me to do this?

To accomplish this exact scenario, you would probably have to write a
custom merge policy class for Lucene.  If you do so, I hope you'll
strongly consider donating it to the Lucene/Solr project.

Another approach: Use distributed search and put the divisions you are
looking at into separate indexes (shards) in their own cores.  You can
then manually do whatever index merging your situation requires.
Constructing the shards parameter for your queries will take some work.

Here's a blog post about this method and a video of the Lucene
Revolution talk mentioned in the blog post:

http://www.loggly.com/blog/2010/08/our-solr-system/
http://loggly.com/videos/lucene-revolution-2010/

I had the honor of being there for that talk in Boston.  They've done
some amazing things with Solr.

> Also, I want to understand how solr stores data or does it have a dependency 
> on the way data is stored. Since the volumes are high, it would be great if 
> the data is compressed and stored (while still searchable). If it is 
> possible, I would like to know what kind of compression does solr do?

Solr 4.1 uses compression for stored fields.  Solr 4.2 also uses
compression for term vectors.  From a performance perspective,
compression is probably not viable at this time for the indexed data,
but if that changes in the future, I'm sure that it will be added.

Here is documentation on the file format used by Solr 4.2:

http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description

Thanks,
Shawn

Re: Facets with 5000 facet fields - Out of memory error during the query time

2013-04-25 Thread sivaprasad
I got more information with the responses.Now, It's time to re look into  the
number of facets to be configured.

Thanks,
Siva
http://smarttechies.wordpress.com/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-with-5000-facet-fields-Out-of-memory-error-during-the-query-time-tp4048450p4059079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud 4.2 - Distributed Requests failing with NPE

2013-04-25 Thread Sudhakar Maddineni
Thank you Hoss for looking into it.

-Sudhakar.


On Thu, Apr 25, 2013 at 6:50 PM, Chris Hostetter
wrote:

>
> : "trace":"java.lang.NullPointerException\r\n\tat
> :
> org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\r\n\tat
> :
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\r\n\tat
>
> yea, definitely a bug.
>
> Raintung reported this recently, and made a patch available...
>
> https://issues.apache.org/jira/browse/SOLR-4705
>
>
> -Hoss
>