Re: Solr 3.5.0 can't find Carrot classes

2012-01-27 Thread Vadim Kisselmann
Hi Christopher,
when all needed jars are included, you can only have wrong paths in
your solrconfig.xml
Regards
Vadim



2012/1/26 Stanislaw Osinski :
> Hi,
>
> Can you paste the logs from the second run?
>
> Thanks,
>
> Staszek
>
> On Wed, Jan 25, 2012 at 00:12, Christopher J. Bottaro > wrote:
>
>> On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote:
>> > SEVERE: java.lang.NoClassDefFoundError:
>> org/carrot2/core/ControllerFactory
>> >         at
>> org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.(CarrotClusteringEngine.java:102)
>> >         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> >         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>> Source)
>> >         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
>> >         at java.lang.reflect.Constructor.newInstance(Unknown Source)
>> >         at java.lang.Class.newInstance0(Unknown Source)
>> >         at java.lang.Class.newInstance(Unknown Source)
>> >
>> > …
>> >
>> > I'm starting Solr with -Dsolr.clustering.enabled=true and I can see that
>> the Carrot jars in contrib are getting loaded.
>> >
>> > Full log file is here:
>> http://onespot-development.s3.amazonaws.com/solr.log
>> >
>> > Any ideas?  Thanks for the help.
>> >
>> Ok, got a little further.  Seems that Solr doesn't like it if you include
>> jars more than once (I had a lib dir and also  directives in the
>> solrconfig which ended up loading the same jars twice).
>>
>> But now I'm getting these errors:  java.lang.NoClassDefFoundError:
>> org/apache/solr/handler/clustering/SearchClusteringEngine
>>
>> Any help?  Thanks.


Re: SolrCell maximum file size

2012-01-27 Thread Augusto Camarotti
I'm talking about 2 GB files. It means that I'll have to allocate something 
bigger than that for the JVM? Something like 2,5 GB?
 
Thanks,
 
Augusto Camarotti

>>> Erick Erickson  1/25/2012 1:48 pm >>>
Mostly it depends on your container settings, quite often that's
where the limits are. I don't think Solr imposes any restrictions.

What size are we talking about anyway? There are implicit
issues with how much memory parsing the file requires, but you
can allocate lots of memory to the JVM to handle that.

Best
Erick

On Tue, Jan 24, 2012 at 10:24 AM, Augusto Camarotti
 wrote:
> Hi everybody
>
> Does anyone knows if there is a maximum file size that can be uploaded to the 
> extractingrequesthandler via http request?
>
> Thanks in advance,
>
> Augusto Camarotti


Commit and sessions

2012-01-27 Thread Per Steffensen

Hi

If I have added some document to solr, but not done explicit commit yet, 
and I get a power outage, will I then loose data? Or asked in another 
way, does data go into persistent store before commit? How to avoid 
possibility of loosing data?


Does solr have some kind of session concept, so that different threads 
can add documents to the same solr, and when one of them says "commit" 
it is only the documents added by this thread that gets committed? Or is 
it always "all documents added by any thread since last commit" that 
gets committed?


Regards, Per Steffensen


Re: Commit and sessions

2012-01-27 Thread Jan Høydahl
Hi,

Yep, anything added between two commits must be regarded as lost in case of 
crash.
You can of course minimize this interval by using a low "commitWithin". But 
after a crash you should always investigate whether the last minutes of adds 
made it.

A transaction log feature is being developed, but not there yet: 
https://issues.apache.org/jira/browse/SOLR-2700

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 27. jan. 2012, at 13:05, Per Steffensen wrote:

> Hi
> 
> If I have added some document to solr, but not done explicit commit yet, and 
> I get a power outage, will I then loose data? Or asked in another way, does 
> data go into persistent store before commit? How to avoid possibility of 
> loosing data?
> 
> Does solr have some kind of session concept, so that different threads can 
> add documents to the same solr, and when one of them says "commit" it is only 
> the documents added by this thread that gets committed? Or is it always "all 
> documents added by any thread since last commit" that gets committed?
> 
> Regards, Per Steffensen



Re: Commit and sessions

2012-01-27 Thread Sami Siren
On Fri, Jan 27, 2012 at 3:25 PM, Jan Høydahl  wrote:
> Hi,
>
> Yep, anything added between two commits must be regarded as lost in case of 
> crash.
> You can of course minimize this interval by using a low "commitWithin". But 
> after a crash you should always investigate whether the last minutes of adds 
> made it.

In addition to what Jan said I think you also need to watch out for
out of memory exceptions and filled disk space because I think you
loose your docs (since last commit) in those cases too.

--
 Sami Siren


Why are copyFields necessary here?

2012-01-27 Thread Tim Hibbs
Hi, all,

I could use a little education here, if you'd be so kind. My queries
without a field-name qualifier (such as "schedule a pickup", no quotes)
don't return any values UNLESS I've defined copyFields as illustrated
below. The queries work sufficiently well when those fields are defined,
so I sort of have a reasonable fallback position... but I'd like to
understand what's really happening. Any insights are much appreciated.

I'm using SolrJ from solr-3.5.0, and have the following field
definitions in schema.xml:

   
   
   
   
   
   
   
   
   
   
   

NOTES:
- text_en_splitting hasn't been changed from the value defined in the
example schema.xml with the 3.5.0 distribution...
- text exists as such in the
file...
- field "text" is defined as:

- copyFields are:




- If it's relevant, I'm boosting Keywords^2.5, Title^2.0, TOC^2.0, and
Overview^1.5 when the index is built.

Thanks,
Tim Hibbs


Re: Why are copyFields necessary here?

2012-01-27 Thread Rafał Kuć
Hello!

If you don't specify the field, the query will be made against the
default search field defined in the schema.xml file. So, when the
default search field is empty (no copy fields) then there are no
search results.

-- 
Regards,
 Rafał Kuć

> Hi, all,

> I could use a little education here, if you'd be so kind. My queries
> without a field-name qualifier (such as "schedule a pickup", no quotes)
> don't return any values UNLESS I've defined copyFields as illustrated
> below. The queries work sufficiently well when those fields are defined,
> so I sort of have a reasonable fallback position... but I'd like to
> understand what's really happening. Any insights are much appreciated.

> I'm using SolrJ from solr-3.5.0, and have the following field
> definitions in schema.xml:

>
> stored="true" omitNorms="false"/>
> stored="false" required="false"/>
>>
> stored="true" required="true" multiValued="true"/>
> stored="false" required="false" multiValued="true" omitNorms="false"/>
> stored="true" required="false" omitNorms="false"/>
> required="true" multiValued="true"/>
> stored="false" required="false"/>
> stored="true" required="false" multiValued="true" omitNorms="false"/>
> required="false" multiValued="true"/>

> NOTES:
> - text_en_splitting hasn't been changed from the value defined in the
> example schema.xml with the 3.5.0 distribution...
> - text exists as such in the
> file...
> - field "text" is defined as:
>  stored="false" multiValued="true"/>
> - copyFields are:
> 
> 
> 
> 
> - If it's relevant, I'm boosting Keywords^2.5, Title^2.0, TOC^2.0, and
> Overview^1.5 when the index is built.

> Thanks,
> Tim Hibbs




Solr Warm-up performance issues

2012-01-27 Thread dan sutton
Hi List,

We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB.
Every day we produce a new dataset of 40 GB and have to switch one for
the other.

Once the index switch over has taken place, it takes roughly 30 min for Solr
to reach maximum performance. Are there any hardware or software solutions
to reduce the warm-up time ? We tried warm-up queries but it didn't change
much.

Our hardware specs is:
   * Dell Poweredge 1950
   * 2 x Quad-Core Xeon E5405 (2.00GHz)
   * 48 GB RAM
   * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror

One thing that does seem to take a long time is un-inverting a set of
multivalued fields, are there any optimizations we might be able to
use here?

Thanks for your help.
Dan


RE: Why are copyFields necessary here?

2012-01-27 Thread Tim Hibbs
Rafał,

Thanks for your response.

I defined what I think you're referring to as "the default search field" as

text

I'm confused about how this works. I defined that field "text" to be of
fieldType "text_en_splitting". I don't understand how associating "text"
with anything can work unless I ALSO associate *my* things (Title, TOC,
Keywords, Overview, and the contents of the URLs pointed to in the
urlpath) with "text" as well.

Does that mean that copyFields will always be necessary if query words
are sent to solr with no field qualifier?

Tim


-Original Message-
From: Rafał Kuć [mailto:r@solr.pl]
Sent: Friday, January 27, 2012 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Why are copyFields necessary here?

Hello!

If you don't specify the field, the query will be made against the
default search field defined in the schema.xml file. So, when the
default search field is empty (no copy fields) then there are no search
results.

--
Regards,
 Rafał Kuć

> Hi, all,

> I could use a little education here, if you'd be so kind. My queries
> without a field-name qualifier (such as "schedule a pickup", no
> quotes) don't return any values UNLESS I've defined copyFields as
> illustrated below. The queries work sufficiently well when those
> fields are defined, so I sort of have a reasonable fallback
> position... but I'd like to understand what's really happening. Any
insights are much appreciated.

> I'm using SolrJ from solr-3.5.0, and have the following field
> definitions in schema.xml:

>
> stored="true" omitNorms="false"/>
> stored="false" required="false"/>
>>
> stored="true" required="true" multiValued="true"/>
> stored="false" required="false" multiValued="true" omitNorms="false"/>
> stored="true" required="false" omitNorms="false"/>
> required="true" multiValued="true"/>
> stored="false" required="false"/>
> stored="true" required="false" multiValued="true" omitNorms="false"/>
> required="false" multiValued="true"/>

> NOTES:
> - text_en_splitting hasn't been changed from the value defined in the
> example schema.xml with the 3.5.0 distribution...
> - text exists as such in the
> file...
> - field "text" is defined as:
>  stored="false" multiValued="true"/>
> - copyFields are:
> 
> 
> 
> 
> - If it's relevant, I'm boosting Keywords^2.5, Title^2.0, TOC^2.0, and
> Overview^1.5 when the index is built.

> Thanks,
> Tim Hibbs




Re: Why are copyFields necessary here?

2012-01-27 Thread Rafał Kuć
Hello!

When you don't specify a field or fields you want to search against,
Solr will use the one set as the default in schema.xml file (the one
defined with ).

So, you have the following field:


When you don't specify copyField's this field won't have any values.
So when searching without specifying the field name, Solr will send
the query to the field named 'text' and because it is empty no results
will be found.

If you always want to search against the same set of fields, please
take a look at the dismax or edismax query parsers, so you can make a
query like:
q=schedule a pickup&qf=Keywords Title TOC Overview

where the qf specify the fields go be searched.

-- 
Regards,
 Rafał Kuć

> Rafał,

> Thanks for your response.

> I defined what I think you're referring to as "the default search field" as

> text

> I'm confused about how this works. I defined that field "text" to be of
> fieldType "text_en_splitting". I don't understand how associating "text"
> with anything can work unless I ALSO associate *my* things (Title, TOC,
> Keywords, Overview, and the contents of the URLs pointed to in the
> urlpath) with "text" as well.

> Does that mean that copyFields will always be necessary if query words
> are sent to solr with no field qualifier?

> Tim


> -Original Message-
> From: Rafał Kuć [mailto:r@solr.pl]
> Sent: Friday, January 27, 2012 9:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Why are copyFields necessary here?

> Hello!

> If you don't specify the field, the query will be made against the
> default search field defined in the schema.xml file. So, when the
> default search field is empty (no copy fields) then there are no search
> results.

> --
> Regards,
>  Rafał Kuć

>> Hi, all,

>> I could use a little education here, if you'd be so kind. My queries
>> without a field-name qualifier (such as "schedule a pickup", no
>> quotes) don't return any values UNLESS I've defined copyFields as
>> illustrated below. The queries work sufficiently well when those
>> fields are defined, so I sort of have a reasonable fallback
>> position... but I'd like to understand what's really happening. Any
> insights are much appreciated.

>> I'm using SolrJ from solr-3.5.0, and have the following field
>> definitions in schema.xml:

>>
>>> stored="true" omitNorms="false"/>
>>> stored="false" required="false"/>
>>>>
>>> stored="true" required="true" multiValued="true"/>
>>> stored="false" required="false" multiValued="true" omitNorms="false"/>
>>> stored="true" required="false" omitNorms="false"/>
>>> required="true" multiValued="true"/>
>>> stored="false" required="false"/>
>>> stored="true" required="false" multiValued="true" omitNorms="false"/>
>>> required="false" multiValued="true"/>

>> NOTES:
>> - text_en_splitting hasn't been changed from the value defined in the
>> example schema.xml with the 3.5.0 distribution...
>> - text exists as such in the
>> file...
>> - field "text" is defined as:
>> > stored="false" multiValued="true"/>
>> - copyFields are:
>> 
>> 
>> 
>> 
>> - If it's relevant, I'm boosting Keywords^2.5, Title^2.0, TOC^2.0, and
>> Overview^1.5 when the index is built.

>> Thanks,
>> Tim Hibbs







Fwd: RE: Why are copyFields necessary here?

2012-01-27 Thread Tim Hibbs
Rafal,

Thanks for your response.

I defined what I think you're referring to as "the default search field" as

text

I'm confused about how this works. I defined that field "text" to be of
fieldType "text_en_splitting". I don't understand how associating "text"
with anything can work unless I ALSO associate *my* things (Title, TOC,
Keywords, Overview, and the contents of the URLs pointed to in the
urlpath) with "text" as well.

Does that mean that copyFields will always be necessary if query words
are sent to solr with no field qualifier?

Tim


-Original Message-
From: Rafał Kuć [mailto:r@solr.pl]
Sent: Friday, January 27, 2012 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Why are copyFields necessary here?

Hello!

If you don't specify the field, the query will be made against the
default search field defined in the schema.xml file. So, when the
default search field is empty (no copy fields) then there are no search
results.

--
Regards,
 Rafał Kuć

> Hi, all,

> I could use a little education here, if you'd be so kind. My queries
> without a field-name qualifier (such as "schedule a pickup", no
> quotes) don't return any values UNLESS I've defined copyFields as
> illustrated below. The queries work sufficiently well when those
> fields are defined, so I sort of have a reasonable fallback
> position... but I'd like to understand what's really happening. Any
insights are much appreciated.

> I'm using SolrJ from solr-3.5.0, and have the following field
> definitions in schema.xml:

>
> stored="true" omitNorms="false"/>
> stored="false" required="false"/>
>>
> stored="true" required="true" multiValued="true"/>
> stored="false" required="false" multiValued="true" omitNorms="false"/>
> stored="true" required="false" omitNorms="false"/>
> required="true" multiValued="true"/>
> stored="false" required="false"/>
> stored="true" required="false" multiValued="true" omitNorms="false"/>
> required="false" multiValued="true"/>

> NOTES:
> - text_en_splitting hasn't been changed from the value defined in the
> example schema.xml with the 3.5.0 distribution...
> - text exists as such in the
> file...
> - field "text" is defined as:
>  stored="false" multiValued="true"/>
> - copyFields are:
> 
> 
> 
> 
> - If it's relevant, I'm boosting Keywords^2.5, Title^2.0, TOC^2.0, and
> Overview^1.5 when the index is built.

> Thanks,
> Tim Hibbs





Re: ord/rord with a function

2012-01-27 Thread Erick Erickson
Would sorting by distance work or are you just looking to say something
"only give me all the places in New York"? Might frange work as
a filter query in that case where the distance you provide is XXX
kilometers so you're effectively excluding everything over, say, 160
kilometers from your city-of-choice?

But from the documentation here:
http://wiki.apache.org/solr/FunctionQuery#ord
I really don't think ord or rord are going to do
what you want anyway.

Best
Erick

On Thu, Jan 26, 2012 at 9:51 AM, entdeveloper
 wrote:
> Is it possible for ord/rord to work with a function? I'm attempting to use
> rord with a spatial function like the following as a bf:
>
> bf=rord(geodist())
>
> If there's no way for this to work, is there a way to simulate the same
> behavior?
>
> For some background, I have two sets of documents: one set applies to a
> location in NY and another in LA. I want to boost documents that are closer
> to where the user is searching from. But I only need these sets to be ranked
> 1 & 2. In other words, the actual distance should not be used to boost the
> documents, just if you are closer or farther. We may add more locations in
> the future, so I'd like to be able to rank the locations from closest to
> furthest.
>
> I need some way to rank the distances, and rord is the right idea, but
> doesn't seem to work with functions.
>
> I'm running Solr 3.4, btw.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/ord-rord-with-a-function-tp3691138p3691138.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr custom component

2012-01-27 Thread Erick Erickson
Why not just sort on date and take the first doc returned in the list?

Best
Erick

On Thu, Jan 26, 2012 at 10:33 AM, Peter Markey  wrote:
> Hello,
>
> I am building a custom component in Solr and I am trying to construct a
> query to get the latest (based on a date field) DocID using SolrIndexSearcher.
> Below is a short snippet of my code:
>
> SolrIndexSearcher searcher =
> final SchemaField sf = searcher.getSchema().getField(dateField);
> //dateField is one of the fields that contains timestamp of the record
>
> final IndexSchema schema = searcher.getSchema();
>
> Query rangeQ = ((DateField)(sf.getType())).getRangeQuery(null, sf,null,NOW,
> false,true); //NOW is current Date
>
> DocList dateDocs = searcher.getDocList(rangeQ, base, null, 0, 1); //base is
> a set of doc filters to limit search
>
>
>
> Though I get some docs that satisfy the query, my goal is to get the doc
> whose's dateField is closest to the current time. Are there any other
> queries I can employ for this?
>
>
> Thanks a lot for any suggestions.


Re: solr shards

2012-01-27 Thread Erick Erickson
You need to provide the relevant bits of your configuration
file for anyone to help I think In particular the
sharding-relevant configurations.

Best
Erick

On Thu, Jan 26, 2012 at 11:29 AM, ramin  wrote:
> Hello,
>
> I've gone through the list and have not found the answer but if it is a
> repetitive question, my apologies.
>
> I have a 3x shards solr cluster. If i send a query to each of the shards
> individually I get the result with a list of relevant docs. However, if i
> send the query to the main solr server (dispatcher) it only returns the
> value for numFound but there is no list of docs. Since i seem to be the only
> one having this issue, it is probably a misconfiguration for which i
> couldn't find an answer in the documentations. Can someone please help?
>
> Also, the sum of all the individual numFound's seems to not match the
> numFound I get from the main solr server, given that i do not have any
> duplicate on the unique key.
>
> Thanks in advance,
> Ramin
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-shards-tp3691370p3691370.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCell maximum file size

2012-01-27 Thread Erick Erickson
Hmmm, I'd go considerably higher than 2.5G. Problem is you the Tika
processing will need memory, I have no idea how much. Then you'll
have a bunch of stuff for Solr to index it etc.

But I also suspect that this will be about useless to index (assuming
you're talking lots of data, not say just the meta-data associated
with a video or something). How do you provide a meaningful snippet
of such a huge amount of data?

If it *is* say a video or whatever where almost all of the data won't
make it into the index anyway, you're probably better off using
tika directly on the client and only sending the bits to Solr that
you need in the form of a SolrInputDocument (I'm thinking that
you'll be doing this in SolrJ) rather than transmit 2.5G over the
network and throwing almost all of it away

If the entire 2.5G is data to be indexed, you'll probably want to
consider breaking it up into smaller chunks in order to make it
useful.

Best
Erick

On Fri, Jan 27, 2012 at 3:43 AM, Augusto Camarotti
 wrote:
> I'm talking about 2 GB files. It means that I'll have to allocate something 
> bigger than that for the JVM? Something like 2,5 GB?
>
> Thanks,
>
> Augusto Camarotti
>
 Erick Erickson  1/25/2012 1:48 pm >>>
> Mostly it depends on your container settings, quite often that's
> where the limits are. I don't think Solr imposes any restrictions.
>
> What size are we talking about anyway? There are implicit
> issues with how much memory parsing the file requires, but you
> can allocate lots of memory to the JVM to handle that.
>
> Best
> Erick
>
> On Tue, Jan 24, 2012 at 10:24 AM, Augusto Camarotti
>  wrote:
>> Hi everybody
>>
>> Does anyone knows if there is a maximum file size that can be uploaded to 
>> the extractingrequesthandler via http request?
>>
>> Thanks in advance,
>>
>> Augusto Camarotti


RE: Solr Warm-up performance issues

2012-01-27 Thread Peter Velikin
Dan,

I can suggest a solution that should help. VeloBit enables you to add SSDs
to your servers as a cache (SSD will cost you $200, per server should be
enough). Then, assuming a 100MB/s read speed from your SAS disks, you can
read 50GB data into the VeloBit HyperCache cache in about 9 mins (this
happens automatically, all you need to do is add the SSD to your server and
install Velobit one time, which takes 2 minutes). Solr should run much
faster after that. The added benefit of the solution is that you would have
also boosted the steady state performance by 4x.

Let me know if you are interested in trying it out and I'll set you up to
talk with my engineers.


Best regards,

Peter Velikin
VP Online Marketing, VeloBit, Inc.
pe...@velobit.com
tel. 978-263-4800
mob. 617-306-7165

VeloBit provides plug & play SSD caching software that dramatically
accelerates applications at a remarkably low cost. The software installs
seamlessly in less than 10 minutes and automatically tunes for fastest
application speed. Visit www.velobit.com for details.



-Original Message-
From: dan sutton [mailto:danbsut...@gmail.com] 
Sent: Friday, January 27, 2012 9:44 AM
To: solr-user
Subject: Solr Warm-up performance issues

Hi List,

We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB.
Every day we produce a new dataset of 40 GB and have to switch one for the
other.

Once the index switch over has taken place, it takes roughly 30 min for Solr
to reach maximum performance. Are there any hardware or software solutions
to reduce the warm-up time ? We tried warm-up queries but it didn't change
much.

Our hardware specs is:
   * Dell Poweredge 1950
   * 2 x Quad-Core Xeon E5405 (2.00GHz)
   * 48 GB RAM
   * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror

One thing that does seem to take a long time is un-inverting a set of
multivalued fields, are there any optimizations we might be able to use
here?

Thanks for your help.
Dan




Re: Solr Warm-up performance issues

2012-01-27 Thread Tomás Fernández Löbbe
You say warming queries didn't help? How do those look like? Make sure you
facet and sort in all of the fields that your application allow
faceting/sorting. The same with the filters. Uninversion of fields is done
only when you commit, but warming queries should help you here.
Tomás

On Fri, Jan 27, 2012 at 11:44 AM, dan sutton  wrote:

> Hi List,
>
> We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB.
> Every day we produce a new dataset of 40 GB and have to switch one for
> the other.
>
> Once the index switch over has taken place, it takes roughly 30 min for
> Solr
> to reach maximum performance. Are there any hardware or software solutions
> to reduce the warm-up time ? We tried warm-up queries but it didn't change
> much.
>
> Our hardware specs is:
>   * Dell Poweredge 1950
>   * 2 x Quad-Core Xeon E5405 (2.00GHz)
>   * 48 GB RAM
>   * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror
>
> One thing that does seem to take a long time is un-inverting a set of
> multivalued fields, are there any optimizations we might be able to
> use here?
>
> Thanks for your help.
> Dan
>


SolrCloud - issues running with embedded zookeeper ensemble

2012-01-27 Thread Dipti Srivastava
Hi Mark,
Did you get a chance to look into the issues with running the embedded 
Zookeeper ensemble, as per Example C, from the 
http://wiki.apache.org/solr/SolrCloud2

Hi All,
Did anyone else run multiple shards with embedded zk ensemble successfully? If 
so would like some tips on any issues that you came across.

Regards,
Dipti

From: diptis 
mailto:dipti.srivast...@apollogrp.edu>>
Date: Fri, 23 Dec 2011 10:32:52 -0700
To: "markrmil...@gmail.com" 
mailto:markrmil...@gmail.com>>
Subject: Re: Release build or code for SolrCloud

Hi Mark,
There is some issue with specifying localhost vs actual host names for zk. When 
I changed my script to specify the actual hostname (which should be local by 
default) the first, 2nd and 3rd instances came up, that have the embedded zk 
running. Now, I am getting the same exception for the 4th AMI which in NOT part 
of the zookeeper ensemble. I want to zk only on 3 of the 4 instances.

java -Dbootstrap_confdir=./solr/conf –DzkRun="9983>"
-DzkHost=:9983,:9983,:9983 -DnumShards=2 -jar
start.jar

Dipti

From: Mark Miller mailto:markrmil...@gmail.com>>
Reply-To: "markrmil...@gmail.com" 
mailto:markrmil...@gmail.com>>
Date: Fri, 23 Dec 2011 09:34:52 -0700
To: diptis 
mailto:dipti.srivast...@apollogrp.edu>>
Subject: Re: Release build or code for SolrCloud

I'm having trouble getting a quorum up using the built in SolrZkServer as well 
- so i have not been able to replicate this - I'll have to keep digging. Not 
sure if it's due to a ZooKeeper update or what yet.

2011/12/21 Dipti Srivastava 
mailto:dipti.srivast...@apollogrp.edu>>
Hi Mark,
Thanks! So now I am deploying a 4 node cluster on AMI's and the main
instance that bootstraps the config to the zookeeper does not come up I
get an exception as follows. My solrcloud.sh looks like

#!/usr/bin/env bash

cd ..

rm -r -f example/solr/zoo_data
rm -f example/example.log

cd example
#java -DzkRun -DnumShards=2 -DSTOP.PORT=7983 -DSTOP.KEY=key -jar start.jar
1>example.log 2>&1 &
java -Dbootstrap_confdir=./solr/conf -DzkRun
-DzkHost=:9983,:9983,:9983 -DnumShards=2 -jar
start.jar




And when I RUN it

--CLOUD--[ec2-user@ cloud-dev]$ ./solrcloud.sh
2011-12-22 02:18:23.352:INFO::Logging to STDERR via
org.mortbay.log.StdErrLog
2011-12-22 02:18:23.510:INFO::jetty-6.1-SNAPSHOT
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to 'solr/'
Dec 22, 2011 2:18:23 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Dec 22, 2011 2:18:23 AM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: /home/ec2-user/solrcloud/example/solr/solr.xml
Dec 22, 2011 2:18:23 AM org.apache.solr.core.CoreContainer 
INFO: New CoreContainer 1406140084
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to 'solr/'
Dec 22, 2011 2:18:24 AM org.apache.solr.cloud.SolrZkServerProps
getProperties
INFO: Reading configuration from: solr/zoo.cfg
Dec 22, 2011 2:18:24 AM org.apache.solr.cloud.SolrZkServerProps
parseProperties
INFO: Defaulting to majority quorums
Dec 22, 2011 2:18:24 AM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start Solr. Check solr/home property and the logs
java.lang.IllegalArgumentException: port out of range:-1
   at java.net.InetSocketAddress.(InetSocketAddress.java:83)
   at java.net.InetSocketAddress.(InetSocketAddress.java:63)
   at
org.apache.solr.cloud.SolrZkServerProps.setClientPort(SolrZkServer.java:310
)
   at
org.apache.solr.cloud.SolrZkServerProps.getMySeverId(SolrZkServer.java:273)
   at
org.apache.solr.cloud.SolrZkServerProps.parseProperties(SolrZkServer.java:4
50)
   at org.apache.solr.cloud.SolrZkServer.parseConfig(SolrZkServer.java:85)
   at
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:147)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:329)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:282)
   at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.ja

DataImportHandler fails silently

2012-01-27 Thread mathieu lacage
hi,

I have setup my solr installation to run with jetty and I am trying to
import an sqlite database in the solr index. I have setup a jdbc sqlite
driver:


  
  

  
  

  


The schema:
 
   
   

 id
 thread_title


I kickstart the import process with
"wget http://localhost:8080/solr/dataimport?command=full-import";

It seems to work but the following command reports that only 499 documents
were indexed (yes, there are many more documents in my database):

"wget http://localhost:8080/solr/dataimport?command=status";

and the logs seem to imply that the import is finished:

INFO: Read dataimport.properties
27-Jan-2012 19:37:17 org.apache.solr.handler.dataimport.SolrWriter persist
INFO: Wrote last indexed time to dataimport.properties
27-Jan-2012 19:37:17 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:1.52

I am at a loss. What can I do to debug this further ? Help of any kind
would be most welcome.
Mathieu
-- 
Mathieu Lacage 


Re: DataImportHandler fails silently

2012-01-27 Thread mathieu lacage
On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
wrote:

>
> It seems to work but the following command reports that only 499 documents
> were indexed (yes, there are many more documents in my database):
>

And before anyone asks:

1
499
0
2012-01-27 19:37:16
Indexing completed. Added/Updated: 499 documents. Deleted 0
documents.
2012-01-27 19:37:17
2012-01-27 19:37:17
499
0:0:1.52



-- 
Mathieu Lacage 


Re: Multiple Data Directories and 1 SOLR instance

2012-01-27 Thread Nitin Arora
Thanks for the reply guys (Cameron, David and Anderson).

I will go through the details of using multiple cores. 

Thanks
Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3694412.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Validating solr user query

2012-01-27 Thread Dipti Srivastava
Hi Chantal,
Thanks for your response. Isn't the DisMaxQParserPlugin the default
parser, when none is specified? I am using Solr version 3.4.
Thanks,
Dipti

On 1/23/12 3:33 AM, "Chantal Ackermann" 
wrote:

>Hi Dipti,
>
>just to make sure: are you aware of
>
>http://wiki.apache.org/solr/DisMaxQParserPlugin
>
>This will handle the user input in a very conventional and user friendly
>way. You just have to specify on which fields you want it to search.
>With the 'mm' parameter you have a powerfull option to specify how much
>of a search query has to match (more flexible than defining a default
>operator).
>
>Cheers,
>Chantal
>
>On Fri, 2012-01-20 at 23:52 +0100, Dipti Srivastava wrote:
>> Hi All,
>> I ma using HTTP/JSON to search my documents in Solr. Now the client
>>provides the query on which the search is based.
>> What is a good way to validate the query string provided by the user.
>>
>> On the other hand, if I want the user to build this query using some
>>Solr api instead of preparing a lucene query string which API can I use
>>for this?
>> I looked into
>> SolrQuery in SolrJ but it does not appear to have a way to specify the
>>more complex queries with the boolean operators and operators such as
>>~,+,- etc.
>>
>> Basically, I am trying to avoid running into bad query strings built by
>>the caller.
>>
>> Thanks!
>> Dipti
>>
>> 
>> This message is private and confidential. If you have received it in
>>error, please notify the sender and remove it from your system.
>>
>
>


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.




How to promote or configure search for a specific keyword?

2012-01-27 Thread slapierre
Hello,

this is probably a very basic question, but I haven't found an answer in my
searches.

My search engine runs fine, but I want it to return only one hit if a user
searches for a specific search string. I.e. user searches for "xyz" and,
instead of being presented hundreds of hits, is only shown one
pre-configured result.

Is there a config file somewhere to set such associations or specific
keyword-based rules?

I'm running a solr engine on a Drupal site.

thx

Sebastien

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-promote-or-configure-search-for-a-specific-keyword-tp3694522p3694522.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to promote or configure search for a specific keyword?

2012-01-27 Thread Ahmet Arslan
> My search engine runs fine, but I want it to return only one
> hit if a user
> searches for a specific search string. I.e. user searches
> for "xyz" and,
> instead of being presented hundreds of hits, is only shown
> one
> pre-configured result.
> 
> Is there a config file somewhere to set such associations or
> specific
> keyword-based rules?

http://wiki.apache.org/solr/QueryElevationComponent is the closest thing came 
to my mind.


Complex query, need filtering after query not before

2012-01-27 Thread Jay Hill
I have a project where we need to search 1B docs and still have results <
700ms. The problem is, we are using geofiltering and that is happening *
before* the queries, so we have to geofilter on the 1B docs to restrict our
set of docs first, and then do the query on a name field. But it seems that
it would be better and faster to run the main query first, and only then
filter out that subset of docs by geo. Here is what a typical query looks
like:

?shards=
&q={!boost
b=sum(recip(geodist(geo_lat_long,38.2493581,-122.0399663),1,1,1))}(given_name:Barack
OR given_name_exact:Barack^4.0) AND family_name:Obama
&fq={!geofilt pt=38.2493581,-122.0399663 sfield=geo_lat_long d=120}
&fq=(-source:somedatasource)
&rows=4
QTime=1040

I've looked at the "cache=false" param, and the "cost=" param, but that's
not going to help much because we still have to do the filtering. (We
*will* use
"cache=false" to avoid the overhead of caching queries that will very
rarely be the same.)

Is there any way to indicate a filter query should happen *after* the other
results? The other fq on source restricts the docset somewhat, but
different variations don't eliminate a high number of docs, so we could use
the "cost" param to run the fq on source before the fq on geo, but it would
only help very minimally in some cases.


Thanks,
-Jay


Re: solr shards

2012-01-27 Thread ramin
Sure. So it is really simple. Following the Solr example for setting up two
shards and pushing some xml docs to each one and then doing a distributed
query (http://wiki.apache.org/solr/DistributedSearch), it works perfectly.
Now in my case the indices are being built outside of Solr. So basically I
create three sets of indices through Lucene API's. And at this point, I
change the schema.xml and define the fields I have in these new indices. I
launch three Solr apps (say on ports 7573, 7574, 7575) and host these
indices under each of the instances. Now if I do a search on any of the Solr
apps separately:

curl
'http://localhost:757[345]/solr/select/?distrib=true&indent=on&q=content:solar'

I get results:


  0
  59
  
on
true
content:solar
  


  
   ...
  
  
   ...
  
  ...



But when I issue the following GET:

curl
'http://localhost:7575/solr/select/?shards=localhost:7573/solr,localhost:7574/solr,localhost:7575/solr&distrib=true&indent=on&q=content:solar'

This is what I get:




  0
  235
  
content:solar
on
localhost:7573/solr,localhost:7574/solr,localhost:7575/solr
true
  




As you can see the numFound says that there are documents but the documents
are not part of the response.

Now if I add "group.main=true&group=true&group.field=id" to the query
string, then I get an NPE:


HTTP ERROR 500

Problem accessing /solr/select/. Reason:
null

java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:44)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:101)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
/Powered by Jetty:///  

So, the only thing that is different between the Solr sample and mine is
that the indices have not been built through Solr itself but I believe that
is a moot point anyway (I might be wrong here). But the fact that the
individual queries to each instance do return the answer while the query
with shards does not is a mystery to me.

Thanks for the help.
Ramin


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-shards-tp3691370p3694787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Validating solr user query

2012-01-27 Thread Erick Erickson
No. Look in solrconfig.xml for the
 and you'll see
that it's not. Appending &debugQuery=on will also
show this.

Best
Erick

On Fri, Jan 27, 2012 at 12:18 PM, Dipti Srivastava
 wrote:
> Hi Chantal,
> Thanks for your response. Isn't the DisMaxQParserPlugin the default
> parser, when none is specified? I am using Solr version 3.4.
> Thanks,
> Dipti
>
> On 1/23/12 3:33 AM, "Chantal Ackermann" 
> wrote:
>
>>Hi Dipti,
>>
>>just to make sure: are you aware of
>>
>>http://wiki.apache.org/solr/DisMaxQParserPlugin
>>
>>This will handle the user input in a very conventional and user friendly
>>way. You just have to specify on which fields you want it to search.
>>With the 'mm' parameter you have a powerfull option to specify how much
>>of a search query has to match (more flexible than defining a default
>>operator).
>>
>>Cheers,
>>Chantal
>>
>>On Fri, 2012-01-20 at 23:52 +0100, Dipti Srivastava wrote:
>>> Hi All,
>>> I ma using HTTP/JSON to search my documents in Solr. Now the client
>>>provides the query on which the search is based.
>>> What is a good way to validate the query string provided by the user.
>>>
>>> On the other hand, if I want the user to build this query using some
>>>Solr api instead of preparing a lucene query string which API can I use
>>>for this?
>>> I looked into
>>> SolrQuery in SolrJ but it does not appear to have a way to specify the
>>>more complex queries with the boolean operators and operators such as
>>>~,+,- etc.
>>>
>>> Basically, I am trying to avoid running into bad query strings built by
>>>the caller.
>>>
>>> Thanks!
>>> Dipti
>>>
>>> 
>>> This message is private and confidential. If you have received it in
>>>error, please notify the sender and remove it from your system.
>>>
>>
>>
>
>
> This message is private and confidential. If you have received it in error, 
> please notify the sender and remove it from your system.
>
>


Re: DataImportHandler fails silently

2012-01-27 Thread Lance Norskog
Do all of the documents have unique id fields?

On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
 wrote:
> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
> wrote:
>
>>
>> It seems to work but the following command reports that only 499 documents
>> were indexed (yes, there are many more documents in my database):
>>
>
> And before anyone asks:
> 
> 1
> 499
> 0
> 2012-01-27 19:37:16
> Indexing completed. Added/Updated: 499 documents. Deleted 0
> documents.
> 2012-01-27 19:37:17
> 2012-01-27 19:37:17
> 499
> 0:0:1.52
> 
>
>
> --
> Mathieu Lacage 



-- 
Lance Norskog
goks...@gmail.com


Re: JSON response truncated

2012-01-27 Thread Lance Norskog
Are there any exceptions in the Solr log? Is it possible the JSON
exporter is choking when it wants to escape gunky characters in the
final text?

On Wed, Jan 25, 2012 at 1:40 PM, Erick Erickson  wrote:
> Two things:
> 1> I suspect it's your servelet container rather than Solr since your JSON 
> isn't
>     well formatted. I have no clue where to set that up, but that's
> where I'd look.
> 2> A side note. You may run into the default of 10,000 tokens that are 
> indexed,
>     see the  in solrconfig.xml. This is NOT what you're 
> current
>     problem is since if you exceed this limit you should still get
> well-formatted XML
>     But if you're sending large documents back and forth you might
> see turncated
>     *fields*
>
> Best
> Erick
>
> On Wed, Jan 25, 2012 at 1:18 PM, Sean Adams-Hiett
>  wrote:
>> Summary of Issue:
>> When specifying output as JSON, I get a truncated response.
>>
>>
>> Details:
>> The JSON output I get is truncated, causing errors for any parser that
>> requires well-formed JSON. I have tried spot checking at a dozen different
>> records by adjusting the "start=" attribute. I am using Solr 3.5 running as
>> a Tomcat webapp on a portable hard drive. When getting the response as XML,
>> it appears to work fine. I have provided some examples of the query I am
>> using, as well as JSON and XML responses below.
>>
>> I am definitely new to working directly with Solr, although I have used it
>> via Drupal for years and I have a pretty solid understanding of how it
>> works at a high level. My best guess is that there is some setting that I
>> am not aware of in schema.xml or solrconfig.xml that is causing this
>> outcome. Any help in figuring this out would be greatly appreciated.
>>
>>
>> Example query:
>> http://localhost:8080/solr/rolfe/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=json&explainOther=&hl.fl=
>>
>>
>> Example JSON response:
>>
>> {
>>  "responseHeader":{
>>    "status":0,
>>    "QTime":1},
>>  "response":{"numFound":43678,"start":0,"maxScore":1.0,"docs":[
>>      {
>>        "contents":"idlotoer patriota department of the rolfe
>> arrowlocal   itemsrobi anderson of denver, colo., ur-rived in plover
>> the latter part of tho week for a short visit with relatives and
>> iri'-.'idst <; owen returned monday from hampton, iowa. mr owen
>> reports that .mrs owen w.is successfully operated upon in the hospital
>> there, ;md is now w.'ll on the road to recovery.real estate loans-wo
>> are quoting low rotes on real estate ioudb. if you are expecting to
>> make u loan on your farm this year, it will pay you to see us.wo
>> solicit your banking uusl--ness on the basis of prompt, efficient
>> service to you.peoples saving banki5st. 1883 the community
>> dank:~:~:~>m~:~:~x\"xkk-:->m~:~>®x\">¯ 11the variety of our win-ned
>> goods should appeal lo œ i }ou. especially at this sea- ||[ son
>> ('aimed   vegetables, canned fruits, meals, soups œ and  so  on.
>> make  your housework  lighter during œ this season by being a con- œ
>> slant \\isilor lo our canned goods department| saturday specials19c
>> 19c 19c 19c 19c 19ci t11 i5:i!vlib. of pollock dakingpowder for ... 1
>> cnn white seal lllnc-kitaspbenlea for \\i i.h oysters forsaturday
>> special þ' cans hominy forsaturday special i pkgs. com starch.i willi
>> spoon) i large can plm applesaturday spi-riultwo kxtk\\ specials white
>> (jrtipcse\\tra special pem'lii'k-kxtra .special15c 15cfred ehler's|
>> the right place to tradem. i helvlg spent last sunday in hampton with
>> his daughter, mikk ivis, who is in a hospital fn that city, recovering
>> from the effects of ® recent operation. martin reports that she is
>> getting rlong nicely and will probably be home some, time the latter
>> part of the weekmr and mrs. f j sarhv were holfi visitors last
>> saturday and sundaythe m y ¯. club met with mrs l. n. moody last
>> thursday afternoonmr and mrs. chas. england, mid mrs. england's
>> father. mr brle.kson. of albert city, mr. and mrs. enoch erlckson and
>> children of marathon, and mr. and mrs. a b. cobbs of rolfe spent
>> sunday at the a. w. hess home in tjii8 city.jack (j ton on was
>> recently thrown from a horse and suffered a broken arm.if you want to
>> buy a ®ood corn planter, buy a. \"cbbc\" and pet the rest. see j. w.
>> mangun, plover, iowa.p. j. nacke has purchased a buick touring car,
>> and is now busy learning to operate the same.miss freda gcmbler,
>> daughter of pred oembler, was taken suddenly ill while in school one
>> day last week. a physician was called and after examination pronounced
>> it a case of scarlet fevermr sherlock of bmmotsburg was a business
>> visitor here tuesday,1 h pollock has been making an improvement on bis
>> farm residence by tho addition of a largo porch.ii .1. watts of des
>> molncs spent sunday at the home of bis brother, chas. 15. watts.p 11.
>> henderson has sold the building now occupied by the harness shop to
>> geo. jcffriub.a. j eggspuehler has rctir

SolrCloud on Trunk

2012-01-27 Thread Jamie Johnson
I just want to verify some of the features in regards to SolrCloud
that are now on Trunk

documents added to the cluster are automatically distributed amongst
the available shards (I had seen that Yonik had ported the Murmur
hash, but I didn't see that on trunk, what is being used and where can
I look at it?)
documents deletes/updates are automatically forwarded to the correct
shard no matter which shard/replica they are originally sent to

Also on the latest trunk when I run the embedded zk as is describe
here (http://wiki.apache.org/solr/SolrCloud2) I keep getting the
following information

12/01/27 23:44:38 INFO server.NIOServerCnxn: Accepted socket
connection from /fe80:0:0:0:0:0:0:1%1:57549
12/01/27 23:44:38 INFO server.NIOServerCnxn: Refusing session request
for client /fe80:0:0:0:0:0:0:1%1:57549 as it has seen zxid 0x179 our
last zxid is 0x10f client must try another server
12/01/27 23:44:38 INFO server.NIOServerCnxn: Closed socket connection
for client /fe80:0:0:0:0:0:0:1%1:57549 (no session established for
client)
12/01/27 23:44:38 INFO server.NIOServerCnxn: Accepted socket
connection from /127.0.0.1:57550
12/01/27 23:44:38 INFO server.NIOServerCnxn: Closed socket connection
for client /127.0.0.1:57550 (no session established for client)
12/01/27 23:44:39 INFO server.NIOServerCnxn: Accepted socket
connection from /0:0:0:0:0:0:0:1%0:57551

I don't actually run with the embedded ZK in production so I am not
all that worried about this, but figured it was worth figuring out
what was happening.

As always awesome work.


Re: Solr Warm-up performance issues

2012-01-27 Thread Otis Gospodnetic
Hi Dan,

I think this may be your problem:

> Every day we produce a new dataset of 40 GB and have to switch one for the 
> othe

If you really replace an index with a new index one a day, you throw away all 
the hard work the OS has been doing to cache hot parts of your index in 
memory.  It takes it 30 minutes apparently in your cache to re-cache things.  
Check the link in my signature.   If you use that and if I'm right about this, 
you will see a big spike in Disk Reads after you switch to the new index.  You 
want to minimize that spike.


So see if you can avoid replacing the whole index and if that is really not 
doable, you can try warmup queries, but of course while you run them, if they 
are expensive, they will hurt system performance somewhat.

Otis 


Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html 




>
> From: dan sutton 
>To: solr-user  
>Sent: Friday, January 27, 2012 9:44 AM
>Subject: Solr Warm-up performance issues
> 
>Hi List,
>
>We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB.
>Every day we produce a new dataset of 40 GB and have to switch one for
>the other.
>
>Once the index switch over has taken place, it takes roughly 30 min for Solr
>to reach maximum performance. Are there any hardware or software solutions
>to reduce the warm-up time ? We tried warm-up queries but it didn't change
>much.
>
>Our hardware specs is:
>   * Dell Poweredge 1950
>   * 2 x Quad-Core Xeon E5405 (2.00GHz)
>   * 48 GB RAM
>   * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror
>
>One thing that does seem to take a long time is un-inverting a set of
>multivalued fields, are there any optimizations we might be able to
>use here?
>
>Thanks for your help.
>Dan
>
>
>

querying multivalue fields

2012-01-27 Thread Travis Low
If a query matches one or more values of a multivalued field, is it
possible to get the indexes back for WHICH values?  For example, for a
document with a multivalue field having ["red", "redder", "reddest",
"yellow", "blue"] as its value, if "red" is the query, could we know that
values 0,1, and 2 matched?

Against all hope, if that's "yes", then the next question is, would the
values be listed in the order they were specified when adding the document?

The idea here is that each document may have a variable number of multiple
external (e.g. Word) documents associated with it, and for any match, we
not only want to provide a link to the Solr document, but also, be able to
tell the user which external documents matched.  The contents of these
documents would populate the multivalued field (a very big field).

If that can't be done, I think what we'll do is do some kind of prefixed
hash of the document name and embed that in each mutlivalued field value
(each document content).  The prefix would contain (or be another hash of)
the document id.  Then we could find which documents matched, could we
not?

Sorry if this is a dumb question.  I've asked about this before, and
received some *very* useful input (thanks!) but nothing that has yet lead
me to a robust solution for indexing a set of records along with their
associated documents and being able to identify the matching record AND the
matching document(s).

Thanks for your help!

cheers,
Travis

-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* 

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Re: DataImportHandler fails silently

2012-01-27 Thread mathieu lacage

Le 28 janv. 2012 à 05:17, Lance Norskog  a écrit :

> Do all of the documents have unique id fields?

yes.


> 
> On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
>  wrote:
>> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
>> wrote:
>> 
>>> 
>>> It seems to work but the following command reports that only 499 documents
>>> were indexed (yes, there are many more documents in my database):
>>> 
>> 
>> And before anyone asks:
>> 
>> 1
>> 499
>> 0
>> 2012-01-27 19:37:16
>> Indexing completed. Added/Updated: 499 documents. Deleted 0
>> documents.
>> 2012-01-27 19:37:17
>> 2012-01-27 19:37:17
>> 499
>> 0:0:1.52
>> 
>> 
>> 
>> --
>> Mathieu Lacage 
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com