Re: Solr & JVM performance issue after 2 days

2010-12-12 Thread Hamid Vahedi
Hi 

Thanks for suggestion.
I do following changes in solrconfig.xml :

256

false

1


  2000
  30

simple







after that, i see one server works fine (that includes 3 cores for 3 languages)
but another server (3 cores for 3 other languages) has problem after 52 hours. 


I will plan to do your suggestion. i hope it helps me 

any better idea would be appreciated

Kind Regards
Hamid




From: Peter Karich 
To: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 8:26:01 PM
Subject: Re: Solr & JVM performance issue after 2 days

  Am 07.12.2010 13:01, schrieb Hamid Vahedi:
> Hi Peter
>
> Thanks a lot for reply. Actually I need real time indexing and query at the 
>same
> time.
>
> Here  told:
> "You  can run multiple Solr instances in separate JVMs, with both having  
their
> solr.xml configured to use the same index folder."
>
> Now
> Q1: I'm using Tomcat now, Could you please tell me how to have separate JVMs
> with Tomcat?

Are you sure you don't want two servers and you really want real time?
Slow down indexing + less cache should do the trick I think.

I wouldn't recommend indexing AND querying on the same machine unless 
you have a lot RAM and CPU.

you could even deploy two indices into one tomcat... the read only index 
refers to the data dir via:
/path/to/index/data
then issue an empty (!!) commit to the read only index every minute. so 
that the read only index sees the changes from the feeding index.
(again: see the wikipage!)

setting up two tomcats on one server I woudn't recommend too, but its 
possible via copying tomcat into, say tomcat2
and change the shutdown and 8080 port in the tomcat2/conf/server.xml

> Q2:What should  I set for LockType?

I'm using simple, but native should also be ok.

> Thanks in advanced
>
>
>
>
> 
> From: Peter Karich
> To: solr-user@lucene.apache.org
> Sent: Tue, December 7, 2010 2:06:49 PM
> Subject: Re: Solr&  JVM performance issue after 2 days
>
>Hi Hamid,
>
> try to avoid autowarming when indexing (see solrconfig.xml:
> caches->autowarm + newSearcher + maxSearcher).
> If you need to query and indexing at the same time,
> then probably you'll need one read-only core and one for writing with no
> autowarming configured.
> See: http://wiki.apache.org/solr/NearRealtimeSearchTuning
>
> Or replicate from the indexing-core to a different core with different
> settings.
>
> Regards,
> Peter.
>
>
>> Hi,
>>
>> I am using multi-core tomcat on 2 servers. 3 language per server.
>>
>> I am adding documents to solr up to 200 doc/sec. when updating process is
>> started, every thing is fine (update performance is max 200 ms/doc. with 
about
>> 800 MB memory used with minimal cpu usage).
>>
>> After 15-17 hours it's became so slow  (more that 900 sec for update), used
>> heap
>> memory is about 15GB, GC time is became more than one hour.
>>
>>
>> I don't know what's wrong with it? Can anyone describe me what's the problem?
>> Is that came from Solr or JVM?
>>
>> Note: when i stop updating, CPU busy within 15-20 min. and when start 
updating
>> again i have same issue. but when stop tomcat service and start it again, all
>> thing is OK.
>>
>> I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr 1.4.1
>>
>> thanks in advanced
>> Hamid
>


-- 
http://jetwick.com twitter search prototype


  

Re: Solr & JVM performance issue after 2 days

2010-12-12 Thread Erick Erickson
Several things:
1> Your ramBufferSizeMB is probably too large. 128M is often the
 point of diminishing returns. Your situation may be different...
2> Your logs will show you what is happening with your autocommit
   properties. If you're really sending a 200 docs/second to your index
   your commits are happening every 10 seconds. Still too fast..
3> I'd really, really, really recommend that you use a master/slave
configuration where the slaves are your searchers and your
master is the indexer. Really. You're really hammering your machine.
If you separate the machines, you can turn off all of the autowarming
etc on the indexer and control the frequency of slave updates. Really
consider this.
4> You haven't given us any idea of the total index size.
5> I doubt separate JVMs are useful here. You're still operating on the
 same underlying hardware. Multiple cores are preferable to
 multiple JVMs almost always.

Best
Erick

On Sun, Dec 12, 2010 at 8:26 AM, Hamid Vahedi  wrote:

> Hi
>
> Thanks for suggestion.
> I do following changes in solrconfig.xml :
>
> 256
>
> false
>
> 1
>
> 
>  2000
>  30
> 
> simple
>
>   class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>   class="solr.FastLRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>   class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
> after that, i see one server works fine (that includes 3 cores for 3
> languages)
> but another server (3 cores for 3 other languages) has problem after 52
> hours.
>
>
> I will plan to do your suggestion. i hope it helps me
>
> any better idea would be appreciated
>
> Kind Regards
> Hamid
>
>
>
> 
> From: Peter Karich 
> To: solr-user@lucene.apache.org
> Sent: Tue, December 7, 2010 8:26:01 PM
> Subject: Re: Solr & JVM performance issue after 2 days
>
>  Am 07.12.2010 13:01, schrieb Hamid Vahedi:
> > Hi Peter
> >
> > Thanks a lot for reply. Actually I need real time indexing and query at
> the
> >same
> > time.
> >
> > Here  told:
> > "You  can run multiple Solr instances in separate JVMs, with both having
> their
> > solr.xml configured to use the same index folder."
> >
> > Now
> > Q1: I'm using Tomcat now, Could you please tell me how to have separate
> JVMs
> > with Tomcat?
>
> Are you sure you don't want two servers and you really want real time?
> Slow down indexing + less cache should do the trick I think.
>
> I wouldn't recommend indexing AND querying on the same machine unless
> you have a lot RAM and CPU.
>
> you could even deploy two indices into one tomcat... the read only index
> refers to the data dir via:
> /path/to/index/data
> then issue an empty (!!) commit to the read only index every minute. so
> that the read only index sees the changes from the feeding index.
> (again: see the wikipage!)
>
> setting up two tomcats on one server I woudn't recommend too, but its
> possible via copying tomcat into, say tomcat2
> and change the shutdown and 8080 port in the tomcat2/conf/server.xml
>
> > Q2:What should  I set for LockType?
>
> I'm using simple, but native should also be ok.
>
> > Thanks in advanced
> >
> >
> >
> >
> > 
> > From: Peter Karich
> > To: solr-user@lucene.apache.org
> > Sent: Tue, December 7, 2010 2:06:49 PM
> > Subject: Re: Solr&  JVM performance issue after 2 days
> >
> >Hi Hamid,
> >
> > try to avoid autowarming when indexing (see solrconfig.xml:
> > caches->autowarm + newSearcher + maxSearcher).
> > If you need to query and indexing at the same time,
> > then probably you'll need one read-only core and one for writing with no
> > autowarming configured.
> > See: http://wiki.apache.org/solr/NearRealtimeSearchTuning
> >
> > Or replicate from the indexing-core to a different core with different
> > settings.
> >
> > Regards,
> > Peter.
> >
> >
> >> Hi,
> >>
> >> I am using multi-core tomcat on 2 servers. 3 language per server.
> >>
> >> I am adding documents to solr up to 200 doc/sec. when updating process
> is
> >> started, every thing is fine (update performance is max 200 ms/doc. with
> about
> >> 800 MB memory used with minimal cpu usage).
> >>
> >> After 15-17 hours it's became so slow  (more that 900 sec for update),
> used
> >> heap
> >> memory is about 15GB, GC time is became more than one hour.
> >>
> >>
> >> I don't know what's wrong with it? Can anyone describe me what's the
> problem?
> >> Is that came from Solr or JVM?
> >>
> >> Note: when i stop updating, CPU busy within 15-20 min. and when start
> updating
> >> again i have same issue. but when stop tomcat service and start it
> again, all
> >> thing is OK.
> >>
> >> I am using tomcat 6 with 18 GB memory on windows 2008 server x64. Solr
> 1.4.1
> >>
> >> thanks in advanced
> >> Hamid
> >
>
>
> --
> http://jetwick.com twitter search prototype
>
>
>
>


Re: Solr & JVM performance issue after 2 days

2010-12-12 Thread Hamid Vahedi
Dear Erick 


thanks for advice

Index size on all cores is 35 GB for 35 million doc (for 3 week indexing data) 

Kind Regards,
Hamid



From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 5:24:18 PM
Subject: Re: Solr & JVM performance issue after 2 days

Several things:
1> Your ramBufferSizeMB is probably too large. 128M is often the
 point of diminishing returns. Your situation may be different...
2> Your logs will show you what is happening with your autocommit
   properties. If you're really sending a 200 docs/second to your index
   your commits are happening every 10 seconds. Still too fast..
3> I'd really, really, really recommend that you use a master/slave
configuration where the slaves are your searchers and your
master is the indexer. Really. You're really hammering your machine.
If you separate the machines, you can turn off all of the autowarming
etc on the indexer and control the frequency of slave updates. Really
consider this.
4> You haven't given us any idea of the total index size.
5> I doubt separate JVMs are useful here. You're still operating on the
 same underlying hardware. Multiple cores are preferable to
 multiple JVMs almost always.

Best
Erick

On Sun, Dec 12, 2010 at 8:26 AM, Hamid Vahedi  wrote:

> Hi
>
> Thanks for suggestion.
> I do following changes in solrconfig.xml :
>
> 256
>
> false
>
> 1
>
> 
>  2000
>  30
> 
> simple
>
>   class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>   class="solr.FastLRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>   class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
> after that, i see one server works fine (that includes 3 cores for 3
> languages)
> but another server (3 cores for 3 other languages) has problem after 52
> hours.
>
>
> I will plan to do your suggestion. i hope it helps me
>
> any better idea would be appreciated
>
> Kind Regards
> Hamid
>
>
>
> 
> From: Peter Karich 
> To: solr-user@lucene.apache.org
> Sent: Tue, December 7, 2010 8:26:01 PM
> Subject: Re: Solr & JVM performance issue after 2 days
>
>  Am 07.12.2010 13:01, schrieb Hamid Vahedi:
> > Hi Peter
> >
> > Thanks a lot for reply. Actually I need real time indexing and query at
> the
> >same
> > time.
> >
> > Here  told:
> > "You  can run multiple Solr instances in separate JVMs, with both having
> their
> > solr.xml configured to use the same index folder."
> >
> > Now
> > Q1: I'm using Tomcat now, Could you please tell me how to have separate
> JVMs
> > with Tomcat?
>
> Are you sure you don't want two servers and you really want real time?
> Slow down indexing + less cache should do the trick I think.
>
> I wouldn't recommend indexing AND querying on the same machine unless
> you have a lot RAM and CPU.
>
> you could even deploy two indices into one tomcat... the read only index
> refers to the data dir via:
> /path/to/index/data
> then issue an empty (!!) commit to the read only index every minute. so
> that the read only index sees the changes from the feeding index.
> (again: see the wikipage!)
>
> setting up two tomcats on one server I woudn't recommend too, but its
> possible via copying tomcat into, say tomcat2
> and change the shutdown and 8080 port in the tomcat2/conf/server.xml
>
> > Q2:What should  I set for LockType?
>
> I'm using simple, but native should also be ok.
>
> > Thanks in advanced
> >
> >
> >
> >
> > 
> > From: Peter Karich
> > To: solr-user@lucene.apache.org
> > Sent: Tue, December 7, 2010 2:06:49 PM
> > Subject: Re: Solr&  JVM performance issue after 2 days
> >
> >Hi Hamid,
> >
> > try to avoid autowarming when indexing (see solrconfig.xml:
> > caches->autowarm + newSearcher + maxSearcher).
> > If you need to query and indexing at the same time,
> > then probably you'll need one read-only core and one for writing with no
> > autowarming configured.
> > See: http://wiki.apache.org/solr/NearRealtimeSearchTuning
> >
> > Or replicate from the indexing-core to a different core with different
> > settings.
> >
> > Regards,
> > Peter.
> >
> >
> >> Hi,
> >>
> >> I am using multi-core tomcat on 2 servers. 3 language per server.
> >>
> >> I am adding documents to solr up to 200 doc/sec. when updating process
> is
> >> started, every thing is fine (update performance is max 200 ms/doc. with
> about
> >> 800 MB memory used with minimal cpu usage).
> >>
> >> After 15-17 hours it's became so slow  (more that 900 sec for update),
> used
> >> heap
> >> memory is about 15GB, GC time is became more than one hour.
> >>
> >>
> >> I don't know what's wrong with it? Can anyone describe me what's the
> problem?
> >> Is that came from Solr or JVM?
> >>
> >> Note: when i stop updating, CPU busy within 15-20 min. and when start
> updating
> >> again i have same issu

Re: SOLR geospatial

2010-12-12 Thread Adam Estrada
I am particularly interested in storing and querying polygons. That sort of
thing looks like its on their roadmap so does anyone know what the status is
on that? Also, integration with JTS would make this a core component of any
GIS. Again, anyone know what the status is on that?

*What’s on the roadmap of future features?*

Here are some of the features and henhancements we're planning for SSP:

   -

   Performance improvements for larger data sets
   -

   Fixing of known bugs
   -

   Distance facets: Allowing Solr users to be able to filter their results
   based on the calculated distances.
   -

   Search with regular polygons, and groups of shapes
   -

   Integration with JTS
   -

   Highly optimized distance calculation algorithms
   -

   Ranking results by distance
   -

   3D dimension search


Adam

On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
wrote:

> That smells like: http://www.jteam.nl/news/spatialsolr.html
>
> > My partner is using a publicly available plugin for GeoSpatial. It is
> used
> > both during indexing and during search. It forms some kind of gridding
> > system and puts 10 fields per row related to that. Doing a Radius search
> > (vs a bounding box search which is faster in almost all cases in all
> > GeoSpatial query systems) seems pretty fast. GeoSpatial was our project's
> > constraint. We've moved past that now.
> >
> > Did I mention that it returns distance from the center of the radius
> based
> > on units supplied in the query?
> >
> > I would tell you what the plugin is, but in our division of labor, I have
> > kept that out of my short term memory. You can contact him at:
> > Danilo Unite ;
> >
> > Dennis Gearon
> >
> >
> > Signature Warning
> > 
> > It is always a good idea to learn from your own mistakes. It is usually a
> > better idea to learn from others’ mistakes, so you do not have to make
> > them yourself. from
> > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> >
> >
> > EARTH has a Right To Life,
> > otherwise we all die.
> >
> >
> >
> > - Original Message 
> > From: George Anthony 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, December 10, 2010 9:23:18 AM
> > Subject: SOLR geospatial
> >
> > In looking at some of the docs support for geospatial search.
> >
> > I see this functionality is mostly scheduled for upcoming release 4.0
> (with
> > some
> >
> > playing around with backported code).
> >
> >
> > I note the support for the bounding box filter, but will "bounding box"
> be
> > one of the supported *data* types for use with this filter?  For example,
> > if my lat/long data describes the "footprint" of a map, I'm curious if
> > that type of coordinate data can be used by the bounding box filter (or
> in
> > any other way for similar limiting/filtering capability). I see it can
> > work with point type data but curious about functionality with bounding
> > box type data (in contrast to simple point lat/long data).
> >
> > Thanks,
> > George
>


Re: SOLR geospatial

2010-12-12 Thread Erick Erickson
By and large, spatial solr is being replaced by geospatial, see:
http://wiki.apache.org/solr/SpatialSearch. I don't think the old
spatial contrib is still included in the trunk or 3.x code bases, but
I could be wrong

That said, I don't know whether what you want is on the roadmap
there either. Here's a place to start if you want to see the JIRA
discussions: https://issues.apache.org/jira/browse/SOLR-1568

Best
Erick


On Sun, Dec 12, 2010 at 11:23 AM, Adam Estrada wrote:

> I am particularly interested in storing and querying polygons. That sort of
> thing looks like its on their roadmap so does anyone know what the status
> is
> on that? Also, integration with JTS would make this a core component of any
> GIS. Again, anyone know what the status is on that?
>
> *What’s on the roadmap of future features?*
>
> Here are some of the features and henhancements we're planning for SSP:
>
>   -
>
>   Performance improvements for larger data sets
>   -
>
>   Fixing of known bugs
>   -
>
>   Distance facets: Allowing Solr users to be able to filter their results
>   based on the calculated distances.
>   -
>
>   Search with regular polygons, and groups of shapes
>   -
>
>   Integration with JTS
>   -
>
>   Highly optimized distance calculation algorithms
>   -
>
>   Ranking results by distance
>   -
>
>   3D dimension search
>
>
> Adam
>
> On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
> wrote:
>
> > That smells like: http://www.jteam.nl/news/spatialsolr.html
> >
> > > My partner is using a publicly available plugin for GeoSpatial. It is
> > used
> > > both during indexing and during search. It forms some kind of gridding
> > > system and puts 10 fields per row related to that. Doing a Radius
> search
> > > (vs a bounding box search which is faster in almost all cases in all
> > > GeoSpatial query systems) seems pretty fast. GeoSpatial was our
> project's
> > > constraint. We've moved past that now.
> > >
> > > Did I mention that it returns distance from the center of the radius
> > based
> > > on units supplied in the query?
> > >
> > > I would tell you what the plugin is, but in our division of labor, I
> have
> > > kept that out of my short term memory. You can contact him at:
> > > Danilo Unite ;
> > >
> > > Dennis Gearon
> > >
> > >
> > > Signature Warning
> > > 
> > > It is always a good idea to learn from your own mistakes. It is usually
> a
> > > better idea to learn from others’ mistakes, so you do not have to make
> > > them yourself. from
> > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > >
> > >
> > > EARTH has a Right To Life,
> > > otherwise we all die.
> > >
> > >
> > >
> > > - Original Message 
> > > From: George Anthony 
> > > To: solr-user@lucene.apache.org
> > > Sent: Fri, December 10, 2010 9:23:18 AM
> > > Subject: SOLR geospatial
> > >
> > > In looking at some of the docs support for geospatial search.
> > >
> > > I see this functionality is mostly scheduled for upcoming release 4.0
> > (with
> > > some
> > >
> > > playing around with backported code).
> > >
> > >
> > > I note the support for the bounding box filter, but will "bounding box"
> > be
> > > one of the supported *data* types for use with this filter?  For
> example,
> > > if my lat/long data describes the "footprint" of a map, I'm curious if
> > > that type of coordinate data can be used by the bounding box filter (or
> > in
> > > any other way for similar limiting/filtering capability). I see it can
> > > work with point type data but curious about functionality with bounding
> > > box type data (in contrast to simple point lat/long data).
> > >
> > > Thanks,
> > > George
> >
>


Re: SOLR geospatial

2010-12-12 Thread Dennis Gearon
We're in Alpha, heading to Alpha 2. Our requirements are simple: radius 
searching, and distance from center. Solr Spatial works and is current. 
GeoSpatial is almost there, but we're going to wait until it's released to 
spend 
time with it. We have other tasks to work on and don't want to be part of the 
debugging process of any project right now.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 11:18:03 AM
Subject: Re: SOLR geospatial

By and large, spatial solr is being replaced by geospatial, see:
http://wiki.apache.org/solr/SpatialSearch. I don't think the old
spatial contrib is still included in the trunk or 3.x code bases, but
I could be wrong

That said, I don't know whether what you want is on the roadmap
there either. Here's a place to start if you want to see the JIRA
discussions: https://issues.apache.org/jira/browse/SOLR-1568

Best
Erick


On Sun, Dec 12, 2010 at 11:23 AM, Adam Estrada wrote:

> I am particularly interested in storing and querying polygons. That sort of
> thing looks like its on their roadmap so does anyone know what the status
> is
> on that? Also, integration with JTS would make this a core component of any
> GIS. Again, anyone know what the status is on that?
>
> *What’s on the roadmap of future features?*
>
> Here are some of the features and henhancements we're planning for SSP:
>
>   -
>
>   Performance improvements for larger data sets
>   -
>
>   Fixing of known bugs
>   -
>
>   Distance facets: Allowing Solr users to be able to filter their results
>   based on the calculated distances.
>   -
>
>   Search with regular polygons, and groups of shapes
>   -
>
>   Integration with JTS
>   -
>
>   Highly optimized distance calculation algorithms
>   -
>
>   Ranking results by distance
>   -
>
>   3D dimension search
>
>
> Adam
>
> On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
> wrote:
>
> > That smells like: http://www.jteam.nl/news/spatialsolr.html
> >
> > > My partner is using a publicly available plugin for GeoSpatial. It is
> > used
> > > both during indexing and during search. It forms some kind of gridding
> > > system and puts 10 fields per row related to that. Doing a Radius
> search
> > > (vs a bounding box search which is faster in almost all cases in all
> > > GeoSpatial query systems) seems pretty fast. GeoSpatial was our
> project's
> > > constraint. We've moved past that now.
> > >
> > > Did I mention that it returns distance from the center of the radius
> > based
> > > on units supplied in the query?
> > >
> > > I would tell you what the plugin is, but in our division of labor, I
> have
> > > kept that out of my short term memory. You can contact him at:
> > > Danilo Unite ;
> > >
> > > Dennis Gearon
> > >
> > >
> > > Signature Warning
> > > 
> > > It is always a good idea to learn from your own mistakes. It is usually
> a
> > > better idea to learn from others’ mistakes, so you do not have to make
> > > them yourself. from
> > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > >
> > >
> > > EARTH has a Right To Life,
> > > otherwise we all die.
> > >
> > >
> > >
> > > - Original Message 
> > > From: George Anthony 
> > > To: solr-user@lucene.apache.org
> > > Sent: Fri, December 10, 2010 9:23:18 AM
> > > Subject: SOLR geospatial
> > >
> > > In looking at some of the docs support for geospatial search.
> > >
> > > I see this functionality is mostly scheduled for upcoming release 4.0
> > (with
> > > some
> > >
> > > playing around with backported code).
> > >
> > >
> > > I note the support for the bounding box filter, but will "bounding box"
> > be
> > > one of the supported *data* types for use with this filter?  For
> example,
> > > if my lat/long data describes the "footprint" of a map, I'm curious if
> > > that type of coordinate data can be used by the bounding box filter (or
> > in
> > > any other way for similar limiting/filtering capability). I see it can
> > > work with point type data but curious about functionality with bounding
> > > box type data (in contrast to simple point lat/long data).
> > >
> > > Thanks,
> > > George
> >
>



boosting, both query time and other

2010-12-12 Thread Dennis Gearon
So, our main search results has some very common fields,

'title'
'tags'
'description'

What kind of boosting has everybody been using that makes them and their 
customers happy with these kind of fields?

What are the pros and cons of query time boosting versus configured boosting?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



Very high load after replicating

2010-12-12 Thread Mark
After replicating an index of around 20g my slaves experience very high 
load (50+!!)


Is there anything I can do to alleviate this problem?  Would solr cloud 
be of any help?


thanks


Re: Very high load after replicating

2010-12-12 Thread Markus Jelsma
There can be numerous explanations such as your configuration (cache warm 
queries, merge factor, replication events etc) but also I/O having trouble 
flushing everything to disk. It could also be a memory problem, the OS might 
start swapping if you allocate too much RAM to the JVM leaving little for the 
OS to work with.

You need to provide more details.

> After replicating an index of around 20g my slaves experience very high
> load (50+!!)
> 
> Is there anything I can do to alleviate this problem?  Would solr cloud
> be of any help?
> 
> thanks


Re: Using synonyms in combination with facets

2010-12-12 Thread kirchheimer

Thanks,

this is exactly the type of solution  I need.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-synonyms-in-combination-with-facets-tp1968584p2074692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: full text search in multiple fields

2010-12-12 Thread PeterKerk

I went for the * operator, and it works now! Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2075140.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boosting, both query time and other

2010-12-12 Thread Erick Erickson
Basically that's unanswerable, you have to look at trying
various choices with your corpus. Take a look at the defaults
in the dismax request handler in the example schema for a
place to start... And do be aware that the "correct" values may
change as your corpus acquires more data.

I'm not sure what you're really asking when you say "query time
boosting versus configured boosting". Could you give an example?

Best
Erick

On Sun, Dec 12, 2010 at 3:51 PM, Dennis Gearon wrote:

> So, our main search results has some very common fields,
>
> 'title'
> 'tags'
> 'description'
>
> What kind of boosting has everybody been using that makes them and their
> customers happy with these kind of fields?
>
> What are the pros and cons of query time boosting versus configured
> boosting?
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>


Re: SOLR geospatial

2010-12-12 Thread Adam Estrada
I would be more than happy to help with any of the spatial testing you are
working on.

adam

On Sun, Dec 12, 2010 at 3:08 PM, Dennis Gearon wrote:

> We're in Alpha, heading to Alpha 2. Our requirements are simple: radius
> searching, and distance from center. Solr Spatial works and is current.
> GeoSpatial is almost there, but we're going to wait until it's released to
> spend
> time with it. We have other tasks to work on and don't want to be part of
> the
> debugging process of any project right now.
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Erick Erickson 
> To: solr-user@lucene.apache.org
> Sent: Sun, December 12, 2010 11:18:03 AM
> Subject: Re: SOLR geospatial
>
> By and large, spatial solr is being replaced by geospatial, see:
> http://wiki.apache.org/solr/SpatialSearch. I don't think the old
> spatial contrib is still included in the trunk or 3.x code bases, but
> I could be wrong
>
> That said, I don't know whether what you want is on the roadmap
> there either. Here's a place to start if you want to see the JIRA
> discussions: https://issues.apache.org/jira/browse/SOLR-1568
>
> Best
> Erick
>
>
> On Sun, Dec 12, 2010 at 11:23 AM, Adam Estrada  >wrote:
>
> > I am particularly interested in storing and querying polygons. That sort
> of
> > thing looks like its on their roadmap so does anyone know what the status
> > is
> > on that? Also, integration with JTS would make this a core component of
> any
> > GIS. Again, anyone know what the status is on that?
> >
> > *What’s on the roadmap of future features?*
> >
> > Here are some of the features and henhancements we're planning for SSP:
> >
> >   -
> >
> >   Performance improvements for larger data sets
> >   -
> >
> >   Fixing of known bugs
> >   -
> >
> >   Distance facets: Allowing Solr users to be able to filter their results
> >   based on the calculated distances.
> >   -
> >
> >   Search with regular polygons, and groups of shapes
> >   -
> >
> >   Integration with JTS
> >   -
> >
> >   Highly optimized distance calculation algorithms
> >   -
> >
> >   Ranking results by distance
> >   -
> >
> >   3D dimension search
> >
> >
> > Adam
> >
> > On Sun, Dec 12, 2010 at 12:01 AM, Markus Jelsma
> > wrote:
> >
> > > That smells like: http://www.jteam.nl/news/spatialsolr.html
> > >
> > > > My partner is using a publicly available plugin for GeoSpatial. It is
> > > used
> > > > both during indexing and during search. It forms some kind of
> gridding
> > > > system and puts 10 fields per row related to that. Doing a Radius
> > search
> > > > (vs a bounding box search which is faster in almost all cases in all
> > > > GeoSpatial query systems) seems pretty fast. GeoSpatial was our
> > project's
> > > > constraint. We've moved past that now.
> > > >
> > > > Did I mention that it returns distance from the center of the radius
> > > based
> > > > on units supplied in the query?
> > > >
> > > > I would tell you what the plugin is, but in our division of labor, I
> > have
> > > > kept that out of my short term memory. You can contact him at:
> > > > Danilo Unite ;
> > > >
> > > > Dennis Gearon
> > > >
> > > >
> > > > Signature Warning
> > > > 
> > > > It is always a good idea to learn from your own mistakes. It is
> usually
> > a
> > > > better idea to learn from others’ mistakes, so you do not have to
> make
> > > > them yourself. from
> > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > > >
> > > >
> > > > EARTH has a Right To Life,
> > > > otherwise we all die.
> > > >
> > > >
> > > >
> > > > - Original Message 
> > > > From: George Anthony 
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Fri, December 10, 2010 9:23:18 AM
> > > > Subject: SOLR geospatial
> > > >
> > > > In looking at some of the docs support for geospatial search.
> > > >
> > > > I see this functionality is mostly scheduled for upcoming release 4.0
> > > (with
> > > > some
> > > >
> > > > playing around with backported code).
> > > >
> > > >
> > > > I note the support for the bounding box filter, but will "bounding
> box"
> > > be
> > > > one of the supported *data* types for use with this filter?  For
> > example,
> > > > if my lat/long data describes the "footprint" of a map, I'm curious
> if
> > > > that type of coordinate data can be used by the bounding box filter
> (or
> > > in
> > > > any other way for similar limiting/filtering capability). I see it
> can
> > > > work with point type data but curious about functionality with
> bounding
> > > > box type data (in contrast to simple point lat/long data).
> > > >
> > > > Thanks,
> > > > George
> > >
> >
>
>


Re: [Multiple] RSS Feeds at a time...

2010-12-12 Thread Adam Estrada
Hi Ahmet,

This is a great idea but still does not appear to be working correctly. The
idea is that I want to be able to add an RSS feed and then index that feed
on a schedule. My C# method looks something like this.

public ActionResult Index()
{
try {
HTTPGet req = new HTTPGet();
string solrStr =
System.Configuration.ConfigurationManager.AppSettings["solrUrl"].ToString();
req.Request(solrStr +
"/select?clean=true&commit=true&qt=/dataimport&command=reload-config");
req.Request(solrStr +
"/select?clean=false&commit=true&qt=/dataimport&command=full-import");
Response.Write(req.StatusLine);
Response.Write(req.ResponseTime);
Response.Write(req.StatusCode);
return RedirectToAction("../Import/Feeds");
//return View();
} catch (SolrConnectionException) {
throw new Exception(string.Format("Couldn't Import RSS
Feeds"));
}
}

My XML configuration file looks somethiing like this...



  


  http://rss.cnn.com/rss/cnn_topstories.rss";
  processor="XPathEntityProcessor"
  forEach="/rss/channel | /rss/channel/item"
  transformer="DateFormatTransformer,HTMLStripTransformer">












  

  http://feeds.newsweek.com/newsweek/nation";
processor="XPathEntityProcessor"
forEach="/rss/channel | /rss/channel/item"
transformer="DateFormatTransformer,HTMLStripTransformer">












  
   
  


As you can see, I can add sub-entities from what appears to be as many times
as I want. The idea was to reload the xml file after each entity is added.
What else am I missing here because the reload-config command does not seem
to be working. Any ideas would be great!

Thanks,
Adam Estrada

On Sat, Dec 11, 2010 at 4:48 PM, Ahmet Arslan  wrote:

> > I found that you can have a single config file that can
> > have several
> > entities in it. My question now is how can I add entities
> > without restarting
> > the Solr service?
>
> You mean changing and re-loading xml config file?
>
> dataimport?command=reload-config
> http://wiki.apache.org/solr/DataImportHandler#Commands
>
>
>
>


[pubDate] is not converting correctly

2010-12-12 Thread Adam Estrada
All,

I am having some difficu"lties parsing the pubDate field that is part of the
RSS spec (I believe). I get the warning that "states, "Dec 12, 2010 6:45:26
PM org.apache.solr.handler.dataimport.DateFormatTransformer
 transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: "Thu, 30 Jul 2009 14:41:43
+"
at java.text.DateFormat.parse(Unknown Source)"

Does anyone know how to fix this? I would eventually like to do a date query
but without the ability to properly parse them I don't know if it's going to
work.

Thanks,
Adam


Re: [pubDate] is not converting correctly

2010-12-12 Thread Koji Sekiguchi

(10/12/13 8:49), Adam Estrada wrote:

All,

I am having some difficu"lties parsing the pubDate field that is part of the
RSS spec (I believe). I get the warning that "states, "Dec 12, 2010 6:45:26
PM org.apache.solr.handler.dataimport.DateFormatTransformer
  transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: "Thu, 30 Jul 2009 14:41:43
+"
 at java.text.DateFormat.parse(Unknown Source)"

Does anyone know how to fix this? I would eventually like to do a date query
but without the ability to properly parse them I don't know if it's going to
work.

Thanks,
Adam


Adam,

How does your data-config.xml look like for that field?
Have you looked at rss-data-config.xml file
under example/example-DIH/solr/rss/conf directory?

Koji
--
http://www.rondhuit.com/en/


Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
Which query parser did my partner set up below, and how to I parse three fields 
in the index for scoring and returning results?



/solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft


 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



Re: full text search in multiple fields

2010-12-12 Thread Dennis Gearon
For those of us who come late to a thread, having at least the last post that 
you're replying to would help. Me at least ;-)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: PeterKerk 
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 1:47:35 PM
Subject: Re: full text search in multiple fields


I went for the * operator, and it works now! Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2075140.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Pradeep Singh
You said you were using a third party plugin. What do you expect people
herre to know? Solr plugins don't have parameters lat, long, radius and
threadCount (they have pt and dist).

On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon wrote:

> Which query parser did my partner set up below, and how to I parse three
> fields
> in the index for scoring and returning results?
>
>
>
>
> /solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft
>
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>


Rebuild Spellchecker based on cron expression

2010-12-12 Thread Martin Grotzke
Hi,

the spellchecker component already provides a buildOnCommit and
buildOnOptimize option.

Since we have several spellchecker indices building on each commit is
not really what we want to do.
Building on optimize is not possible as index optimization is done on
the master and the slaves don't even run an optimize but only fetch
the optimized index.

Therefore I'm thinking about an extension of the spellchecker that
allows you to rebuild the spellchecker based on a cron-expression
(e.g. rebuild each night at 1 am).

What do you think about this, is there anybody else interested in this?

Regarding the lifecycle, is there already some executor "framework" or
any regularly running process in place, or would I have to pull up my
own thread? If so, how can I stop my thread when solr/tomcat is
shutdown (I couldn't see any shutdown or destroy method in
SearchComponent)?

Thanx for your feedback,
cheers,
Martin


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Markus Jelsma
Pradeep is right, but, check the solrconfig, the query parser is defined there. 
Look for the basedOn attribute in the queryParser element.



> You said you were using a third party plugin. What do you expect people
> herre to know? Solr plugins don't have parameters lat, long, radius and
> threadCount (they have pt and dist).
> 
> On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon wrote:
> > Which query parser did my partner set up below, and how to I parse three
> > fields
> > in the index for scoring and returning results?
> > 
> > 
> > 
> > 
> > /solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.326
> > 375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20L
> > oft
> > 
> >  Dennis Gearon
> > 
> > Signature Warning
> > 
> > It is always a good idea to learn from your own mistakes. It is usually a
> > better
> > idea to learn from others’ mistakes, so you do not have to make them
> > yourself.
> > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > 
> > 
> > EARTH has a Right To Life,
> > otherwise we all die.


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
Well, I didn't think the plugin would be an issue. I thought the rest of the 
query was from the main query parser, and the plugin processes after that. so I 
thought the rest of query AFTER the plugin/filter part of the query was like 
normal,without the filter/plugin. Is that so?

Using the plugin makes me do everything according to it's reequirements, or for 
just what's in the braces {}?

I believe the plugin is Spatial Solr, anyway.

I'm really new to using this, guys.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Pradeep Singh 
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 5:02:54 PM
Subject: Re: Which query parser and how to do full text on mulitple fields

You said you were using a third party plugin. What do you expect people
herre to know? Solr plugins don't have parameters lat, long, radius and
threadCount (they have pt and dist).

On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon wrote:

> Which query parser did my partner set up below, and how to I parse three
> fields
> in the index for scoring and returning results?
>
>
>
>
>/solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft
>t
>
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>



Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
And to be more specific, the fields I want to combine for *full text* are just 
three text fields, they're not geospatial.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Pradeep Singh 
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 5:02:54 PM
Subject: Re: Which query parser and how to do full text on mulitple fields

You said you were using a third party plugin. What do you expect people
herre to know? Solr plugins don't have parameters lat, long, radius and
threadCount (they have pt and dist).

On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon wrote:

> Which query parser did my partner set up below, and how to I parse three
> fields
> in the index for scoring and returning results?
>
>
>
>
>/solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft
>t
>
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>



Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Markus Jelsma
Maybe you've overlooked the build parameter?
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build

> Hi,
> 
> the spellchecker component already provides a buildOnCommit and
> buildOnOptimize option.
> 
> Since we have several spellchecker indices building on each commit is
> not really what we want to do.
> Building on optimize is not possible as index optimization is done on
> the master and the slaves don't even run an optimize but only fetch
> the optimized index.
> 
> Therefore I'm thinking about an extension of the spellchecker that
> allows you to rebuild the spellchecker based on a cron-expression
> (e.g. rebuild each night at 1 am).
> 
> What do you think about this, is there anybody else interested in this?
> 
> Regarding the lifecycle, is there already some executor "framework" or
> any regularly running process in place, or would I have to pull up my
> own thread? If so, how can I stop my thread when solr/tomcat is
> shutdown (I couldn't see any shutdown or destroy method in
> SearchComponent)?
> 
> Thanx for your feedback,
> cheers,
> Martin


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Dennis Gearon
Oh, I didn't know that the syntax didn't show the parser used, that it was set 
in the config file.

I'll talk to my partner, thanks.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Markus Jelsma 
To: solr-user@lucene.apache.org
Cc: Pradeep Singh 
Sent: Sun, December 12, 2010 5:08:11 PM
Subject: Re: Which query parser and how to do full text on mulitple fields

Pradeep is right, but, check the solrconfig, the query parser is defined there. 
Look for the basedOn attribute in the queryParser element.



> You said you were using a third party plugin. What do you expect people
> herre to know? Solr plugins don't have parameters lat, long, radius and
> threadCount (they have pt and dist).
> 
> On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon wrote:
> > Which query parser did my partner set up below, and how to I parse three
> > fields
> > in the index for scoring and returning results?
> > 
> > 
> > 
> > 
> > /solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.326
> > 375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20L
> > oft
> > 
> >  Dennis Gearon
> > 
> > Signature Warning
> > 
> > It is always a good idea to learn from your own mistakes. It is usually a
> > better
> > idea to learn from others’ mistakes, so you do not have to make them
> > yourself.
> > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > 
> > 
> > EARTH has a Right To Life,
> > otherwise we all die.



Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Markus Jelsma
The manual answers most questions.

> Oh, I didn't know that the syntax didn't show the parser used, that it was
> set in the config file.
> 
> I'll talk to my partner, thanks.
> 
>  Dennis Gearon
> 
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> 
> EARTH has a Right To Life,
> otherwise we all die.
> 
> 
> 
> - Original Message 
> From: Markus Jelsma 
> To: solr-user@lucene.apache.org
> Cc: Pradeep Singh 
> Sent: Sun, December 12, 2010 5:08:11 PM
> Subject: Re: Which query parser and how to do full text on mulitple fields
> 
> Pradeep is right, but, check the solrconfig, the query parser is defined
> there. Look for the basedOn attribute in the queryParser element.
> 
> > You said you were using a third party plugin. What do you expect people
> > herre to know? Solr plugins don't have parameters lat, long, radius and
> > threadCount (they have pt and dist).
> > 
> > On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon 
wrote:
> > > Which query parser did my partner set up below, and how to I parse
> > > three fields
> > > in the index for scoring and returning results?
> > > 
> > > 
> > > 
> > > 
> > > /solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.3
> > > 26
> > > 375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%
> > > 20L oft
> > > 
> > >  Dennis Gearon
> > > 
> > > Signature Warning
> > > 
> > > It is always a good idea to learn from your own mistakes. It is usually
> > > a better
> > > idea to learn from others’ mistakes, so you do not have to make them
> > > yourself.
> > > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > > 
> > > 
> > > EARTH has a Right To Life,
> > > otherwise we all die.


Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Martin Grotzke
On Mon, Dec 13, 2010 at 2:12 AM, Markus Jelsma
 wrote:
> Maybe you've overlooked the build parameter?
> http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build
I'm aware of this, but we don't want to maintain cron-jobs on all
slaves for all spellcheckers for all cores.
That's why I'm thinking about a more integrated solution. Or did I
really overlook s.th.?

Cheers,
Martin


>
>> Hi,
>>
>> the spellchecker component already provides a buildOnCommit and
>> buildOnOptimize option.
>>
>> Since we have several spellchecker indices building on each commit is
>> not really what we want to do.
>> Building on optimize is not possible as index optimization is done on
>> the master and the slaves don't even run an optimize but only fetch
>> the optimized index.
>>
>> Therefore I'm thinking about an extension of the spellchecker that
>> allows you to rebuild the spellchecker based on a cron-expression
>> (e.g. rebuild each night at 1 am).
>>
>> What do you think about this, is there anybody else interested in this?
>>
>> Regarding the lifecycle, is there already some executor "framework" or
>> any regularly running process in place, or would I have to pull up my
>> own thread? If so, how can I stop my thread when solr/tomcat is
>> shutdown (I couldn't see any shutdown or destroy method in
>> SearchComponent)?
>>
>> Thanx for your feedback,
>> cheers,
>> Martin
>



-- 
Martin Grotzke
http://twitter.com/martin_grotzke


Re: [pubDate] is not converting correctly

2010-12-12 Thread Adam Estrada
Thanks for the feedback! There are quite a few formats that can be used. I
am experiencing at least 5 of them. Would something like this work? Note
that there are 2 different formats separated by a comma.



I don't suppose it will because there is already a comma in the first
parser. I guess I am reallly looking for an all purpose data time parser but
even if I have that, would I still be able to query *all* fields in the
index?

Good article:
http://www.java2s.com/Open-Source/Java-Document/RSS-RDF/Rome/com/sun/syndication/io/impl/DateParser.java.htm

Adam

On Sun, Dec 12, 2010 at 7:31 PM, Koji Sekiguchi  wrote:

> (10/12/13 8:49), Adam Estrada wrote:
>
>> All,
>>
>> I am having some difficu"lties parsing the pubDate field that is part of
>> the?
>> RSS spec (I believe). I get the warning that "states, "Dec 12, 2010
>> 6:45:26
>> PM org.apache.solr.handler.dataimport.DateFormatTransformer
>>  transformRow
>> WARNING: Could not parse a Date field
>> java.text.ParseException: Unparseable date: "Thu, 30 Jul 2009 14:41:43
>> +"
>> at java.text.DateFormat.parse(Unknown Source)"
>>
>> Does anyone know how to fix this? I would eventually like to do a date
>> query
>> but without the ability to properly parse them I don't know if it's going
>> to
>> work.
>>
>> Thanks,
>> Adam
>>
>
> Adam,
>
> How does your data-config.xml look like for that field?
> Have you looked at rss-data-config.xml file
> under example/example-DIH/solr/rss/conf directory?
>
> Koji
> --
> http://www.rondhuit.com/en/
>


Re: [Multiple] RSS Feeds at a time...

2010-12-12 Thread Ahmet Arslan
> What else am I missing here because the reload-config
> command does not seem
> to be working. Any ideas would be great!

solr/dataimport?command=reload-config should return the message 
Configuration Re-loaded sucessfully
if everything went well. May be you can check that after each reload. May be it 
is not a valid xml?

By the way, can't you use variable resolver in your case?

http://wiki.apache.org/solr/DataImportHandler#A_VariableResolver

Passing different rss URLs using a custom parameter from request
like, ${dataimporter.request.myrssurl}. 

/dataimport?command=full-import&clean=false&myrssurl=http://rss.cnn.com/rss/cnn_topstories.rss

Similar discussion http://search-lucene.com/m/xILqvbY6h91/


  


Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Erick Erickson
I'm shooting in the dark here, but according to this:
http://wiki.apache.org/solr/SolrReplication
after the slave pulls the index
down, it issues a commit. So if your
slave is configured to generate the dictionary on commit, will it
"just happen"?

But according to this: https://issues.apache.org/jira/browse/SOLR-866
this is an open issue

Best
Erick

On Sun, Dec 12, 2010 at 8:30 PM, Martin Grotzke <
martin.grot...@googlemail.com> wrote:

> On Mon, Dec 13, 2010 at 2:12 AM, Markus Jelsma
>  wrote:
> > Maybe you've overlooked the build parameter?
> > http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build
> I'm aware of this, but we don't want to maintain cron-jobs on all
> slaves for all spellcheckers for all cores.
> That's why I'm thinking about a more integrated solution. Or did I
> really overlook s.th.?
>
> Cheers,
> Martin
>
>
> >
> >> Hi,
> >>
> >> the spellchecker component already provides a buildOnCommit and
> >> buildOnOptimize option.
> >>
> >> Since we have several spellchecker indices building on each commit is
> >> not really what we want to do.
> >> Building on optimize is not possible as index optimization is done on
> >> the master and the slaves don't even run an optimize but only fetch
> >> the optimized index.
> >>
> >> Therefore I'm thinking about an extension of the spellchecker that
> >> allows you to rebuild the spellchecker based on a cron-expression
> >> (e.g. rebuild each night at 1 am).
> >>
> >> What do you think about this, is there anybody else interested in this?
> >>
> >> Regarding the lifecycle, is there already some executor "framework" or
> >> any regularly running process in place, or would I have to pull up my
> >> own thread? If so, how can I stop my thread when solr/tomcat is
> >> shutdown (I couldn't see any shutdown or destroy method in
> >> SearchComponent)?
> >>
> >> Thanx for your feedback,
> >> cheers,
> >> Martin
> >
>
>
>
> --
> Martin Grotzke
> http://twitter.com/martin_grotzke
>


Search with facet.pivot

2010-12-12 Thread Anders Dam
Hi,

I have a minor problem in getting the pivoting working correctly. The thing
is that two otherwise equal search queries behave differently, namely one is
returning the search result with the facet.pivot fields below and another is
returning the search result with an empty facet.pivot. This is a problem,
since I am particularly interested in displaying the pivots.

Perhaps anyone has an idea about what is going wrong in this case, For
clarity I paste the parameters used for searching:



0
41
-


2<-1 5<-2 6<90%

on
1
0.01

category_search
 
0


*:*
 
category
true
dismax
all

*,score
 
true
1

true

shop_name:colorbob.dk
 
-

root_category_name,parent_category_name,category
root_category_id,parent_category_id,category_id

100
-

root_category_name,parent_category_name,category
root_category_id,parent_category_id,category_id

OKI
100



I see no pattern in what queries is returning the pivot fields and which
ones are not


The field searched in is defined as:



And the edgytext type is defined as

 
   


   
 
 
   


 


I am using apache-solr-4.0-2010-11-26_08-36-06 release

Thanks in advance,

Anders Dam


Re: [pubDate] is not converting correctly

2010-12-12 Thread Lance Norskog
Nice find!  This is Apache 2.0, copyright SUN.

O Great Apache Elders: Is it kosher to add this to the Solr
distribution? It's not in the JDK and is also com.sun.*

On Sun, Dec 12, 2010 at 5:33 PM, Adam Estrada
 wrote:
> Thanks for the feedback! There are quite a few formats that can be used. I
> am experiencing at least 5 of them. Would something like this work? Note
> that there are 2 different formats separated by a comma.
>
>  dateTimeFormat="EEE, dd MMM  HH:mm:ss zzz, -MM-dd'T'HH:mm:ss'Z'" />
>
> I don't suppose it will because there is already a comma in the first
> parser. I guess I am reallly looking for an all purpose data time parser but
> even if I have that, would I still be able to query *all* fields in the
> index?
>
> Good article:
> http://www.java2s.com/Open-Source/Java-Document/RSS-RDF/Rome/com/sun/syndication/io/impl/DateParser.java.htm
>
> Adam
>
> On Sun, Dec 12, 2010 at 7:31 PM, Koji Sekiguchi  wrote:
>
>> (10/12/13 8:49), Adam Estrada wrote:
>>
>>> All,
>>>
>>> I am having some difficu"lties parsing the pubDate field that is part of
>>> the?
>>> RSS spec (I believe). I get the warning that "states, "Dec 12, 2010
>>> 6:45:26
>>> PM org.apache.solr.handler.dataimport.DateFormatTransformer
>>>  transformRow
>>> WARNING: Could not parse a Date field
>>> java.text.ParseException: Unparseable date: "Thu, 30 Jul 2009 14:41:43
>>> +"
>>>         at java.text.DateFormat.parse(Unknown Source)"
>>>
>>> Does anyone know how to fix this? I would eventually like to do a date
>>> query
>>> but without the ability to properly parse them I don't know if it's going
>>> to
>>> work.
>>>
>>> Thanks,
>>> Adam
>>>
>>
>> Adam,
>>
>> How does your data-config.xml look like for that field?
>> Have you looked at rss-data-config.xml file
>> under example/example-DIH/solr/rss/conf directory?
>>
>> Koji
>> --
>> http://www.rondhuit.com/en/
>>
>



-- 
Lance Norskog
goks...@gmail.com


PDFBOX 1.3.1 Parsing Error

2010-12-12 Thread pankaj bhatt
hi All,
While using PDFBOX 1.3.1 in APACHE TIKA 1.7 i am getting the
following error to parse an PDF Document.
*Error: Expected an integer type, actual='' " at
org.apache.pdfbox.pdfparser.BaseParser.readInt*
*
*
This error occurs, because of SHA-256 Encryption used by Adobe Acrobat 9.
is there is any solution to this problem??? I get stuck because of this
approoach.

In Jira Issue-697 has been created against this.
https://issues.apache.org/jira/browse/PDFBOX-697

Please help!!

/ Pankaj Bhatt.


Re: PDFBOX 1.3.1 Parsing Error

2010-12-12 Thread Pradeep Singh
If the document is encrypted maybe it isn't meant to be indexed and publicly
visible after all?

On Sun, Dec 12, 2010 at 10:22 PM, pankaj bhatt  wrote:

> hi All,
>While using PDFBOX 1.3.1 in APACHE TIKA 1.7 i am getting the
> following error to parse an PDF Document.
> *Error: Expected an integer type, actual='' " at
> org.apache.pdfbox.pdfparser.BaseParser.readInt*
> *
> *
> This error occurs, because of SHA-256 Encryption used by Adobe Acrobat 9.
> is there is any solution to this problem??? I get stuck because of this
> approoach.
>
> In Jira Issue-697 has been created against this.
> https://issues.apache.org/jira/browse/PDFBOX-697
>
> Please help!!
>
> / Pankaj Bhatt.
>


Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Martin Grotzke
Hi,

when thinking further about it it's clear that
  https://issues.apache.org/jira/browse/SOLR-433
would be even better - we could generate the spellechecker indices on
commit/optimize on the master and replicate them to all slaves.

Just wondering what's the reason that this patch receives that little
interest. Anything wrong with it?

Cheers,
Martin


On Mon, Dec 13, 2010 at 2:04 AM, Martin Grotzke
 wrote:
> Hi,
>
> the spellchecker component already provides a buildOnCommit and
> buildOnOptimize option.
>
> Since we have several spellchecker indices building on each commit is
> not really what we want to do.
> Building on optimize is not possible as index optimization is done on
> the master and the slaves don't even run an optimize but only fetch
> the optimized index.
>
> Therefore I'm thinking about an extension of the spellchecker that
> allows you to rebuild the spellchecker based on a cron-expression
> (e.g. rebuild each night at 1 am).
>
> What do you think about this, is there anybody else interested in this?
>
> Regarding the lifecycle, is there already some executor "framework" or
> any regularly running process in place, or would I have to pull up my
> own thread? If so, how can I stop my thread when solr/tomcat is
> shutdown (I couldn't see any shutdown or destroy method in
> SearchComponent)?
>
> Thanx for your feedback,
> cheers,
> Martin
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/