Outstanding Jira issue

2013-05-08 Thread Shane Perry
I opened a Jira issue in Oct of 2011 which is still outstanding. I've
boosted the priority to Critical as each time I've upgraded Solr, I've had
to manually patch and build the jars.   There is a patch (for 3.6) attached
to the ticket. Is there someone with commit access who can take a look and
poke the fix through (preferably on 4.2 as well as 4.3)?  The ticket is
https://issues.apache.org/jira/browse/SOLR-2834.

Thanks in advance.

Shane


Re: Outstanding Jira issue

2013-05-08 Thread Shane Perry
Yeah, I realize my "fix" is more a bandage.  While it wouldn't be a good
long-term solution, how about going the path of ignoring unrecognized types
and logging a warning message so the handler does crash.  The Jira ticket
could then be left open (and hopefully assigned) to fix the actual problem.
 This would allow consumers from having to avoid the scenario or manually
patching the file to ignore the problem.

On Wed, May 8, 2013 at 11:49 AM, Shawn Heisey  wrote:

> On 5/8/2013 9:20 AM, Shane Perry wrote:
>
>> I opened a Jira issue in Oct of 2011 which is still outstanding. I've
>> boosted the priority to Critical as each time I've upgraded Solr, I've had
>> to manually patch and build the jars.   There is a patch (for 3.6)
>> attached
>> to the ticket. Is there someone with commit access who can take a look and
>> poke the fix through (preferably on 4.2 as well as 4.3)?  The ticket is
>> https://issues.apache.org/**jira/browse/SOLR-2834<https://issues.apache.org/jira/browse/SOLR-2834>
>> .
>>
>
> Your patch just ignores the problem so the request doesn't crash, it
> doesn't fix it.  We need to fix whatever the problem is in
> HTMLStripCharFilter.
>
> I had hoped I could come up with a quick fix, but it's proving too
> difficult for me to unravel.  I can't even figure out it works on "good"
> analysis components like WhiteSpaceTokenizer, so I definitely can't see
> what the problem is for HTMLStripCharFilter.
>
> Thanks,
> Shawn
>
>


Inaccurate wiki documentation?

2013-05-20 Thread Shane Perry
I am in the process of setting up a core using Solr 4.3.  On the Core
Discovery
wiki
page it states:

As of SOLR-4196, there's a new way of defining cores. Essentially, it is no
longer necessary to define cores in solr.xml. In fact, solr.xml is no
longer necessary at all and will be obsoleted in Solr 5.x. As of Solr 4.3
the process is as follows:


   - If a solr.xml file is found in , then it is expected to be
  the old-style solr.xml that defines cores etc.
  - If there is no solr.xml but there is a solr.properties file, then
  exploration-based core enumeration is assumed.
  - If neither a solr.xml nor an solr.properties file is found, a
  default solr.xml file is assumed. NOTE: as of 5.0, this will not be true
  and an error will be thrown if no solr.properties file is found.

Using the 4.3 war available for download, I attempted to set up my core
using the solr.properties file (in anticipation of moving to 5.0).  When I
start the context, logging shows that the process is falling back to the
default solr.xml file (essentially the second bullet does not occur).
 After digging through the 4_3 branch it looks like solr.properties is not
yet part of the library.  Am I missing something (I'm able to get the
context started using a solr.xml file with "" as the contents)?

I'm going with a basic solr.xml for now, but any insight would be
appreciated.

Thanks in advance.


Sorting by field is slow

2013-06-12 Thread Shane Perry
In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
increased exponentially.  After testing in 4.3.0 it appears the same query
(with 1 matching document) returns after 100 ms without sorting but takes 1
minute when sorting by a text field.  I've looked around but haven't yet
found a reason for the degradation.  Can someone give me some insight or
point me in the right direction for resolving this?  In most cases, I can
change my code to do client-side sorting but I do have a couple of
situations where pagination prevents client-side sorting.  Any help would
be greatly appreciated.

Thanks,

Shane


Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Thanks for the responses.

Setting first/newSearcher had no noticeable effect.  I'm sorting on a
stored/indexed field named 'text' who's fieldType is solr.TextField.
 Overall, the values of the field are unique. The JVM is only using about
2G of the available 12G, so no OOM/GC issue (at least on the surface).  The
server is question is a slave with approximately 56 million documents.
 Additionally, sorting on a field of the same type but with significantly
less uniqueness results quick response times.

The following is a sample of *debugQuery=true* for a query which returns 1
document:


  61458.0
  
61452.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
6.0
  



-- Update --

Out of desperation, I turned off replication by commenting out the ** element in the replication requestHandler block.  After
restarting tomcat I was surprised to find that the replication admin UI
still reported the core as replicating.  Search queries were still slow.  I
then disabled replication via the UI and the display updated to report the
core was no longer replicating.  Queries are now fast so it appears that
the sorting may be a red-herring.

It's may be of note to also mention that the slow queries don't appear to
be getting cached.

Thanks again for the feed back.

On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky wrote:

> Rerun the sorted query with &debugQuery=true and look at the module
> timings. See what stands out
>
> Are you actually sorting on a "text" field, as opposed to a "string" field?
>
> Of course, it's always possible that maybe you're hitting some odd OOM/GC
> condition as a result of Solr growing  between releases.
>
> -- Jack Krupansky
>
> -Original Message- From: Shane Perry
> Sent: Wednesday, June 12, 2013 3:00 PM
> To: solr-user@lucene.apache.org
> Subject: Sorting by field is slow
>
>
> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
> increased exponentially.  After testing in 4.3.0 it appears the same query
> (with 1 matching document) returns after 100 ms without sorting but takes 1
> minute when sorting by a text field.  I've looked around but haven't yet
> found a reason for the degradation.  Can someone give me some insight or
> point me in the right direction for resolving this?  In most cases, I can
> change my code to do client-side sorting but I do have a couple of
> situations where pagination prevents client-side sorting.  Any help would
> be greatly appreciated.
>
> Thanks,
>
> Shane
>


Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Erick,

I agree, it doesn't make sense.  I manually merged the solrconfig.xml from
the distribution example with my 3.6 solrconfig.xml, pulling out what I
didn't need.  There is the possibility I removed something I shouldn't have
though I don't know what it would be.  Minus removing the dynamic fields, a
custom tokenizer class, and changing all my fields to be stored, the
schema.xml file should be the same as well.  I'm not currently in the
position to do so, but I'll double check those two files.  Finally, the
data was re-indexed when I moved to 4.3.

My statement about field values wasn't stated very well.  What I meant is
that the 'text' field has more unique terms than some of my other fields.

As for this being an edge case, I'm not sure why it would manifest itself
in 4.3 but not in 3.6 (short of me having a screwy configuration setting).
 If I get a chance, I'll see if I can duplicate the behavior with a small
document count in a sandboxed environment.

Shane

On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson wrote:

> This doesn't make much sense, particularly the fact
> that you added first/new searchers. I'm assuming that
> these are sorting on the same field as your slow query.
>
> But sorting on a text field for which
> "Overall, the values of the field are unique"
> is a red-flag. Solr doesn't sort on fields that have
> more than one term, so you might as well use a
> string field and be done with it, it's possible you're
> hitting some edge case.
>
> Did you just copy your 3.6 schema and configs to
> 4.3? Did you re-index?
>
> Best
> Erick
>
> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
> > Thanks for the responses.
> >
> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
> > stored/indexed field named 'text' who's fieldType is solr.TextField.
> >  Overall, the values of the field are unique. The JVM is only using about
> > 2G of the available 12G, so no OOM/GC issue (at least on the surface).
>  The
> > server is question is a slave with approximately 56 million documents.
> >  Additionally, sorting on a field of the same type but with significantly
> > less uniqueness results quick response times.
> >
> > The following is a sample of *debugQuery=true* for a query which returns
> 1
> > document:
> >
> > 
> >   61458.0
> >   
> > 61452.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 6.0
> >   
> > 
> >
> >
> > -- Update --
> >
> > Out of desperation, I turned off replication by commenting out the * > name="slave">* element in the replication requestHandler block.  After
> > restarting tomcat I was surprised to find that the replication admin UI
> > still reported the core as replicating.  Search queries were still slow.
>  I
> > then disabled replication via the UI and the display updated to report
> the
> > core was no longer replicating.  Queries are now fast so it appears that
> > the sorting may be a red-herring.
> >
> > It's may be of note to also mention that the slow queries don't appear to
> > be getting cached.
> >
> > Thanks again for the feed back.
> >
> > On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky  >wrote:
> >
> >> Rerun the sorted query with &debugQuery=true and look at the module
> >> timings. See what stands out
> >>
> >> Are you actually sorting on a "text" field, as opposed to a "string"
> field?
> >>
> >> Of course, it's always possible that maybe you're hitting some odd
> OOM/GC
> >> condition as a result of Solr growing  between releases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Shane Perry
> >> Sent: Wednesday, June 12, 2013 3:00 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Sorting by field is slow
> >>
> >>
> >> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
> >> increased exponentially.  After testing in 4.3.0 it appears the same
> query
> >> (with 1 matching document) returns after 100 ms without sorting but
> takes 1
> >> minute when sorting by a text field.  I've looked around but haven't yet
> >> found a reason for the degradation.  Can someone give me some insight or
> >> point me in the right direction for resolving this?  In most cases, I
> can
> >> change my code to do client-side sorting but I do have a couple of
> >> situations where pagination prevents client-side sorting.  Any help
> would
> >> be greatly appreciated.
> >>
> >> Thanks,
> >>
> >> Shane
> >>
>


Re: Sorting by field is slow

2013-06-13 Thread Shane Perry
Erick,

We do have soft commits turned.  Initially, autoCommit was set at 15000 and
autoSoftCommit at 1000.  We did up those to 120 and 60
respectively.  However, since the core in question is a slave, we don't
actually do writes to the core but rely on replication only to populate the
index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
no-ops?  I thought I had pulled out all hard commits but a double check
shows one instance where it still occurs.

Thanks for your time.

Shane

On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson wrote:

> Shane:
>
> You've covered all the config stuff that I can think of. There's one
> other possibility. Do you have the soft commits turned on and are
> they very short? Although soft commits shouldn't invalidate any
> segment-level caches (but I'm not sure whether the sorting buffers
> are low-level or not).
>
> About the only other thing I can think of is that you're somehow
> doing hard commits from, say, the client but that's really
> stretching.
>
> All I can really say at this point is that this isn't a problem I've seen
> before, so it's _likely_ some innocent-seeming config has changed.
> I'm sure it'll be obvious once you find it ...
>
> Erick
>
> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
> > Erick,
> >
> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
> from
> > the distribution example with my 3.6 solrconfig.xml, pulling out what I
> > didn't need.  There is the possibility I removed something I shouldn't
> have
> > though I don't know what it would be.  Minus removing the dynamic
> fields, a
> > custom tokenizer class, and changing all my fields to be stored, the
> > schema.xml file should be the same as well.  I'm not currently in the
> > position to do so, but I'll double check those two files.  Finally, the
> > data was re-indexed when I moved to 4.3.
> >
> > My statement about field values wasn't stated very well.  What I meant is
> > that the 'text' field has more unique terms than some of my other fields.
> >
> > As for this being an edge case, I'm not sure why it would manifest itself
> > in 4.3 but not in 3.6 (short of me having a screwy configuration
> setting).
> >  If I get a chance, I'll see if I can duplicate the behavior with a small
> > document count in a sandboxed environment.
> >
> > Shane
> >
> > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson  >wrote:
> >
> >> This doesn't make much sense, particularly the fact
> >> that you added first/new searchers. I'm assuming that
> >> these are sorting on the same field as your slow query.
> >>
> >> But sorting on a text field for which
> >> "Overall, the values of the field are unique"
> >> is a red-flag. Solr doesn't sort on fields that have
> >> more than one term, so you might as well use a
> >> string field and be done with it, it's possible you're
> >> hitting some edge case.
> >>
> >> Did you just copy your 3.6 schema and configs to
> >> 4.3? Did you re-index?
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
> >> > Thanks for the responses.
> >> >
> >> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
> >> > stored/indexed field named 'text' who's fieldType is solr.TextField.
> >> >  Overall, the values of the field are unique. The JVM is only using
> about
> >> > 2G of the available 12G, so no OOM/GC issue (at least on the surface).
> >>  The
> >> > server is question is a slave with approximately 56 million documents.
> >> >  Additionally, sorting on a field of the same type but with
> significantly
> >> > less uniqueness results quick response times.
> >> >
> >> > The following is a sample of *debugQuery=true* for a query which
> returns
> >> 1
> >> > document:
> >> >
> >> > 
> >> >   61458.0
> >> >   
> >> > 61452.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 6.0
> >> >  

Re: Sorting by field is slow

2013-06-13 Thread Shane Perry
I've dug through the code and have narrowed the delay down
to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
the point where the comparator's setNextReader() method is called (line 98
in the lucene_solr_4_3 branch).  That line is actually two method calls so
I'm not yet certain which path is the cause.  I'll continue to dig through
the code but am on thin ice so input would be great.

Shane

On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:

> Erick,
>
> We do have soft commits turned.  Initially, autoCommit was set at 15000
> and autoSoftCommit at 1000.  We did up those to 120 and 60
> respectively.  However, since the core in question is a slave, we don't
> actually do writes to the core but rely on replication only to populate the
> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
> no-ops?  I thought I had pulled out all hard commits but a double check
> shows one instance where it still occurs.
>
> Thanks for your time.
>
> Shane
>
> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson 
> wrote:
>
>> Shane:
>>
>> You've covered all the config stuff that I can think of. There's one
>> other possibility. Do you have the soft commits turned on and are
>> they very short? Although soft commits shouldn't invalidate any
>> segment-level caches (but I'm not sure whether the sorting buffers
>> are low-level or not).
>>
>> About the only other thing I can think of is that you're somehow
>> doing hard commits from, say, the client but that's really
>> stretching.
>>
>> All I can really say at this point is that this isn't a problem I've seen
>> before, so it's _likely_ some innocent-seeming config has changed.
>> I'm sure it'll be obvious once you find it ...
>>
>> Erick
>>
>> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
>> > Erick,
>> >
>> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
>> from
>> > the distribution example with my 3.6 solrconfig.xml, pulling out what I
>> > didn't need.  There is the possibility I removed something I shouldn't
>> have
>> > though I don't know what it would be.  Minus removing the dynamic
>> fields, a
>> > custom tokenizer class, and changing all my fields to be stored, the
>> > schema.xml file should be the same as well.  I'm not currently in the
>> > position to do so, but I'll double check those two files.  Finally, the
>> > data was re-indexed when I moved to 4.3.
>> >
>> > My statement about field values wasn't stated very well.  What I meant
>> is
>> > that the 'text' field has more unique terms than some of my other
>> fields.
>> >
>> > As for this being an edge case, I'm not sure why it would manifest
>> itself
>> > in 4.3 but not in 3.6 (short of me having a screwy configuration
>> setting).
>> >  If I get a chance, I'll see if I can duplicate the behavior with a
>> small
>> > document count in a sandboxed environment.
>> >
>> > Shane
>> >
>> > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson <
>> erickerick...@gmail.com>wrote:
>> >
>> >> This doesn't make much sense, particularly the fact
>> >> that you added first/new searchers. I'm assuming that
>> >> these are sorting on the same field as your slow query.
>> >>
>> >> But sorting on a text field for which
>> >> "Overall, the values of the field are unique"
>> >> is a red-flag. Solr doesn't sort on fields that have
>> >> more than one term, so you might as well use a
>> >> string field and be done with it, it's possible you're
>> >> hitting some edge case.
>> >>
>> >> Did you just copy your 3.6 schema and configs to
>> >> 4.3? Did you re-index?
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry 
>> wrote:
>> >> > Thanks for the responses.
>> >> >
>> >> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
>> >> > stored/indexed field named 'text' who's fieldType is solr.TextField.
>> >> >  Overall, the values of the field are unique. The JVM is only using
>> about
>> >> > 2G of the available 12G, so no OOM/GC issue (at least on the
>> surface

Re: Sorting by field is slow

2013-06-17 Thread Shane Perry
Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For a
query in the format (it returns one document, sorted by field4)

+(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
+field4:"text1"


with the field types






  





  



the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
reports 2640449 terms.  As a result, the loop on line 1198 is
executed 2640449 and the inner loop is executed a total of 658310778.  My
index contains 56180128 documents.

My configuration file sets the  for the newSearcher and
firstSearcher listeners to the value


   static firstSearcher warming in solrconfig.xml
   field4



which does not appear to affect the speed.  I'm not sure how replication
plays into the equation outside the fact that we are relatively aggressive
on the replication (every 60 seconds).  I fear I may be at the end of my
knowledge without really getting into the code so any help at this point
would be greatly appreciated.

Shane

On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry  wrote:

> I've dug through the code and have narrowed the delay down
> to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
> the point where the comparator's setNextReader() method is called (line 98
> in the lucene_solr_4_3 branch).  That line is actually two method calls so
> I'm not yet certain which path is the cause.  I'll continue to dig through
> the code but am on thin ice so input would be great.
>
> Shane
>
>
> On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:
>
>> Erick,
>>
>> We do have soft commits turned.  Initially, autoCommit was set at 15000
>> and autoSoftCommit at 1000.  We did up those to 120 and 60
>> respectively.  However, since the core in question is a slave, we don't
>> actually do writes to the core but rely on replication only to populate the
>> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
>> no-ops?  I thought I had pulled out all hard commits but a double check
>> shows one instance where it still occurs.
>>
>> Thanks for your time.
>>
>> Shane
>>
>> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson 
>> wrote:
>>
>>> Shane:
>>>
>>> You've covered all the config stuff that I can think of. There's one
>>> other possibility. Do you have the soft commits turned on and are
>>> they very short? Although soft commits shouldn't invalidate any
>>> segment-level caches (but I'm not sure whether the sorting buffers
>>> are low-level or not).
>>>
>>> About the only other thing I can think of is that you're somehow
>>> doing hard commits from, say, the client but that's really
>>> stretching.
>>>
>>> All I can really say at this point is that this isn't a problem I've seen
>>> before, so it's _likely_ some innocent-seeming config has changed.
>>> I'm sure it'll be obvious once you find it ...
>>>
>>> Erick
>>>
>>> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
>>> > Erick,
>>> >
>>> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
>>> from
>>> > the distribution example with my 3.6 solrconfig.xml, pulling out what I
>>> > didn't need.  There is the possibility I removed something I shouldn't
>>> have
>>> > though I don't know what it would be.  Minus removing the dynamic
>>> fields, a
>>> > custom tokenizer class, and changing all my fields to be stored, the
>>> > schema.xml file should be the same as well.  I'm not currently in the
>>> > position to do so, but I'll double check those two files.  Finally, the
>>> > data was re-indexed when I moved to 4.3.
>>> >
>>> > My statement about field values wasn't stated very well.  What I meant
>>> is
>>> > that the 'text' field has more unique terms than some of my other
>>> fields.
>>> >
>>> > As for this being an edge case, I'm not sure why it would manifest
>>> itself
>>> > in 4.3 but not in 3.6 (short of me having a screwy configuration
>>> setting).
>>> >  If I get a chance, I'll see if I can duplicate the behavior with a
>>> small
>>> > document count in a sandboxed environment.
>>> >
>>> > Shane
>>> >
>>> > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson <
>>> erickerick...@gmail.com>wrote:
>>

Re: Sorting by field is slow

2013-06-17 Thread Shane Perry
Turns out it was a case of an oversite.  My warming queries weren't setting
the sort order and as a result don't successfully complete.  After setting
the sort order things appear to be responding quickly.

Thanks for the help.

On Mon, Jun 17, 2013 at 9:45 AM, Shane Perry  wrote:

> Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For
> a query in the format (it returns one document, sorted by field4)
>
> +(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
> +field4:"text1"
>
>
> with the field types
>
> 
>
>  omitNorms="true"/>
>
> 
>   
> 
> 
> 
> 
> 
>   
> 
>
>
> the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
> reports 2640449 terms.  As a result, the loop on line 1198 is
> executed 2640449 and the inner loop is executed a total of 658310778.  My
> index contains 56180128 documents.
>
> My configuration file sets the  for the newSearcher and
> firstSearcher listeners to the value
>
> 
>static firstSearcher warming in solrconfig.xml
>field4
> 
>
>
> which does not appear to affect the speed.  I'm not sure how replication
> plays into the equation outside the fact that we are relatively aggressive
> on the replication (every 60 seconds).  I fear I may be at the end of my
> knowledge without really getting into the code so any help at this point
> would be greatly appreciated.
>
> Shane
>
> On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry  wrote:
>
>> I've dug through the code and have narrowed the delay down
>> to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
>> the point where the comparator's setNextReader() method is called (line 98
>> in the lucene_solr_4_3 branch).  That line is actually two method calls so
>> I'm not yet certain which path is the cause.  I'll continue to dig through
>> the code but am on thin ice so input would be great.
>>
>> Shane
>>
>>
>> On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:
>>
>>> Erick,
>>>
>>> We do have soft commits turned.  Initially, autoCommit was set at 15000
>>> and autoSoftCommit at 1000.  We did up those to 120 and 60
>>> respectively.  However, since the core in question is a slave, we don't
>>> actually do writes to the core but rely on replication only to populate the
>>> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
>>> no-ops?  I thought I had pulled out all hard commits but a double check
>>> shows one instance where it still occurs.
>>>
>>> Thanks for your time.
>>>
>>> Shane
>>>
>>> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson >> > wrote:
>>>
>>>> Shane:
>>>>
>>>> You've covered all the config stuff that I can think of. There's one
>>>> other possibility. Do you have the soft commits turned on and are
>>>> they very short? Although soft commits shouldn't invalidate any
>>>> segment-level caches (but I'm not sure whether the sorting buffers
>>>> are low-level or not).
>>>>
>>>> About the only other thing I can think of is that you're somehow
>>>> doing hard commits from, say, the client but that's really
>>>> stretching.
>>>>
>>>> All I can really say at this point is that this isn't a problem I've
>>>> seen
>>>> before, so it's _likely_ some innocent-seeming config has changed.
>>>> I'm sure it'll be obvious once you find it ...
>>>>
>>>> Erick
>>>>
>>>> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry 
>>>> wrote:
>>>> > Erick,
>>>> >
>>>> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
>>>> from
>>>> > the distribution example with my 3.6 solrconfig.xml, pulling out what
>>>> I
>>>> > didn't need.  There is the possibility I removed something I
>>>> shouldn't have
>>>> > though I don't know what it would be.  Minus removing the dynamic
>>>> fields, a
>>>> > custom tokenizer class, and changing all my fields to be stored, the
>>>> > schema.xml file should be the same as well.  I'm not currently in the
>>>> > position to do so, but I'll double check those two files.  Finally,
>>>> the
>>>> > data was re-indexed whe

Re: solr performance problem from 4.3.0 with sorting

2013-06-20 Thread Shane Perry
Ariel,

I just went up against a similar issue with upgrading from 3.6.1 to 4.3.0.
 In my case, my solrconfig.xml for 4.3.0 (which was based on my 3.6.1 file)
did not provide a newSearcher or firstSearcher warming query.  After adding
a query to each listener, my query speeds drastically increased.  Check
your config file and if you aren't defining a query (make sure to sort it
on the field in question) do so.

Shane

On Thu, Jun 20, 2013 at 3:45 AM, Ariel Zerbib wrote:

> Hi,
>
> We updated to version 4.3.0 from 4.2.1 and we have some performance
> problem with the sorting.
>
>
> A query that returns 1 hits has a query time more than 100ms (can be
> more than 1s) against less than 10ms for the same query without the
> sort parameter:
>
> query with sorting option:
> q=level_4_id:531044&sort=level_4_id+asc
> response:
> - 1
> - 106
>
>
> query without sorting option: q=level_4_id:531024
> - 1
> - 
>
> the field level_4_id is unique and defined as a long.
>
> In version 4.2.1, the performances were identical. The 4.3.1 version
> has the same behavior than the version 4.3.0.
>
> Thanks,
> Ariel
>


ClassCastException when using FieldAnalysisRequest

2011-10-14 Thread Shane Perry
Hi,

Using Solr 3.4.0, I am trying to do a field analysis via the
FieldAnalysisRequest feature in solrj.  During the process() call, the
following ClassCastException is thrown:

java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List
       at 
org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
       at 
org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
       at 
org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)

My code is as follows:

FieldAnalysisRequest request = new FieldAnalysisRequest(myUri).
  addFieldName(field).
  setFieldValue(text).
  setQuery(text);

request.process(myServer);

Is there something I am doing wrong?  Any help would be appreciated.

Sincerely,

Shane


Re: ClassCastException when using FieldAnalysisRequest

2011-10-14 Thread Shane Perry
After looking at this more, it appears that
solr.HTMLStripCharFilterFactory does not return a list which
AnalysisResponseBase is expecting.  I have created a bug ticket
(https://issues.apache.org/jira/browse/SOLR-2834)

On Fri, Oct 14, 2011 at 8:28 AM, Shane Perry  wrote:
> Hi,
>
> Using Solr 3.4.0, I am trying to do a field analysis via the
> FieldAnalysisRequest feature in solrj.  During the process() call, the
> following ClassCastException is thrown:
>
> java.lang.ClassCastException: java.lang.String cannot be cast to 
> java.util.List
>        at 
> org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
>        at 
> org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
>        at 
> org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
>
> My code is as follows:
>
> FieldAnalysisRequest request = new FieldAnalysisRequest(myUri).
>  addFieldName(field).
>  setFieldValue(text).
>  setQuery(text);
>
> request.process(myServer);
>
> Is there something I am doing wrong?  Any help would be appreciated.
>
> Sincerely,
>
> Shane
>


DIH - Closing ResultSet in JdbcDataSource

2011-01-07 Thread Shane Perry
Hi,

I am in the process of migrating our system from Postgres 8.4 to Solr
1.4.1.  Our system is fairly complex and as a result, I have had to define
19 base entities in the data-config.xml definition file.  Each of these
entities executes 5 queries.  When doing a full-import, as each entity
completes, the server hosting Postgres shows 5 "idle in transaction" for the
entity.

In digging through the code, I found that the JdbcDataSource wraps the
ResultSet object in a custom ResultSetIterator object, leaving the ResultSet
open.  Walking through the code I can't find a close() call anywhere on the
ResultSet.  I believe this results in the "idle in transaction" processes.

Am I off base here?  I'm not sure what the overall implications are of the
"idle in transaction" processes, but is there a way I can get around the
issue without importing each entity manually?  Any feedback would be greatly
appreciated.

Thanks in advance,

Shane


Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-10 Thread Shane Perry
Gora,

Thanks for the response.  After taking another look, you are correct about
the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
didn't recognize the case difference in the two function calls, so missed
it.  I'll keep looking into the original issue and reply if I find a
cause/solution.

Shane

On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty  wrote:

> On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry  wrote:
> > Hi,
> >
> > I am in the process of migrating our system from Postgres 8.4 to Solr
> > 1.4.1.  Our system is fairly complex and as a result, I have had to
> define
> > 19 base entities in the data-config.xml definition file.  Each of these
> > entities executes 5 queries.  When doing a full-import, as each entity
> > completes, the server hosting Postgres shows 5 "idle in transaction" for
> the
> > entity.
> >
> > In digging through the code, I found that the JdbcDataSource wraps the
> > ResultSet object in a custom ResultSetIterator object, leaving the
> ResultSet
> > open.  Walking through the code I can't find a close() call anywhere on
> the
> > ResultSet.  I believe this results in the "idle in transaction"
> processes.
> [...]
>
> Have not examined the "idle in transaction" issue that you
> mention, but the ResultSet object in a ResultSetIterator is
> closed in the private hasnext() method, when there are no
> more results, or if there is an exception. hasnext() is called
> by the public hasNext() method that should be used in
> iterating over the results, so I see no issue there.
>
> Regards,
> Gora
>
> P.S. This is from Solr 1.4.0 code, but I would not think that
>this part of the code would have changed.
>


Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-11 Thread Shane Perry
By placing some strategic debug messages, I have found that the JDBC
connections are not being closed until all  elements have been
processed (in the entire config file).  A simplified example would be:


  
  

  

  ... field list ...
  
... field list ...
  
   

  ... field list ...
  
... field list ...
  
   
  


The behavior is:

JDBC connection opened for entity1 and entity1a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity1 and entity1a
JDBC connection opened for entity2 and entity2a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity2 and entity2a
All JDBC connections (none are closed at this point) are closed.

In my instance, I have some 95 unique  elements (19 parents with 5
children each), resulting in 95 open JDBC connections.  If I understand the
process correctly, it should be safe to close the JDBC connection for a
"root"  (immediate children of ) and all descendant
 elements once the parent has been successfully completed.  I have
been digging around the code, but due to my unfamiliarity with the code, I'm
not sure where this would occur.

Is this a valid solution?  It's looking like I should probably open a defect
and I'm willing to do so along with submitting a patch, but need a little
more direction on where the fix would best reside.

Thanks,

Shane


On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry  wrote:

> Gora,
>
> Thanks for the response.  After taking another look, you are correct about
> the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
> didn't recognize the case difference in the two function calls, so missed
> it.  I'll keep looking into the original issue and reply if I find a
> cause/solution.
>
> Shane
>
>
> On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty  wrote:
>
>> On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry  wrote:
>> > Hi,
>> >
>> > I am in the process of migrating our system from Postgres 8.4 to Solr
>> > 1.4.1.  Our system is fairly complex and as a result, I have had to
>> define
>> > 19 base entities in the data-config.xml definition file.  Each of these
>> > entities executes 5 queries.  When doing a full-import, as each entity
>> > completes, the server hosting Postgres shows 5 "idle in transaction" for
>> the
>> > entity.
>> >
>> > In digging through the code, I found that the JdbcDataSource wraps the
>> > ResultSet object in a custom ResultSetIterator object, leaving the
>> ResultSet
>> > open.  Walking through the code I can't find a close() call anywhere on
>> the
>> > ResultSet.  I believe this results in the "idle in transaction"
>> processes.
>> [...]
>>
>> Have not examined the "idle in transaction" issue that you
>> mention, but the ResultSet object in a ResultSetIterator is
>> closed in the private hasnext() method, when there are no
>> more results, or if there is an exception. hasnext() is called
>> by the public hasNext() method that should be used in
>> iterating over the results, so I see no issue there.
>>
>> Regards,
>> Gora
>>
>> P.S. This is from Solr 1.4.0 code, but I would not think that
>>this part of the code would have changed.
>>
>
>


Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-12 Thread Shane Perry
I have found where a root entity has completed processing and added the
logic to clear the entity's cache at that point (didn't change any of the
logic for clearing all entity caches once the import has completed).  I have
also created an enhancement request found at
https://issues.apache.org/jira/browse/SOLR-2313.

On Tue, Jan 11, 2011 at 2:54 PM, Shane Perry  wrote:

> By placing some strategic debug messages, I have found that the JDBC
> connections are not being closed until all  elements have been
> processed (in the entire config file).  A simplified example would be:
>
> 
>url="jdbc:postgresql://localhost:5432/db1" user="..." password="..." />
>url="jdbc:postgresql://localhost:5432/db2" user="..." password="..." />
>
>   
> 
>   ... field list ...
>   
> ... field list ...
>   
>
> 
>   ... field list ...
>   
> ... field list ...
>   
>
>   
> 
>
> The behavior is:
>
> JDBC connection opened for entity1 and entity1a - Applicable queries run
> and ResultSet objects processed
> All open ResultSet and Statement objects closed for entity1 and entity1a
> JDBC connection opened for entity2 and entity2a - Applicable queries run
> and ResultSet objects processed
> All open ResultSet and Statement objects closed for entity2 and entity2a
> All JDBC connections (none are closed at this point) are closed.
>
> In my instance, I have some 95 unique  elements (19 parents with 5
> children each), resulting in 95 open JDBC connections.  If I understand the
> process correctly, it should be safe to close the JDBC connection for a
> "root"  (immediate children of ) and all descendant
>  elements once the parent has been successfully completed.  I have
> been digging around the code, but due to my unfamiliarity with the code, I'm
> not sure where this would occur.
>
> Is this a valid solution?  It's looking like I should probably open a
> defect and I'm willing to do so along with submitting a patch, but need a
> little more direction on where the fix would best reside.
>
> Thanks,
>
> Shane
>
>
>
> On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry  wrote:
>
>> Gora,
>>
>> Thanks for the response.  After taking another look, you are correct about
>> the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
>> didn't recognize the case difference in the two function calls, so missed
>> it.  I'll keep looking into the original issue and reply if I find a
>> cause/solution.
>>
>> Shane
>>
>>
>> On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty  wrote:
>>
>>> On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry  wrote:
>>> > Hi,
>>> >
>>> > I am in the process of migrating our system from Postgres 8.4 to Solr
>>> > 1.4.1.  Our system is fairly complex and as a result, I have had to
>>> define
>>> > 19 base entities in the data-config.xml definition file.  Each of these
>>> > entities executes 5 queries.  When doing a full-import, as each entity
>>> > completes, the server hosting Postgres shows 5 "idle in transaction"
>>> for the
>>> > entity.
>>> >
>>> > In digging through the code, I found that the JdbcDataSource wraps the
>>> > ResultSet object in a custom ResultSetIterator object, leaving the
>>> ResultSet
>>> > open.  Walking through the code I can't find a close() call anywhere on
>>> the
>>> > ResultSet.  I believe this results in the "idle in transaction"
>>> processes.
>>> [...]
>>>
>>> Have not examined the "idle in transaction" issue that you
>>> mention, but the ResultSet object in a ResultSetIterator is
>>> closed in the private hasnext() method, when there are no
>>> more results, or if there is an exception. hasnext() is called
>>> by the public hasNext() method that should be used in
>>> iterating over the results, so I see no issue there.
>>>
>>> Regards,
>>> Gora
>>>
>>> P.S. This is from Solr 1.4.0 code, but I would not think that
>>>this part of the code would have changed.
>>>
>>
>>
>


Writing on master while replicating to slave

2011-02-10 Thread Shane Perry
Hi,

When a slave is replicating from the master instance, it appears a
write lock is created. Will this lock cause issues with writing to the
master while the replication is occurring or does SOLR have some
queuing that occurs to prevent the actual write until the replication
is complete?  I've been looking around but can't seem to find anything
definitive.

My application's data is user centric and as a result the application
does a lot of updates and commits.  Additionally, we want to provide
near real-time searching and so replication would have to occur
aggressively.  Does anybody have any strategies for handling such an
application which they would be willing to share?

Thanks,

Shane


Re: rejected email

2011-02-10 Thread Shane Perry
I tried posting from gmail this morning and had it rejected.  When I
resent as plaintext, it went through.

On Thu, Feb 10, 2011 at 11:51 AM, Erick Erickson
 wrote:
> Anyone else having problems with the Solr users list suddenly deciding
> everything you send is spam? For the last couple of days I've had this
> happening from gmail, and as far as I know I haven't changed anything that
> would give my mails a different "spam score" which is being exceeded
> according to the bounced message...
>
> Thanks,
> Erick
>


Re: Omit hour-min-sec in search?

2011-03-03 Thread Shane Perry
Not sure if there is a means of doing explicitly what you ask, but you
could do a date range:

+mydate:[-MM-DD 0:0:0 TO -MM-DD 11:59:59]

On Thu, Mar 3, 2011 at 9:14 AM, bbarani  wrote:
> Hi,
>
> Is there a way to omit hour-min-sec in SOLR date field during search?
>
> I have indexed a field using TrieDateField and seems like it uses UTC
> format. The dates get stored as below,
>
> lastupdateddate">2008-02-26T20:40:30.94Z
>
> I want to do a search based on just -MM-DD and omit T20:40:30.94Z.. Not
> sure if its feasible, just want to check if its possible.
>
> Also most of the data in our source doesnt have time information hence we
> are very much interested in just storing the date without time or even if
> its stored with some default timestamp we want to search just using date
> without using the timestamp.
>
> Thanks,
> Barani
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Omit-hour-min-sec-in-search-tp2625840p2625840.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


ICUTokenizer ArrayIndexOutOfBounds

2012-10-17 Thread Shane Perry
Hi,

I've been playing around with using the ICUTokenizer from 4.0.0.
Using the code below, I was receiving an ArrayIndexOutOfBounds
exception on the call to tokenizer.incrementToken().  Looking at the
ICUTokenizer source, I can see why this is occuring (usableLength
defaults to -1).

ICUTokenizer tokenizer = new ICUTokenizer(myReader);
CharTermAttribute termAtt = 
tokenizer.getAttribute(CharTermAttribute.class);

while(tokenizer.incrementToken())
{
System.out.println(termAtt.toString());
}

After poking around a little more, I found that I can just call
tokenizer.reset() (initializes usableLength to 0) right after
constructing the object
(org.apache.lucene.analysis.icu.segmentation.TestICUTokenizer does a
similar step in it's super class).  I was wondering if someone could
explain why I need to call tokenizer.reset() prior to using the
tokenizer for the first time.

Thanks in advance,

Shane