from:"solr user"

Faceting and Grouping Performance Degradation in Solr 5

2016-05-04 Thread Solr User

I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had
to abort due to average response times degraded from a baseline volume
performance test.  The affected queries involved faceting (both enum method
and default) and grouping.  There is a critical bug
https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
gather is the cause of the slower response times.  One concern I have is
that discussions around the issue offer the suggestion of indexing with
docValues which alleviated the problem in at least that one reported case.
However, indexing with docValues did not improve the performance in my case.

Can someone please confirm or correct my understanding that this issue has
no path forward at this time and specifically that it is already known that
docValues does not necessarily solve this?

Thanks in advance!

Indexing a (File attached to a document)

2016-05-12 Thread Solr User

Hi

If I index a document with a file attachment attached to it in solr, can I
visualise data of that attached file attachment also while querying that
particular document? Please help me on this


Thanks & Regards
Vidya Nadella



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User

Does anyone know the answer to this?

On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:

> I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had
> to abort due to average response times degraded from a baseline volume
> performance test.  The affected queries involved faceting (both enum method
> and default) and grouping.  There is a critical bug
> https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> gather is the cause of the slower response times.  One concern I have is
> that discussions around the issue offer the suggestion of indexing with
> docValues which alleviated the problem in at least that one reported case.
> However, indexing with docValues did not improve the performance in my case.
>
> Can someone please confirm or correct my understanding that this issue has
> no path forward at this time and specifically that it is already known that
> docValues does not necessarily solve this?
>
> Thanks in advance!
>
>
>

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User

Joel,

Thank you for taking the time to respond to my question.  I tried the JSON
Facet API for one query that uses facet.method=enum (since this one has a
ton of unique values and performed better with enum) but this was way
slower than even the slower Solr 5 times.  I did not try the new API with
the non-enum queries though so I will give that a go.  It looks like Solr
5.5.1 also has a facet.method=uif which will be interesting to try.

If these do not prove helpful, it looks like I will need to wait for
SOLR-8096 to be resolved before upgrading.

Thanks also for your comment on top_fc for the CollapsingQParser.  I use
collapse/expand for some queries but traditional grouping for others due to
performance.  It will be interesting to see if those grouping queries
perform better now using CollapsingQParser with top_fc.

On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein  wrote:

> Yes, SOLR-8096 is the issue here.
>
> I don't believe indexing with docValues is going to help too much with
> this. The enum slowness may not be related, but I'm not positive about
> that.
>
> The major slowdowns are likely due to the removal of the top level
> FieldCache from general use and the removal of the FieldValuesCache which
> was used for multi-value field faceting.
>
> The JSON facet API covers all the functionality in the traditional
> faceting, and it has been developed to be very performant.
>
> You may also want to see if Collapse/Expand can meet your applications
> needs rather Grouping. It allows you to specify using a top level
> FieldCache if performance is a blocker without it.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:
>
> > Does anyone know the answer to this?
> >
> > On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
> >
> > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but
> > had
> > > to abort due to average response times degraded from a baseline volume
> > > performance test.  The affected queries involved faceting (both enum
> > method
> > > and default) and grouping.  There is a critical bug
> > > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> > > gather is the cause of the slower response times.  One concern I have
> is
> > > that discussions around the issue offer the suggestion of indexing with
> > > docValues which alleviated the problem in at least that one reported
> > case.
> > > However, indexing with docValues did not improve the performance in my
> > case.
> > >
> > > Can someone please confirm or correct my understanding that this issue
> > has
> > > no path forward at this time and specifically that it is already known
> > that
> > > docValues does not necessarily solve this?
> > >
> > > Thanks in advance!
> > >
> > >
> > >
> >
>

Re: Indexing a (File attached to a document)

2016-05-23 Thread Solr User

Hi 
I am using MapReduceIndexer Tool to index data from hdfs , using morphlines
as ETL tool.

Specifying data path as xpath's in morphline file.

sorry for delay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-26 Thread Solr User

Thanks again for your work on honoring the facet.method.  I have an
observation that I would like to share and get your feedback on if possible.

I performance tested Solr 5.5.2 with various facet queries and the only way
I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
Here are the details.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
5.5.2 (without deletes): 125 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
5.5.2 (without deletes): 42 ms
5.5.2 (1 segment without deletes): 34 ms



On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> Interesting developments :
>
> https://issues.apache.org/jira/browse/SOLR-9176
>
> I think we found why term Enum seems slower in recent Solr !
> In our case it is likely to be related to the commit I mention in the Jira.
> Have a check Joel !
>
> On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > I am investigating this scenario right now.
> > I can confirm that the enum slowness is in Solr 6.0 as well.
> > And I agree with Joel, it seems to be un-related with the famous faceting
> > regression :(
> >
> > Furthermore with the legacy facet approach, if you set docValues for the
> > field you are not going to be able to try the enum approach anymore.
> >
> > org/apache/solr/request/SimpleFacets.java:448
> >
> > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> >   // only fc can handle docvalues types
> >   method = FacetMethod.FC;
> > }
> >
> >
> > I got really horrible regressions simply using term enum in both Solr 4
> > and Solr 6.
> >
> > And even the most optimized fcs approach with docValues and
> > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> >
> > i.e.
> >
> > For some sample queries I have 40 ms vs 160 ms and similar...
> > I think we should open an issue if we can confirm it is not related with
> > the other.
> > A lot of people will continue using the legacy approach for a while...
> >
> > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein 
> > wrote:
> >
> >> The enum slowness is interesting. It would appear on the surface to not
> be
> >> related to the FieldCache issue. I don't think the main emphasis of the
> >> JSON facet API has been the enum approach. You may find using the JSON
> >> facet API and eliminating the use of enum meets your performance needs.
> >>
> >> With the CollapsingQParserPlugin top_fc is definitely faster during
> >> queries. The tradeoff is slower warming times and increased memory usage
> >> if
> >> the collapse fields are used in faceting, as faceting will load the
> field
> >> into a different cache.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
> >>
> >> > Joel,
> >> >
> >> > Thank you for taking the time to respond to my question.  I tried the
> >> JSON
> >> > Facet API for one query that uses facet.method=enum (since this one
> has
> >> a
> >> > ton of unique values and performed better with enum) but this was way
> >> > slower than even the slower Solr 5 times.  I did not try the new API
> >> with
> >> > the non-enum queries though so I will give that a go.  It looks like
> >> Solr
> >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> >> >
> >> > If these do not prove helpful, it looks like I will need to wait for
> >> > SOLR-8096 to be resolved before upgrading.
> >> >
> >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> use
> >> > collapse/expand for some queries but traditional grouping for others
> >> due to
> >> > performance.  It will be interesting to see if those grouping queries
> >> > perform better now using CollapsingQParser with top_fc.
> >> >
> >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
> >> > wrote:
> >> >
> >> > > Yes, SOLR-8096 is the issue her

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Solr User

Further testing indicates that any performance difference is not due to
deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes.
The times appear to converge on an optimized index.  Below are the
details.  Not sure what else to make of this at this point other than
moving forward with an upgrade with an optimized index wherever possible.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
4.8.1 (without deletes): 104 ms
5.5.2 (without deletes): 125 ms
4.8.1 (1 segment without deletes): 55 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
4.8.1 (without deletes): 35 ms
5.5.2 (without deletes): 42 ms
4.8.1 (1 segment without deletes): 28 ms
5.5.2 (1 segment without deletes): 34 ms

On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti  wrote:

> Hi !
> At the time we didn't investigate the deletion implication at all.
> This can be interesting.
> if you proceed with your investigations and discover what changed in the
> deletion approach, I would be more than happy to help!
>
> Cheers
>
> On Mon, Sep 26, 2016 at 10:59 PM, Solr User  wrote:
>
> > Thanks again for your work on honoring the facet.method.  I have an
> > observation that I would like to share and get your feedback on if
> > possible.
> >
> > I performance tested Solr 5.5.2 with various facet queries and the only
> way
> > I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> > Here are the details.
> >
> > Scenario #1:  Using facet.method=uif with faceting on several
> multi-valued
> > fields.
> > 4.8.1 (with deletes): 115 ms
> > 5.5.2 (with deletes): 155 ms
> > 5.5.2 (without deletes): 125 ms
> > 5.5.2 (1 segment without deletes): 44 ms
> >
> > Scenario #2:  Using facet.method=enum with faceting on several
> multi-valued
> > fields.  These fields are different than Scenario #1 and perform much
> > better with enum hence that method is used instead.
> > 4.8.1 (with deletes): 38 ms
> > 5.5.2 (with deletes): 49 ms
> > 5.5.2 (without deletes): 42 ms
> > 5.5.2 (1 segment without deletes): 34 ms
> >
> >
> >
> > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > Interesting developments :
> > >
> > > https://issues.apache.org/jira/browse/SOLR-9176
> > >
> > > I think we found why term Enum seems slower in recent Solr !
> > > In our case it is likely to be related to the commit I mention in the
> > Jira.
> > > Have a check Joel !
> > >
> > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > > abenede...@apache.org> wrote:
> > >
> > > > I am investigating this scenario right now.
> > > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > > And I agree with Joel, it seems to be un-related with the famous
> > faceting
> > > > regression :(
> > > >
> > > > Furthermore with the legacy facet approach, if you set docValues for
> > the
> > > > field you are not going to be able to try the enum approach anymore.
> > > >
> > > > org/apache/solr/request/SimpleFacets.java:448
> > > >
> > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > > >   // only fc can handle docvalues types
> > > >   method = FacetMethod.FC;
> > > > }
> > > >
> > > >
> > > > I got really horrible regressions simply using term enum in both
> Solr 4
> > > > and Solr 6.
> > > >
> > > > And even the most optimized fcs approach with docValues and
> > > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > > >
> > > > i.e.
> > > >
> > > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > > I think we should open an issue if we can confirm it is not related
> > with
> > > > the other.
> > > > A lot of people will continue using the legacy approach for a
> while...
> > > >
> > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein  >
> > > > wrote:
> > > >
> > > >> The en

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User

Certainly.  And I would of course welcome anyone else to test this for
themselves especially with facet.method=uif to see if that has indeed
bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
testing is invalid due to variance, problem in process, etc.  One thing I
was pondering is if I should force merge the index to a certain amount of
segments because indexing yields a random number of segments and
deletions.  The only thing stopping me short of doing that were
observations of longer Solr 4 times even with more deletions and similar
number of segments.

We use Soasta as our testing tool.  Before testing, load is sent for 10-15
minutes to make sure any Solr caches have stabilized.  Then the test is run
for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
Scenario #2 tested at 100 req/sec.  Each request is different with input
being pulled from data files.  The requests are repeatable test to test.

The numbers posted above are average response times as reported by Soasta.
However, respective time differences are supported by Splunk which indexes
the Solr logs and Dynatrace which is instrumented on one of the JVM's.

The versions are deployed to the same machines thereby overlaying the
previous installation.  Going Solr 4 to Solr 5, full indexing is run with
the same input data.  Being in SolrCloud mode, the full indexing comprises
of indexing all documents and then deleting any that were not touched.
Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
results as the previous Solr 4 test.

On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
wrote:

> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> > Further testing indicates that any performance difference is not due
> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> > deletes.
>
> Sanity check: Could you describe how you test?
>
> * How many queries do you issue for each test?
> * Are each query a new one or do you re-use the same query?
> * Do you discard the first X calls?
> * Are the numbers averages, medians or something third?
> * What do you do about disk cache?
> * Are both Solr's on the same machine?
> * Do they use the same index?
> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>
> - Toke Eskildsen, State and University Library, Denmark
>

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User

I plan to re-test this in a separate environment that I have more control
over and will share the results when I can.

On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:

> Certainly.  And I would of course welcome anyone else to test this for
> themselves especially with facet.method=uif to see if that has indeed
> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
> testing is invalid due to variance, problem in process, etc.  One thing I
> was pondering is if I should force merge the index to a certain amount of
> segments because indexing yields a random number of segments and
> deletions.  The only thing stopping me short of doing that were
> observations of longer Solr 4 times even with more deletions and similar
> number of segments.
>
> We use Soasta as our testing tool.  Before testing, load is sent for 10-15
> minutes to make sure any Solr caches have stabilized.  Then the test is run
> for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
> Scenario #2 tested at 100 req/sec.  Each request is different with input
> being pulled from data files.  The requests are repeatable test to test.
>
> The numbers posted above are average response times as reported by
> Soasta.  However, respective time differences are supported by Splunk which
> indexes the Solr logs and Dynatrace which is instrumented on one of the
> JVM's.
>
> The versions are deployed to the same machines thereby overlaying the
> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
> the same input data.  Being in SolrCloud mode, the full indexing comprises
> of indexing all documents and then deleting any that were not touched.
> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
> results as the previous Solr 4 test.
>
>
> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
> wrote:
>
>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>> > Further testing indicates that any performance difference is not due
>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>> > deletes.
>>
>> Sanity check: Could you describe how you test?
>>
>> * How many queries do you issue for each test?
>> * Are each query a new one or do you re-use the same query?
>> * Do you discard the first X calls?
>> * Are the numbers averages, medians or something third?
>> * What do you do about disk cache?
>> * Are both Solr's on the same machine?
>> * Do they use the same index?
>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>
>

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-10-03 Thread Solr User

Below is some further testing.  This was done in an environment that had no
other queries or updates during testing.  We ran through several scenarios
so I pasted this with HTML formatting below so you may view this as a
table.  Sorry if you have to pull this out into a different file for
viewing, but I did not want the formatting to be messed up.  The times are
average times in milliseconds.  Same test methodology as above except there
was a 5 minute warmup and a 15 minute test.

Note that both the segment and deletions were recorded from only 1 out of 2
of the shards so we cannot try to extrapolate a function between them and
the outcome.  In other words, just view them as "non-optimized" versus
"optimized" and "has deletions" versus "no deletions".  The only exceptions
are the 0 deletes were true for both shards and the 1 segment and 8 segment
cases were true for both shards.  A few of the tests were repeated as well.

The only conclusion that I could draw is that the number of segments and
the number of deletes appear to greatly influence the response times, at
least more than any difference in Solr version.  There also appears to be
some external contributor to variancemaybe network, etc.

Thoughts?


Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr
Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted
Docs578735787317695859369459369457873578735787357873Segment
Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario
#119821014518619020820921020610914273701601098385Scenario
#29288596258727077746873636166545251




On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:

> I plan to re-test this in a separate environment that I have more control
> over and will share the results when I can.
>
> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>
>> Certainly.  And I would of course welcome anyone else to test this for
>> themselves especially with facet.method=uif to see if that has indeed
>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>> testing is invalid due to variance, problem in process, etc.  One thing I
>> was pondering is if I should force merge the index to a certain amount of
>> segments because indexing yields a random number of segments and
>> deletions.  The only thing stopping me short of doing that were
>> observations of longer Solr 4 times even with more deletions and similar
>> number of segments.
>>
>> We use Soasta as our testing tool.  Before testing, load is sent for
>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>> with input being pulled from data files.  The requests are repeatable test
>> to test.
>>
>> The numbers posted above are average response times as reported by
>> Soasta.  However, respective time differences are supported by Splunk which
>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>> JVM's.
>>
>> The versions are deployed to the same machines thereby overlaying the
>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>> of indexing all documents and then deleting any that were not touched.
>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>> results as the previous Solr 4 test.
>>
>>
>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>> wrote:
>>
>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>> > Further testing indicates that any performance difference is not due
>>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>>> > deletes.
>>>
>>> Sanity check: Could you describe how you test?
>>>
>>> * How many queries do you issue for each test?
>>> * Are each query a new one or do you re-use the same query?
>>> * Do you discard the first X calls?
>>> * Are the numbers averages, medians or something third?
>>> * What do you do about disk cache?
>>> * Are both Solr's on the same machine?
>>> * Do they use the same index?
>>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>
>>
>

ClassNotFoundException with Custom ZkACLProvider

2016-11-07 Thread Solr User

This is mostly just an FYI regarding future work on issues like SOLR-8792.

I wanted admin update but world read on ZK since I do not have anything
sensitive from a read perspective in the Solr data and did not want to
force all SolrCloud clients to implement authentication just for read.  So,
I extended DefaultZkACLProvider and implemented a replacement for
VMParamsAllAndReadonlyDigestZkACLProvider.

My custom code is loaded from the sharedLib in solr.xml.  However, there is
a temporary ZK lookup to read solr.xml (and chroot) which is obviously done
before loading sharedLib.  Therefore, I am faced with a
ClassNotFoundException.  This has no negative effect on the ACL
functionalityjust the annoying stack trace in the logs.  I do not want
to package this custom code with the Solr code and do not want to package
this along with Solr dependencies in the Jetty lib/ext.

So, I am planning to live with the stack trace and just wanted to share
this for any future work on the dynamic solr.xml and chroot lookups or in
case I am missing some work-around.

Thanks!

Re: ClassNotFoundException with Custom ZkACLProvider

2016-11-15 Thread Solr User

For those interested, I ended up bundling the customized ACL provider with
the solr.war.  I could not stomach looking at the stack trace in the logs.

On Mon, Nov 7, 2016 at 4:47 PM, Solr User  wrote:

> This is mostly just an FYI regarding future work on issues like SOLR-8792.
>
> I wanted admin update but world read on ZK since I do not have anything
> sensitive from a read perspective in the Solr data and did not want to
> force all SolrCloud clients to implement authentication just for read.  So,
> I extended DefaultZkACLProvider and implemented a replacement for
> VMParamsAllAndReadonlyDigestZkACLProvider.
>
> My custom code is loaded from the sharedLib in solr.xml.  However, there
> is a temporary ZK lookup to read solr.xml (and chroot) which is obviously
> done before loading sharedLib.  Therefore, I am faced with a
> ClassNotFoundException.  This has no negative effect on the ACL
> functionalityjust the annoying stack trace in the logs.  I do not want
> to package this custom code with the Solr code and do not want to package
> this along with Solr dependencies in the Jetty lib/ext.
>
> So, I am planning to live with the stack trace and just wanted to share
> this for any future work on the dynamic solr.xml and chroot lookups or in
> case I am missing some work-around.
>
> Thanks!
>
>

Re: Work-around for "indexed without position data"

2017-06-05 Thread Solr User

Sorry for the delay.  I was able to reproduce this easily with my setup,
but reproducing this on a Solr example proved challenging.  Hopefully the
work that I did to find the situation in which this is produced will help
in resolving the problem.  The driving factor for this appears to be how
updates are sent to Solr.  When sending batches of updates with commits,
the problem is reproduced.  If the commit is held until after all updates
are sent, then no problem is produced.  This leads me to believe that this
issue has something to do with overlapping commits or index merges.  This
was reproducible regardless of running classic or managed schema and
regardless of running Solr core or SolrCloud.

There are not many steps to reproduce this, but you will need a way to send
these updates.  I have included inline create.sh and create.pl scripts to
generate the data and send the updates.  You can index a lastModified field
or something to convince yourself that everything has been re-indexed.  I
left that out to keep the steps lean.  Also, this test is using commit
statements from the client sending the updates for simplicity even though
it is not a good practice.  My normal setup is using Solrj with
commitWithin to allow Solr to manage when the commits take place, but the
same error is produced either way.

*STEPS TO REPRODUCE*

   1. Install Solr 5.5.3 and change to that working directory
   2. bin/solr -e techproducts
   3. bin/solr stop [Why these next 3 steps?  These are to start the
   index completely new without the 32 example documents as opposed to a
   delete query.  The documents are not posted after the core is detected the
   second time.]
   4. rm -rf ./example/techproducts/solr/techproducts/data/
   5. bin/solr -e techproducts
   6. ./create.sh
   7. curl -X POST -H 'Content-type:application/json' --data-binary '{
   "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true,
   "multiValued":true, "stored":true } }'
   http://localhost:8983/solr/techproducts/schema
   8.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error]
   9. ./create.sh
   10.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error even though all documents have been re-indexed]

*create.sh*
#!/bin/bash
for i in {1..100}; do
echo "$i"
./create.pl $i > ./create.xml$i
curl http://localhost:8983/solr/techproducts/update?commit=true -H
"Content-Type: text/xml" --data-binary @./create.xml$i
done

*create.pl <http://create.pl>*
#!/usr/bin/perl
my $S = $ARGV[0];
my $I = 100;
my $N = $S*$I + $I;
my $i;
print "\n";
for($i=$S*$I; $i<$N; $i++) {
   print "SP${i}cat
hard drive ${i}\n";
}
print "\n";

On Fri, May 26, 2017 at 2:14 AM, Rick Leir  wrote:

> Can you reproduce this error? What are the steps you take to reproduce it?
> ( simple is better).
>
> cheers -- Rick
>
>
>
> On 2017-05-25 05:46 PM, Solr User wrote:
>
>> This is in regards to changing a field type from string to
>> text_en_splitting, re-indexing all documents, even optimizing to give the
>> index a chance to merge segments and rewrite itself entirely, and then
>> getting this error when running a phrase query:
>> java.lang.IllegalStateException: field "blah" was indexed without
>> position
>> data; cannot run PhraseQuery
>>
>> I have encountered this issue before and have always done one of the
>> following as a work-around:
>> 1.  Instead of changing the field type on an existing field just create a
>> new field and retire the old one.
>> 2.  Delete the index directory and start from scratch.
>>
>> These work-arounds are not always ideal.  Does anyone know what is holding
>> onto that old field type definition?  What thinks it is still a string?
>> Every document has been re-indexed and I am sure of this because I have a
>> time stamp indexed.  Is there any other way to get this to work?
>>
>> For what it is worth, I am running this in SolrCloud mode but I remember
>> seeing this issue before SolrCloud was released as well.
>>
>>
>

Anonymous Read?

2017-06-06 Thread Solr User

Is it possible to setup Solr security to allow anonymous query (/select
etc.) but restricted access to other permissions as described in
https://lucidworks.com/2015/08/17/securing-solr-basic-auth-permission-rules/
?

Re: Anonymous Read?

2017-06-06 Thread Solr User

Thanks!  The null role value did the trick.  I tried this with the
predefined permissions and it worked as well.  Thanks again!

On Tue, Jun 6, 2017 at 2:08 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> We usually end security.json with the permissions
>
>{
> "name":"open_select",
>  "path":"/select/*",
>  "role":null},
>  {
> "name":"all-admin",
> "collection":null,
> "path":"/*",
> "role":"allgen"},
>  {
> "name":"all-core-handlers",
> "path":"/*",
>  "role":"allgen"}]
>  } }
>
>
> ...and then assign the "allgen" role to all users
>
> This allows a select without a login & password, but requires a login &
> password for anything else (including the front page of the GUI)
>
> -Original Message-
> From: Solr User [mailto:solr...@gmail.com]
> Sent: Tuesday, June 06, 2017 2:27 PM
> To: solr-user@lucene.apache.org
> Subject: Anonymous Read?
>
> Is it possible to setup Solr security to allow anonymous query (/select
> etc.) but restricted access to other permissions as described in
> https://lucidworks.com/2015/08/17/securing-solr-basic-
> auth-permission-rules/
> ?
>

Re: Work-around for "indexed without position data"

2017-07-03 Thread Solr User

Not sure if it helps beyond the steps to reproduce that I supplied above,
but I also see that "Omit Term Frequencies & Positions" is still set on the
field according to the LukeRequestHandler:

ITS--OF--



On Mon, Jun 5, 2017 at 1:18 PM, Solr User  wrote:

> Sorry for the delay.  I was able to reproduce this easily with my setup,
> but reproducing this on a Solr example proved challenging.  Hopefully the
> work that I did to find the situation in which this is produced will help
> in resolving the problem.  The driving factor for this appears to be how
> updates are sent to Solr.  When sending batches of updates with commits,
> the problem is reproduced.  If the commit is held until after all updates
> are sent, then no problem is produced.  This leads me to believe that this
> issue has something to do with overlapping commits or index merges.  This
> was reproducible regardless of running classic or managed schema and
> regardless of running Solr core or SolrCloud.
>
> There are not many steps to reproduce this, but you will need a way to
> send these updates.  I have included inline create.sh and create.pl
> scripts to generate the data and send the updates.  You can index a
> lastModified field or something to convince yourself that everything has
> been re-indexed.  I left that out to keep the steps lean.  Also, this test
> is using commit statements from the client sending the updates for
> simplicity even though it is not a good practice.  My normal setup is using
> Solrj with commitWithin to allow Solr to manage when the commits take
> place, but the same error is produced either way.
>
>
> *STEPS TO REPRODUCE*
>
>1. Install Solr 5.5.3 and change to that working directory
>2. bin/solr -e techproducts
>3. bin/solr stop [Why these next 3 steps?  These are to start the
>index completely new without the 32 example documents as opposed to a
>delete query.  The documents are not posted after the core is detected the
>second time.]
>4. rm -rf ./example/techproducts/solr/techproducts/data/
>5. bin/solr -e techproducts
>6. ./create.sh
>7. curl -X POST -H 'Content-type:application/json' --data-binary '{
>"replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true,
>"multiValued":true, "stored":true } }' http://localhost:8983/solr/
>techproducts/schema
>8. http://localhost:8983/solr/techproducts/select?q=cat:%
>22hard%20drive%22  [error]
>9. ./create.sh
>10. http://localhost:8983/solr/techproducts/select?q=cat:%
>22hard%20drive%22  [error even though all documents have been
>re-indexed]
>
> *create.sh*
> #!/bin/bash
> for i in {1..100}; do
> echo "$i"
> ./create.pl $i > ./create.xml$i
> curl http://localhost:8983/solr/techproducts/update?commit=true -H
> "Content-Type: text/xml" --data-binary @./create.xml$i
> done
>
> *create.pl <http://create.pl>*
> #!/usr/bin/perl
> my $S = $ARGV[0];
> my $I = 100;
> my $N = $S*$I + $I;
> my $i;
> print "\n";
> for($i=$S*$I; $i<$N; $i++) {
>print "SP${i}cat
> hard drive ${i}\n";
> }
> print "\n";
>
>
> On Fri, May 26, 2017 at 2:14 AM, Rick Leir  wrote:
>
>> Can you reproduce this error? What are the steps you take to reproduce
>> it? ( simple is better).
>>
>> cheers -- Rick
>>
>>
>>
>> On 2017-05-25 05:46 PM, Solr User wrote:
>>
>>> This is in regards to changing a field type from string to
>>> text_en_splitting, re-indexing all documents, even optimizing to give the
>>> index a chance to merge segments and rewrite itself entirely, and then
>>> getting this error when running a phrase query:
>>> java.lang.IllegalStateException: field "blah" was indexed without
>>> position
>>> data; cannot run PhraseQuery
>>>
>>> I have encountered this issue before and have always done one of the
>>> following as a work-around:
>>> 1.  Instead of changing the field type on an existing field just create a
>>> new field and retire the old one.
>>> 2.  Delete the index directory and start from scratch.
>>>
>>> These work-arounds are not always ideal.  Does anyone know what is
>>> holding
>>> onto that old field type definition?  What thinks it is still a string?
>>> Every document has been re-indexed and I am sure of this because I have a
>>> time stamp indexed.  Is there any other way to get this to work?
>>>
>>> For what it is worth, I am running this in SolrCloud mode but I remember
>>> seeing this issue before SolrCloud was released as well.
>>>
>>>
>>
>

Re: Faceting and Grouping Performance Degradation in Solr 5

2017-02-06 Thread Solr User

I am pleased to report that we are in Production on Solr 5.5.3 with
comparable performance to Solr 4.8.1 through leveraging facet.method=uif as
well as https://issues.apache.org/jira/browse/SOLR-9176.  Thanks to
everyone who worked on these!

On Mon, Oct 3, 2016 at 3:55 PM, Solr User  wrote:

> Below is some further testing.  This was done in an environment that had
> no other queries or updates during testing.  We ran through several
> scenarios so I pasted this with HTML formatting below so you may view this
> as a table.  Sorry if you have to pull this out into a different file for
> viewing, but I did not want the formatting to be messed up.  The times are
> average times in milliseconds.  Same test methodology as above except there
> was a 5 minute warmup and a 15 minute test.
>
> Note that both the segment and deletions were recorded from only 1 out of
> 2 of the shards so we cannot try to extrapolate a function between them and
> the outcome.  In other words, just view them as "non-optimized" versus
> "optimized" and "has deletions" versus "no deletions".  The only exceptions
> are the 0 deletes were true for both shards and the 1 segment and 8 segment
> cases were true for both shards.  A few of the tests were repeated as well.
>
> The only conclusion that I could draw is that the number of segments and
> the number of deletes appear to greatly influence the response times, at
> least more than any difference in Solr version.  There also appears to be
> some external contributor to variancemaybe network, etc.
>
> Thoughts?
>
>
> Date9/29/20169/29/
> 20169/29/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/201610/3/
> 201610/3/201610/3/201610/3/2016Solr
> Version5.5.25.5.24.8.14.
> 8.14.8.15.5.25.5.25.5.2<
> /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873
> 57873176958593694593694
> 578735787357873578730<
> /td>00<
> /td>0Segment Count3434 td>1827273434<
> td>34348811 td>8811
> facet.method=uifYESYESN/A<
> td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario
> #1198210145186<
> td>190208209210206 td>1091427370160 td>1098385Scenario
> #29288596258 td>7270777468<
> td>7363616654
> 5251
>
>
>
>
> On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:
>
>> I plan to re-test this in a separate environment that I have more control
>> over and will share the results when I can.
>>
>> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>>
>>> Certainly.  And I would of course welcome anyone else to test this for
>>> themselves especially with facet.method=uif to see if that has indeed
>>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>>> testing is invalid due to variance, problem in process, etc.  One thing I
>>> was pondering is if I should force merge the index to a certain amount of
>>> segments because indexing yields a random number of segments and
>>> deletions.  The only thing stopping me short of doing that were
>>> observations of longer Solr 4 times even with more deletions and similar
>>> number of segments.
>>>
>>> We use Soasta as our testing tool.  Before testing, load is sent for
>>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>>> with input being pulled from data files.  The requests are repeatable test
>>> to test.
>>>
>>> The numbers posted above are average response times as reported by
>>> Soasta.  However, respective time differences are supported by Splunk which
>>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>>> JVM's.
>>>
>>> The versions are deployed to the same machines thereby overlaying the
>>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>>> of indexing all documents and then deleting any that were not touched.
>>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>>> results as the previous Solr 4 test.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>>> wrote:
>>>
>>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>>>

Work-around for "indexed without position data"

2017-05-25 Thread Solr User

This is in regards to changing a field type from string to
text_en_splitting, re-indexing all documents, even optimizing to give the
index a chance to merge segments and rewrite itself entirely, and then
getting this error when running a phrase query:
java.lang.IllegalStateException: field "blah" was indexed without position
data; cannot run PhraseQuery

I have encountered this issue before and have always done one of the
following as a work-around:
1.  Instead of changing the field type on an existing field just create a
new field and retire the old one.
2.  Delete the index directory and start from scratch.

These work-arounds are not always ideal.  Does anyone know what is holding
onto that old field type definition?  What thinks it is still a string?
Every document has been re-indexed and I am sure of this because I have a
time stamp indexed.  Is there any other way to get this to work?

For what it is worth, I am running this in SolrCloud mode but I remember
seeing this issue before SolrCloud was released as well.

does shards.tolerant deal with this scenario?

2014-03-18 Thread solr-user

hi all

I have some questions re shards.tolerant=true and timeAllowed=xxx

I have seen situations where shards.tolerant=true works; if one of the
shards specified in a query is dead, shards.tolerant seems to work and I get
results from the non-dead shards

However, if one of the shards goes down during the execution of a query, I
have to wait for the primary searcher (the solr sending the request to the
shards) to timeout, which can last minutes.  ie shards.tolerant doesn't seem
to work

question 1: is timeAllowed "shard-aware"?  ie in a sharded query, does this
param get used by all the shards specified or does it only get used by the
primary searcher?

question 2: Since shards.tolerant=true is not helping when a shard goes down
during query execution, is there any other way to deal with this?  If
timeAllowed is shard-aware, I would think that I could use timeAware and the
primary searcher would then wait xxx milliseconds and return with whatever
the other shards had sent back.  Is that correct?

thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-shards-tolerant-deal-with-this-scenario-tp4125300.html
Sent from the Solr - User mailing list archive at Nabble.com.

how do I get search for "fort st john" to match "ft saint john"

2014-03-26 Thread solr-user

I have been using solr for a while but started running across situations
where synonyms are required.

the example I have is group of city names that look like "Fort Saint John"
(a city), in a text field.  Users may want to search for "Ft St John" or
"Fort St John" or "Ft Saint John" however

My attempted solution was to create a type that uses SynonymFilterFactory
and a text file of city based synonyms like this:

   saint,st,ste
   fort,ft

this doesnt work however and I am not sure I understand why.

any help appreciated.  thx

p.s. I am using Solr 4.6.1 and here is the field type definition from the
solrconfig.xml:


  






  
  





  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

2014-03-28 Thread solr-user

yes, and I can see that (as expected) per the field type:

1. the indexed value is lowercased
2. stripped of non-alpha characters
3. multiple consecutive whitespace is removed
4. trimmed
5. goes thru the SynonymFilterFactory where:

a. the indexed value of "Marina/Former Fort Ord" is "marina former fort ord"
b. the search value of "Marina/Former Ft Ord" is "marina former ft ord"

This I already knew.  My question wasn't "why" they dont match, it is: how
do I get search for "fort st john" to match "ft saint john".  ie is there a
way to index/search that would allow the search to match.

the SynonymFilterFactory during indexing does not create a matching term for
"marina former ft ord", which I think it would do if the indexed value was a
word instead of a phrase (ie "fort" vs "Marina/Former Fort Ord")

(note that my terms/understanding of how this works may be incorrect, hence
my request for assistance/understanding)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4127764.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-01 Thread solr-user

Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread solr-user

Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", "former",
"fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search on
"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st john"
or ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-03 Thread solr-user

thanks guys.

unfortunately the solr that contains this schema/data is in a legacy system
that requires the fields to not be changed.

we will, hopefully in the near future, be able to look at redesigning the
schema.

alternatively, I could look at boning up on Java (which I havent used in a
long time) and see if I can write a subword synonym plugin of some sort to
perform this type of synonyming

thanks anyhow.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128914.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr3.4 on tomcat 7.0.23 - hung with error "threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed"

2013-12-18 Thread solr-user

were you able to resolve this issue, and if so how??

I am encountering the same issue in a couple of solr versions (including 4.0
and 4.5)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr3-4-on-tomcat-7-0-23-hung-with-error-threw-exception-java-lang-IllegalStateException-Cannot-call-tp4087342p4107286.html
Sent from the Solr - User mailing list archive at Nabble.com.

is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user

lets say I have a largish set of data (120M docs) and that I am partitioning
my data by groups of states (using the state codes)

Someone suggested that I could use the following format in my solrconfig.xml
when defining the filterqueries work:


  

  *:*
  State:AL
  State:AK
...
  State:WY
  


Would that work, and if so how would I know that the cache is being hit?

Or do I need to use the following traditional syntax instead:


  

  *:*
  State:AL


  *:*
  State:AK

...

  *:*
  State:WY

  


any help appreciated



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user

note: by partitioning I mean that I have sharded the 120M docs into 9 Solr
partitions (each on a separate server)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121012.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user

would not breaking the FQs out by state be faster for warming up the fq
caches?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121030.html
Sent from the Solr - User mailing list archive at Nabble.com.

Are there any Java versions we should avoid with Solr

2014-03-04 Thread solr-user

we are currently using Oracle Java 1.7.0_11 23.6-b04 JDK with our Solr 4.6.1
setup

I was looking at upgrading to a more recent version but am wondering, are
there any versions to avoid?

reason I ask is that I see some versions that have GC issues but am not sure
how/if Solr is affected by them.

7u40 has bug with "New minimum young generation size is not properly checked
by the JVM", and with "Irregular crash or corrupt term vectors in the Lucene
libraries"

7u51 has bug with "Memory leak when GCNotifier uses
create_from_platform_dependent_str()"




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Are-there-any-Java-versions-we-should-avoid-with-Solr-tp4121164.html
Sent from the Solr - User mailing list archive at Nabble.com.

how do I stop queries from being logged in two different log files in Tomcat

2014-11-10 Thread solr-user

hi all.

We have a number of solr 1.4x and solr 4.x installations running on tomcat

We are trying to standardize the content of our log files so that we can
automate log analysis; we dont want to use log4j at this time.

In our solr 1.4x installations, the following conf\logging.properties file
is correctly logging queries only to our localhost_access_log.xxx.txt files,
and tomcat type messages to our catalina.xxx.log files

However

in our solr 4.x installations, we are seeing solr queries being logged in
both our localhost_access_log.xxx.txt files and our catalina.xxx.log files.
We dont want the solr queries logged in catalina.xxx.log files since it more
than doubles the amount of logging being done and doubles the disk space
requirement (which can be huge).

Is there a way to configure logging, without using log4j (for now), to only
log solr queries to the localhost_access_log.xxx.txt files??

I have looked at various tomcat logging info and dont see how to do it.

Any help appreciated.

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

handlers = 1catalina.org.apache.juli.FileHandler,
2localhost.org.apache.juli.FileHandler,
3manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler

.handlers = 1catalina.org.apache.juli.FileHandler,
java.util.logging.ConsoleHandler

# Handler specific properties.
# Describes specific configuration info for Handlers.

1catalina.org.apache.juli.FileHandler.level = FINE
1catalina.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
1catalina.org.apache.juli.FileHandler.prefix = catalina.

2localhost.org.apache.juli.FileHandler.level = FINE
2localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
2localhost.org.apache.juli.FileHandler.prefix = localhost.

3manager.org.apache.juli.FileHandler.level = FINE
3manager.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
3manager.org.apache.juli.FileHandler.prefix = manager.

java.util.logging.ConsoleHandler.level = WARNING
java.util.logging.ConsoleHandler.formatter =
java.util.logging.SimpleFormatter

# Facility specific properties.
# Provides extra control for each logger.

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers =
2localhost.org.apache.juli.FileHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level
= INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers
= 3manager.org.apache.juli.FileHandler

# For example, set the org.apache.catalina.util.LifecycleBase logger to log
# each component that extends LifecycleBase changing state:
#org.apache.catalina.util.LifecycleBase.level = FINE

--
View this message in context:
http://lucene.472066.n3.nabble.com/how-do-I-stop-queries-from-being-logged-in-two-different-log-files-in-Tomcat-tp4168587.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I stop queries from being logged in two different log files in Tomcat

2014-11-10 Thread solr-user

awesome Mike.  that does exactly what I want.

many thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-stop-queries-from-being-logged-in-two-different-log-files-in-Tomcat-tp4168587p4168597.html
Sent from the Solr - User mailing list archive at Nabble.com.

confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user

I inherited a set of some old 1.4x Solrs running under tomcat6/java6

while I will eventually upgrade them to a more recent solr/tomcat/java, I am
unable to do in near term

one of my priority fixes tho is to implement some sort of timeout for solr
queries that exceed 1000ms (or so); ie if the query takes longer than that,
I want to abort that query (returning nothing or an error or whatever) so
that solr can process other queries.  while we have optimized our queries
for an average 50ms response time, we do occasionally see some that can run
between 10 and 100 seconds.

I know that this version of Solr itself doesn't have a built in timeout
mechanism, which leaves me with figuring out what to do (it seems to me that
I have to figure out how to get Tomcat to timeout the queries somehow)

note that I DID google until my fingers hurt and have not been able to find
clear (at least not clear to me) instructions on how do to so 

Details:

1. the setup uses the DataImportHandler to updates Solr, and updates occur
often and can be quite large; we use batchSize="1" and autoCommit="true"
with doc size being around 1400 to 1600 bytes.  I dont want the timeout to
kill the imports of course

2. I tried adding a timeout param to the tomcat configuration but it doesnt
work:  

any thoughts??   can anyone point me in the right direction on how to
implement this?

any help appreciated.  thx in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user

millions of documents per shard, with a number of shards
~40gb index folder size
12gb of heap on a 16gb machine (this old Solr doesnt use O/S mem space like
4.x does)
servers are hosted internally, and are powerful

understood.  as mentioned, we tuned the bulk of our queries to run very
quickly (50ms or less), but we do occasionally see queries (ie internal ones
for statistics/tests) that can be excessively long running

Basically, we want to be able to enforce how long those long running queries
are allowed to run



--
View this message in context: 
http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171368.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread solr-user

yes, that solr queries continue to run the query on the solr server even
after a connection is broken was my understanding and concern as well

I was hoping I had overlooked or missed something in Solr or Tomcat
documentation that might do the job

it is unfortunate

if anyone else can think of something, let me know




--
View this message in context: 
http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171379.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-10 Thread solr-user

.schema.IndexSchema 
û [coreA] Schema name=Helios
1863 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  û
user.dir=C:\SOLR\helios-4.10.2\Instance\Master
1864 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  û
SolrDispatchFilter.init() done
1885 [main] INFO  org.eclipse.jetty.server.AbstractConnector  û Started
SocketConnector@0.0.0.0:8086
9895 [qtp618640318-19] INFO  org.apache.solr.servlet.SolrDispatchFilter  û
[admin] webapp=null path=/admin/cores
params={indexInfo=false&_=1418236560709&wt=json} status=0 QTime=17

9931 [qtp618640318-19] INFO  org.apache.solr.servlet.SolrDispatchFilter  û
[admin] webapp=null path=/admin/info/system params={_=1418236560885&wt=json}
status=0 QTime=2




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-10 Thread solr-user

definitely puzzling.

am running this on my local box (ie using http://localhost:8086/solr) and it
is the only running instance of any solr.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-10 Thread solr-user

log tab shows "No Events available"
no errors at all in the CMD console

my test version hasnt got any logging changes that are already in the
default solr 4.10.2 package

some kind of warning or error message would have been helpful...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173627.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user

my apologies for the lack of clarity

our internal name for the project to upgrade solr from 4.0 to 4.10.2 is
"helios" and so we named our test folder "heliosearch". I was not even
aware of the github project Heliosearch, and nothing we are doing is related
to it.

to simplify things for this post, we simplified things so that we have one
solr instance but two cores; coreX contains the collection1 files/folders
as per the downloaded solr 4.10.2 package, while coreA uses the same
collection1 files/folders but with schema.xml and solrconfig.xml changes to
meet our needs

so file and foldername-wise, here is what we did:

1. C:\SOLR\solr-4.10.2.zip\solr-4.10.2\example renamed to
C:\SOLR\helios-4.10.2\Master
2. renamed example\solr\collection1 to example\solr\coreX; no files modified
here
3. copied example\solr\coreX to example\solr\coreA
4. modified the coreA schema to match our current production schema; ie our
field names, etc
5. modified the coreA solrconfig.xml to meet our needs (see below)

here are the solrconfig.xml changes we made to coreA

1.
2. 4
3. false
4. false
5. commented out autoCommit section
6. commented out autoSoftCommit section
7. commented out the section
8. 4
9.
10. contains geocluster
11. commented out these sections:

here are the schema.xml changes we made to our copy of the downloaded solr
4.10.2 package (aside from replacing the example fields provided in the
downloaded solr 4.10.2):

1.
2. removed the example fields provided in the downloaded solr 4.10.2
3. delete various types we dont use in our current schemas
4. added fieldtypes that are in our current solr 4.0 instances
5. added various fieldtypes that are in our current solr 4.0 instances
6. readded the "text" field as apparently required:

also note that we are using java "1.7.0_67" and jetty-8.1.10.v20130312

all in all, I dont see anything that we have done that would keep the cores
from being discovered.

hope that helps.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173831.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user

small correction;  coreX (the one with the unmodified schema.xml and
solrconfig.xml) IS seen by solr and appears on the solr admin page, but
coreA (which has our modified schema and solrconfig) is found by solr but is
not shown in the solr admin page:

1494 [main] INFO  org.apache.solr.core.CoresLocator  û Looking for core
definitions underneath C:\SOLR\helios-4.10.2\Master\solr
1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreA in
C:\SOLR\helios-4.10.2\Master\solr\coreA\
1502 [main] INFO  org.apache.solr.core.CoresLocator  û Found core coreX in
C:\SOLR\helios-4.10.2\Master\solr\coreX\
1503 [main] INFO  org.apache.solr.core.CoresLocator  û Found 2 core
definitions





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173832.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user

yes, have triple checked the schema and solrconfig XML; various tools have
indicated the XML is valid

no missing types or dupes, and have not disabled the admin handler

as mentioned in my most recent response, I can see the coreX core (the
renamed and unmodified collection1 core from the downloaded package) and
query it with no issues, but coreA (whch has our specific schema and
solrconfig changes) is not showing in the admin interface and cannot be
queried (I get a 404)

both cores are located in the same solr folder.

appreciate the suggestions; looks like I will need to gradually move my
schema and core changes towards the collection1 content and see where things
start working; will take a while...sigh

will let you know what I find out.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173839.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-11 Thread solr-user

Chris, will get the schema and solrconfig ready for uploading.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173840.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-12 Thread solr-user

I did find out the cause of my problems.  Turns out the problem wasn't due to
the solrconfig.xml file; it was in the schema.xml file

I spent a fair bit of time making my solrconfig closer to the default
solrconfig.xml in the solr download; when that didnt get rid of the error I
went back to the only other file we had that was different

Turns out the line that was causing the problem was the middle line in this
location_rpt fieldtype definition:



The spatialContextFactory line caused the core to not load even tho no
error/warning messages were shown.

I missed that extra line somehow; mea culpa.

Anyhow, I really appreciate the responses/help I got on this issue.  many
thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4174118.html
Sent from the Solr - User mailing list archive at Nabble.com.

what does this "write.lock does not exist" mean??

2014-12-19 Thread solr-user

I looked for messages on the following error but dont see anything in nabble. 
Does anyone know what this error means and how to correct it??

SEVERE: java.lang.IllegalArgumentException:
/var/apache/my-solr-slave/solr/coreA/data/index/write.lock does not exist

I also occasionally see error messages about specific index files such as
this:

SEVERE: null:java.lang.IllegalArgumentException:
/var/apache/my_solr-slave/solr/coreA/data/index/_md39_1.del does not exist

I am using Solr 4.0.0, with Java 1.7.0_11-b21 and tomcat 7.0.34, running on
a 12GB centos box; we have master/slave setup with multiple slave searchers
per indexer.

any thoughts on this would be appreciated



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-does-this-write-lock-does-not-exist-mean-tp4175291.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-20 Thread solr-user

interesting.  unfortunately, time to take a break and so will have to deal
with this in the new year tho.

Merry Christmas and thanks for all the time and effort you guys put in
answering all of our questions.  It is much appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4175423.html
Sent from the Solr - User mailing list archive at Nabble.com.

Getting a word count frequency out of a page field

2012-01-20 Thread solr user

SOLR reports the term occurrence for terms over all the documents. I am
having trouble making a query that returns the term occurrence in a
specific page field called, documentPageId.

I don't know how to issue a proper SOLR query that returns a word count for
a paragraph of text such as the term "amplifier" for a field. For some
reason it only returns.

The things I've tried only return a count for 1 occurrence of the term even
though I see the term in the paragraph more than just once.

I've tried faceting on the field, "contents"

http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count



21



1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0








In schema.xml:
 
 

In solrconfig.xml:

   filewrapper
   caseNumber
   pageNumber
   documentId
   contents
   documentId
   caseNumber
   pageNumber
  documentPageId
   contents

Thanks in advance,

Re: Getting a word count frequency out of a page field

2012-01-22 Thread solr user

See comments inline below.

On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson wrote:

> Faceting won't work at all. Its function is to return the count
> of the *documents* that a value occurs in, so that's no good
> for your use case.
>
> "I don't know how to issue a proper SOLR query that returns a word count
> for
> a paragraph of text such as the term "amplifier" for a field. For some
> reason it only returns."
>
> This is really unclear. Are you asking for the word counts of a paragraph
> that contains "amplifier"? The number of times "amplifier" appears in
> a paragraph? In a document?
>

I'm looking for the number of times the word or term appears in a paragraph
that I'm indexing as the field name "contents". I'm storing and indexing
the field name "contents" that contains multiple occurrences of the
term/word. However, when I query for that term it only reports that the
word/term appeared only once in the field name "contents".


>
> And why do you want this information anyway? It might be an XY problem.
>

I want to be able to search for word frequency for a page in a document
that has many pages. So I can report to the user that the term/word
occurred on page 1 "10" times. The user can click on the result and go
right the the page where the word/term appeared most frequently.

What do you mean an XY problem?



>
> Best
> Erick
>
> On Fri, Jan 20, 2012 at 1:06 PM, solr user  wrote:
> > SOLR reports the term occurrence for terms over all the documents. I am
> > having trouble making a query that returns the term occurrence in a
> > specific page field called, documentPageId.
> >
> > I don't know how to issue a proper SOLR query that returns a word count
> for
> > a paragraph of text such as the term "amplifier" for a field. For some
> > reason it only returns.
> >
> > The things I've tried only return a count for 1 occurrence of the term
> even
> > though I see the term in the paragraph more than just once.
> >
> > I've tried faceting on the field, "contents"
> >
> >
> http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count
> >
> > 
> > 
> > 21
> > 
> > 
> > 
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 0
> > 
> > 
> > 
> > 
> > 
> > 
> >
> >
> > In schema.xml:
> >   > indexed="true" />
> >   > multiValued="false"/>
> >
> > In solrconfig.xml:
> >
> >   filewrapper
> >   caseNumber
> >   pageNumber
> >   documentId
> >   contents
> >   documentId
> >   caseNumber
> >   pageNumber
> >  documentPageId
> >   contents
> >
> > Thanks in advance,
>

Re: Getting a word count frequency out of a page field

2012-01-23 Thread solr user

Thanks for the article.

I am indexing each page of a document as if it were a document.

I think the answer is to configure SOLR for use of the TermVector Component:
 http://wiki.apache.org/solr/TermVectorComponent

I have not tried it yet, but someone told me on StackExchange forum to try
this one.

-Melanie

On Sun, Jan 22, 2012 at 8:56 PM, Erick Erickson wrote:

> Here's Hoss' XY problem writeup:
> http://people.apache.org/~hossman/#xyproblem
> but this doesn't appear to be that.
>
> There's no way out of the box that I know of to do what you want. It starts
> with the fact that Solr has no clue what a page is in the first place. Or
> a paragraph. Or a sentence. So you're really on your own here
> Solr only knows about *documents*. If each document is a page,
> you can do some stuff with term frequencies etc. But for a larger
> document you'll be getting into some pretty low-level analysis
> of the data to accomplish this.
>
> Sorry I can't be more help.
> Erick
>
> On Sun, Jan 22, 2012 at 5:35 PM, solr user  wrote:
> > See comments inline below.
> >
> > On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson  >
> > wrote:
> >>
> >> Faceting won't work at all. Its function is to return the count
> >> of the *documents* that a value occurs in, so that's no good
> >> for your use case.
> >>
> >> "I don't know how to issue a proper SOLR query that returns a word count
> >> for
> >> a paragraph of text such as the term "amplifier" for a field. For some
> >> reason it only returns."
> >>
> >> This is really unclear. Are you asking for the word counts of a
> paragraph
> >> that contains "amplifier"? The number of times "amplifier" appears in
> >> a paragraph? In a document?
> >
> >
> > I'm looking for the number of times the word or term appears in a
> paragraph
> > that I'm indexing as the field name "contents". I'm storing and indexing
> the
> > field name "contents" that contains multiple occurrences of the
> term/word.
> > However, when I query for that term it only reports that the word/term
> > appeared only once in the field name "contents".
> >
> >>
> >>
> >> And why do you want this information anyway? It might be an XY problem.
> >
> >
> > I want to be able to search for word frequency for a page in a document
> that
> > has many pages. So I can report to the user that the term/word occurred
> on
> > page 1 "10" times. The user can click on the result and go right the the
> > page where the word/term appeared most frequently.
> >
> > What do you mean an XY problem?
> >
> >
> >>
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Jan 20, 2012 at 1:06 PM, solr user 
> wrote:
> >> > SOLR reports the term occurrence for terms over all the documents. I
> am
> >> > having trouble making a query that returns the term occurrence in a
> >> > specific page field called, documentPageId.
> >> >
> >> > I don't know how to issue a proper SOLR query that returns a word
> count
> >> > for
> >> > a paragraph of text such as the term "amplifier" for a field. For some
> >> > reason it only returns.
> >> >
> >> > The things I've tried only return a count for 1 occurrence of the term
> >> > even
> >> > though I see the term in the paragraph more than just once.
> >> >
> >> > I've tried faceting on the field, "contents"
> >> >
> >> >
> >> >
> http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count
> >> >
> >> > 
> >> > 
> >> > 21
> >> > 
> >> > 
> >> > 
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 1
> >> > 0
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> >
> >> >
> >> > In schema.xml:
> >> >   >> > indexed="true" />
> >> >   stored="true"
> >> > multiValued="false"/>
> >> >
> >> > In solrconfig.xml:
> >> >
> >> >   filewrapper
> >> >   caseNumber
> >> >   pageNumber
> >> >   documentId
> >> >   contents
> >> >   documentId
> >> >   caseNumber
> >> >   pageNumber
> >> >  documentPageId
> >> >   contents
> >> >
> >> > Thanks in advance,
> >
> >
>

Limiting term frequency in a document to a specific term

2012-01-23 Thread solr user

0 down vote favorite
share [fb] share [tw]


What is the proper query URL to limit the term frequency to just one term
in a document?

Below is an example query to search for the term frequency in a document,
but it is returning the frequency for all the terms.

[
http://localhost:8983/solr/select/?fl=documentPageId&q=documentPageId:49667.3&qt=tvrh&tv.tf=true&tv.fl=contents][1
]

I would like to be able to limit the query to just one term that I know
occurs in the document. The documentation for Term Frequency said to
specify the following:

   f.fieldName.tv.tf - Turns on Term Frequency for the fieldName specified.

This is in the wiki documentation:
http://wiki.apache.org/solr/TermVectorComponent

I tried various combinations of the above for the term amplifier in the URL
but I could not get it to work. I would appreciate the appropriate syntax
for a specific term amplifier.

Re: Limiting term frequency in a document to a specific term

2012-01-24 Thread solr user

With the Solr search relevancy functions, a ParseException, unknown
function ttf in FunctionQuery.

http://localhost:8983/solr/select/?fl=score,documentPageId&defType=func&q=ttf(contents,amplifiers)

where contents is a field name, and amplifiers is text in the field name.

Just curious why I get a parse exception for the above syntax.




On Monday, January 23, 2012, Ahmet Arslan  wrote:
>> Below is an example query to search for the term frequency
>> in a document,
>> but it is returning the frequency for all the terms.
>>
>> [
>>
http://localhost:8983/solr/select/?fl=documentPageId&q=documentPageId:49667.3&qt=tvrh&tv.tf=true&tv.fl=contents][1
>> ]
>>
>> I would like to be able to limit the query to just one term
>> that I know
>> occurs in the document.
>
> I don't fully follow but http://wiki.apache.org/solr/FunctionQuery#tf may
be what you want?
>

can't use strdist as functionquery?

2010-08-04 Thread solr-user


I want to sort my results by how closely a given resultset field matches a
given string.

For example, say I am searching for a given product, and the product can be
found in many cities including "seattle".  I want to sort the results so
that results from city of "seattle" are at the top, and all other results
below that

I thought that I could do so by using strdist as a functionquery (I am using
solr 1.4 so I cant directly sort on strdist) but am having problems with the
syntax of the query because functionqueries require double quotes and so
does strdist.

My current query, which fails with an NPE, looks something like this:

http://localhost:8080/solr/select?q=(product:"foo")
_val_:"strdist("seattle",city,edit)"&sort=score%20asc&fl=product, city,
score

I have tried various types of URL encoding (ie using %22 instead of double
quotes in the strdist function), but no success.

Any ideas??  Is there a better way to accomplish this sorting??

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1023390.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: can't use strdist in sortiing either?

2010-08-06 Thread solr-user


I tried I also noticed that I am unable to sort by the strdist function:

http://localhost:8080/solr/select?q=*:*&sort=strdist("seattle",city,edit)%20desc

Am I using the strdist incorrectly?

The version of Solr I am using is $Id: CHANGES.txt 903398 2010-01-26
20:21:09Z hossman $

I know it isnt the latest version, but I am constrained by needing to keep
to a minimum the number of changes between our current version and the
version that accomplishes the task mentioned previously (essentially a
binary sort that separates results where the city matches a given criteria
from those that dont).

Appreciate any help or advice someone can offer on this
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1032200.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: can't use strdist in sortiing either?

2010-08-06 Thread solr-user


forgot to mention:

1. yes, I upgraded to a version that allows sorting by Functions (thx Grant
for the work done on this feature, very cool)

2. when I try to sort by strdist, it doesnt seem to do any sorting; I get
the same results if I sort asc or desc, if I change the static string value,
if I change the third argument, etc.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1032231.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: can't use strdist in sorting either?

2010-08-09 Thread solr-user


finally figured out that I can simply escape the quotation marks in the query
URL using backslashes to use strdist as a functionquery (sorry all, that
should have been a no-brainer)

http://10.0.11.54:8994/solr/select?q=(*:*)^0%20_val_:"strdist(\"phoenix\",city,edit)"&fl=score,*&sort=score%20desc

however, sorting by the score in this query doesnt work (ie same problem as
when sorting by strdist function - results dont change when I go from asc to
desc or vice-versa).

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1057056.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: can't use strdist in sorting either?

2010-08-09 Thread solr-user


issue resolved

I should have read the documentation with more care; "Calculate the distance
between two strings"

my city field was a tokenized text field so changing it to string type got
things working

sorry all
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1058059.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread solr-user


please excuse this newbie question, but:  

I want to upgrade solr to a version but not to the latest version in the
trunk (because there are so many changes that I would have to test against,
and modify my custom classes for, and behavior changes, and deal with the
lucene index change, etc)

My thought was to try to look at versions that are post 903398 2010-01-26
20:21:09Z but pre the change in the lucene index.  Eventually picking up the
version that had the features I wanted but with as few other changes as
feasible.  I know I could probably apply a bunch of patches but some of the
patches seem to rely on other patches which rely on other patches which rely
on ...  It just seems easier to pick the version that has just the
features/patches I want.

I have no trouble seeing/using the trunk at
http://svn.apache.org/repos/asf/lucene/dev/trunk/ but it only seems to have
builds 984777 thru 984832

So where would I find significantly older builds (ie like the one I am
currently using - 903398)?

I tried using svn on repository
http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev/ but get a
"Repository moved permanently to
'/viewc/lucene/solr/branches/branch-1.5-dev/' message.

Any help would be great

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1113863.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread solr-user


Thanks Yonik but
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt
says that the lucene index has changed

"
Upgrading from Solr 1.4
--

* The Lucene index format has changed and as a result, once you upgrade, 
  previous versions of Solr will no longer be able to read your indices.
  In a master/slave configuration, all searchers/slaves should be upgraded
  before the master.  If the master were to be updated first, the older
  searchers would not be able to read the new index format."

not to mention that regression testing is a pain 

Is there any way to get a set of builds with versions prior to 3.x??
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1114353.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to update solr to older 1.5 builds instead of to trunk

2010-08-12 Thread solr-user


no, once upgraded I wouldnt need to have an older solr read the indexes. 
misunderstood the note.

thx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1115694.html
Sent from the Solr - User mailing list archive at Nabble.com.

possible bug in sorting by Function?

2010-08-12 Thread solr-user


I was looking at the ability to sort by Function that was added to solr.

For the most part it seems to work.  However solr doesn't seem to like to
sort by certain functions. 

For example, this sum works:

http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(1,Latitude,Longitude,sum(Latitude,Longitude))
asc

but this hsin doesn't work:

http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(3959,rad(47.544594),rad(-122.38723),rad(Latitude),rad(Longitude))

and gives me a "Must declare sort field or function" error, pointing to a
line in QueryParsing.java.

Note that I did apply the SOLR-1297-2.patch supplied by Koji Sekiguchi but
it didn't seem to help.

I am using solr 903398 2010-01-26 20:21:09Z.

Any suggestions appreciated.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1118235.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: possible bug in sorting by Function?

2010-08-12 Thread solr-user


small typo in last email:  second sum should have been hsin, but I notice
that the problem also occurs when I leave it as sum

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1118260.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: possible bug in sorting by Function?

2010-08-12 Thread solr-user


problem could be related to some oddity in sum()??  some more examples:

note: Latitude and Longitude are fields of type=double

works:
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(1,1.0))%20asc
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(Latitude,Latitude)%20asc
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(rad(Latitude))%20asc
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1))%20asc
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1.0))%20asc

fails:
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1),sum(Latitude,1))%20asc
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1.0),sum(Latitude,1.0))%20asc
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(rad(Latitude),rad(Latitude))%20asc

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1120017.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: possible bug in sorting by Function?

2010-08-12 Thread solr-user


issue resolve.  problem was that solr.war was silently not being overwritten
by new version.

will try to spend more time debugging before posting.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1121349.html
Sent from the Solr - User mailing list archive at Nabble.com.

what would cause large numbers of executeWithRetry INFO messages?

2010-09-10 Thread solr-user

I see a large number (~1000) of the following executeWithRetry messages in my
apache catalina log files every day (see bolded snippet below). They seem
to appear at random intervals.

Since they are not flagged as errors or warnings, I have been ignoring them
for now. However, I started wondering if "INFO" message is a red-herring
and thinking there might be an actual problem somewhere.

Does anyone know what would cause this type of message? Are they normal? I
have not seen anything in my google searches for solr that contain this
message

Details:

1. My CPU usage seems fine as does my heap; we have lots of cpu capacity and
heap space
2. The log is from a searcher but I know that the intervals do not
correspond to replication (every 15 min on the hour)
3. the INFO lines appear in all searcher logs (we have a number of
searchers)
4. the data is around 10m records per searcher and occupies around 14gb
5. I am not noticing any problems performing queries on the solr (so no
trace info to give you); performance and queries seem fine

Log snippet:
Sep 10, 2010 2:17:59 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 10, 2010 2:18:20 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (org.apache.commons.httpclient.NoHttpResponseException)
caught when processing request: The server xxx.admin.inf failed to respond
Sep 10, 2010 2:18:20 AM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Sep 10, 2010 2:18:20 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.

any info appreciated. thx
--
View this message in context:
http://lucene.472066.n3.nabble.com/what-would-cause-large-numbers-of-executeWithRetry-INFO-messages-tp1453417p1453417.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Solr User

Hi,

I have a question about boosting.

I have the following fields in my schema.xml:

1. title
2. description
3. ISBN

etc

I want to boost the field title. I tried index time boosting but it did not
work. I also tried Query time boosting but with no luck.

Can someone help me on how to implement boosting on a specific field like
title?

Thanks,
Solr User

On Thu, Nov 11, 2010 at 10:26 AM,  wrote:

> Hi! This is the ezmlm program. I'm managing the
> solr-user@lucene.apache.org mailing list.
>
> I'm working for my owner, who can be reached
> at solr-user-ow...@lucene.apache.org.
>
> Acknowledgment: I have added the address
>
>   solr...@gmail.com
>
> to the solr-user mailing list.
>
> Welcome to solr-u...@lucene.apache.org!
>
> Please save this message so that you know the address you are
> subscribed under, in case you later want to unsubscribe or change your
> subscription address.
>
>
> --- Administrative commands for the solr-user list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>   
>
> To remove your address from the list, send a message to:
>   
>
> Send mail to the following for info and FAQ for this list:
>   
>   
>
> Similar addresses exist for the digest list:
>   
>   
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>   
>
> To get an index with subject and author for messages 123-456 , mail:
>   
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>   
>
> The messages should contain one line or word of text to avoid being
> treated as s...@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "j...@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> 
>
> To stop subscription for this address, mail:
> 
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> solr-user-ow...@lucene.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: 
> Received: (qmail 48883 invoked by uid 99); 11 Nov 2010 15:26:44 -
> Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
>by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 15:26:44
> +
> X-ASF-Spam-Status: No, hits=2.2 required=10.0
>
>  
> tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
> X-Spam-Check-By: apache.org
> Received-SPF: pass (nike.apache.org: domain of solr...@gmail.comdesignates 
> 209.85.213.48 as permitted sender)
> Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com)
> (209.85.213.48)
>by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 15:26:35
> +
> Received: by ywp4 with SMTP id 4so1394872ywp.35
>for  @lucene.apache.org>; Thu, 11 Nov 2010 07:26:14 -0800 (PST)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>d=gmail.com; s=gamma;
>h=domainkey-signature:mime-version:received:received:in-reply-to
> :references:date:message-id:subject:from:to:content-type;
>bh=4KuKRrRVLjzTO4oB9/DNxMdQPfNQH2GnYznzPE6YqOo=;
>b=l5lBfUYcyvipJn9SE+5j+t1XUmBjTtbyPYlRVj7jDb6G+W3NzQ21EHOowiD9rNH2L9
>
> gc2+6mGEZmRJOZQwpKD7SUQ2bXL9fVm7mVfS21TMAgC+ZsWQ3vvFOHXalWZa8dbtcOY7
> C23KauLY7YH1UfducfXL77J7u0/snEZl5jQ7A=
> DomainKey-Signature: a=rsa-sha1; c=nofws;
>d=gmail.com; s=gamma;
>
>  h=mime-version:in-reply-to:references:date:message-id:subject:from:to
> :content-type;
>b=nb9+3a9bOHnjGO5T5BhMlW15adcafr+MPzvpgc5X5NXEUGCI05ViLho0SSoQP2Wp2i
>
> xp1Mfjrjw05umeKmHX23oeD5Idc2G6xgz8I3ZcJ1bUM+cD7c52cMKG2suE2VvhUHlfah
> z52rEtlqd0Q9fk/ZDWwR2DS7GoiVMRmgaWgD0=
> MIME-Version: 1.0
> Received: by 10.229.216.201 with SMTP id hj9mr877669qcb.58.1289489174123;
> Thu,
>  11 Nov 2010 07:26:14 -0800 (PST)
> Received: by 10.229.66.165 with HTTP; Thu, 11 Nov 2010 07:26:14 -0800 (PST)
> In-Reply-To: <1289489103.46214.ez...@lucene.apache.org>
> References: <1289489103.46214.ez...@lucene.apache.org>
> Date: Thu, 11 Nov 2010

Boosting

2010-11-11 Thread Solr User

Hi,

I have a question about boosting.

I have the following fields in my schema.xml:

1. title
2. description
3. ISBN

etc

I want to boost the field title. I tried index time boosting but it did not
work. I also tried Query time boosting but with no luck.

Can someone help me on how to implement boosting on a specific field like
title?

Thanks,
Solr User

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Solr User

Eric,

Thank you so much for the reply and apologize for not providing all the
details.

The following are the field definitons in my schema.xml:

Copy Fields:

searchFields

Before creating the indexes I feed XML file to the Solr job to create index
files. I added Boost attribute to the title field before creating indexes
and an example is below:

1785440Each Little
Bird That Sings16.001520511399780152051136Hardcover2005-03-0120052005-02-22272ActiveSpring
2005Children's8.0-12.03-6Marla FrazeeJacket
IllustratorDeborah WilesAuthorSocial
Issues/FriendshipSocial Issues/General (see
also headings under Family)GeneralGirls &
WomenFiction/Middle GradeFiction/Award WinnersComing
of AgeSocial Situations/Death &
DyingSocial
Situations/Friendship/assets/product/0152051139.gif<div>Ten-year-old Comfort Snowberger has attended 247
funerals. But that's not surprising, considering that her family runs the
town funeral home. And even though Great-uncle Edisto keeled over with a
heart attack and Great-great-aunt Florentine dropped dead--just like
that--six months later, Comfort knows how to deal with loss, or so she
thinks. She's more concerned with avoiding her crazy cousin Peach and trying
to figure out why her best friend, Declaration, suddenly won't talk to her.
Life is full of surprises. And the biggest one of all is learning what it
takes to handle them.<br> <br>Deborah Wiles has created a
unique, funny, and utterly real cast of characters in this heartfelt, and
quintessentially Southern coming-of-age novel. Comfort will charm young
readers with her wit, her warmth, and her struggles as she learns about
life, loss, and ultimately, triumph.<br></div>Ten-year-old Comfort Snowberger learns about life's
surprises in this funny, poignant, and very Southern coming-of-age
story.1195443Baby Bear's Chairs16.001520511479780152051143Hardcover2005-09-0120052005-08-0140ActiveFall
2005Children's2.0-5.0P-KJane YolenAuthorMelissa
SweetIllustratorBedtime & DreamsAnimals/BearsFamily/General
(see also headings under Social Issues)Social
Issues/Emotions & FeelingsFamily/ParentsAnimals/BearsBedtime
BooksFamily
Relationships/Parent-Child/assets/product/0152051147.gif<div>Baby Bear is the littlest bear in his family, and
sometimes that's not so easy. Mama and Papa Bear get to stay up late in
their great big chairs. Big brother gets to play fun games in his
middle-sized chair. And Baby Bear only seems to cause trouble in his own
tiny chair. But at the end of the day, he finds the one<i>
</i>perfect chair that's comfier and cozier than all the
rest.<br> <br>Bestselling author Jane Yolen and popular
illustrator Melissa Sweet have come together to create a lyrical bedtime
tale about a baby bear trying to find his place in a family. With a playful
rhyming text and adorable, fun illustrations, here is a book for parents and
their own baby bears to treasure.<br></div>In this sweet, bedtime story, Baby Bear discovers that
Papa's lap is the best chair of all!

I am trying to boost the title field so that the search results brings the
actual match with title as the first item in the results.

Adding boost attribute to the title field and Index time boosting did not
change the search results. I tried Query time boosting also as mentioned
below but no luck

/select?q=Each+Little+Bird+That+Sings&title^9&fl=score

Any help to fix this issue would be really helpful.

Thanks,

Solr User
On Thu, Nov 11, 2010 at 10:32 AM, Solr User  wrote:

> Hi,
>
> I have a question about boosting.
>
> I have the following fields in my schema.xml:
>
> 1. title
> 2. description
> 3. ISBN
>
> etc
>
> I want to boost the field title. I tried index time boosting but it did not
> work. I also tried Query time boosting but with no luck.
>
> Can someone help me on how to implement boosting on a specific field like
> title?
>
> Thanks,
> Solr User
>
>
>

Re: WELCOME to solr-user@lucene.apache.org

2010-11-12 Thread Solr User

Ahmet,

Thanks for the reply.

select/?q=built+to+last&defType=dismax&qf=searchFields^0.2+title^20&debugQuery=on

For some reason if I use title field in my query I don't get any results.

I am copying all searchable fields into searchFields field. So I am able to
search only in the searchFields field not in any other fields.

I request you all to clarify if anything wrong with my schema.xml. The
schema.xml is at the bottom of this email.

I am not able to get the boosting working on the title field. Please help me
here too.

Thanks,
Solr User

On Thu, Nov 11, 2010 at 5:11 PM, Ahmet Arslan  wrote:

> There are several mistakes in your approach:
>
> copyField just copies data. Index time boost is not copied.
>
> There is no such boosting syntax. /select?q=Each&title^9&fl=score
>
> You are searching on your default field.
>
> This is not your cause of your problem but omitNorms="true" disables index
> time boosts.
>
> http://wiki.apache.org/solr/DisMaxQParserPlugin can satisfy your need.
>
>
> --- On Thu, 11/11/10, Solr User  wrote:
>
> > From: Solr User 
> > Subject: Re: WELCOME to solr-user@lucene.apache.org
> > To: solr-user@lucene.apache.org
> > Date: Thursday, November 11, 2010, 11:54 PM
> > Eric,
> >
> > Thank you so much for the reply and apologize for not
> > providing all the
> > details.
> >
> > The following are the field definitons in my schema.xml:
> >
> >  > stored="true"
> > omitNorms="false" />
> >
> >  > stored="true"
> > multiValued="true" omitNorms="true" />
> >
> >  > stored="true"
> > multiValued="true" omitNorms="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true"
> > multiValued="true" omitNorms="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true"
> > multiValued="true" omitNorms="true" />
> >
> >  > stored="true"
> > multiValued="true" omitNorms="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true" />
> >
> >  > stored="true"
> > omitNorms="true"/>
> >
> >  > stored="true"/>
> >
> >  > indexed="true" stored="true"
> > multiValued="true" omitNorms="true"/>
> >
> > Copy Fields:
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >
> >
> > searchFields
> >
> >
> >
> > Before creating the indexes I feed XML file to the Solr job
> > to create index
> > files. I added Boost attribute to the title field before
> > creating indexes
> > and an example is below:
> >
> >  > standalone="no"?> > name="material">1785440 > boost="10.0" name="title">Each Little
> > Bird That Sings > name="price">16.0 > name="isbn10">0152051139 > name="isbn13">9780152051136 > name="format">Hardcover > name="pubdate">2005-03-01 > name="pubyear">2005 > name="reldate">2005-02-22 > name="pages">272 > name="bisacstatus">Active > name="season">Spring
> > 2005 > name="imprint">Children's > name="age">8.0-12.0 > name="grade">3-6 > name="author">Marla Frazee > name="authortype">

Re: WELCOME to solr-user@lucene.apache.org

2010-11-12 Thread Solr User

Ahmet,

In production system we are using

/spell/?q=built+to+last

so that we can check the spelling. We are not using /select?q=built+to+last

Can I use dismax with /spell?

I understood from your reply that I need to change my schema.xml and modify
the field types.

Do I need to still use the searchFields field and what do I need to specify
in the defaultSearchField tag?

searchFields is one of the field names that we provided.

Thanks,
Solr User

On Fri, Nov 12, 2010 at 10:26 AM, Ahmet Arslan  wrote:

> >
> select/?q=built+to+last&defType=dismax&qf=searchFields^0.2+title^20&debugQuery=on
> >
> > For some reason if I use title field in my query I don't
> > get any results.
> >
> > I am copying all searchable fields into searchFields field.
> > So I am able to
> > search only in the searchFields field not in any other
> > fields.
> >
> > I request you all to clarify if anything wrong with my
> > schema.xml. The
> > schema.xml is at the bottom of this email.
> >
> > I am not able to get the boosting working on the title
> > field. Please help me
> > here too.
>
> Change type of your title field. It is string now. Make it solr.TextField.
> Actually you dont need cath-all copy field with dismax.
> Just change their types string to text and append them qf= parameter.
>
>
>
>

is there a way to prevent abusing rows parameter

2012-11-20 Thread solr-user

silly question

is there any configuration value I can set to prevent someone from entering
a bad value for the rows parameter?

ie to prevent something like "&rows=1"  from crashing my servers?

the server I am looking at is a solr v3.6



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there a way to prevent abusing rows parameter

2012-11-22 Thread solr-user

Thanks guys.  This is a problem with the front end not validating requests. 
I was hoping there might be a simple config value I could enter/change,
rather than going the long process of migrating a proper fix all the way up
to our production servers.  Looks like not, but thx.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467p4021892.html
Sent from the Solr - User mailing list archive at Nabble.com.

upgrading from 4.0 to 4.1 causes "CorruptIndexException: checksum mismatch in segments file"

2012-11-22 Thread solr-user

hi all

I have been working on moving us from 4.0 to a newer build of 4.1

I am seeing a "CorruptIndexException: checksum mismatch in segments file"
error when I try to use the existing index files.

I did see something in the build log for #119 re "LUCENE-4446" that mentions
"flip file formats to point to 4.1 format"

Do I just need to reindex or is this some other issue (ie do I need to
configure something differently)?

or should I move back a few builds?

note, we are currently using:

solr-spec 4.0.0.2012.04.05.15.05.52
solr-impl 4.0-SNAPSHOT 1310094M - - 2012-04-05 15:05:52
lucene-spec 4.0-SNAPSHOT
lucene-impl 4.0-SNAPSHOT 1309921 - - 2012-04-05 10:25:27

and are considering moving to:

solr-spec 4.1.0.2012.11.03.18.08.42
solr-impl 4.1-2012-11-03_18-05-49 1405392 - hudson - 2012-11-03 18:08:42
lucene-spec 4.1-2012-11-03_18-05-49
lucene-impl 4.1-2012-11-03_18-05-49 1405392 - hudson - 2012-11-03 18:06:50
(aka apache-solr-4.1-2012-11-03_18-05-49)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-from-4-0-to-4-1-causes-CorruptIndexException-checksum-mismatch-in-segments-file-tp4021913.html
Sent from the Solr - User mailing list archive at Nabble.com.

spatial searches and geo-json data

2012-12-11 Thread solr-user

hi all.  I have a large amount of spatial data in geo-json format that I get
from mssql server.

I want to be able to index that data and am trying to figure out how to
convert the data into WKT format since solr only accepts WKT.

is anyone away of any solr module or tsql code or c# code that would help me
with the conversion?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spatial-searches-and-geo-json-data-tp4026140.html
Sent from the Solr - User mailing list archive at Nabble.com.

what is difference between 4.1 and 5.x

2013-01-09 Thread solr-user

just curious as to what the difference is between 4.1 and 5.0

i.e. is 4.1 a maintenance branch for what is currently 4.0 or are they very
different designs/architectures



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-difference-between-4-1-and-5-x-tp4032064.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud on multiple appservers

2012-04-04 Thread solr user

Does anyone have a blog, wiki with detailed step by step instructions on
setting up SOLRCloud on multiple JBOSS instances?

Thanks in advance,

Using Customized sorting in Solr

2012-04-26 Thread solr user

Hi,

We are planning to move the search of one of our listing based portal to
solr/lucene search server from sphinx search server. But we are facing a
challenge is porting customized sorting being used in our portal. We only
have last 60 days of data live.The algorithm is as follows:-

   1.  Put all listings into 54 buckets – (Date bucket for 60 days)  i.e.
   buckets of 7day, 1 day, 1 day……
   2.  For each date bucket we make 2 buckets –(Paid / free bucket)
   3.  For each paid / free bucket cycle the advertisers on uniqueness basis

  i.e. inside a bucket the ordering should be 1st listing
of each advertiser, 2nd listing of each advertiser and so on
  in other words within a *sub-bucket* second listing of an
advertiser will be displayed only after first listing of all advertiser has
been displayed.

For taking care of point 1 and 2 we have created a field named bucket_index
at the time of indexing the data and get the results sorted by this index,
but we are not able to find a way to create a sort field at index time or
think of a sort function for the point no 3.  Please suggest if there is a
way to do so in solr.

Tia,

BC Rathore

Re: Using Customized sorting in Solr

2012-04-26 Thread solr user

Jan,

Thanks for the response,

I though of using it, but it will be suboptimal to do this in the scenario
I have. I guess I have to explain the scenario better, let me try it again:-

1. I have importance based buckets in the system, this is implemented using
a variable named bucket_count having integer values 0,1,2,3, and I have to
show results in order of bucket_count i.e. results from 0th bucket at top,
then results from 1st bucket and so on. That is done by doing a asc sort on
this variable.
2. Now *within these buckets* I need to ensure that 1st listing of every
advertiser comes at top, then 2nd listing from every advertiser and so on.

Now if I go with the grouping on advertiserId and and use the group.offset,
then probably I also need to do additive filtering on bucket_count. To
explain it better pseudo algorithm will be like

1. query solr with group.offset 0 and bucket count 0
2. if results more than zero in step1 then increase group offset and follow
step 1 again
3. else increase bucket count with group offset zero and start from step 1.

With this logic in the worst case I need to query solr (number of
importance buckets)*(max number of listings by an advertiser). Which could
be very high number of solr queries for a single user query. Please suggest
if I can do this with more optimal way. I am also open to do modifications
in solr/lucene code if needed.

Regards,
BC Rathore

On Fri, Apr 27, 2012 at 4:09 AM, Jan Høydahl  wrote:

> Hi,
>
> How about trying grouping with paging?
> First you do
> group=true&group.field=advertiserId&group.limit=1&group.offset=0&group.main=true&sort=something&group.sort=how-much-paid
> desc
>
> That gives you one listing per advertiser, sorted the way you like.
> Then to grab the next batch of ads, you go group.offset=1 etc etc.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 26. apr. 2012, at 08:10, solr user wrote:
>
> > Hi,
> >
> > We are planning to move the search of one of our listing based portal to
> > solr/lucene search server from sphinx search server. But we are facing a
> > challenge is porting customized sorting being used in our portal. We only
> > have last 60 days of data live.The algorithm is as follows:-
> >
> >   1.  Put all listings into 54 buckets – (Date bucket for 60 days)  i.e.
> >   buckets of 7day, 1 day, 1 day……
> >   2.  For each date bucket we make 2 buckets –(Paid / free bucket)
> >   3.  For each paid / free bucket cycle the advertisers on uniqueness
> basis
> >
> >  i.e. inside a bucket the ordering should be 1st listing
> > of each advertiser, 2nd listing of each advertiser and so on
> >  in other words within a *sub-bucket* second listing of
> an
> > advertiser will be displayed only after first listing of all advertiser
> has
> > been displayed.
> >
> > For taking care of point 1 and 2 we have created a field named
> bucket_index
> > at the time of indexing the data and get the results sorted by this
> index,
> > but we are not able to find a way to create a sort field at index time or
> > think of a sort function for the point no 3.  Please suggest if there is
> a
> > way to do so in solr.
> >
> > Tia,
> >
> > BC Rathore
>
>

Re: Using Customized sorting in Solr

2012-04-29 Thread solr user

Hi,

Any suggestions,

Am I trying to do too much with solr? Is there any other search engine,
which should be used here?

I am looking into solr codebase and planning to modify QueryComponent. Will
this be the right approach?

Regards,

Shivam

On Fri, Apr 27, 2012 at 10:48 AM, solr user  wrote:

> Jan,
>
> Thanks for the response,
>
> I though of using it, but it will be suboptimal to do this in the scenario
> I have. I guess I have to explain the scenario better, let me try it again:-
>
> 1. I have importance based buckets in the system, this is implemented
> using a variable named bucket_count having integer values 0,1,2,3, and I
> have to show results in order of bucket_count i.e. results from 0th bucket
> at top, then results from 1st bucket and so on. That is done by doing a asc
> sort on this variable.
> 2. Now *within these buckets* I need to ensure that 1st listing of every
> advertiser comes at top, then 2nd listing from every advertiser and so on.
>
> Now if I go with the grouping on advertiserId and and use the
> group.offset, then probably I also need to do additive filtering on
> bucket_count. To explain it better pseudo algorithm will be like
>
> 1. query solr with group.offset 0 and bucket count 0
> 2. if results more than zero in step1 then increase group offset and
> follow step 1 again
> 3. else increase bucket count with group offset zero and start from step 1.
>
> With this logic in the worst case I need to query solr (number of
> importance buckets)*(max number of listings by an advertiser). Which could
> be very high number of solr queries for a single user query. Please suggest
> if I can do this with more optimal way. I am also open to do modifications
> in solr/lucene code if needed.
>
> Regards,
> BC Rathore
>
>
>
> On Fri, Apr 27, 2012 at 4:09 AM, Jan Høydahl wrote:
>
>> Hi,
>>
>> How about trying grouping with paging?
>> First you do
>> group=true&group.field=advertiserId&group.limit=1&group.offset=0&group.main=true&sort=something&group.sort=how-much-paid
>> desc
>>
>> That gives you one listing per advertiser, sorted the way you like.
>> Then to grab the next batch of ads, you go group.offset=1 etc etc.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> On 26. apr. 2012, at 08:10, solr user wrote:
>>
>> > Hi,
>> >
>> > We are planning to move the search of one of our listing based portal to
>> > solr/lucene search server from sphinx search server. But we are facing a
>> > challenge is porting customized sorting being used in our portal. We
>> only
>> > have last 60 days of data live.The algorithm is as follows:-
>> >
>> >   1.  Put all listings into 54 buckets – (Date bucket for 60 days)  i.e.
>> >   buckets of 7day, 1 day, 1 day……
>> >   2.  For each date bucket we make 2 buckets –(Paid / free bucket)
>> >   3.  For each paid / free bucket cycle the advertisers on uniqueness
>> basis
>> >
>> >  i.e. inside a bucket the ordering should be 1st listing
>> > of each advertiser, 2nd listing of each advertiser and so on
>> >  in other words within a *sub-bucket* second listing of
>> an
>> > advertiser will be displayed only after first listing of all advertiser
>> has
>> > been displayed.
>> >
>> > For taking care of point 1 and 2 we have created a field named
>> bucket_index
>> > at the time of indexing the data and get the results sorted by this
>> index,
>> > but we are not able to find a way to create a sort field at index time
>> or
>> > think of a sort function for the point no 3.  Please suggest if there
>> is a
>> > way to do so in solr.
>> >
>> > Tia,
>> >
>> > BC Rathore
>>
>>
>

Dismax - Boosting

2010-11-15 Thread Solr User

Hi,

Currently we are using StandardRequestHandler and the configuration in
SolrConfig.xml is as below:

  

 
   explicit
   
 
  


We would like to switch to DisMax request handler and the configuration in
SolrConfig.xml is:

  

 dismax
 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
popularity^0.5 recip(price,1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*
 
 text features name
 
 0
 
 name
 regex 

  

Questions:

1. Do we need to change the above DisMax handler configuration as per our
requirements? Or Leave it as it is? What changes?
2. Do we need make DisMax as a default request handler?  Do I need to add
attribute default="true" to the tag?
3. I read in the documentation that Default Search Handler and DisMax are
the same except that to use DisMaxQueryParser add defType=dismax in the
query string. Is there anything else do we need to do?

We are basically moving on to dismax handler and trying to understand what
changes we need to make to SolrConfig.xml. I understood what changes need to
be made to schema.xml in a different thread on this forum.

Thanks,
Solr User

Re: Dismax - Boosting

2010-11-17 Thread Solr User

Ahmet,

Thanks for the reply and it was very helpful.

The query that I used before changing to dismax was:

/solr/tradecore/spell/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true

The above query use to return all the data related to facets, data and also
any suggestions related to spelling mistakes properly.

The configuration after modifying using dismax is as below:

Schema.xml:

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

SolrConfig.xml:

  

 dismax
 explicit
 
 
title^9.0 subtitle^3.0 author^1.0 desc shortdesc imprint category
isbn13 isbn10 format series season bisacsub award
 
 
 
*
 

 

 

 

 

  

The query that I used after changing to dismax is:

solr/tradecore/select/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true


The following are the issues that I am having after modifying to dismax:

1. Facets data is not coming correctly. Lot of extra data is coming. Why and
how to fix it?
2. How to use spell checker request handler along with dismax?

Thanks,
Murali

On Mon, Nov 15, 2010 at 5:38 PM, Ahmet Arslan  wrote:

> > 1. Do we need to change the above DisMax handler
> > configuration as per our
> > requirements? Or Leave it as it is? What changes?
>
> Yes, you need to edit it. At least field names. Does your schema has a
> field named sku?
>
> > 2. Do we need make DisMax as a default request
> > handler?  Do I need to add
> > attribute default="true" to the tag?
>
> If you are going to always use it, why not, change it by adding
> default="true". By doing so you need to add qt parameter in every request.
> But don't forget to delete other default="true". There can be only one
> default="true" :)
>
> > 3. I read in the documentation that Default Search Handler
> > and DisMax are the same except that to use DisMaxQueryParser add
> > defType=dismax in the query string. Is there anything else do we need to
> > do?
>
> Above dismax config contains default parameter list. So you don't need to
> add &defType=dismax&qf=title^1.0 text^1.5 ... etc. to the query string.
>
>
> > We are basically moving on to dismax handler and trying to
> > understand what
> > changes we need to make to SolrConfig.xml.
>
> As you can see in default solrconfig.xml, you can register multiple
> instances of solr.SearchHandler with different default parameter list and
> name. default="true" one is executed by default.
>
> And this can be helpful deciding about dismax params: qf,pf,ps,ps,mm etc
> http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/
>
>
>
>

Re: Dismax - Boosting

2010-11-18 Thread Solr User

Ahmet,

I modified the schema as follows: (Added more fields for faceting)














































































Also added Copy Fields as below:

























With the above changes I am not getting any facet data as a result.

Why is that the facet data not returning and what mistake I did with the
schema?

Thanks,
Solr User

On Wed, Nov 17, 2010 at 6:42 PM, Ahmet Arslan  wrote:

>
>
> Wow you facet on many fields :
>
> author,pubyear,format,series,season,imprint,category,award,age,reading,grade,price
>
> The fields you facet on should be untokenized type: string, int, tint date
> etc.
>
> The fields you want full text search, e.g. the ones you specify in qf, pf
> parameter should be text type.
> (title subtitle authordesc shortdesc imprint category isbn13 isbn10 format
> series season bisacsub award)
>
> If you have common fields, for example category, you need two copy of that.
> one string one text. So that you can both full-text search and facet on.
> Use copy field for this.
>
> 
>
> Example document:
> category: electronic devices
>
>
> query electronic will return it, and facets on category_string will be
> displayed as :
>
> electronic devices (1)
>
> not :
>
> electronic (1)
> devices (1)
>
>
>
> --- On Wed, 11/17/10, Solr User  wrote:
>
> > From: Solr User 
> > Subject: Re: Dismax - Boosting
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, November 17, 2010, 11:31 PM
>  > Ahmet,
> >
> > Thanks for the reply and it was very helpful.
> >
> > The query that I used before changing to dismax was:
> >
> >
> /solr/tradecore/spell/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true
> >
> > The above query use to return all the data related to
> > facets, data and also
> > any suggestions related to spelling mistakes properly.
> >
> > The configuration after modifying using dismax is as
> > below:
> >
> > Schema.xml:
> >
> > > indexed="true" stored="true"
> > omitNorms="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true"
> > multiValued="true" omitNorms="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="false" stored="true" />
> > > indexed="true" stored="true" />
> > > indexed="true" stored="true"
> > omitNorms="true"/>
> > > indexed="true" stored="true"/>
> >
> > SolrConfig.xml:
> >
> >> class="solr.SearchHandler" default="true">
> > 
> >   > n

Re: Dismax - Boosting

2010-11-19 Thread Solr User

Hi Ahmet,

The below is my previous configuration which use to work correctly.

 textSpell

  default
  searchFields
  /solr/qa/tradedata/spellchecker
  true

We use to search only in one field which is "searchFields" but with
implementing dismax we are searching in different fields like

title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint category isbn13
isbn10 format series season bisacsub award.

Do we need to modify the above configuration to include all the above
fields:??? Please give me an example.

In the past we use to query twice to get first the suggestions and then we
use to query using the first suggestion to show the data.

Is there a way that we can do it in one step?

Thanks,

Murali

On Wed, Nov 17, 2010 at 7:00 PM, Ahmet Arslan  wrote:

>
> > 2. How to use spell checker request handler along with
> > dismax?
>
> Just append this at the end of dismax request handler definition:
>
> 
>   spellcheck
> 
>
> 
>
>
>
>

Re: Dismax - Boosting

2010-11-22 Thread Solr User

Hi Ahmet,

In the past we used /spell and if there is not match then we use to get a
list of suggestions and then we use to make another call with the first
suggestion to get search results. After that we show user both suggestions
for the spelling mistake and results of the first suggestion.

I think the URL that you provided which has plug in will do help doing that.

Is there a way from Solr to directly get the spelling suggestions as well as
first suggestion data at the same time?

For example:

if seach keywork is mooon (typed by mistake instead of moon)

the we need all suggestions like:

Did you mean:  moon, mo, mooing, moonen, soon, mood, moose, moore,
spoon, moons?

and also the search results for the first suggestion moon.

Thanks,
Solr User

On Fri, Nov 19, 2010 at 6:41 PM, Ahmet Arslan  wrote:

> > The below is my previous configuration which use to work
> > correctly.
> >
> >  > class="solr.SpellCheckComponent">
> >   > name="queryAnalyzerFieldType">textSpell
> >  
> >   default
> >   searchFields
> >> name="spellcheckIndexDir">/solr/qa/tradedata/spellchecker
> >   true
> >  
> > 
> >
> > We use to search only in one field which is "searchFields"
> > but with
> > implementing dismax we are searching in different fields
> > like
> >
> > title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint
> > category isbn13
> > isbn10 format series season bisacsub award.
> >
> > Do we need to modify the above configuration to include all
> > the above
> > fields:??? Please give me an example.
>
> Searching and spell checking are independent. For example you can search on
> 10 fields, and create suggestions from 2 fields. Spell checker accepts one
> field in its configuration. So you need to populate this field with
> copyField. Using the fields that you want to use spell checking. And type of
> this field should be textSpell in your case. You can use above config.
>
> >
> > In the past we use to query twice to get first the
> > suggestions and then we
> > use to query using the first suggestion to show the data.
> >
> > Is there a way that we can do it in one step?
>
> Are you talking about queries that return 0 numFound? Re-executing the
> search like, described here
> http://sematext.com/products/dym-researcher/index.html
>
> Not out-of-the-box.
>
>
>
>

Special Characters

2010-11-22 Thread Solr User

Hi,

I am searching for j.r.r. tolkien and getting results back but if I search
for jrr I am not getting any results. Also not getting any results if I am
searching for jrr tolkien. I am using AND as the default operator.

The search results should work for both j.r.r. tolkien and jrr tolkien.

What configuration changes I need to make so that special characters like
hypen (-), period (.) are ignored while indexing? or any other suggestions?

Thanks,
Solr User

Re: Special Characters

2010-11-22 Thread Solr User

Hi Eric,

I use solr version 1.4.0 and below is my schema.xml





















It creates 3 tokens j r r tolkien works fine but not jrr tolkien.

I will read about PatternReplaceCharFilterFactory and try it. Please let me
know if I need to do anything differently.

Thanks,
Solr User



On Mon, Nov 22, 2010 at 8:19 AM, Erick Erickson wrote:

> What version of Solr are you using? You can think about
> PatternReplaceCharFilterFactory if you're using the right
> version of Solr.
>
> But you have other problems than that. Let's claim you
> get the periods removed. Do you tokenize three tokens or
> one? I.e. jrr or j r r? In the latter case your search still won't
> match.
>
> Best
> Erick
>
> On Mon, Nov 22, 2010 at 7:45 AM, Solr User  wrote:
>
> > Hi,
> >
> > I am searching for j.r.r. tolkien and getting results back but if I
> search
> > for jrr I am not getting any results. Also not getting any results if I
> am
> > searching for jrr tolkien. I am using AND as the default operator.
> >
> > The search results should work for both j.r.r. tolkien and jrr tolkien.
> >
> > What configuration changes I need to make so that special characters like
> > hypen (-), period (.) are ignored while indexing? or any other
> suggestions?
> >
> > Thanks,
> > Solr User
> >
>

Facet - Range Query issue

2010-11-22 Thread Solr User

Hi,

I am having issue with querying and using facet.

This was working fine earlier:

/spell/?q=(sun) AND (pubyear:[1991 TO
2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true&debugQuery=on

After modifying to use dismax handler with new schema the below query does
not work:

/select/?q=(sun) AND (pubyear:[1991 TO
2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear_facet&facet.field=format_facet&facet.field=series_facet&facet.field=season_facet&facet.field=imprint_facet&facet.field=category_facet&facet.field=award_facet&facet.field=age_facet&facet.field=reading_facet&facet.field=grade_facet&facet.field=price_facet&spellcheck=true&debugQuery=on


  (sun) AND (pubyear:[1991 TO 2011])
  (sun) AND (pubyear:[1991 TO 2011])
  +((+DisjunctionMaxQuery((series:sun | desc:sun |
bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun |
author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun |
imprint:sun | subtitle:sun^3.0 | isbn13:sun))
+DisjunctionMaxQuery((series:"pubyear 1991" | desc:"pubyear 1991" |
bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991"))
DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011 |
format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 |
category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 |
subtitle:2011^3.0 | isbn13:2011)))~1) ()
  +((+(series:sun | desc:sun | bisacsub:sun
| award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 |
category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun |
subtitle:sun^3.0 | isbn13:sun) +(series:"pubyear 1991" | desc:"pubyear 1991"
| bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991") (series:2011 |
desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 |
pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 |
isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 |
isbn13:2011))~1) ()
  
  DisMaxQParser

Basically we are trying to pass the query string along with a facet field
and the range. Is there any syntax issue? Please help this is urgent as I
got stuck.

Thanks,
Solr user

Re: Facet - Range Query issue

2010-11-22 Thread Solr User

Eric,

I solved the issue by adding fq parameter in the query. Thank you so much
for your reply.

Thanks,
Murali

On Mon, Nov 22, 2010 at 1:51 PM, Erick Erickson wrote:

> Well, without seeing the changes you made to the schema, it's hard to tell
> much.
> Also, could you define "not work"? What, exactly, fails to do what you
> expect?
>
> But the first question I have is "did you reindex after changing your
> schema?".
>
> And have you checked your index to verify that there values in the fields
> you
> changed?
>
> Best
> Erick
>
> On Mon, Nov 22, 2010 at 1:42 PM, Solr User  wrote:
>
> > Hi,
> >
> > I am having issue with querying and using facet.
> >
> > This was working fine earlier:
> >
> > /spell/?q=(sun) AND (pubyear:[1991 TO
> >
> >
> 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true&debugQuery=on
> >
> > After modifying to use dismax handler with new schema the below query
> does
> > not work:
> >
> > /select/?q=(sun) AND (pubyear:[1991 TO
> >
> >
> 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear_facet&facet.field=format_facet&facet.field=series_facet&facet.field=season_facet&facet.field=imprint_facet&facet.field=category_facet&facet.field=award_facet&facet.field=age_facet&facet.field=reading_facet&facet.field=grade_facet&facet.field=price_facet&spellcheck=true&debugQuery=on
> >
> > 
> >  (sun) AND (pubyear:[1991 TO 2011])
> >  (sun) AND (pubyear:[1991 TO 2011])
> >  +((+DisjunctionMaxQuery((series:sun | desc:sun |
> > bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun |
> > author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun |
> > imprint:sun | subtitle:sun^3.0 | isbn13:sun))
> > +DisjunctionMaxQuery((series:"pubyear 1991" | desc:"pubyear 1991" |
> > bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
> > shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
> > 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
> > isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
> > subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991"))
> > DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011
> |
> > format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 |
> > category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011
> |
> > subtitle:2011^3.0 | isbn13:2011)))~1) ()
> >  +((+(series:sun | desc:sun |
> bisacsub:sun
> > | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 |
> > category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun |
> > subtitle:sun^3.0 | isbn13:sun) +(series:"pubyear 1991" | desc:"pubyear
> > 1991"
> > | bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991"
> |
> > shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
> > 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
> > isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
> > subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991") (series:2011 |
> > desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 |
> > pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 |
> > isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 |
> > isbn13:2011))~1) ()
> >  
> >  DisMaxQParser
> >
> > Basically we are trying to pass the query string along with a facet field
> > and the range. Is there any syntax issue? Please help this is urgent as I
> > got stuck.
> >
> > Thanks,
> > Solr user
> >
>

How to get all the search results?

2010-12-06 Thread Solr User

Hi,

First off thanks to the group for guiding me to move from default search
handler to dismax.

I have a question related to getting all the search results. In the past
with the default search handler I was getting all the search results (8000)
if I pass q=* as search string but with dismax I was getting only 16 results
instead of 8000 results.

How to get all the search results using dismax? Do I need to configure
anything to make * (asterisk) work?

Thanks,
Solr User

Re: How to get all the search results?

2010-12-13 Thread Solr User

Hi,

I tried *:* using dismax and I get no results.

Is there a way that I can get all the search results using dismax?

Thanks,
Murali

On Mon, Dec 6, 2010 at 11:17 AM, Savvas-Andreas Moysidis <
savvas.andreas.moysi...@googlemail.com> wrote:

> Hello,
>
> shouldn't that query syntax be *:* ?
>
> Regards,
> -- Savvas.
>
> On 6 December 2010 16:10, Solr User  wrote:
>
> > Hi,
> >
> > First off thanks to the group for guiding me to move from default search
> > handler to dismax.
> >
> > I have a question related to getting all the search results. In the past
> > with the default search handler I was getting all the search results
> (8000)
> > if I pass q=* as search string but with dismax I was getting only 16
> > results
> > instead of 8000 results.
> >
> > How to get all the search results using dismax? Do I need to configure
> > anything to make * (asterisk) work?
> >
> > Thanks,
> > Solr User
> >
>

Re: How to get all the search results?

2010-12-13 Thread Solr User

Hi Shawn,

Yes you did.

I tried and did not work so I asked the same question again.

Now I understood and tried directly on the Solr admin and I got all the
search results. I will implement the same on the website.

Thank you so much Shawn.


On Mon, Dec 13, 2010 at 5:16 PM, Shawn Heisey  wrote:

> On 12/13/2010 9:59 AM, Solr User wrote:
>
>> Hi,
>>
>> I tried *:* using dismax and I get no results.
>>
>> Is there a way that I can get all the search results using dismax?
>>
>
> For dismax, use q= or simply leave the q parameter off the URL entirely.
>  It appears that you need to have q.alt set to *:* for this to work.  It
> would be a good idea to include this in your handler definition:
>
> *:*
>
> Two people (myself and Peter Karich) gave this answer on this thread last
> week, within 15 minutes of the time your original question was posted.
>  Here's the entire thread on nabble:
>
>
> http://lucene.472066.n3.nabble.com/How-to-get-all-the-search-results-td2028233.html
>
> Shawn
>
>

Re: what would cause large numbers of executeWithRetry INFO messages?

2011-01-18 Thread solr-user


sorry, never did find a solution to that.

if you do happen to figure it out, pls post a reply to this thread.  thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/what-would-cause-large-numbers-of-executeWithRetry-INFO-messages-tp1453417p2281087.html
Sent from the Solr - User mailing list archive at Nabble.com.

Out of memory while creating indexes

2011-03-03 Thread Solr User

Hi All,

I am trying to create indexes out of a 400MB XML file using the following
command and I am running into out of memory exception.

$JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST
SOLR_PORT/solr/customercarecore/update -jar
$SOLRBASEDIR/*dataconvertor*/common/lib/post.jar
$SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml

I am planning to bump up the memory and try again.

Did any one ran into similar issue? Any inputs would be very helpful to
resolve the out of memory exception.

I was able to create indexes with small file but not with large file. I am
not using Solr J.

Thanks,
Solr User

Terms Component - solr-1.4.0

2011-05-17 Thread Solr User

Hi All,

I am using Solr 1.4.0 and dismax as request handler.I have the following in
my solrconfig.xml in the dismax request handler tag


spellcheck


The above tags helps to find terms if there are spelling issues. I tried
configuring terms component and no luck.

May I know how to configure terms component with dismax? or Do I need to
call terms component directly to get auto suggestions?

Thank you so much in advance.

Regards,
Solr User

Re: Terms Component - solr-1.4.0

2011-05-26 Thread Solr User

Hi All,

Please help me in implementing TermsComponent in my current Solr solution.

Regards,
Solr User

On Tue, May 17, 2011 at 4:12 PM, Solr User  wrote:

> Hi All,
>
> I am using Solr 1.4.0 and dismax as request handler.I have the following in
> my solrconfig.xml in the dismax request handler tag
>
> 
> spellcheck
> 
>
> The above tags helps to find terms if there are spelling issues. I tried
> configuring terms component and no luck.
>
> May I know how to configure terms component with dismax? or Do I need to
> call terms component directly to get auto suggestions?
>
> Thank you so much in advance.
>
> Regards,
> Solr User
>

question(s) re lucene spatial toolkit aka LSP aka spatial4j

2012-07-27 Thread solr-user

hopefully someone is using the lucene spatial toolkit aka LSP aka spatial4j,
and can answer this question

we are using this spatial tool for doing searches.  overall, it seems to
work very well.  however, finding documentation is difficult.

I have a couple of questions:

1. I have a geohash field in my solr schema that contains indexed geographic
polygon data.  I want to find all docs where that polygon intersects a given
lat/long.  I was experimenting with returning distance in the resultset and
with sorting by distance and found that the following query works.  However,
I dont know what distance means in the query.  i.e. is it distance from
point to the polygon centroid, to the closest outer edge of the polygon, its
a useless random value, etc. Does anyone know??

http://solrserver:solrport/solr/core0/select?q=*:*&fq={!v=$geoq%20cache=false}&geoq=wkt_search:%22Intersects(Circle(-97.057%2047.924%20d=0.01))%22&sort=query($geoq)+asc&fl=catchment_wkt1_trimmed,school_name,latitude,longitude,dist:query($geoq,-1),loc_city,loc_state

2. some of the polygons, being geographic representations, are very big (ie
state/province polygons).  when solr starts processing a spatial query (like
the one above), I can see ("INFO: Building Cache [xx]") it fills in some
sort of memory cache
(org.apache.lucene.spatial.strategy.util.ShapeFieldCache) of the indexed
polygon data.  We are encountering Java OOM issues when this occurs (even
when we booested the mem to 7GB). I know that some of the polygons can have
more than 2300 points, but heavy trimming isn't really an option due to
level of detail issues. Can we control this caching, or the indexing of the
polygons, in any way to reduce the memory requirements??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: question(s) re lucene spatial toolkit aka LSP aka spatial4j

2012-08-09 Thread solr-user

Thanks David.  No worries about the delay; am always happy and appreciative
when someone responds.

I don't understand what you mean by "All center points get cached into
memory upon first use in a score" in question 2 about the Java OOM errors I
am seeing.

The Solr instance I have setup for testing has around 200k docs, with one
WKT field per doc (indexed and stored and set to multivalue).

I did a count of the number of points that get indexed in Solr (computed in
MS SQL by counting the number of points (using STNumPoints) for each
geometry (using STNumGeometries) in the WKT data I am indexing), and I have
around 35M points total.

If only the center points for 190K docs get cached, wouldn't that easily fit
in 7GB of heap? 

Even if Solr was caching 35M points, that still doesn't sound like 7GB worth
of data.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757p4000268.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: question(s) re lucene spatial toolkit aka LSP aka spatial4j

2012-08-09 Thread solr-user

Thanks David.  You are a life saver.  

I didn't know how the cache got triggered and the "needScore=false" now
allows some of my problem queries to finally work, and well within 2gb of
mem.

will look at your other suggestion when I can. 

MANY thanks again.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757p4000286.html
Sent from the Solr - User mailing list archive at Nabble.com.

"Intersects" spatial query returns polygons it shouldn't

2012-09-18 Thread solr-user

2207227 45.26619551047824, -93.23899176558338
45.26613779367068, -93.24250527367546 45.26608234822973, -93.243445378056
45.26606503829342, -93.24512861083372 45.2660344570852, -93.24588057830995
45.26602026067889, -93.24713274287363 45.26599455787498, -93.25036838013868
45.26592734514467, -93.25172461510564 45.265900698298395, -93.25236738024864
45.265888260809106, -93.25481754173921 45.26583307838667, -93.25571357952906
45.265819559899164, -93.2594981489083 45.26575415212897, -93.26098138766197
45.265754375486374, -93.26155216698102 45.26565612540643, -93.26170097145753
45.26562288963898, -93.26208574477789 45.26553876835043, -93.26245875524685
45.265434673708015, -93.26277275191426 45.265316250819595,
-93.26311663127117 45.26517251314189, -93.26346212923646 45.26500240317637,
-93.26393572774133 45.26477558787491, -93.2651820516718 45.26406759657772,
-93.26518110226205 45.26337226279194, -93.26515218908767 45.26311636791454,
-93.26518703008779 45.262871689663605, -93.2652064900752 45.26265582104258,
-93.2652110298225 45.26215614194132, -93.26522443086994 45.26112430402238,
-93.26522989950563 45.260703199933474, -93.26524872191168 45.25930812973533,
-93.26525187087448 45.258897852775995, -93.26525857049303
45.258025812056765, -93.26527734826267 45.256675072153314,
-93.26528081766433 45.25612813038996, -93.265287399575 45.25512698071874,
-93.26530031054412 45.253711671615115, -93.26531490547187 45.25273002640574,
-93.26532214123614 45.252243491267, -93.26533817105908 45.25062180123498,
-93.26535413994274 45.24906421173263, -93.26536141910549 45.24841165046578,
-93.26536638602661 45.24796649509243, -93.26537318826473 45.24735637067748,
-93.26539798003012 45.24589779189643, -93.265404909549 45.24454674190931,
-93.2654060939449 45.24296904311022, -93.26540624905046 45.24276127146885,
-93.26540843815205 45.2420263885843, -93.26541275006169 45.240577352345994,
-93.2654375717671 45.238843301612725, -93.26544518264211 45.237906888690105,
-93.26544940933664 45.23738688110566, -93.26546966016808 45.236093591927926,
-93.2654781584622 45.235359229961944, -93.26548338867605 45.23490715107922,
-93.26553582901259 45.23354268990693, -93.26554071996831 45.23330119833777,
-93.26555987026248 45.2323552839169, -93.26557251955711 45.23173040973764,
-93.26556626032777 45.22975235185782, -93.26556606661761 45.229367333607186,
-93.26556579189545 45.228823722705066, -93.26562882232702
45.226872206176665, -93.26571073971922 45.224335971082276,
-93.26574560622672 45.2219321787, -93.26574836877063 45.22173093256304,
-93.26577033227747 45.22021043432355, -93.26578588443306 45.21913391123174,
-93.26580662128347 45.21769799745153, -93.26580983179628 45.217475736026664,
-93.26581322607608 45.217240685631346, -93.26590715360736
45.210737684073244, -93.26591966090616 45.209871711997586, -93.2659016992406
45.20722015227932, -93.26587484243684 45.203254836571126, -93.26585637174348
45.20052765082941, -93.26585684827346 45.19841676076085, -93.26587786763154
45.19732741144391, -93.2658624676632 45.1970879109074, -93.2659274100303
45.194004979577755, -93.26595017983325 45.191531890895845,
-93.26595423366354 45.19092534610275, -93.26593099287571 45.190637988686554,
-93.2659274057232 45.18986823069059, -93.26592485308495 45.18931973506328))'




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Intersects-spatial-query-returns-polygons-it-shouldn-t-tp4008646.html
Sent from the Solr - User mailing list archive at Nabble.com.

question about schemas

2009-12-01 Thread solr-user


I just started using Solr, and I am trying to figure out how to setup my
schema. I know that Solr doesn’t have JOINs, and so I am having some
difficulty figuring out how would I setup a schema for the following
fictional situation.  For example, let us say that :

-   I have a 1+ customers, each having some specific info (StoreId , 
Name,
Phone, Address, City, State, Zip, etc)
-   Each customer has a subset of the 100+ products I am looking to track,
each product having some specific info (ProductId, Name, Width, Height,
Depth, Weight, Density, etc)
-   I want to be able to search by the product info but have facets return 
the
number of customers, rather than the number of products, that meet my
criteria
-   I want to display (and sort) customers based on my product search

In relational databases, I would simply create two tables (customer and
product) and JOIN them.  I could then craft a sql query to count the number
of distinct StoreId values in the result (something like facets).

In Solr, however, there are no joins.  As far as I can tell, my options are
to:

-   create two Solr instances, one with customer info and one with product
info; I would search the product Solr instance and identify the StoreId
values return, and then use that info to search the customer Solr instance
to get the customer info.  The problem with this is the second query could
have ten thousand ANDs (one for each StoreId returned by the first query)
-   create a single Solr instance that contains a denormalized version of 
the
data where each doc would contain both the customer info and the product
info for a given product.  The problem with this is that my facets would
return the number of products, not the number of customers
-   create a single Solr instance that contains a denormalized version of 
the
data where each doc contains the customer info and info for ALL products
that the  customer might have (likely done via dynamicfields). The problem
with this is that my schema would be a bit messy and that my queries could
have hundreds of ANDs and Ors (one AND for each product field, and one OR
for each product); for example, q=((Width1:50 AND Density1:7) OR (Width2:50
AND Density2:7) OR …)

Does anyone have any advice on this?  Are there other schemas that might
work?  Hopefully the example makes sense.

-- 
View this message in context: 
http://old.nabble.com/question-about-schemas-tp26600956p26600956.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: question about schemas

2009-12-02 Thread solr-user


cbennett wrote:
> 
> Solr supports multi value fields so you could store one document per
> customer and have multi value fields for the product information.
> 
> Colin.
Quoted from: 
http://old.nabble.com/question-about-schemas-tp26600956p26608618.html

Thanks Colin.  From the online docs, there doesnt seem to be a way to
directly map a multivalue field value in one field to the multivalue field
value in another field (ie the first value in myMultiValueProductId wouldnt
necessarily match the first value in myMultiValueDensity or in
myMultiValueWeight)?  Is there a technique to do this?
-- 
View this message in context: 
http://old.nabble.com/question-about-schemas-tp26600956p26611715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: question about schemas

2009-12-04 Thread solr-user



Lance Norskog-2 wrote:
> 
> But, in general, this is a "shopping cart" database and Solr/Lucene may
> not be the best fit for this problem.
> 

True, every tool has strengths and weaknesses. Given how powerful Solr
appears to be, I would be surprised if I was not able to handle this use
case.


Lance Norskog-2 wrote:
> 
> You can make a separate facet field which contains a range of "buckets":
> 10, 20, 50, or 100 means that the field has a value 0-10, 11-20, 21-50, or
> 51-100. You could use a separate filter query with values for these
> buckets. Filter queries are very fast in Solr 1.4 and this would limit
> your range query execution to documents which match the buckets.
> 

Thank you for this suggestion.  I will look into this.

-- 
View this message in context: 
http://old.nabble.com/question-about-schemas-tp26600956p26636155.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: question about schemas

2009-12-07 Thread solr-user



Lance Norskog-2 wrote:
> 
> You can make a separate facet field which contains a range of "buckets":
> 10, 20, 50, or 100 means that the field has a value 0-10, 11-20, 21-50, or
> 51-100. You could use a separate filter query with values for these
> buckets. Filter queries are very fast in Solr 1.4 and this would limit
> your range query execution to documents which match the buckets.
> 

Lance, I am afraid that I do not see how to use this suggestion.

Which of the three (four?) suggested schemas would I be using?  How would
these range facets prevent the potential issues I found such as getting
product facets instead of customer facets, or having very large numbers of
ANDs and ORs, and so forth.
-- 
View this message in context: 
http://old.nabble.com/question-about-schemas-tp26600956p26679922.html
Sent from the Solr - User mailing list archive at Nabble.com.

1 2 >

1 - 100 of 127 matches

Mail list logo