Reverse-engineering existing installation

2019-05-02 Thread Doug Reeder
The documentation for SOLR is good.  However it is oriented toward setting
up a new installation, with the data model known.

I have inherited an existing installation.  Aspects of the data model I
know, but there's a lot of ways things could have been configured in SOLR,
and for some cases, I don't know what SOLR was supposed to do.

Can you reccomend any documentation on working out the configuration of an
existing installation?


Re: Reverse-engineering existing installation

2019-05-03 Thread Doug Reeder
Thanks! Alexandre's presentation is helpful in understanding what's not
essential.  David's suggesting of comparing config files is good - I'll
have to see if I can dig up the config files for version 4.2, which we're
currently running.

I'll also look into updating to a supported version. I guess I'll be
reading https://lucene.apache.org/solr/guide/6_6/upgrading-solr.html and
the similar ones for later versions.  Is an upgrade guide for version 4 to
5 still around somewhere?

On Fri, May 3, 2019 at 12:21 AM David Smiley 
wrote:

> Consider trying to diff configs from a default at the version it was copied
> from, if possible. Even better, the configs should be in source control and
> then you can browse history with commentary and sometimes links to issue
> trackers and code reviews.
>
> Also a big part that you can’t see by staring at configs is what the
> queries look like. You should examine the system interacting with Solr to
> observe embedded comments/docs for insights.
>
> On Thu, May 2, 2019 at 11:21 PM Doug Reeder 
> wrote:
>
> > The documentation for SOLR is good.  However it is oriented toward
> setting
> > up a new installation, with the data model known.
> >
> > I have inherited an existing installation.  Aspects of the data model I
> > know, but there's a lot of ways things could have been configured in
> SOLR,
> > and for some cases, I don't know what SOLR was supposed to do.
> >
> > Can you reccomend any documentation on working out the configuration of
> an
> > existing installation?
> >
> --
> Sent from Gmail Mobile
>


Re: Reverse-engineering existing installation

2019-05-03 Thread Doug Reeder
Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
need to strip the comments before we can get a useful diff of
solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
hasn't been updated in years.


On Fri, May 3, 2019 at 4:24 PM Shawn Heisey  wrote:

> On 5/3/2019 1:44 PM, Erick Erickson wrote:
> > Then git will let you check out any previous branch. 4.2 is from before
> we switched to Git, co I’m not sure you can go that far back, but 4x is
> probably close enough for comparing configs.
>
> Git has all of Lucene's history, and most of Solr's history, back to
> when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
> releases are there:
>
> 
> elyograg@smeagol:~/asf/lucene-solr$ git checkout
> releases/lucene-solr/4.2.1
> Checking out files: 100% (13209/13209), done.
> Note: checking out 'releases/lucene-solr/4.2.1'.
>
> You are in 'detached HEAD' state. You can look around, make experimental
> changes and commit them, and you can discard any commits you make in
> this state without impacting any branches by performing another checkout.
>
> If you want to create a new branch to retain commits you create, you may
> do so (now or later) by using -b with the checkout command again. Example:
>
>git checkout -b 
>
> HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
> 
>
> Thanks,
> Shawn
>


Re: Reverse-engineering existing installation

2019-05-06 Thread Doug Reeder
Thanks, xmlstarlet makes it straightforward to get the canonical XML.

It looks like our schema.xml files are rather different from files
like solr/example/solr/collection1/conf/schema.xml

Any suggestions of sections I should focus on?

On Sat, May 4, 2019 at 8:11 AM Alexandre Rafalovitch 
wrote:

> XMLStarlet still works just fine. So if you want the fast way, that is the
> one.
>
> Otherwise, some xml editors can do it (not sure which ones) or you can look
> for XSLT or XQuery examples on the web.
>
> XMLStarlet actually just spits out XSLT internally, or even externally if
> you ask.
>
> Regards,
>  Alex
>
>
> On Fri, May 3, 2019, 10:30 PM Doug Reeder, 
> wrote:
>
> > Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
> > need to strip the comments before we can get a useful diff of
> > solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
> > files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
> > hasn't been updated in years.
> >
> >
> > On Fri, May 3, 2019 at 4:24 PM Shawn Heisey  wrote:
> >
> > > On 5/3/2019 1:44 PM, Erick Erickson wrote:
> > > > Then git will let you check out any previous branch. 4.2 is from
> before
> > > we switched to Git, co I’m not sure you can go that far back, but 4x is
> > > probably close enough for comparing configs.
> > >
> > > Git has all of Lucene's history, and most of Solr's history, back to
> > > when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
> > > releases are there:
> > >
> > > 
> > > elyograg@smeagol:~/asf/lucene-solr$ git checkout
> > > releases/lucene-solr/4.2.1
> > > Checking out files: 100% (13209/13209), done.
> > > Note: checking out 'releases/lucene-solr/4.2.1'.
> > >
> > > You are in 'detached HEAD' state. You can look around, make
> experimental
> > > changes and commit them, and you can discard any commits you make in
> > > this state without impacting any branches by performing another
> checkout.
> > >
> > > If you want to create a new branch to retain commits you create, you
> may
> > > do so (now or later) by using -b with the checkout command again.
> > Example:
> > >
> > >git checkout -b 
> > >
> > > HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
> > > 
> > >
> > > Thanks,
> > > Shawn
> > >
> >
>


Softer version of grouping and/or filter query

2019-05-08 Thread Doug Reeder
We have a query to return products related to a given product. To give some
variety to the results, we group by vendor:
group=true&group.main=true&group.field=merchantId

We need at least four results to display. Unfortunately, some categories
don't have a lot of products, and grouping takes us (say) from five results
to three.

Can I "soften" the grouping, so other products by the same vendor will
appear in the results, but with much lower score?


Similarly, we have a filter query that only returns products over $150:
fq=price:[150+TO+*]

Can this be changed to a q or qf parameter where products less than $150
have score less than any product priced $150 or more? (A price higher than
$150 should not increase the score.)


Re: Softer version of grouping and/or filter query

2019-05-10 Thread Doug Reeder
Thanks much!  I dropped price from the fq term, changed to an edismax
parser, and boosted with
bq=price:[150+TO+*]^100



On Thu, May 9, 2019 at 7:21 AM Edward Ribeiro 
wrote:

> Em qua, 8 de mai de 2019 18:56, Doug Reeder 
> escreveu:
>
> >
> > Similarly, we have a filter query that only returns products over $150:
> > fq=price:[150+TO+*]
> >
> > Can this be changed to a q or qf parameter where products less than $150
> > have score less than any product priced $150 or more? (A price higher
> than
> > $150 should not increase the score.)
> >
>
> If you are using edismax then you could use boost function. Maybe something
> along those: bf=if(lt(price, 150), 0.5, 100)
>
> Your fq already filters out documents with prices less than 150. Using a
> boost (function/query) will retrieve back docs with prices less than 150,
> but probably with smaller scores.
>
> Edward
>
> >
>