Looking to merge deprecation of snapshotscli.sh in favour of bin/solr sub commands

2024-08-16 Thread David Eric Pugh
I'd like to merge SOLR-17180: Deprecate snapshotscli.sh in favour of bin/solr 
sub commands next week.   

https://github.com/apache/solr/pull/2381
I've emailed the user list for additional eyes, but with no luck.  
https://lists.apache.org/thread/l6wyx5yvw8nqxlm1wyw45hld34blvyov
I'm thinking this will land in main, but not get backported to 9x since the 
HDFS stuff hasn't received much testing/review.


Solr 9.7 Release update

2024-08-16 Thread Anshum Gupta
Hi everyone,

I started building the RC last night (Pacific Time) so please don't merge
anything into 9.7 at this point without checking with me.

I've had a few issues with GPG but they are mostly fixed now.


Guidance on new functionality for query string generation

2024-08-16 Thread Geoffrey Slinker
I have been using Apache Solr for many years in a live environment that 
services queries at 3K rpm (unless there is a campaign in progress) and updates 
from 3K to 10K rpm. The schema is quite robust with each record potentially 
having 90 fields populated. The system stores 900 million records and is hosted 
on several powerful server instances.

Back in 2015 I attended a session at Lucid Works called “Solr Unleashed”. When 
I described the system that I was building I recall the presenter saying, “Good 
luck with that.” We have had very good luck.

When I first started generating Solr query strings I did it with StringBuilder. 
That became problematic when I wanted to change a boost or a constant score for 
a query term group that had been generated previously. So, I eventually wrote 
some Java classes to provide an object structure that I could manipulate and 
navigate. It has been very helpful, and recently I revamped my query generation 
and was glad I had objects to work with instead of strings.

My employer has often encouraged the development staff to participate in the 
Open Source community and they are supportive of sharing this query generation 
functionality.

I will attach a link to the fork of Apache Solr that I am using below.

I have some questions.

1) Do these Java classes provide functionality that the community would like to 
have? Maybe there is functionality already available or similar.
2) I just made a guess in the project structure on where to put the 
functionality. Maybe it should be in SolrJ, or maybe in Lucene, or somewhere 
else.

The main or working java class is called QueryTermGrouper.

QueryTermGrouper aggregates QueryTerms and other QueryTermGroupers to form 
complex queries that can be used in a Standard Solr Query
Example:
  QueryTermGrouper grouper = new QueryTermGrouper().with(BooleanClause. 
Occur. MUST).withBoost(1.4f);
  grouper. addTerm(new QueryTerm("foo", "bar").withProximity(1));
 
  String query = grouper. toString();
 
  Output: +( foo:bar~1 )^1.4
  
Example:
  QueryTermGrouper grouper = new 
QueryTermGrouper().withConstantScore(5.0f);
  grouper. addTerm(new QueryTerm("foo", "bar").withProximity(1));
 
  String query = grouper. toString();
 
  Output: ( foo:bar~1 )^=5
  
Instead of using string manipulation to create complex query strings the 
QueryTermGrouper allows complex queries to be built inside an object model that 
can be more easily changed.
If you need to generate a query like this:
  (
(
cd:"back in black"
cd:"point of no return"
cd:"night at the opera"
)^0.3
 
(
record:destroyer
record:"the grand illusion"
)^0.5
 
  )
 
  The code to do so is as simple this:
  QueryTermGrouper grouper = new QueryTermGrouper();
  QueryTermGrouper cdGrouper = grouper. addGroup();
  QueryTermGrouper recordsGroup = grouper. addGroup();
 
  cdGrouper. addTerm(new QueryTerm("cd", "back in black"));
  cdGrouper. addTerm(new QueryTerm("cd", "point of no return"));
  cdGrouper. addTerm(new QueryTerm("cd", "night at the opera"));
  cdGrouper. setBoost(0.3f);
 
  recordsGroup. addTerm(new QueryTerm("record", "destroyer"));
  recordsGroup. addTerm(new QueryTerm("record", "the grand illusion"));
  recordsGroup. setBoost(0.5f);


The code can be found here:

https://github.com/gslinker/solr/tree/QUERY_TERM_GROUPER

Unit tests provide 100% coverage on all lines of code and on all branches in 
the code.

Please share your thoughts.

Sincerely
Geoffrey Slinker


-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Solr 9.7 Release update

2024-08-16 Thread David Smiley
The key stuff is the biggest pain.  I'm glad you're through it.

On Fri, Aug 16, 2024 at 12:30 PM Anshum Gupta  wrote:
>
> Hi everyone,
>
> I started building the RC last night (Pacific Time) so please don't merge
> anything into 9.7 at this point without checking with me.
>
> I've had a few issues with GPG but they are mostly fixed now.

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Guidance on new functionality for query string generation

2024-08-16 Thread David Smiley
Hello Geoffrey,

Thanks for your message and offer.

I think the overall idea is nice but if we got more serious, we'd want
to bike-shed on a number of details.  Like naming ("grouper" is
dubious to me) and explore further simplifications.  Like why
"addTerm(new QueryTerm("cd", "back in black")" when you could do
"addTerm("cd", "back in black")" ?  And I suspect you are confusing a
"query term" (ultimately a TermQuery in Lucene) with a query/clause
generally.  So much bikeshedding here that we'd probably start from
scratch to be honest.

If we hypothetically incorporated this immediately, where in the
codebase would it be used (give a specific example)?  If it's nowhere
at all, it might be an awkward thing to include.  Maybe there's 100%
test coverage but I suspect no tests for if the string is actually
parseable and parsed as-intended (i.e. has the Query structure).  If
it's for users of Solr (which I believe is your intention), it should
live in SolrJ but you placed it in solr-core.

~ David

On Fri, Aug 16, 2024 at 3:48 PM Geoffrey Slinker
 wrote:
>
> I have been using Apache Solr for many years in a live environment that 
> services queries at 3K rpm (unless there is a campaign in progress) and 
> updates from 3K to 10K rpm. The schema is quite robust with each record 
> potentially having 90 fields populated. The system stores 900 million records 
> and is hosted on several powerful server instances.
>
> Back in 2015 I attended a session at Lucid Works called “Solr Unleashed”. 
> When I described the system that I was building I recall the presenter 
> saying, “Good luck with that.” We have had very good luck.
>
> When I first started generating Solr query strings I did it with 
> StringBuilder. That became problematic when I wanted to change a boost or a 
> constant score for a query term group that had been generated previously. So, 
> I eventually wrote some Java classes to provide an object structure that I 
> could manipulate and navigate. It has been very helpful, and recently I 
> revamped my query generation and was glad I had objects to work with instead 
> of strings.
>
> My employer has often encouraged the development staff to participate in the 
> Open Source community and they are supportive of sharing this query 
> generation functionality.
>
> I will attach a link to the fork of Apache Solr that I am using below.
>
> I have some questions.
>
> 1) Do these Java classes provide functionality that the community would like 
> to have? Maybe there is functionality already available or similar.
> 2) I just made a guess in the project structure on where to put the 
> functionality. Maybe it should be in SolrJ, or maybe in Lucene, or somewhere 
> else.
>
> The main or working java class is called QueryTermGrouper.
>
> QueryTermGrouper aggregates QueryTerms and other QueryTermGroupers to form 
> complex queries that can be used in a Standard Solr Query
> Example:
>   QueryTermGrouper grouper = new 
> QueryTermGrouper().with(BooleanClause. Occur. MUST).withBoost(1.4f);
>   grouper. addTerm(new QueryTerm("foo", "bar").withProximity(1));
>
>   String query = grouper. toString();
>
>   Output: +( foo:bar~1 )^1.4
>
> Example:
>   QueryTermGrouper grouper = new 
> QueryTermGrouper().withConstantScore(5.0f);
>   grouper. addTerm(new QueryTerm("foo", "bar").withProximity(1));
>
>   String query = grouper. toString();
>
>   Output: ( foo:bar~1 )^=5
>
> Instead of using string manipulation to create complex query strings the 
> QueryTermGrouper allows complex queries to be built inside an object model 
> that can be more easily changed.
> If you need to generate a query like this:
>   (
> (
> cd:"back in black"
> cd:"point of no return"
> cd:"night at the opera"
> )^0.3
>
> (
> record:destroyer
> record:"the grand illusion"
> )^0.5
>
>   )
>
>   The code to do so is as simple this:
>   QueryTermGrouper grouper = new QueryTermGrouper();
>   QueryTermGrouper cdGrouper = grouper. addGroup();
>   QueryTermGrouper recordsGroup = grouper. addGroup();
>
>   cdGrouper. addTerm(new QueryTerm("cd", "back in black"));
>   cdGrouper. addTerm(new QueryTerm("cd", "point of no return"));
>   cdGrouper. addTerm(new QueryTerm("cd", "night at the opera"));
>   cdGrouper. setBoost(0.3f);
>
>   recordsGroup. addTerm(new QueryTerm("record", "destroyer"));
>   recordsGroup. addTerm(new QueryTerm("record", "the grand 
> illusion"));
>   recordsGroup. setBoost(0.5f);
>
>
> The code can be found here:
>
> https://github.com/gslinker/solr/tree/QUERY_TERM_GROUPER
>
> Unit tests provide 100% coverage on all lines of code and on all branches in 
> the code.
>
> Please share your thoughts.
>
> Sincerely
> Geoffrey Slinker
>
>
> ---

Re: Guidance on new functionality for query string generation

2024-08-16 Thread Geoffrey Slinker
David,

Thank you for your insights and thank you for your work on Apache Solr.

You are correct, this is for end users. Many typically build up their Solr 
query using StringBuilder or String concatenation, then instantiate a SolrJ 
SolrQuery object and pass in the query string.

I would like to get these classes named correctly for the Solr domain. 
QueryTerm is actually legacy from my work where we use an in-house indexing and 
query engine and have over the years moved some things to Solr and some to 
Elastic Search.

I base the naming of classes somewhat on : 
https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html

Term - single word
Phrase - group of words surrounded by quotes.
Field - item defined by the schema
Grouping is used to form "sub-queries"

What is the proper domain name for:
title:"pink panther"

title -> field
"pink panther" -> clause? Terms? 

I have been making SolrJ queries for many years now and I am comfortable doing 
so. Other developers look at the interface of Solr and compare that to the 
query builder of elastic-search and decide it is easier to work in Elastic 
Search. I am building a new system that has a small amount of data and decided 
to look at Elastic Search as a possible solution. The first review I read said 
that building queries is easier in Elastic Search. Since I have been using my 
own query building tools for years I hadn't considered that it was a "selling 
point".

- Geoffrey



> On Aug 16, 2024, at 3:50 PM, David Smiley  wrote:
> 
> Hello Geoffrey,
> 
> Thanks for your message and offer.
> 
> I think the overall idea is nice but if we got more serious, we'd want
> to bike-shed on a number of details.  Like naming ("grouper" is
> dubious to me) and explore further simplifications.  Like why
> "addTerm(new QueryTerm("cd", "back in black")" when you could do
> "addTerm("cd", "back in black")" ?  And I suspect you are confusing a
> "query term" (ultimately a TermQuery in Lucene) with a query/clause
> generally.  So much bikeshedding here that we'd probably start from
> scratch to be honest.
> 
> If we hypothetically incorporated this immediately, where in the
> codebase would it be used (give a specific example)?  If it's nowhere
> at all, it might be an awkward thing to include.  Maybe there's 100%
> test coverage but I suspect no tests for if the string is actually
> parseable and parsed as-intended (i.e. has the Query structure).  If
> it's for users of Solr (which I believe is your intention), it should
> live in SolrJ but you placed it in solr-core.
> 
> ~ David
> 
> On Fri, Aug 16, 2024 at 3:48 PM Geoffrey Slinker
>  wrote:
>> 
>> I have been using Apache Solr for many years in a live environment that 
>> services queries at 3K rpm (unless there is a campaign in progress) and 
>> updates from 3K to 10K rpm. The schema is quite robust with each record 
>> potentially having 90 fields populated. The system stores 900 million 
>> records and is hosted on several powerful server instances.
>> 
>> Back in 2015 I attended a session at Lucid Works called “Solr Unleashed”. 
>> When I described the system that I was building I recall the presenter 
>> saying, “Good luck with that.” We have had very good luck.
>> 
>> When I first started generating Solr query strings I did it with 
>> StringBuilder. That became problematic when I wanted to change a boost or a 
>> constant score for a query term group that had been generated previously. 
>> So, I eventually wrote some Java classes to provide an object structure that 
>> I could manipulate and navigate. It has been very helpful, and recently I 
>> revamped my query generation and was glad I had objects to work with instead 
>> of strings.
>> 
>> My employer has often encouraged the development staff to participate in the 
>> Open Source community and they are supportive of sharing this query 
>> generation functionality.
>> 
>> I will attach a link to the fork of Apache Solr that I am using below.
>> 
>> I have some questions.
>> 
>> 1) Do these Java classes provide functionality that the community would like 
>> to have? Maybe there is functionality already available or similar.
>> 2) I just made a guess in the project structure on where to put the 
>> functionality. Maybe it should be in SolrJ, or maybe in Lucene, or somewhere 
>> else.
>> 
>> The main or working java class is called QueryTermGrouper.
>> 
>> QueryTermGrouper aggregates QueryTerms and other QueryTermGroupers to form 
>> complex queries that can be used in a Standard Solr Query
>> Example:
>>  QueryTermGrouper grouper = new 
>> QueryTermGrouper().with(BooleanClause. Occur. MUST).withBoost(1.4f);
>>  grouper. addTerm(new QueryTerm("foo", "bar").withProximity(1));
>> 
>>  String query = grouper. toString();
>> 
>>  Output: +( foo:bar~1 )^1.4
>> 
>> Example:
>>  QueryTermGrouper grouper = new 
>> QueryTermGrouper().withConstantScore(5.0f);
>>  grouper. addTerm(new QueryTerm("foo",