Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-06 Thread Gus Heck
Keep in mind Yoniks law of half baked patches:

"A half-baked patch with no documentation, no tests and no backwards
compatibility is better than no patch at all."

On Sun, Mar 5, 2023 at 5:43 PM Fikavec F  wrote:

> Thanks. In the coming days I will conduct testing and measurements on real
> hardware.
>Unfortunately my code is not ready to become part of the project
> directly, since this is a very serious place for changes and I am not a
> Java developer, I am not deeply familiar with the work of internal Solr
> mechanisms, the code has no tests, it does not support modes and parameters
> like the original wt=json, and I myself have a number of questions about
> code, but it would be great if someone from knowledgeable professionals
> would check my code and prepare a high-quality patch, as previously Mikhail
> Khludnev helped me here get a patch with a modified buffer. As before, I am
> happy to take part in testing such a patch, if it appears. All I did was
> replace SmileResponseWriter with JsonFactory in the source code, as I wrote
> earlier. I'm not sure that viewing my low-quality code will help
> professionals more than knowing at which part of the code there is a 4x+
> slowdown from possible speeds in order to revise and improve it.
>
>I'm prepared a repository and share the code with the changes made -
> https://github.com/Fikavec/NewAndModifiedSolrResponseWriters
> The first commit with the code of the original SmileResponseWriter so that
> it would be convenient to see what small changes I made. I placed all jar's
> from bin folders in ...
> /solr-8.11.2/server/solr-webapp/webapp/WEB-INF/lib/* and connected them via
> collection solrconfig.xml:
>
>  class="my.MyJacksonJsonResponseWriter">
>  class="my.MyJacksonCBORResponseWriter">
>
> Then I created a collection and used them as wt=myfastjson and
> wt=myfastcbor query parameters.
>
>Please let me know if there are problems in my code, especially the
> place with utf-8 raises the question, since I do not know in which encoding
> Solr transmits data to writers, Michael Gibney mentioned that in utf-16 ->
> utf-8 --> writer, in addition, there are methods writeString and
> writeRawUTF8String in jackson (
> https://fasterxml.github.io/jackson-core/javadoc/2.13/com/fasterxml/jackson/core/JsonGenerator.html)
> which one is needed after Solr passes the data to writer?
>
> Method similar to writeString(String)
> 
>  but
> that takes as its input a UTF-8 encoded String that is to be output as-is,
> without additional escaping (type of which depends on data format;
> backslashes for JSON). However, quoting that data format requires (like
> double-quotes for JSON) will be added around the value if and as necessary.
>
> Note that some backends may choose not to support this method: for
> example, if underlying destination is a Writer
> 
>  using
> this method would require UTF-8 decoding. If so, implementation may instead
> choose to throw a UnsupportedOperationException
> 
>  due
> to ineffectiveness of having to decode input.
>
> I checked my code on different utf-8 data, I didn't find any problems, but
> suddenly I used the wrong function (writeString) and there are cases when
> the data will be corrupted...
>
>Speeding up the json output would be useful to many people, but I'm not
> sure about CBOR. It turned out that CBOR is easily added (like other data
> formats from the fasterxml jackson library
> https://github.com/FasterXML/jackson#data-format-modules it is possible
> that csv, xml... will work faster with this library than the current
> implementation) as ResponseWriter, python is well supported (cbor2 fast)
> and full data fetching with cursors works 10%-20% faster than fetching data
> from Solr to python via JSON format (*this means faster in comparison with
> the modified json serializer on jackson **in python I use orjson library
> which is faster than a regular json library). I didn't find any very fast
> smile format python desereliazator, but this does not mean that many people
> needs CBOR.
>
>At the moment, everything works for me on my collections and their data
> structures and works very fast. It was surprising to me that the speed of
> regular json select with gzip has almost doubled, this could potentially
> lead to upper rps, since at full load individual server responses will
> return and end faster, I will try to check this too on real hardware using
> wrk benchmarking tool.
>
> Best Regards,
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Welcome Andy Webb as Solr committer

2023-03-06 Thread Jan Høydahl
Hi all,

I'm pleased to announce that Andy Webb has accepted the PMC's
invitation to become a committer.

Andy, the tradition is that new committers introduce themselves with a
brief bio.

Congratulations and welcome!

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Jason Gerlowski
Welcome Andy, and congratulations!

On Mon, Mar 6, 2023 at 6:16 AM Jan Høydahl  wrote:
>
> Hi all,
>
> I'm pleased to announce that Andy Webb has accepted the PMC's
> invitation to become a committer.
>
> Andy, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: preferredLeader is useless during election or node restart

2023-03-06 Thread Pierre Salagnac
I discussed this issue offline with David, and I'm now working on a code
change to make the preferredLeader to become the leader when we register a
replica.

The idea is, when we register a replica from Zookeeper, we check whether it
has the preferred leader flag. When true, we tell the current leader to
stop being the leader right after we joined the election queue. This can we
done by sending REJOINLEADERELECTION command to the current leader.

Now, my issue is this works great with 2 replicas. When having 3 or more,
the REJOINLEADERELECTION does not have the intended effect.
Looking deeper in LeaderElector class, I figured out the preferred leader
replica does not join the election queue right after the current leader. It
usually joins as second in the queue (one more between the current leader
and where we join the queue).

Then, the RebalanceLeader command moves all the candidates with the same
sequence number as the preferred leader to the end of the queue.


=> So my question is: for the preferred leader, why don't we join the
election right after the current leader?


Current implementation of the RebalanceLeaders commands is:
- if not already the case, ask the preferred leader to rejoin at head
- ask all nodes with same sequence number as the preferred leader to rejoin
at end of the queue
- ask current leader to rejoin at end of the queue

By using the same sequence number as the current leader, we would not have
to ask several nodes to rejoin at the end of the queue in most of the cases.
For most of the cases, RebalanceLeaders command would just be:
- if not already the case, ask preferred leader to rejoin at head
- ask current leader to rejoin at end of the queue

We should keep the logic of checking other nodes with the same sequence
number, but no such nodes will not exist in most of the cases.

Le lun. 27 févr. 2023 à 18:42, David Smiley  a écrit :

> I found this existing issue:
> https://issues.apache.org/jira/browse/SOLR-8238
> I commented on it just now.  Erick isn't around anymore but I'd appreciate
> input from anyone using "preferredLeader".
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Feb 20, 2023 at 12:49 PM David Smiley  wrote:
>
> > Seems like a bug to me!
> > Recommended reading: https://issues.apache.org/jira/browse/SOLR-6491
> > There's a treasure trove of information in JIRA to learn about how code
> > comes to be; what were the intentions behind features; what alternatives
> > were explored; pros & cons.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Mon, Feb 20, 2023 at 9:04 AM Bruno Roustant  >
> > wrote:
> >
> >> After many tests and deployments, it appears the preferredLeader flag
> >> described in the RebalanceLeader command doc [1] is not useful.
> >> It is taken into account only during the rebalance command. Afterwards,
> if
> >> there is a leader election or some node restart, it is ignored.
> >>
> >> Is this preferredLeader useless?
> >> I thought to use it to make leadership kind of sticky, but in practice
> the
> >> leadership assignment quickly returns to randomness. So, what was the
> >> purpose of this flag for the rebalance command, really a one-shot leader
> >> assignment, ignored after? Or is it a bug?
> >>
> >> Indeed only the rebalance leader command doc talks about this replica
> >> property. It is not mentioned elsewhere. But if it is ignored elsewhere,
> >> it's not of a great help.
> >>
> >> Should I enter a bug on preferredLeader property not respected during
> >> leader elections?
> >>
> >> [1]
> >>
> >>
> https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#rebalanceleaders
> >>
> >
>


Re: [DISCUSS] Community Virtual Meetup, March 2023

2023-03-06 Thread Jason Gerlowski
Ishan, Dave - I copied the potential discussion topics you mentioned
to the wiki page.  Of course, that doesn't imply any commitment or
pressure - even to attend.  Just wanted to bootstrap the list of
topics with something.

All - reminder to add your own potential discussion topics to the wiki
page so others can get a sense for the agenda and decide whether to
attend!

Best,

Jason

On Thu, Mar 2, 2023 at 10:37 AM Jason Gerlowski  wrote:
>
> Alright, thanks everyone for the input!
>
> I've scheduled a Google Meet (meet.google.com/fso-aqtw-fdk) for March
> 8th at 2pm ET, and there's a Confluence page for the meeting here
> (https://cwiki.apache.org/confluence/display/SOLR/2023-03-08+Meeting+notes)
> where you can add any topics for potential discussion.
>
> See you all online next week!
>
> Jason
>
>
> On Tue, Feb 28, 2023 at 6:48 PM Noble Paul  wrote:
> >
> > Thanks Jason
> > I have no objections to March 8
> >
> >
> > On Wed, Mar 1, 2023 at 6:48 AM Jason Gerlowski 
> > wrote:
> >
> > > > but what does it mean to host a Virtual meetup?
> > >
> > > Mostly it's just handling the administrative tasks that help the
> > > meetup happen and go smoothly: scheduling the video call, spreading
> > > the word in various places, creating a Confluence page for folks to
> > > propose their discussion topics, etc.  Maybe I'll "host" this one, and
> > > then I can put together a checklist or something so it's easier for
> > > other volunteers going forward.
> > >
> > > > To be clear "2 hrs later"
> > >
> > > Sure, works for me.  I choose noon ET last time because that was 5pm
> > > in the UK which still seemed somewhat do-able.  But I don't think
> > > anyone out that way logged on last time.  So let's do the shift you
> > > suggested this time and see how that goes.
> > >
> > > In terms of the date, let's aim for March 8th?  I'll put together the
> > > Confluence page shortly for folks to add their potential topics to.
> > >
> > > Jason
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > For additional commands, e-mail: dev-h...@solr.apache.org
> > >
> > >
> >
> > --
> > -
> > Noble Paul

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Houston Putman
Welcome Andy!

- Houston

On Mon, Mar 6, 2023 at 7:31 AM Jason Gerlowski 
wrote:

> Welcome Andy, and congratulations!
>
> On Mon, Mar 6, 2023 at 6:16 AM Jan Høydahl  wrote:
> >
> > Hi all,
> >
> > I'm pleased to announce that Andy Webb has accepted the PMC's
> > invitation to become a committer.
> >
> > Andy, the tradition is that new committers introduce themselves with a
> > brief bio.
> >
> > Congratulations and welcome!
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>


Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Justin Sweeney
Welcome Andy!

On Mon, Mar 6, 2023 at 11:05 AM Houston Putman  wrote:

> Welcome Andy!
>
> - Houston
>
> On Mon, Mar 6, 2023 at 7:31 AM Jason Gerlowski 
> wrote:
>
> > Welcome Andy, and congratulations!
> >
> > On Mon, Mar 6, 2023 at 6:16 AM Jan Høydahl 
> wrote:
> > >
> > > Hi all,
> > >
> > > I'm pleased to announce that Andy Webb has accepted the PMC's
> > > invitation to become a committer.
> > >
> > > Andy, the tradition is that new committers introduce themselves with a
> > > brief bio.
> > >
> > > Congratulations and welcome!
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > For additional commands, e-mail: dev-h...@solr.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >
>


Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Kevin Risden
Welcome Andy!

Kevin Risden


On Mon, Mar 6, 2023 at 11:07 AM Justin Sweeney 
wrote:

> Welcome Andy!
>
> On Mon, Mar 6, 2023 at 11:05 AM Houston Putman  wrote:
>
> > Welcome Andy!
> >
> > - Houston
> >
> > On Mon, Mar 6, 2023 at 7:31 AM Jason Gerlowski 
> > wrote:
> >
> > > Welcome Andy, and congratulations!
> > >
> > > On Mon, Mar 6, 2023 at 6:16 AM Jan Høydahl 
> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I'm pleased to announce that Andy Webb has accepted the PMC's
> > > > invitation to become a committer.
> > > >
> > > > Andy, the tradition is that new committers introduce themselves with
> a
> > > > brief bio.
> > > >
> > > > Congratulations and welcome!
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > > For additional commands, e-mail: dev-h...@solr.apache.org
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > For additional commands, e-mail: dev-h...@solr.apache.org
> > >
> > >
> >
>


Re: [DISCUSS] Community Virtual Meetup, March 2023

2023-03-06 Thread Ishan Chattopadhyaya
Thanks Jason

On Mon, 6 Mar, 2023, 9:23 pm Jason Gerlowski,  wrote:

> Ishan, Dave - I copied the potential discussion topics you mentioned
> to the wiki page.  Of course, that doesn't imply any commitment or
> pressure - even to attend.  Just wanted to bootstrap the list of
> topics with something.
>
> All - reminder to add your own potential discussion topics to the wiki
> page so others can get a sense for the agenda and decide whether to
> attend!
>
> Best,
>
> Jason
>
> On Thu, Mar 2, 2023 at 10:37 AM Jason Gerlowski 
> wrote:
> >
> > Alright, thanks everyone for the input!
> >
> > I've scheduled a Google Meet (meet.google.com/fso-aqtw-fdk) for March
> > 8th at 2pm ET, and there's a Confluence page for the meeting here
> > (
> https://cwiki.apache.org/confluence/display/SOLR/2023-03-08+Meeting+notes)
> > where you can add any topics for potential discussion.
> >
> > See you all online next week!
> >
> > Jason
> >
> >
> > On Tue, Feb 28, 2023 at 6:48 PM Noble Paul  wrote:
> > >
> > > Thanks Jason
> > > I have no objections to March 8
> > >
> > >
> > > On Wed, Mar 1, 2023 at 6:48 AM Jason Gerlowski 
> > > wrote:
> > >
> > > > > but what does it mean to host a Virtual meetup?
> > > >
> > > > Mostly it's just handling the administrative tasks that help the
> > > > meetup happen and go smoothly: scheduling the video call, spreading
> > > > the word in various places, creating a Confluence page for folks to
> > > > propose their discussion topics, etc.  Maybe I'll "host" this one,
> and
> > > > then I can put together a checklist or something so it's easier for
> > > > other volunteers going forward.
> > > >
> > > > > To be clear "2 hrs later"
> > > >
> > > > Sure, works for me.  I choose noon ET last time because that was 5pm
> > > > in the UK which still seemed somewhat do-able.  But I don't think
> > > > anyone out that way logged on last time.  So let's do the shift you
> > > > suggested this time and see how that goes.
> > > >
> > > > In terms of the date, let's aim for March 8th?  I'll put together the
> > > > Confluence page shortly for folks to add their potential topics to.
> > > >
> > > > Jason
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > > For additional commands, e-mail: dev-h...@solr.apache.org
> > > >
> > > >
> > >
> > > --
> > > -
> > > Noble Paul
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>


Re: [DISCUSS] Community Virtual Meetup, March 2023

2023-03-06 Thread Noble Paul
Can you please attach a calendar invite to that?

On Tue, Mar 7, 2023 at 4:05 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Thanks Jason
>
> On Mon, 6 Mar, 2023, 9:23 pm Jason Gerlowski, 
> wrote:
>
> > Ishan, Dave - I copied the potential discussion topics you mentioned
> > to the wiki page.  Of course, that doesn't imply any commitment or
> > pressure - even to attend.  Just wanted to bootstrap the list of
> > topics with something.
> >
> > All - reminder to add your own potential discussion topics to the wiki
> > page so others can get a sense for the agenda and decide whether to
> > attend!
> >
> > Best,
> >
> > Jason
> >
> > On Thu, Mar 2, 2023 at 10:37 AM Jason Gerlowski 
> > wrote:
> > >
> > > Alright, thanks everyone for the input!
> > >
> > > I've scheduled a Google Meet (meet.google.com/fso-aqtw-fdk) for March
> > > 8th at 2pm ET, and there's a Confluence page for the meeting here
> > > (
> >
> https://cwiki.apache.org/confluence/display/SOLR/2023-03-08+Meeting+notes)
> > > where you can add any topics for potential discussion.
> > >
> > > See you all online next week!
> > >
> > > Jason
> > >
> > >
> > > On Tue, Feb 28, 2023 at 6:48 PM Noble Paul 
> wrote:
> > > >
> > > > Thanks Jason
> > > > I have no objections to March 8
> > > >
> > > >
> > > > On Wed, Mar 1, 2023 at 6:48 AM Jason Gerlowski <
> gerlowsk...@gmail.com>
> > > > wrote:
> > > >
> > > > > > but what does it mean to host a Virtual meetup?
> > > > >
> > > > > Mostly it's just handling the administrative tasks that help the
> > > > > meetup happen and go smoothly: scheduling the video call, spreading
> > > > > the word in various places, creating a Confluence page for folks to
> > > > > propose their discussion topics, etc.  Maybe I'll "host" this one,
> > and
> > > > > then I can put together a checklist or something so it's easier for
> > > > > other volunteers going forward.
> > > > >
> > > > > > To be clear "2 hrs later"
> > > > >
> > > > > Sure, works for me.  I choose noon ET last time because that was
> 5pm
> > > > > in the UK which still seemed somewhat do-able.  But I don't think
> > > > > anyone out that way logged on last time.  So let's do the shift you
> > > > > suggested this time and see how that goes.
> > > > >
> > > > > In terms of the date, let's aim for March 8th?  I'll put together
> the
> > > > > Confluence page shortly for folks to add their potential topics to.
> > > > >
> > > > > Jason
> > > > >
> > > > >
> -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > > > For additional commands, e-mail: dev-h...@solr.apache.org
> > > > >
> > > > >
> > > >
> > > > --
> > > > -
> > > > Noble Paul
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >
>


-- 
-
Noble Paul


Re: [DISCUSS] Community Virtual Meetup, March 2023

2023-03-06 Thread Ishan Chattopadhyaya
BEGIN:VCALENDAR
VERSION:1.0
TZ:+05:30
BEGIN:VEVENT
COMPLETED:20230308T21Z
DTSTART:20230308T19Z
CLASS:PUBLIC
DTEND:20230308T21Z
LOCATION;CHARSET=GBK;ENCODING=QUOTED-PRINTABLE:http://meet.google.com/fso-aqtw-fdk
SUMMARY;CHARSET=GBK;ENCODING=QUOTED-PRINTABLE:2nd Virtual Solr Community meetup
STATUS:CONFIRMED
AALARM:20230308T185000;;;
EVENTTYPEEXT:NORMALEVENTTYPE
EVENTCALENDARTYPE:SOLAR
END:VEVENT
END:VCALENDAR
BEGIN:VCALENDAR
VERSION:1.0
TZ:+05:30
BEGIN:VEVENT
COMPLETED:20230308T21Z
DTSTART:20230308T19Z
CLASS:PUBLIC
DTEND:20230308T21Z
LOCATION;CHARSET=GBK;ENCODING=QUOTED-PRINTABLE:http://meet.google.com/fso-aqtw-fdk
SUMMARY;CHARSET=GBK;ENCODING=QUOTED-PRINTABLE:2nd Virtual Solr Community meetup
STATUS:CONFIRMED
AALARM:20230308T185000;;;
EVENTTYPEEXT:NORMALEVENTTYPE
EVENTCALENDARTYPE:SOLAR
END:VEVENT
END:VCALENDAR

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Andy Webb
hi all, thank you for the invitation and welcome messages - this has been
an unexpected honour!

I'm currently Technical Architect for Search at the BBC, based in
Manchester, UK. I've been using Solr since about 2012 and made a few small
contributions over the last few years - thanks again to all the committers
who've helped me with those.

I also love mountain biking and help to run a local club, often getting out
for rides in beautiful locations like the Peak District, Lake District,
Wales and Scotland.

Andy

On Mon, 6 Mar 2023, 11:16 Jan Høydahl,  wrote:

> Hi all,
>
> I'm pleased to announce that Andy Webb has accepted the PMC's
> invitation to become a committer.
>
> Andy, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>


Search timeouts; recent Lucene changes

2023-03-06 Thread David Smiley
While reviewing Lucene 9.5 changes coming into Solr, I noticed some changes
relating to the ability to specify timeAllowed in Solr on a search query.
Solr uses both ExitableDirectoryReader and TimeLimitingCollector from
Lucene for this (complementary to each other).  Unfortunately, changes in
Lucene will make the cost of ExitableDirectoryReader wrapping happen for
all queries into Solr, even those not using timeAllowed.  Options to keep
EDR aren't good -- fork it basically.  Anecdotally, I think I've heard the
overhead is not trivial and my intuition thinks likewise.  Meanwhile,
Lucene 9.3 added a new TimeLimitingBulkScorer which even gets first class
integration into IndexSearcher which has a timeout.  It's been
incrementally improved, and I really like its approach, probable
performance, and simplicity.  It should be straightforward to integrate
this into SolrIndexSearcher and also only do so for queries specifying
timeAllowed.  I'm not sure TimeLimitingCollector offers much value to using
TLBS other than additional precision on timeAllowed at some cost to
unselective queries.

I think doing this should block Solr 9.2 using Lucene 9.5.  Alternatively,
someone might benchmark the state of things and see that things aren't so
bad as they may seem.  But that takes work too.

[1] QueryTimeout.isTimeoutEnabled is gone:
https://github.com/apache/lucene/pull/11954
[2] TimeLimitingBulkScorer in LUCENE-10151
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TimeLimitingBulkScorer.java

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread David Smiley
Thanks for your contributions Andy; keep'em coming!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 6, 2023 at 2:54 PM Andy Webb  wrote:

> hi all, thank you for the invitation and welcome messages - this has been
> an unexpected honour!
>
> I'm currently Technical Architect for Search at the BBC, based in
> Manchester, UK. I've been using Solr since about 2012 and made a few small
> contributions over the last few years - thanks again to all the committers
> who've helped me with those.
>
> I also love mountain biking and help to run a local club, often getting out
> for rides in beautiful locations like the Peak District, Lake District,
> Wales and Scotland.
>
> Andy
>
> On Mon, 6 Mar 2023, 11:16 Jan Høydahl,  wrote:
>
> > Hi all,
> >
> > I'm pleased to announce that Andy Webb has accepted the PMC's
> > invitation to become a committer.
> >
> > Andy, the tradition is that new committers introduce themselves with a
> > brief bio.
> >
> > Congratulations and welcome!
> >
>


Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Alessandro Benedetti
Welcome on board Andy! Well deserved :)
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Mon, 6 Mar 2023 at 21:02, David Smiley  wrote:

> Thanks for your contributions Andy; keep'em coming!
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Mar 6, 2023 at 2:54 PM Andy Webb  wrote:
>
> > hi all, thank you for the invitation and welcome messages - this has been
> > an unexpected honour!
> >
> > I'm currently Technical Architect for Search at the BBC, based in
> > Manchester, UK. I've been using Solr since about 2012 and made a few
> small
> > contributions over the last few years - thanks again to all the
> committers
> > who've helped me with those.
> >
> > I also love mountain biking and help to run a local club, often getting
> out
> > for rides in beautiful locations like the Peak District, Lake District,
> > Wales and Scotland.
> >
> > Andy
> >
> > On Mon, 6 Mar 2023, 11:16 Jan Høydahl,  wrote:
> >
> > > Hi all,
> > >
> > > I'm pleased to announce that Andy Webb has accepted the PMC's
> > > invitation to become a committer.
> > >
> > > Andy, the tradition is that new committers introduce themselves with a
> > > brief bio.
> > >
> > > Congratulations and welcome!
> > >
> >
>


Re: [DISCUSS] Language detection in Solr

2023-03-06 Thread Alessandro Benedetti
+1 for delegating to Tika which is a much better place for that (and that
they are actively evolving).

+1 for deprecating the old and not updated plugins as well (langdetect)

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Thu, 2 Mar 2023 at 20:22, Jan Høydahl  wrote:

> Hi,
>
> Solr supports pluggable language detectors <
> https://solr.apache.org/guide/solr/latest/indexing-guide/language-detection.html
> >:
>
> > Solr supports three implementations of this feature:
> >
> > Tika’s language detection feature:
> https://tika.apache.org/1.28.4/detection.html
> > LangDetect language detection:
> https://github.com/shuyo/language-detection
> > OpenNLP language detection:
> http://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.langdetect
>
> Since our first implementation, the Tika project <
> https://tika.apache.org/2.7.0/detection.html#Language_Detection> has
> evolved it's language detection capabilities and added a pluggable
> architecture as well:
> https://github.com/apache/tika/tree/main/tika-langdetect
>
> One of Solr's langid plugins is "langdetect" which has not been updated in
> 10 years. I'd like to deprecate it and remove it in main for that reason.
>
> Longer term question: Does it make sense for us to keep maintaining our
> own set of language detectors in this landscape?
> We could re-purpose the langid module so that uses Tika's pluggable
> detectors in some way, perhaps with thin wrapper classes in Solr?
>
> Wdyt?
>
> Jan


RE: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-06 Thread Fikavec F
Thank you, you are very kind. I took measurements on two physical servers with a 10 Gigabit link, the speed and time of full fetching 10 Gb collection (one shard; empty "Accept-Encoding: " header; collection with only id and string stored fields) are as follows:original wt=json      -  419 Mb/s fetching time: 3m 26soriginal wt=csv       -  463 Mb/s fetching time: 3m 6smy wt=myfastjson  - 2.33 Gb/s fetching time:       37soriginal wt=smile    - 2.38 Gb/s fetching time:       36smy wt=myfastcbor  - 2.55 Gb/s fetching time:       34soriginal wt=javabin - 2.76 Gb/s fetching time:       31s With "Accept-Encoding: gzip" header:original wt=json      -    81  Mb/s fetching time: 12m 3smy wt=myfastjson  -  114 Mb/s fetching time:    8m 32s    When I placed the test collection on 4 shards, wt=json worked at speeds of about 430 Mb/s, and javabin, myfastcbor, myfastjson worked at speeds of 5.2-6.7 Gb/s, however, before the start of data transfer, there was a significant delay (possibly for data collection from individual shards and sorting) and the final full fetching time did not exceed the above. Downloading a 10 GB file from Jetty took place at a speed of 9.87 Gb/s.   The time to move the cursor through the collection documents and get the field values in my python application has been reduced from 4m 24s to 1m 4s, moreover, half of this minute was spent to deserialization in python from json and cbor. No failures were noticed, everything seemed to work as before, only very quickly (4х+ faster).   I dreamed of seeing Solr work at 5Gigabit+ speeds and with the help of your support, everything worked out, thank you all. Best Regards,

Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Anshum Gupta
Welcome and congratulations, Andy!

On Mon, Mar 6, 2023 at 3:16 AM Jan Høydahl  wrote:

> Hi all,
>
> I'm pleased to announce that Andy Webb has accepted the PMC's
> invitation to become a committer.
>
> Andy, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

-- 
Anshum Gupta


Re: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-06 Thread David Smiley
Fantastic!
I really appreciate you working with the community on this one.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 6, 2023 at 6:32 PM Fikavec F  wrote:

> Thank you, you are very kind. I took measurements on two physical servers
> with a 10 Gigabit link, the speed and time of full fetching 10 Gb
> collection (one shard; empty "Accept-Encoding: " header; collection with
> only id and string stored fields) are as follows:
> original wt=json  -  419 Mb/s fetching time: 3m 26s
> original wt=csv   -  463 Mb/s fetching time: 3m 6s
> my wt=myfastjson  - 2.33 Gb/s fetching time:   37s
> original wt=smile- 2.38 Gb/s fetching time:   36s
> my wt=myfastcbor  - 2.55 Gb/s fetching time:   34s
> original wt=javabin - 2.76 Gb/s fetching time:   31s
>
> With "Accept-Encoding: gzip" header:
> original wt=json  -81  Mb/s fetching time: 12m 3s
> my wt=myfastjson  -  114 Mb/s fetching time:8m 32s
>
>When I placed the test collection on 4 shards, wt=json worked at speeds
> of about 430 Mb/s, and javabin, myfastcbor, myfastjson worked at speeds of
> 5.2-6.7 Gb/s, however, before the start of data transfer, there was a
> significant delay (possibly for data collection from individual shards and
> sorting) and the final full fetching time did not exceed the above.
> Downloading a 10 GB file from Jetty took place at a speed of 9.87 Gb/s.
>The time to move the cursor through the collection documents and get
> the field values in my python application has been reduced from 4m 24s to
> 1m 4s, moreover, half of this minute was spent to deserialization in python
> from json and cbor. No failures were noticed, everything seemed to work as
> before, only very quickly (4х+ faster).
>I dreamed of seeing Solr work at 5Gigabit+ speeds and with the help of
> your support, everything worked out, thank you all.
>
> Best Regards,
>
>
>


Re: Welcome Andy Webb as Solr committer

2023-03-06 Thread Mikhail Khludnev
Welcome, Andy!

On Mon, Mar 6, 2023 at 10:54 PM Andy Webb  wrote:

> hi all, thank you for the invitation and welcome messages - this has been
> an unexpected honour!
>
> I'm currently Technical Architect for Search at the BBC, based in
> Manchester, UK. I've been using Solr since about 2012 and made a few small
> contributions over the last few years - thanks again to all the committers
> who've helped me with those.
>
> I also love mountain biking and help to run a local club, often getting out
> for rides in beautiful locations like the Peak District, Lake District,
> Wales and Scotland.
>
> Andy
>
> On Mon, 6 Mar 2023, 11:16 Jan Høydahl,  wrote:
>
> > Hi all,
> >
> > I'm pleased to announce that Andy Webb has accepted the PMC's
> > invitation to become a committer.
> >
> > Andy, the tradition is that new committers introduce themselves with a
> > brief bio.
> >
> > Congratulations and welcome!
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!