Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-05 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2702007572 I've downloaded and moved all those data sets that were present in gradle build files (specifically, in external-datasets.gradle). If there is anything else I should place there, let

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-05 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2701189251 @dweiss I can't tell from above -- are there other corpora that need a home still? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-05 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2701096339 > I guess thius 94GB comes from `33M x 768 x 4` bytes? Frankly I never test with indexes > ~2M docs, but maybe there is a call for the 33M-doc index in nightlies? Yeah ...

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub
msokolov commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2698449439 I guess thius 94GB comes from 33M*768*4 bytes? Frankly I never test with indexes > ~2M docs, but maybe there is a call for the 33M-doc index in nightlies? -- This is an automate

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2698418094 Hmm... 100 gb may be stretching Apache Infra's patience... I don't even know if this bucket has a limit of some sort. -- This is an automated message from the Apache Git Service.

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2697666568 Oooh we have an official S3 bucket to use now? I had already uploaded the benchy corpus files to my own S3 bucket ... I think the URLs are in the setup.py (just renamed to `init

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2697675763 > [@mikemccand](https://github.com/mikemccand) would you be able to expose the files [@dsmiley](https://github.com/dsmiley) rescued on your server? oh, hmm, not I haven't y

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub
rmuir commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2697599407 @dweiss we could fetch https://whimsy.apache.org/public/public_ldap_people.json and retrieve committer's GPG fingerprint that way? -- This is an automated message from the Apache G

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2696786476 There are two or three references in test files. There is one reference remaining in releaseWizard.py: ``` key_url = "https://home.apache.org/keys/committer/%s.asc"; % id.strip(

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-03 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2696384821 I can generate this file and make it available as a benchmark dataset. Or would you rather give me one of your own, for consistency with your previous results? -- This is an a

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-03 Thread via GitHub
msokolov commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695713269 Yes, I was referring to files that can be generated with `infer_token_vectors_cohere.py`. Maybe we take the position that users should regenerate, but it is kind of slow and demand

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-03 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695553260 > [...] but can we attach 3G files here? I think we can, if it makes sense to do so. We're not supposed to abuse this service - for example by downloading 3gb data file

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-03 Thread via GitHub
msokolov commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695517144 There are other vector data files - I think the key one that has become a reference point is Cohere 768d trained on wikipedia-derived docs, but I'm not sure where nightly benchmark

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-03 Thread via GitHub
benwtrent commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695529583 @msokolov the python script in Lucene util downloads from hugging face. If that is the data you are talking about? `infer_token_vectors_cohere.py` -- This is an a

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-03 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695433674 We now have an s3 bucket to place those benchmark/ reference files on. If you have any of these files - please let me know and perhaps make it available to me, somehow - ```

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-03 Thread via GitHub
rmuir commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2695438823 @dweiss https://issues.apache.org/jira/secure/attachment/12429835/top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2 -- This is an automated message from the Apache Git Service.

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-02-26 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2684281563 Thanks to INFRA-26434 Lucene now has an s3 bucket we can publish those data/test resources on. I'll try to collect these resources, upload them and make the necessary build changes s

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-01-15 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2594748074 Fetched, thanks, David. I'm talking to infra about the possibilities of storing those benchmark files somewhere on Apache services. I don't feel comfortable uploading it to github/gi

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-01-15 Thread via GitHub
dsmiley commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2593762704 [geonames_20130921_randomOrder_allCountries.txt.bz2](http://gofile.me/5MFBZ/edVjck97c) 297.2MB If that works for you, I'll share the other. If it doesn't I'll share in another

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-01-15 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2593500832 I filed https://issues.apache.org/jira/browse/INFRA-26434 and asked if apache.org can be of any help here. Some of those files are too large to host on github (even in a separate rep

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-01-15 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2593450811 @mikemccand would you be able to expose the files @dsmiley rescued on your server? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-11-07 Thread via GitHub
iamsanjay commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2463893766 I was trying to set the [luceneutil](https://github.com/mikemccand/luceneutil), ran the script. ``` python3 src/python/setup.py -download ``` It failed on one url where

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2312879477 I also aliased (CNAMEd) [benchmarks.mikemccandless.com](https://benchmarks.mikemccandless.com/) -- GitHub pages makes this simple-ish, yay. -- This is an automated message fro

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2312800530 > > FYI: I clicked on a few random links and found a 404 https://mikemccand.github.io/luceneutil/analyzers.html although this page does seem to exist on the current site >

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-27 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2312740634 Phew, OK, I think nightly benchy is now successfully publishing automatically to https://mikemccand.github.io/lucenenightly (using GitHub pages). Last night's run "just worked".

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-22 Thread via GitHub
msokolov commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2305436221 Nice! glad it worked. FYI: I clicked on a few random links and found a 404 https://mikemccand.github.io/luceneutil/analyzers.html although this page does seem to exist on t

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-22 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2305157923 A nice side effect of this is that the long running (13+ years now!) nightly reports will be backed up via git/GitHub and no longer single sourced on my home box, yay. And if ev

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-22 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2305153561 > I'm leaning towards a [simple GitHub pages site](https://docs.github.com/en/pages) (thank you @msokolov for the idea) I enabled pages for the `luceneutil` repro and pushe

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2024-08-22 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2304800950 Thanks @rmuir and @ChrisHegarty. I've downloaded all my content from `home.apache.org` (Lucene benchmark source corpora, line file docs, large vector file, etc.), so we won