mikemccand commented on issue #61:
URL:
https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193100921
OK I wrote a simple tool to aggregate all labels from my (nearly complete)
jira dump:
```
import os
import glob
import json
with_label_count = 0
label_count = {}
for file_name in glob.glob('jira-dump/*.json'):
d = json.load(open(file_name))
labels = d["fields"]["labels"]
if len(labels) > 0:
with_label_count += 1
#print(f'{file_name}: labels {labels}')
for label in labels:
label_count[label] = 1+label_count.get(label, 0)
for label, count in sorted(label_count.items(), key=lambda a: -a[1]):
print(f'{label} {count}')
```
Results:
```
patch 66
newdev 44
performance 39
newbie 30
vector-based-search 26
easyfix 25
gsoc2014 25
Java9 22
features 21
dead 19
build 17
gsoc2011 16
Java7 14
mentor 13
pull-request-available 13
documentation 13
Java8 12
maybe32blocker 11
random-chains 11
lucene-gsoc-11 11
lucene 11
github-pullrequest 11
analysis 10
IBM-J9 9
gsoc 8
search 8
facet 8
gsoc2012 7
lucene-gsoc-12 7
FastVectorHighlighter 7
patch-available 6
highlighter 6
query 6
suggester 6
docValues 6
test 6
Java11 6
similarity 5
stemmer 4
beginner 4
IndexWriter 4
incomplete_fix 4
missing_fixes 4
classification 4
index 4
sort 4
language 3
snowball 3
chinese 3
tokenization 3
compression 3
diffblue 3
queryparser 3
optimization 3
maven 3
solr 3
highlighting 3
Highlighter 3
stemming 3
fastvectorhighlighter 3
memory 3
perfomance 3
api-change 3
codec 3
bug 2
java8 2
pagination 2
sorting 2
parallelmultisearcher 2
jvm 2
rank 2
contrib 2
Documentation 2
Turkish 2
download 2
javadoc 2
hadoop 2
feature 2
blocker 2
locking 2
faceting 2
parser 2
Java10 2
booleanquery 2
regression 2
improvement 2
ICUFoldingFilterFactory 2
ready-to-commit 2
multi-word 2
synonyms 2
lock 2
release 2
filter 2
Arabic 2
highlight 2
faceted-search 2
EdgeNGramTokenFilter 2
analyzers 2
Java15 2
gsoc2013 2
searcher 2
tokenizer 2
morelikethis 2
jenkins 1
HTMLStripCharFilter 1
index, 1
iterators 1
Encoding 1
Front 1
normalize 1
null 1
codestyle 1
crush 1
multisearcher 1
span 1
synonym 1
score 1
Document 1
geo 1
join 1
DIH 1
Clarification 1
New_Users 1
Sort 1
docs 1
collator 1
ant 1
ivy 1
jar 1
javax 1
Analyzer 1
Ansj 1
plugin 1
Windows 1
antlr 1
hdfs 1
elasticsearch 1
refresh 1
static-analysis 1
scorer 1
clover 1
cache 1
explain 1
IndexReader 1
Highlighting 1
NPE 1
optimize 1
CountFacetRequest 1
LuceneFaq 1
Website 1
invalid 1
links 1
arguments/parameters 1
javadocs 1
indexing 1
soft-delete 1
ClassLoader 1
Thread 1
french 1
german 1
concurrency 1
starter 1
QueryParser 1
deprecated 1
missing 1
LZ4 1
BOM 1
Dependencies 1
IOE 1
update 1
policy 1
split 1
github-import 1
usability 1
EarlyTerminatingSortingCollector 1
paging 1
searchafter 1
sortingmergepolicy 1
spatial 1
spatialsearch 1
distance 1
geometric 1
length 1
short 1
suggest 1
lucene, 1
prefix 1
gradle-master 1
complexPhrase 1
cleanup 1
Impact 1
MultiLevelSkipList 1
SimpleTextCodec 1
discussion 1
gsoc2017 1
exception 1
interrupt 1
nio 1
classifier 1
batch 1
refactoring 1
time 1
error 1
checksum 1
double 1
float 1
int 1
long 1
numeric 1
Stemmer 1
SpanNearQuery 1
setMinimumNumberShouldMatch 1
CoreContainer 1
CoreReload 1
JMX 1
complexqueryparser 1
hang 1
NativeFSLockFactory 1
Java17 1
IDE 1
netbeans 1
applet 1
unsigned 1
grouping 1
neardup 1
CloseableThreadLocal 1
knn 1
android8.0 1
Suggestion 1
flex 1
merge 1
spatialrecursiveprefixtreefieldtype 1
fedora_12 1
tomcat 1
zstandard 1
Java13 1
Java14 1
java11 1
jdk11 1
jdk13 1
jdk14 1
jdk15 1
RegEx 1
bucket 1
security 1
sha1sum 1
curiosity 1
jdk16 1
opennlp 1
parallel 1
ShingleFilter 1
StopFilter 1
StopWords 1
writer 1
fieldcache 1
range 1
attribute 1
whitespace 1
Java16 1
SnapPull 1
failed 1
masterSlave 1
sorl 1
f5 1
test-failure 1
lookup 1
archive 1
dist 1
tests 1
query-parser 1
forbiddenapis 1
BTree 1
flamewar 1
logging 1
group 1
totalGroupCount 1
noob 1
patch-with-test 1
NPE, 1
Null-Safety 1
Scorer 1
```
I think some of these are helpful? e.g. `vector-based-search`,
`performance`, `newdev`, `newbie`. The highly unstructured nature is indeed a
bit ... open-ended.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]