mikemccand commented on issue #61:
URL: 
https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193100921

   OK I wrote a simple tool to aggregate all labels from my (nearly complete) 
jira dump:
   ```
   import os
   import glob
   import json
   
   with_label_count = 0
   label_count = {}
   for file_name in glob.glob('jira-dump/*.json'):
       d = json.load(open(file_name))
       labels = d["fields"]["labels"]
       if len(labels) > 0:
           with_label_count += 1
           #print(f'{file_name}: labels {labels}')                              
                                                                                
                                                                                
                                                            
           for label in labels:
               label_count[label] = 1+label_count.get(label, 0)
   
   for label, count in sorted(label_count.items(), key=lambda a: -a[1]):
       print(f'{label} {count}')
   ```
   
   Results:
   
   ```
   patch 66
   newdev 44
   performance 39
   newbie 30
   vector-based-search 26
   easyfix 25
   gsoc2014 25
   Java9 22
   features 21
   dead 19
   build 17
   gsoc2011 16
   Java7 14
   mentor 13
   pull-request-available 13
   documentation 13
   Java8 12
   maybe32blocker 11
   random-chains 11
   lucene-gsoc-11 11
   lucene 11
   github-pullrequest 11
   analysis 10
   IBM-J9 9
   gsoc 8
   search 8
   facet 8
   gsoc2012 7
   lucene-gsoc-12 7
   FastVectorHighlighter 7
   patch-available 6
   highlighter 6
   query 6
   suggester 6
   docValues 6
   test 6
   Java11 6
   similarity 5
   stemmer 4
   beginner 4
   IndexWriter 4
   incomplete_fix 4
   missing_fixes 4
   classification 4
   index 4
   sort 4
   language 3
   snowball 3
   chinese 3
   tokenization 3
   compression 3
   diffblue 3
   queryparser 3
   optimization 3
   maven 3
   solr 3
   highlighting 3
   Highlighter 3
   stemming 3
   fastvectorhighlighter 3
   memory 3
   perfomance 3
   api-change 3
   codec 3
   bug 2
   java8 2
   pagination 2
   sorting 2
   parallelmultisearcher 2
   jvm 2
   rank 2
   contrib 2
   Documentation 2
   Turkish 2
   download 2
   javadoc 2
   hadoop 2
   feature 2
   blocker 2
   locking 2
   faceting 2
   parser 2
   Java10 2
   booleanquery 2
   regression 2
   improvement 2
   ICUFoldingFilterFactory 2
   ready-to-commit 2
   multi-word 2
   synonyms 2
   lock 2
   release 2
   filter 2
   Arabic 2
   highlight 2
   faceted-search 2
   EdgeNGramTokenFilter 2
   analyzers 2
   Java15 2
   gsoc2013 2
   searcher 2
   tokenizer 2
   morelikethis 2
   jenkins 1
   HTMLStripCharFilter 1
   index, 1
   iterators 1
   Encoding 1
   Front 1
   normalize 1
   null 1
   codestyle 1
   crush 1
   multisearcher 1
   span 1
   synonym 1
   score 1
   Document 1
   geo 1
   join 1
   DIH 1
   Clarification 1
   New_Users 1
   Sort 1
   docs 1
   collator 1
   ant 1
   ivy 1
   jar 1
   javax 1
   Analyzer 1
   Ansj 1
   plugin 1
   Windows 1
   antlr 1
   hdfs 1
   elasticsearch 1
   refresh 1
   static-analysis 1
   scorer 1
   clover 1
   cache 1
   explain 1
   IndexReader 1
   Highlighting 1
   NPE 1
   optimize 1
   CountFacetRequest 1
   LuceneFaq 1
   Website 1
   invalid 1
   links 1
   arguments/parameters 1
   javadocs 1
   indexing 1
   soft-delete 1
   ClassLoader 1
   Thread 1
   french 1
   german 1
   concurrency 1
   starter 1
   QueryParser 1
   deprecated 1
   missing 1
   LZ4 1
   BOM 1
   Dependencies 1
   IOE 1
   update 1
   policy 1
   split 1
   github-import 1
   usability 1
   EarlyTerminatingSortingCollector 1
   paging 1
   searchafter 1
   sortingmergepolicy 1
   spatial 1
   spatialsearch 1
   distance 1
   geometric 1
   length 1
   short 1
   suggest 1
   lucene, 1
   prefix 1
   gradle-master 1
   complexPhrase 1
   cleanup 1
   Impact 1
   MultiLevelSkipList 1
   SimpleTextCodec 1
   discussion 1
   gsoc2017 1
   exception 1
   interrupt 1
   nio 1
   classifier 1
   batch 1
   refactoring 1
   time 1
   error 1
   checksum 1
   double 1
   float 1
   int 1
   long 1
   numeric 1
   Stemmer 1
   SpanNearQuery 1
   setMinimumNumberShouldMatch 1
   CoreContainer 1
   CoreReload 1
   JMX 1
   complexqueryparser 1
   hang 1
   NativeFSLockFactory 1
   Java17 1
   IDE 1
   netbeans 1
   applet 1
   unsigned 1
   grouping 1
   neardup 1
   CloseableThreadLocal 1
   knn 1
   android8.0 1
   Suggestion 1
   flex 1
   merge 1
   spatialrecursiveprefixtreefieldtype 1
   fedora_12 1
   tomcat 1
   zstandard 1
   Java13 1
   Java14 1
   java11 1
   jdk11 1
   jdk13 1
   jdk14 1
   jdk15 1
   RegEx 1
   bucket 1
   security 1
   sha1sum 1
   curiosity 1
   jdk16 1
   opennlp 1
   parallel 1
   ShingleFilter 1
   StopFilter 1
   StopWords 1
   writer 1
   fieldcache 1
   range 1
   attribute 1
   whitespace 1
   Java16 1
   SnapPull 1
   failed 1
   masterSlave 1
   sorl 1
   f5 1
   test-failure 1
   lookup 1
   archive 1
   dist 1
   tests 1
   query-parser 1
   forbiddenapis 1
   BTree 1
   flamewar 1
   logging 1
   group 1
   totalGroupCount 1
   noob 1
   patch-with-test 1
   NPE, 1
   Null-Safety 1
   Scorer 1
   ```
   
   I think some of these are helpful?  e.g. `vector-based-search`, 
`performance`, `newdev`, `newbie`.  The highly unstructured nature is indeed a 
bit ... open-ended.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to