Hi, Does anyone on the list have experience with hierarchical facets, specifically for species data?
I have a variety of 'messy' species names that I'd like to tidy up at analysis time and use as the basis for taxonomically guided hierarchical facets at query time. I was wondering if there's some schema.xml with custom analyser pipelines and config files that I can work from if people have done this before? Here are some example species names from my source data: Solanum lycopersicum Solanum tuberosum Hordeum vulgare Vitis vinifera Arabidopsis thaliana Arabidopsis lyrata Brassica rapa Musa acuminata Oryza glaberrima Oryza brachyantha Physcomitrella patens Arabis Triticum sp Hordeum sp Zea mays L. Zea mays Hordeum vulgare L. convar. vulgare var. hybernum Viborg Phaseolus vulgaris L. subsp. vulgaris var. nanus Asch Phaseolus vulgaris L. subsp. vulgaris var. vulgaris Triticum aestivum L. var. lutescens Hordeum vulgare L. convar. distichon Solanum tuberosum L. subsp. tuberosum L Triticum aestivum L. var. aestivum Pisum sp Lupinus sp Lycopersicon esculentum Mill Dactylis glomerata L Avena sp Nicotiana tabacum If you're not familiar with the species taxonomy, there are many hierarchical 'sub groups' that I can define over the species in this list, not to mention the hierarchies implicit in their names, such as Solanum lycopersicum vs. Solanum tuberosum, both species in the Solanum genus, and Hordeum vulgare vs. Hordeum vulgare L. convar. vulgare var. hybernum Viborg, a specific variety of Hordeum vulgare... I figure I can't be the first person to look at this? Thanks for any tips, Dan.