Hi all,
I've started working on the generic dataset indexing project discussed in our last meeting on Thursday. The initial implementation targets ATLAS as the first dataset: https://github.com/jayvenn21/gsoc-dataset-indexing What it does: - Reads the ATLAS TSV (1,938 proteins, 50 metadata fields) - Generates a POSIX directory tree with structured JSON metadata per protein - Includes a search CLI for filtering by organism, resolution, domain classifications, etc. - Generic lib/ layer so adding new datasets (mdCATH, etc.) is just a new ingest adapter The next steps would be adding a second dataset and wiring into CyberShuttle's VFS. Happy to adjust direction based on feedback. Thanks, Jayanth
