rochdev opened a new issue, #52: URL: https://github.com/apache/arrow-js/issues/52
### Describe the enhancement requested We're considering using Apache Arrow for a project, but it seems like the dependency is very large. For example, the `@apache-arrow/es5-cjs` package is 2.82MB _compressed_. It seems that the project was already split between target ES version and module loader for size efficiency, but without actually addressing the size problem. From looking at the folder, I see the following components that could potentially be removed at a glance: * All `.ts` files, as these are only needed in development when using TypeScript. (this includes definition files and the entire `src` folder) * All `.dom` and `.map` files since these are only needed in the browser and not in Node. * The `bin/arrow2csv.js` file as it seems to only be useful for testing. * All `@swc` dependencies, as these are development dependencies. * All `@types` dependencies, as those should be installed separately only when using TypeScript. * The `command-line-args` and `command-line-usage` dependencies as they seem like development dependencies. * The `tslib` dependency, which again is related to TypeScript and cannot be used by Node. * The `json-bignum` dependency, as it doesn't seem to make sense if this project is actually an alternative to JSON to begin with. * For the `flatbuffers` dependency, I'm not entirely sure, but it seems to support generating code from a schema, which may include all or some of the code needed to no longer need `flatbuffers`, although for this one, additional investigation would be needed. * I'm sure a lot of other files are used to do things like managing schemas and could potentially be removed as well from the production package. Deleting all the above (except `flatbuffers` as it's unclear if it's needed) just to see the difference results in the uncompressed package size going from 6.4MB to 1.3MB, and it's probably incomplete and could be reduced much further. I don't know much about the project internals, so my above assumptions might be wrong, but I'd definitely like to open a discussion about the size of this package and how to make it smaller as it seems the current approach to split packages is not quite achieving its goal and it might make more sense to split the project to something closer to a monorepo. For example, something like this: ``` @apache-arrow/cli @apache-arrow/types or @types/apache-arrow @apache-arrow/node (doesn't need to be split by module types as only the entry point needs to change) @apache-arrow/dom-es2015 (or es5, etc) etc ``` Or even more granular (since for example we only need the ability to encode and not decode): ``` @apache-arrow/encoder @apache-arrow/decoder @apache-arrow/stream-writer @apache-arrow/stream-reader etc ``` Since the project already relies on Lerna, it shouldn't be too much work to change what gets released as what package I think, but I'd like to hear thoughts from maintainers before opening a PR with such a large restructuring of the project. ### Component(s) JavaScript, Packaging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org