rochdev opened a new issue, #52:
URL: https://github.com/apache/arrow-js/issues/52

   ### Describe the enhancement requested
   
   We're considering using Apache Arrow for a project, but it seems like the 
dependency is very large. For example, the `@apache-arrow/es5-cjs` package is 
2.82MB _compressed_. It seems that the project was already split between target 
ES version and module loader for size efficiency, but without actually 
addressing the size problem.
   
   From looking at the folder, I see the following components that could 
potentially be removed at a glance:
   
   * All `.ts` files, as these are only needed in development when using 
TypeScript. (this includes definition files and the entire `src` folder)
   * All `.dom` and `.map` files since these are only needed in the browser and 
not in Node.
   * The `bin/arrow2csv.js` file as it seems to only be useful for testing.
   * All `@swc` dependencies, as these are development dependencies.
   * All `@types` dependencies, as those should be installed separately only 
when using TypeScript.
   * The `command-line-args` and `command-line-usage` dependencies as they seem 
like development dependencies.
   * The `tslib` dependency, which again is related to TypeScript and cannot be 
used by Node.
   * The `json-bignum` dependency, as it doesn't seem to make sense if this 
project is actually an alternative to JSON to begin with.
   * For the `flatbuffers` dependency, I'm not entirely sure, but it seems to 
support generating code from a schema, which may include all or some of the 
code needed to no longer need `flatbuffers`, although for this one, additional 
investigation would be needed.
   * I'm sure a lot of other files are used to do things like managing schemas 
and could potentially be removed as well from the production package.
   
   Deleting all the above (except `flatbuffers` as it's unclear if it's needed) 
just to see the difference results in the uncompressed package size going from 
6.4MB to 1.3MB, and it's probably incomplete and could be reduced much further.
   
   I don't know much about the project internals, so my above assumptions might 
be wrong, but I'd definitely like to open a discussion about the size of this 
package and how to make it smaller as it seems the current approach to split 
packages is not quite achieving its goal and it might make more sense to split 
the project to something closer to a monorepo.
   
   For example, something like this:
   
   ```
   @apache-arrow/cli
   @apache-arrow/types or @types/apache-arrow
   @apache-arrow/node (doesn't need to be split by module types as only the 
entry point needs to change)
   @apache-arrow/dom-es2015 (or es5, etc)
   etc
   ```
   
   Or even more granular (since for example we only need the ability to encode 
and not decode):
   
   ```
   @apache-arrow/encoder
   @apache-arrow/decoder
   @apache-arrow/stream-writer
   @apache-arrow/stream-reader
   etc
   ```
   
   Since the project already relies on Lerna, it shouldn't be too much work to 
change what gets released as what package I think, but I'd like to hear 
thoughts from maintainers before opening a PR with such a large restructuring 
of the project.
   
   ### Component(s)
   
   JavaScript, Packaging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to