If you're going to do any work in this area, I would highly encourage you to do in as part of the core.matrix library. That is what Incanter is or will be using for it's dataset implementation. But it's nice that those abstractions and implementations be separate from Incanter itself, since Incanter is a rather large dependency.
Core.matrix is certainly (in my eyes) becoming the de facto matrix computation library in the Clojure ecosystem, and I think in the level of interop between different implementations there, and extent of utilization by the clojure community, we rival the python offerings. However, while core.matrix has some dataset protocols, api functions and basic implementations, there's still some work to get the full expressiveness of the data.frame pattern as seen in R and Pandas. Specifically, there is no support for setting rownames (or arbitrary "name" assignments beyond that of a single dimension (columns...)). This is something I started working on a while back, but wasn't able to finish. I could potentially push what I came up with to a fork, but unfortunately, I don't have any more time to work on the problem at the moment. Mike Anderson is a great project maintainer, and will probably be happy to help guide you in stitching together a solution. Best Chris On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, [email protected] wrote: > > Is there any desire or need for a Clojure DataFrame? > > > By DataFrame, I mean a structure similar to R's data.frame, and Python's > pandas.DataFrame. > > Incanter's DataSet may already be fulfilling this purpose, and if so, I'd > like to know if and how people are using it. > > From quickly researching, I see that some prior work has been done in this > space, such as: > > * https://github.com/cardillo/joinery > * https://github.com/mattrepl/data-frame > * > http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes > > Rather than going off and creating a competing implementation ( > https://xkcd.com/927/), I'd like to know if anyone here is actively > working on, or would like to work on a DataFrame and related utilities for > Clojure (and by extension Java)? Is it something that's sorely needed, or > is everybody happy with using Incanter or some other library that I'm not > aware of? If there's already a defacto standard out there, would anyone > care to please point it out? > > As background information: > > My specific use-case is in NLP and ML, where I often explore and prototype > in Python, but I'm then left to deal with a smattering of libraries on the > JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with > their own ad-hoc implementations of algorithms, matrices, and utilities for > reading data. It would be great to have a unified way to explore my data in > the Clojure REPL, and then serve the same code and models in production. > > I would love for Clojure to have a broadly compatible ecosystem similar to > Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and > Incanter appear to fulfill a large chunk of those roles, but I am not aware > if they've yet become the defacto standards in the community. > > Any feedback is greatly appreciated. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
