Dear ASF Community, I’m writing to propose a new project at Apache Software Foundation. A little bit about myself, I’m currently a senior software engineer in the Affirm ML Platform team (possibly switching soon), and have around 5 years experience in a couple startups, all working in the Data and Machine Learning infrastructure. Over the past few years I’ve seen a general pattern arising in many companies to build the ML infrastructure, especially on the feature store <https://www.featurestore.org/>. And I have built similar products across all my previous and current companies, including in-house solutions, using the 3rd-party vendor: https://www.tecton.ai/ <https://www.tecton.ai/>, open source project: https://github.com/feast-dev/feast <https://github.com/feast-dev/feast> and https://github.com/feathr-ai/feathr <https://github.com/feathr-ai/feathr>. But still those products are not able to well resolve the most important part of a feature store: transformation, or we can call it featurization. So I’m proposing a new podling project - Featurizer (name can change) to build an open source feature platform - aims to address the challenges of feature engineering in machine learning by developing a software framework that can automatically extract relevant features from raw data.It will provide a wide range of featurization algorithms that can be customized and combined to fit the specific needs of different applications. And by leveraging three types of features: request based real time feature, stream feature, and batch feature, the framework is supposed to run on different processor engines such as apache spark, Flink, Beam, or microservices etc. I’m still new to podling here, I did few contributions before but it’s the first time proposing a project. So I’m looking for suggestions, feedbacks, champions and mentors to help on starting the project.
Thanks