adriangb commented on PR #22000: URL: https://github.com/apache/datafusion/pull/22000#issuecomment-4374194293
> > > I am very wary of complicating the built in Parquet reader any more -- it is already very complicated with lots of behaviros (and new ones getting added all ghe time, for example the sortedness ones from @zhuqi-lucas and @xudong963 ) > > > > > > I agree it is a complex piece of software but I think we can continue to add the right abstractions and simplifications (like you recently did with the moralization work 😄 ). Ultimately the file reader is going to be a key piece of a data toolkit like DataFusion so it's unsurprising (to me) that it holds a lot of the complexity. > > yeah -- maybe I am over sensitive as I feel like as soon as we are able to refactor away some of the complexity then it get all complicated again 😆 No you are right: it is a big risk that this code turns into feature spaghetti. It's just not one I think we can necessarily avoid. We should be *cautious* about introducing complexity and push back (like you have here) but if this is the right place to put it and we can factor it into a shape that only adds complexity, not multiplies or exponentiates it, then maybe we just need to deal with it over time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
