Re: Intra-project dependencies
You would reference the snapshot dependency by the timestamped snapshot. > This makes it a reproducible build. > > > How confident are we that the repository will not alter or delete them? > They cannot be altered. I see artefacts there that are more than a decade old. But we cannot rely on their permanence. Putting the SHA into the jar's manifest is easy. And this blog post shows how you can also expose this info on the command line: https://medium.com/liveramp-engineering/identifying-maven-snapshot-artifacts-by-git-revision-15b860d6228b Given there's no guaranteed permanence to the snapshots, we would need to have the git sha in the version, so if much older versions can't be downloaded it can still be rebuilt. This is done like: 1.0.0_${sha1}-SNAPSHOT > linking in the source code into in-tree is a significant thing to do > > > Could you explain why? I thought your preferred alternative was merging > the source trees permanently > Linking or merging while it is still also being a separate library and repo. If we are really not that interested in it as a separate library, and dev change is high, or the code is somewhere less accessible, then in tree makes sense IMHO.
Re: Intra-project dependencies
> Linking or merging while it is still also being a separate library and repo. I am still unclear why you think this is “a significant thing”? > On 18 Jan 2023, at 10:41, Mick Semb Wever wrote: > > > > >>> You would reference the snapshot dependency by the timestamped snapshot. >>> This makes it a reproducible build. >> >> How confident are we that the repository will not alter or delete them? > > > They cannot be altered. > > I see artefacts there that are more than a decade old. But we cannot rely on > their permanence. > > Putting the SHA into the jar's manifest is easy. And this blog post shows > how you can also expose this info on the command line: > https://medium.com/liveramp-engineering/identifying-maven-snapshot-artifacts-by-git-revision-15b860d6228b > > > Given there's no guaranteed permanence to the snapshots, we would need to > have the git sha in the version, so if much older versions can't be > downloaded it can still be rebuilt. > > This is done like: 1.0.0_${sha1}-SNAPSHOT > > >>> linking in the source code into in-tree is a significant thing to do >> >> Could you explain why? I thought your preferred alternative was merging the >> source trees permanently > > > Linking or merging while it is still also being a separate library and repo. > If we are really not that interested in it as a separate library, and dev > change is high, or the code is somewhere less accessible, then in tree makes > sense IMHO. >
Re: Intra-project dependencies
Been out, sorry for just catching up now… I feel this thread pidgin hold on the word Accord and ignored the fact we are dealing with this pain today with python/jvm dtest and trying to improve that would help the project…. We also have other related projects that we are developing in parallel to Cassandra such as Harry, and there is interest in exporting our utils + simulator for other projects to use…. We also depend on related projects such as JAMM which clog us from bumping JDK versions... Accord is just 1 example of a Cassandra dependency needed for a release… by only focusing on Accord and “should it be external” this thread is ignoring the pain we face today and how we could improve. We tried in-tree for in-jvm dtest and found that this broke every other commit… maintaining the APIs across all our supported branches was too hard to do and moving it outside of the tree helped make the upgrade tests more stable (there were breakage but less frequent)…. We currently have to release this for every patch, which has actually caused us to rely on class path ordering to have some branches fork the classes so they can avoid this…. We tried to do snapshot builds where the version contained the SHA, but this has the issue that snapshot builds “may” go away over time and made older SHAs no longer building… Jvm-dtest is in bad shape and really could benefit from us looking to improve this process… We break python-dtest when cross-cutting changes are added as CI is hard to do correctly or not supported (testing downstream users (our 4 supported branches) is rarely done). We want to start using Harry as part of our test suite, so if a patch needs to change harry then what “should” we do? Do we block merging into Cassandra until we vote on a Harry release? Maybe we should be asking what capabilities we need and how to address each? I believe Mick has focused on this capabilities conversation and feel its 100% the best route to do, we should be listing out what we need to do our work and if/how the different solutions address this. For me I need the following: * be able to make cross-cutting changes in 1 ticket ** in my PR override CI to use my PRs for sub-projects * commits to Cassandra should be reproducible and buildable * downstream testing support… if we make a change to python-dtest or Harry we should know if this breaks Cassandra before merging and which supported branches * [nice to have] be able to work with all subprojects in one IDE and not have to switch between windows while making cross-cutting changes * [nice to have] commit understand dependencies and commits things in correct order Now, for the “how”, I am open but see the two leading cases are: git submodule and script that mimics git submodules…. I have used other tools that boil down to fetching a list of repo/sha into specific directories and find them more annoying than git submodules… For me, both ways address my needs above; I can make cross cutting change with easy and could change CI to build my changes rather than the HEAD of a specific branch. To address Mick’s capabilities I think I saw the following (correct me if missing any): > - you can no longer just `git clone …` (and we clone automatically in a > number of places) But submodules and script that no longer works, but we can make this less painful by enhancing build.xml to make sure it builds out the gate; we can’t see all the code on a fresh commit but we would still be buildable > - same with `git pull …` (easy to be left with out-of-sync submodules) Correct, if you use submodules/script you have a text file saying what we “should” use, but this does not enforce actually using them… again we could make sure build.xml does the right thing, but this can be confusing for people who mainly build in IDE and don’t depend on build.xml until later in development… this is something we should think about… A project I am familiar with has their build auto-inject git hooks to make sure things “just work”, we may be able to solve this in a similar way? > - permanence from a git SHA no longer exists Why is this? The SHA points to other SHAs, so it is still immutable. If we claim that pointing to other SHAs doesn’t count then why do library versions? Both are immutable snapshots of code at a specific point in time? > - our releases get more complicated (our source tarballs are the asf > releases) We don’t include our dependencies do we? If so, then does it really? If Accord is a library we use, why would we include it’s source in the build? Isn’t it just another library from this point of view? > - handling patches cover submodules I don’t know what you mean by this, do you mean how do we submit cross-cutting patches? How I do this in the cep-15-accord branch is by updating the pointer to point to my dependency PR, that way the build “does the right thing”, I just have to fix this up before merging into Cassandr
Re: Intra-project dependencies
If we make sure all branches are using the latest “stable” accord then this is 6 commits (4 for C*, 1 for accord the stable branch, then 1 to merge into trunk)If we’re modifying stable, we only need one commit per C* branch per release. We don’t need to immediately point C* to it. So there could plausibly be far fewer total commits this way, though the reality is hard to predict and will vary.On 18 Jan 2023, at 20:45, David Capwell wrote:Been out, sorry for just catching up now…I feel this thread pidgin hold on the word Accord and ignored the fact we are dealing with this pain today with python/jvm dtest and trying to improve that would help the project…. We also have other related projects that we are developing in parallel to Cassandra such as Harry, and there is interest in exporting our utils + simulator for other projects to use…. We also depend on related projects such as JAMM which clog us from bumping JDK versions...Accord is just 1 example of a Cassandra dependency needed for a release… by only focusing on Accord and “should it be external” this thread is ignoring the pain we face today and how we could improve.We tried in-tree for in-jvm dtest and found that this broke every other commit… maintaining the APIs across all our supported branches was too hard to do and moving it outside of the tree helped make the upgrade tests more stable (there were breakage but less frequent)…. We currently have to release this for every patch, which has actually caused us to rely on class path ordering to have some branches fork the classes so they can avoid this…. We tried to do snapshot builds where the version contained the SHA, but this has the issue that snapshot builds “may” go away over time and made older SHAs no longer building… Jvm-dtest is in bad shape and really could benefit from us looking to improve this process…We break python-dtest when cross-cutting changes are added as CI is hard to do correctly or not supported (testing downstream users (our 4 supported branches) is rarely done). We want to start using Harry as part of our test suite, so if a patch needs to change harry then what “should” we do? Do we block merging into Cassandra until we vote on a Harry release?Maybe we should be asking what capabilities we need and how to address each? I believe Mick has focused on this capabilities conversation and feel its 100% the best route to do, we should be listing out what we need to do our work and if/how the different solutions address this.For me I need the following:* be able to make cross-cutting changes in 1 ticket** in my PR override CI to use my PRs for sub-projects* commits to Cassandra should be reproducible and buildable* downstream testing support… if we make a change to python-dtest or Harry we should know if this breaks Cassandra before merging and which supported branches* [nice to have] be able to work with all subprojects in one IDE and not have to switch between windows while making cross-cutting changes* [nice to have] commit understand dependencies and commits things in correct orderNow, for the “how”, I am open but see the two leading cases are: git submodule and script that mimics git submodules…. I have used other tools that boil down to fetching a list of repo/sha into specific directories and find them more annoying than git submodules…For me, both ways address my needs above; I can make cross cutting change with easy and could change CI to build my changes rather than the HEAD of a specific branch.To address Mick’s capabilities I think I saw the following (correct me if missing any): - you can no longer just `git clone …` (and we clone automatically in a number of places)But submodules and script that no longer works, but we can make this less painful by enhancing build.xml to make sure it builds out the gate; we can’t see all the code on a fresh commit but we would still be buildable - same with `git pull …` (easy to be left with out-of-sync submodules)Correct, if you use submodules/script you have a text file saying what we “should” use, but this does not enforce actually using them… again we could make sure build.xml does the right thing, but this can be confusing for people who mainly build in IDE and don’t depend on build.xml until later in development… this is something we should think about…A project I am familiar with has their build auto-inject git hooks to make sure things “just work”, we may be able to solve this in a similar way? - permanence from a git SHA no longer existsWhy is this? The SHA points to other SHAs, so it is still immutable. If we claim that pointing to other SHAs doesn’t count then why do library versions? Both are immutable snapshots of code at a specific point in time? - our releases get more complicated (our source tarballs are the asf releases)We don’t include our dependencies do we? If so, then does it really? If Accord is a library we use, why would we include it’s source in the build? Isn’t it just another library from this poi
Re: GSoD 2023
Hi Deepak - I'll have more to post soon, but I have started working on a proposal for GSoD 2023 for Apache Cassandra. Keep an eye out for that post. On 2023/01/11 09:56:36 Deepak Vohra via dev wrote: > Lorina, > Happy New Year. > You mentioned this year you might apply. As I am interested in being the Tech > Writer, a reminder to apply as applications are to open from January to March > according to > https://sites.google.com/view/gsoc-sod-fosdem-2023/google-season-of-docs. > Detail yet to be posted for GSOD 2023. > regards,Deepak