Braindump: 14:00 <+thiago> let's think this way: 14:00 <+thiago> I made 4 commits, pushed 14:00 <+thiago> then I had to make a fix to #2, which I have, and I squashed 14:00 <+thiago> so I still have 4 commits 14:00 <+thiago> now I want to re-push them 14:01 <+thiago> the ideal scenario would be that the tool discovered that only #2 changed and pushed only that 14:01 <+thiago> can it do that? I think so 14:01 <+ossi> the system will rebase onto the same base as before. so 1 will be a no-op, 2 will work, and the rest will be skipped due to an extended patch-id match 14:02 <+thiago> will that leave the approval marks? 14:02 <+ossi> so far so easy 14:02 <+ossi> well, skipped means not pushed, so obviously yes 14:02 <+thiago> ok, so the tool needs to do it 14:02 <+thiago> determining 1 isn't changed is easy: 14:03 <+thiago> since we fetched the entire series, we can try to re-create the patch with git apply and we end up with even the same SHA-1 14:03 <+ossi> we don't. we just cherry-pick with all parameters equal, so the sha1 will be the same 14:03 <+ossi> exactly 14:03 <+thiago> now we apply 2 and we realise it changed 14:03 <+thiago> so we need to push 2 14:03 <+thiago> the tool will then apply 3 14:03 <+thiago> can it determine that 3 does not need applying? 14:04 <+thiago> it can't do a commit SHA-1 comparison because #2 changed 14:05 <+thiago> it can do a patch-id-like comparison (it can't be patch-id because that ignores whitespace) 14:06 <+thiago> can you think of a way to determine that the commit didn't change? 14:07 <+ossi> yes, something that qualifes as "extended patch-id". whatever we come up with. unmodified diff plus commit message and author should be sufficient. 14:07 <+thiago> ok, so let's say the tool determines that neither 3 nor 4 need to be re-pushed 14:07 <+thiago> should it by default re-push only 2? 14:08 <+thiago> or do you think it should by default push 2~..4? 14:08 <+ossi> in the current git setup, it should stop at 2. forcable to 4 by using --full. 14:08 <+ossi> err, gerrit setup 14:08 <+thiago> ok 14:09 <+thiago> suppose now that it found that 4's own changes didn't change, but the patch itself did (due to 2 changing) 14:09 <+thiago> probably the extended patch-id will catch that, and even err in the side of caution 14:09 <+thiago> in that case, the tool needs to push the 2~..4 series again 14:09 <+thiago> agreed? 14:10 <+ossi> yes 14:10 <+thiago> but if it determined that 2 and 4 changed but they're independent, it doesn't need to re-push 3 14:10 <+thiago> so far, I agree completely and I'd use this in my main workflow 14:11 <+ossi> technically, it does ... because otherwise it would break 4's dependency chain. that's what i called branching within the topic. 14:11 <+thiago> let's say that I made modifications and I reordered 2 and 3 14:11 <+thiago> I don't understand that 14:11 <+thiago> if I push 2 and 4, what's the harm to 3? 14:12 <+ossi> 4 depends on 3 in the topic, by definition. if you push 4 on top of 3, you broke the dependency chain, and created a tree inside the branch. 14:12 <+ossi> err, on top of 2 14:13 <+thiago> you push a new 4, based on top of the old 3 14:13 <+thiago> it would look like this: 14:13 <+ossi> that's also a branch ... in depth, this time 14:13 <+thiago> 1-2-3-4 14:14 <+thiago> \ \4' 14:14 <+thiago> \2' 14:14 <+thiago> when you stage, you'll stage 1-2'-3-4' 14:16 <+ossi> you can't push 4' after 2', because that would reset it to 2, so it gets really tricky at this point 14:16 <+thiago> you can 14:16 <+thiago> you push 4' based on 3 14:16 <+ossi> you'd have to re-serialize the chain and would push the unmodified 3' 14:17 <+thiago> I'm telling you this works 14:17 <+thiago> since 3 didn't change, Gerrit doesn't care that it depends on an outdated 2 14:17 <+ossi> it doesn't, because 2 is a dependency of 3, so by pushing the unmodified 3 you reset 2. 14:17 <+thiago> nope 14:17 <+ossi> and i'm telling you that this exactly the mistake you made a few times already ;) 14:17 <+thiago> that was not the mistake 14:18 <+thiago> I'm telling you this works 14:18 <+thiago> once a commit is in Gerrit, it can never be added to a patch-set again 14:18 <+ossi> only if it was already merged, but that's an entirely different situation 14:18 <+thiago> the mistake of bringing back changes wasn't caused by this procedure. It was caused by manual re-pushing of complex branches 14:19 <+thiago> no, it works for un-merged changes too 14:19 <+ossi> i don't get how that is supposed to be possible 14:19 <+thiago> it just is 14:20 <+thiago> the fact is that gerrit walks back from the pushed tip and finds how many commits are new 14:20 <+thiago> that is, HEAD --not --all 14:20 <+thiago> so it concludes that only 4' is new. It doesn't see or care about 2 14:21 <+thiago> the trick is that Gerrit will look at *all* commits it has, not just those in refs/heads/* 14:22 <+ossi> hmm, you have a point. that would explain why it complains about identical sha1s 14:22 <+thiago> so the fact that 3 is known by a pending change means it doesn't see 3~ (2) 14:23 <+thiago> so do we agree that it should push 2' and 4' but not 3 if it can determine that 3 didn't change and that applying 4' on top of 3 will work? 14:24 <+ossi> yeah, why not. if we can figure out how to track that tree ... 14:24 <+thiago> this part is somewhat easy 14:24 <+thiago> let's say I swapped 2 and 3 around 14:25 <+thiago> so the tool will apply 1, determine that it's the same, then apply 3 and compare to what it had, concluding it's completely different, so it will add 3' to the push list 14:25 <+thiago> then it does the same for 2 and concludes it needs to push 3' 14:26 <+thiago> then it looks at 4 and concludes nothing needs pushing there 14:26 <+thiago> so the push set is 3'~..2' 14:26 <+thiago> fair enough? 14:27 <+ossi> so far so easy 14:27 <+thiago> what if I add 1½ ? 14:27 <+thiago> since we have change-ids, we can actually determine which one is the new commit 14:28 <+ossi> it would have to re-pust 2-4, to restore the dependency chain 14:28 <+thiago> it needs to re-push 2 to keep the dependency chain, indeed 14:28 <+thiago> when I do it, I often don't do that extra step, which makes it necessary I remember the dependency myself 14:29 <+thiago> but it doesn't need to re-push 3, since 3 didn't change 14:30 <+thiago> the problem here is just the complexity of the algorithm 14:30 <+ossi> so we add the parent change-id to the calculation of the ext-patch-id 14:30 <+thiago> right 14:30 <+ossi> that was easy ^^ 14:30 <+thiago> from one push to the next, I could have added new commits, reordered commits, removing commits 14:31 <+thiago> actually, this calculation of yours solves it 14:31 <+thiago> the case before of reordering 2 and 3 will require re-pushing 4, to keep the dependency 14:31 <+thiago> but if there had been a 5 depending on 4, 5's parent wouldn't have changed and thus it wouldn't be pushed 14:31 <+thiago> agreed? 14:31 <+ossi> yes 14:32 <+thiago> great 14:32 <+thiago> let me just repeat to see if this algorithm would work: 14:32 <+thiago> the original push was 1-2-3-4-5 14:32 <+thiago> now I have 1-1½-3-2-4-5 14:33 <+thiago> check 1 → same ext-patch-id, no need to push 14:33 <+thiago> check 1½ → new commit, need to push 14:33 <+thiago> check 3 → different ext-patch-id, need to push 14:33 <+thiago> check 2 → ditto 14:34 <+thiago> check 4 → ditto 14:34 <+thiago> check 5 → same ext-patch-id, no need to push 14:34 <+thiago> so the push set is 1½-3-2-4 (that is, push 1½~..4:refs/for/$branch) 14:35 <+ossi> yes 14:36 <+thiago> I modified 2 and 4, but the changes are independent 14:36 <+thiago> 2 fails ext-patch-id check, add to push set 14:36 <+thiago> 4 fails ext-patch-id check, add to push set 14:37 <+thiago> so two independent pushes 14:37 <+thiago> similar scenario: 2 and 4 were modified, but 4 depends on 2's modifications 14:37 <+thiago> both fail ext-patch-id check, but we need to determine that we need to push 3 too 14:38 <+thiago> we do that by applying new4 on old3 and determine that doesn't work. So we know we need to push everything from new2 14:39 <+thiago> it can get worse: let's say 2, 3, and 5 changed and that 5 depends on 2's changes 14:40 <+thiago> if we fail to apply new5 on old4, we rollback and retry on old2-new3-new4-new5 and that fails too 14:40 <+thiago> so we need to retry again on new2-new3-new4-new5 14:40 <+thiago> too complex? 14:41 <+ossi> well, the thing would just try the segments recursively, so it seems easy enough 14:41 <+thiago> or should it say "screw it, I'll repush from the earliest"? 14:42 <+ossi> i don't think there is a need to make special cases. the "try old first" is recursive by design, so it can be nested infinitely just as well. 14:43 <+thiago> right, there's no server contact here 14:43 <+thiago> and this is still O(n) for n = number of commits changed 14:43 <+thiago> ok, I'm happy so far 14:44 <+thiago> two more questions: how do I add a new commit on top of the youngest 14:44 <+thiago> so far, we've referred to the patch series by the youngest commit. Now I'm creating a new commit, which is the new ID 14:45 <+ossi> yeah. i think that would come with a forced number with the previously known topic: gpush -4:mytopic 14:45 <+thiago> I would go for requiring that you specify the full range 14:45 <+ossi> that is the fully range 14:46 <+ossi> four commits from the top 14:46 <+ossi> overwrite whatever was mytopic 14:46 <+thiago> I'm not sold on names yet 14:46 <+ossi> me neither 14:46 <+thiago> this will get a little more complex when I introduce two different patchsets in the same branch 14:48 <+ossi> the base just becomes a different patchset in your tree. that's where the recursive gpush strikes. 14:49 <+ossi> however, without names, that one actually becomes tricky, because you need to determine what is a topic by reverse-engineering the push history 14:49 <+thiago> I had 1-2-3-4-5 and I had done: git gp 5 14:49 <+thiago> after I had done my changes, without adding commits to the edges, all I need to do is git gp 5 again 14:49 <+ossi> you mean, you detached 5 from the topic? 14:49 <+thiago> no 14:50 <+thiago> git gp 5 pushes @{u}..5 14:50 <+thiago> uh, wait, no 14:50 <+thiago> for my *first* push, i need to specify which commits: 14:50 <+thiago> git gp 1~..5 14:50 <+thiago> for updates, I can simply do git gp 5 14:51 <+thiago> if I add 0 and forget I had done that, the next git gp 5 may succeed or fail 14:51 <+ossi> yes. with my syntax, that would be 5: -5 (commit 5 is top, 5 commits down, like format-patch) 14:51 <+thiago> if it succeeds, then 1 through 5 don't really need 0 14:51 <+thiago> both syntaxes work 14:52 <+thiago> except that the tool will need to parse the -n option 14:52 <+thiago> anyway, if the series is now 0 through 5 and I forgot to name 0, it may fail to apply again, in which case I will know 14:53 <+thiago> so I have to do git gp 0~..5 / git gp -n6 5 14:53 <+thiago> ok here? 14:53 <+ossi> yes 14:54 <+thiago> now, if I added a new commit on top (6), and I do git gp 6, it may work or it may not 14:54 <+thiago> if it fails, I'll know I need to specify a range 14:54 <+ossi> just gp 6 would create a new topic 14:54 <+thiago> if it works, it doesn't really depend on 1-5 (at least, conflict-wise; it may break the build) 14:54 <+thiago> right 14:54 <+thiago> so I have to specify git gp 1~..6 or git git -n6 6 14:54 <+ossi> yes 14:55 <+thiago> when we do the first query to the server, we'll find out that 6 has never been pushed before 14:57 <+thiago> the first query is join(' OR ', @cchangeIdList) 14:58 <+ossi> if it's on top, then it's easy even without a count: it's a new topic. if a change is buried below or within a topic, it's assumed that it is supopsed to join the topic. 14:58 <+thiago> right 14:58 <+thiago> we don't need to do anything 14:58 <+thiago> er... we still need to fetch 5 from the server to check whether 1-5 need re-pushing 14:59 <+thiago> so, given 1~..6, how do we find out which commit used to be the youngest? 14:59 <+thiago> even: do we need to know it? 14:59 <+ossi> the git fetching is uncritical, as that's a single git fetch with multiple refs. 14:59 <+thiago> right 14:59 <+thiago> fetch them all 15:00 <+ossi> how do you know where the list ends? yeah, that's a much better question 15:00 <+thiago> we know the SHA-1 of the commits from the gerrit query, so we start querying the commits 15:00 <+thiago> what do you mean? 15:00 <+thiago> which end? 15:01 <+ossi> the bottom of the series. assuming you don't cache any local info, you don't know where the bottom of the last push is. 15:01 <+ossi> i.e., you could work only incrementally, until you hit a mainlined change. 15:01 <+thiago> I listed the commits to be pushed, so I specified both ends 15:01 <+thiago> 1~..6 is a 6-commit closed set 15:02 <+ossi> ah, no, i'm now thinking of the case where you didn't specify it, because it's already a subsequent push. 15:02 <+ossi> but ok, let's go back to the easy case. 15:02 <+thiago> if I do 'git gp 6', it queries the server, discovers 6 is new, so it will try to push 6 alone 15:02 <+thiago> that may succeed or fail 15:02 <+thiago> the problem is if it succeeds but breaks the build if it's independent 15:02 <+ossi> yes, if it can't rebase on upstream it will fail 15:03 <+ossi> then you'll need to specify a base for re-pushing, and it should remember that base for subsequent pushes 15:03 <+thiago> in either case, if I want to include it in the set, I need to say 1~..6 or 6 -n6 15:04 <+ossi> ah, sitll the very easy case, i.e., just extending the range 15:04 <+thiago> yup 15:04 <+thiago> that works on either end 15:05 <+thiago> the next time I do "git gp 6", it will query 6, fetch 6 and then realise that 6 --not --all is more than one commit 15:05 <+ossi> yes. it's just injecting a new commit 0, with 1' needing a re-push, or in the case of adding 6, just pushing 6 on top of 5. trivial. 15:05 <+thiago> so it know what the full range is 15:05 <+ossi> yes 15:05 <+ossi> that is, as long as you didn't meddle with the series 15:05 <+thiago> inserting in the middle doesn't even require the full range again, since the tool will scan the commit list matching that Change-Id and find the new one 15:06 <+thiago> if you want to split the series, then you have to do git gp 6 -n3 15:06 <+thiago> we'll probably forget about this edge case... 15:06 <+ossi> yes 15:06 <+thiago> anyway, so far it looks good 15:06 <+thiago> we don't need gpull nor grefresh 15:07 <+ossi> now the tricky part: you cannot store what a topic is locally. you need to store that info on gerrit. 15:07 <+thiago> what do you need to store? 15:08 <+ossi> oh, wait, i see, you use the merge-base with upstream 15:09 <+thiago> once we fetch 6 from gerrit, we do git rev-list 6 --not --all 15:09 <+thiago> that gives us the list of commits to search again as well as the old base 15:09 <+ossi> yes, i got that now 15:09 <+ossi> ok, suppose you do gp 3, and 3~..6 is the range that was pushed last time. 15:09 <+ossi> bla 15:09 <+thiago> then it won't know about 4, 5 and 6 15:09 <+ossi> 1~..6 15:09 <+thiago> it will only do 3 15:10 <+thiago> you have to specify the youngest (if it's been pushed) or the full range 15:10 <+thiago> walking from 3 to 6 is not possible 15:10 <+ossi> exactly. is that a good thing? we wouldn't have any error checking 15:10 <+thiago> if it were easy, we should do it 15:11 <+thiago> so specifying any single commit in a series would cause the entire series to be analysed 15:11 <+thiago> but it just isn't possible to walk in the wrong direction in the DAG 15:13 <+thiago> it might be possible by naming the topic branch, but let's keep that for version 2.0 15:13 <+ossi> we could also use locally cached info, but that's an extra feature 15:15 <+thiago> again, 2.0 improvements 15:15 <+thiago> caching local info requires gc'ing later 15:15 <+ossi> yes 15:15 <+thiago> anyway, I think we are in agreement 15:15 <+thiago> we need: 15:16 <+thiago> 1) a git gerrit-patch-id thing that can calculate an ID that reflects the fact that Gerrit staging will succeed 15:16 <+thiago> probably needs to be given a Git commit ID and zero-or-one parent Change-Id 15:17 <+thiago> 2) a first phase tool that, given a commit ID or range, queries the gerrit server and fetches all the commits 15:17 <+thiago> optimisation: skip git fetch if the commits are still available locally 15:18 <+thiago> 3) a second phase that walks the given range and the range from Gerrit (previous push), if any, and determines which commits need re-applying 15:18 <+thiago> anything whose content has changed, is new, or whose parent ID has changed 15:19 <+thiago> 4) a third phase that takes the output of the second phase and tries to git apply --cached the commits, so we get push ranges 15:20 <+thiago> 5) current git gpush, so we'll get the nickname matching 15:21 <+thiago> I'll create some Jira tasks for this. Which component should I do it under? 15:21 <+ossi> ha, tricky. we have none. qtqainfra/gerrit, even though it is technically misapplied. 15:21 <+ossi> sec 15:22 <+thiago> create something for repotools 15:23 <+ossi> maybe https://bugreports.qt-project.org/browse/QTQAINFRA/component/19422 is good? 15:24 <+thiago> sounds good 15:26 <+thiago> are we doing this in Perl? 15:29 <+ossi> so whose tasks are these going to be? mine? 15:29 <+ossi> fwiw, i think it's fine to just make one task. just structure it well enough 15:29 <+ossi> i'm going for a shower+ now. back in >30 min. 15:29 <+thiago> one overarching 15:29 <+thiago> I think you can get started on the ext-patch-id 15:58 <+ossi> thiago: yes, perl. let's just build on top of what is already there. i guess that's also least likely to cause portability problems. 15:59 <+thiago> ossi: my script can already extract the commit diff, name, email, date and commit subject 16:00 <+thiago> if you add that to the parent Change-Id, you should be able to create the ext-patch-id 16:01 <+ossi> thiago: yeah, i've already looked at it quite closely