Towards a Shelving MVP

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Towards a Shelving MVP

Julian Foad-5
It seems Shelving in the current form is working out quite nicely. I'm
thinking about what more we need so that any of us or our friends would
be comfortable testing it on non-trivial data and calling the feature
set a "minimum viable product". I thought of:


Hardening before being safe to use:

  * Help text should contain a short introduction to how to use; set the
user's expectations; state the limitations (see "Extensions / Not
Supported Initially" section [1] in design doc.).

  * Output should be verbose and clear about what is happening.

  * The prototype should not delete patch files when unshelving, in case
it goes wrong; instead rename/move them.

  * Must handle unsupported scenarios gracefully (e.g. kinds of changes
that we can't yet save and restore in a patch file).

  * If applying a patch fails in any way, or produces conflicts, it
should notify the user and keep the patch (moved somewhere is OK).

  * When 'unshelve' would touch an already modified file, consider
aborting instead of patching. Maybe only patch it if '--force' given.


Some easy UI improvements for a better first impression:

  * Let 'unshelve' choose the most recent shelved patch by default (like
'git stash pop' does).

  * Let 'shelve' default to 'this working directory' and an automatic
name, so that command-line arguments are not required.

  * Tidy up the '--list' output. e.g. show age as minutes/hours/days;
remove file size in bytes.


There remain many limitations listed in [1], mainly of svn diff/patch,
which we will certainly need to address, but I don't think any of them
are essential for the MVP stage. And we have today been talking about
extending the shelving UI to facilitate simple checkpointing, but that
isn't within MVP scope either.

Any thoughts please?

- Julian


[1]
https://docs.google.com/document/d/1PVgw0BdPF7v67oxIK7B_Yjmr3p28ojabP5N1PfZTsHk/edit#heading=h.ysnjkpcjx9ee
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Branko Čibej
On 25.08.2017 18:23, Julian Foad wrote:

> It seems Shelving in the current form is working out quite nicely. I'm
> thinking about what more we need so that any of us or our friends would
> be comfortable testing it on non-trivial data and calling the feature
> set a "minimum viable product". I thought of:
>
>
> Hardening before being safe to use:
>
>   * Help text should contain a short introduction to how to use; set the
> user's expectations; state the limitations (see "Extensions / Not
> Supported Initially" section [1] in design doc.).
>
>   * Output should be verbose and clear about what is happening.
>
>   * The prototype should not delete patch files when unshelving, in case
> it goes wrong; instead rename/move them.

On the topic of storing patches, I'd like to propose an alternative
implementation.

Instead of saving a set of patches, save two sets of (untranslated
pristine) files:

  * All currently modified files
  * Their pristine, unmodified versions (these are of course already in
    the pristine store and would only need an extra reference in the WC
    DB so that 'svn cleanup' doesn't delete them).

The idea is that 'unshelve' could, instead of applying patches to the
current working copy, invoke our built-in diff3 algorithm with whole
(untranslated!) files. This is likely to produce much better results and
saner conflict info than applying patches, which have limited context
information.

Additionally, the existing pristine store can be used to track the files.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Julian Foad-5
Branko Čibej wrote:

> On the topic of storing patches, I'd like to propose an alternative
> implementation.
>
> Instead of saving a set of patches, save two sets of (untranslated
> pristine) files:
>
>   * All currently modified files
>   * Their pristine, unmodified versions (these are of course already in
>     the pristine store and would only need an extra reference in the WC
>     DB so that 'svn cleanup' doesn't delete them).

> The idea is that 'unshelve' could, instead of applying patches to the
> current working copy, invoke our built-in diff3 algorithm with whole
> (untranslated!) files. This is likely to produce much better results and
> saner conflict info than applying patches, which have limited context
> information.

Thanks for the thoughts.

Three-way merging should indeed produce better results than patching
when applying the results to a base other than the original base. We
know that patching is sufficient for many use cases, but we also know
there are cases that where a proper merge would be superior, and they
might turn out to be common in certain usage patterns. I would certainly
be happier if we could do that.

I can also imagine that other benefits might arise from integrating this
storage more tightly into the WC like this, although I am not sure
exactly what.

== What would we need to implement this suggestion? ==

For each shelved change we would need

  * these two sets of files;
  * the (~path-to-hash) index to those files;
  * some (similar or other) storage for property changes;
  * per-node metadata (revnums, copy/move info, ...?)
  * patch metadata (at least a log message);

and to keep track of multiple shelved changes:

  * an index from shelf-name to the above group of data.

Then we would need APIs to:

  * store the result of a WC diff in that format;
  * perform a merge, taking input from that storage, into the WC.

And it would be desirable to have a way to:

  * generate a diff from that storage, to output for review or as a patch.

I presume a WC format bump would be necessary, making it incompatible
with the current WC format. (Can you see any way around that?)

== Advantages of using patch files ==

I see some advantages of using patch files for storage.

  * WC format bump not needed -- compatible with current versions;

  * user can more easily import and export the shelved patches, e.g.
transfer one to another user;

  * forces us to make 'svn diff' and 'svn patch' feature-complete (and
this I think is highly significant even though it's not within the
stated purpose of shelving).

== Infinite Context ==

It comes to my mind that an alternative way to achieve storage of the
complete pristine versions, and so enable 3-way merge instead of patch,
is to write a patch file with 'infinite' context.

The algorithm commonly used by 'patch' tools including our built-in
version do not process context in that way and I expect they would
perform badly given such input, although I haven't tried. That would
negate one of the side benefits of using standard patch format for
storage, but perhaps that's OK. We would need to teach 'svn patch',
given an 'infinite context' patch, to use the standard 3-way merge
algorithm.

(That brings to my mind the question of intermediate-sized context: is
there an algorithmic continuum between the basic 'patch' algorithm which
is designed for small context and the basic '3-way merge' algorithm
which is designed for complete context? It would be interesting to see
whether anyone else has thought of this in the outside world.)

Anyway, that might be another option to try without needing so much
WC-layer implementation.

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Johan Corveleyn-3
On Mon, Aug 28, 2017 at 9:05 PM, Julian Foad <[hidden email]> wrote:

> Branko Čibej wrote:
>> On the topic of storing patches, I'd like to propose an alternative
>> implementation.
>>
>> Instead of saving a set of patches, save two sets of (untranslated
>> pristine) files:
>>
>>   * All currently modified files
>>   * Their pristine, unmodified versions (these are of course already in
>>     the pristine store and would only need an extra reference in the WC
>>     DB so that 'svn cleanup' doesn't delete them).
>
>> The idea is that 'unshelve' could, instead of applying patches to the
>> current working copy, invoke our built-in diff3 algorithm with whole
>> (untranslated!) files. This is likely to produce much better results and
>> saner conflict info than applying patches, which have limited context
>> information.
>
> Thanks for the thoughts.
>
> Three-way merging should indeed produce better results than patching
> when applying the results to a base other than the original base. We
> know that patching is sufficient for many use cases, but we also know
> there are cases that where a proper merge would be superior, and they
> might turn out to be common in certain usage patterns. I would certainly
> be happier if we could do that.

There is one big disadvantage of storing the complete modified files,
and that's storage. If I'm making a small edit to a 100 MB file,
instead of storing a patch of 500 bytes, I have to store 100 MB per
shelved change.

I'm not an expert, but do you really need the modified file itself, if
you have the patch and a reference to the base file (pristine)? Why
store both F (pristine) and F' (modified file), if I can reconstruct
F' out of F + P (patch). So I suggest:

* Store the patch
* Keep the pristine on which the patch was based (keep a reference to
it in the pristine store, like in Brane's suggestion)

We can still perform the 3-way merge by first reconstructing F and F'
out of F and P.

And even if the pristine gets lost (either because the patch was
transferred to another user; or because the user executed the
not-yet-existing command 'svn cleanup --vacuum-shelve-pristines' to
reclaim diskspace) the patch will still be usable, although without
the 3-way merging.

--
Johan
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Daniel Shahaf-2
Could we divorce the questions of storage and algorithms, please?

We all seem to be in agreement that we need to store the file contents
on the shelf's base and the file contents as modified in the shelved
patch.  There are many possible ways to do so: as two full files, or as
unidiffs, or even as deltas.  All these representations are
interchangeable: any of them can be derived from any other.  Which one
to use is one question.

Then, there is the problem of how to rebase a shelf, i.e., how to apply
a shelved patch to a different tree than it was composed against.  This
is another question; it is independent of the first one.  (This is the
question to which "apply a unidiff" and "use diff3" were suggested as
solutions.)

More below.

Johan Corveleyn wrote on Mon, Aug 28, 2017 at 22:41:15 +0200:

> There is one big disadvantage of storing the complete modified files,
> and that's storage. If I'm making a small edit to a 100 MB file,
> instead of storing a patch of 500 bytes, I have to store 100 MB per
> shelved change.
>
> I'm not an expert, but do you really need the modified file itself, if
> you have the patch and a reference to the base file (pristine)? Why
> store both F (pristine) and F' (modified file), if I can reconstruct
> F' out of F + P (patch). So I suggest:
>
> * Store the patch
> * Keep the pristine on which the patch was based (keep a reference to
> it in the pristine store, like in Brane's suggestion)
>

Rather than store a patch which is meant to apply to a particular base,
how about cutting the middleman and storing a delta (and the path@revision
of the delta base, if there is one)?  As I say above, even if the storage
takes the form of a delta, we can implement 'svn shelf --export-as-unidiff'
and 'svn shelf --rebase-using-3-way-merge' in terms of it.

However, I think the right place to implement this is not as special
codepaths for shelves, but inside the pristine store.  Therefore, I
envision the MVP simply adding two fulltext files to the pristine store,
as brane originally suggested.  A future enhancement will be to teach
the pristine store not to store two .svn/pristine/*.svn-base files, but
to store one .svn-base file and one delta.  That future enhancement
_will_ require a format bump, but it is entirely orthogonal to the
shelves work: neither of them is a prerequisite to the other.

I think we can borrow more ideas from problems we have already solved;
for example: a patch that makes tree changes could be stored as a
serialized editor drive, or as NODES table rows describing tree
resulting from that drive.  In either representation, rebasing such a
patch the same problem as 'svn update' of a wc that has local mods, with
the shelf's base and modified trees substituted for BASE and ACTUAL
respectively.

Really, the shelves work is simply about having more than one ACTUAL
tree, isn't it?  And it's not stored in the host OS but serialised under
.svn/.

> We can still perform the 3-way merge by first reconstructing F and F'
> out of F and P.
>

Agreed.  In terms of the divorce I tried to chart above, "reconstructing
F and F'" is the first question, and "performing diff3" is (one proposed
answer to) the second question.

Cheers,

Daniel

P.S. Once the pristine store knows how to store text-base foo as a delta
against text-delta bar, it could just as easily store text-base foo as a
self-delta, and presto, we have compressed pristines.

> And even if the pristine gets lost (either because the patch was
> transferred to another user; or because the user executed the
> not-yet-existing command 'svn cleanup --vacuum-shelve-pristines' to
> reclaim diskspace) the patch will still be usable, although without
> the 3-way merging.
>
> --
> Johan
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Daniel Shahaf-2
Julian, regarding your latest mail in this thread:

Daniel Shahaf wrote on Mon, Aug 28, 2017 at 23:27:00 +0000:
> We all seem to be in agreement that we need to store the file contents
> on the shelf's base and the file contents as modified in the shelved
> patch.  There are many possible ways to do so: as two full files, or as
> unidiffs, or even as deltas.  All these representations are
> interchangeable: any of them can be derived from any other.  Which one
> to use is one question.

The fact that all these representations are interchangeable suggests to
me that they all have the same compatibility properties.  It shouldn't
matter whether the shelves are stored in a new DB table or in a new
.svn/shelves/ directory; an end user wouldn't be able to tell which of
the two representations we chose.

For example:

> > > Then we would need APIs to:
> > >
> > >   * store the result of a WC diff in that format;
> > >   * perform a merge, taking input from that storage, into the WC.

The need for new APIs is, pretty much by definition, independent of the
backing storage.

> > > And it would be desirable to have a way to:
> > >
> > >   * generate a diff from that storage, to output for review or as a patch.

You mean, if we store shelves as unidiffs in .svn/shelves/, user can
cp(1) them from there?  I think it would be good to require users to run
'svn shelf --export-as-unidiff' to get a unidiff: it affords separation
of concerns (between the user-facing functionality and the backing
storage).

> > > I presume a WC format bump would be necessary, making it incompatible
> > > with the current WC format. (Can you see any way around that?)

We need a format bump iff a db query in 1.10 wouldn't DTRT on a wc that
1.11 has touched.  I don't see why shelves would introduce any instance
of this — even if they add new DB tables.

(I suppose 1.10 might purge pristines that are only used by shelves, but
then, the worst-case is that somebody who uses 1.10 and 1.11 on the same
working copy will need to re-fetch the pristine from the server.)

> > >   * forces us to make 'svn diff' and 'svn patch' feature-complete (and
> > > this I think is highly significant even though it's not within the
> > > stated purpose of shelving).

Storing the patches as deltas and applying them with diff3 will "force
us to make tree-diff3 feature complete", which could also be considered
"highly significant despite not being within the stated scope of
shelving".

Besides, as I said in my previous email, whether we store the shelves as
unidiffs is orthogonal to whether we apply them with 'svn patch'.

>

My overall impression is that diff3 is the so called king's road here:
it's designed for this and it's not all that more expensive than the
alternative.

Cheers,

Daniel

> Then, there is the problem of how to rebase a shelf, i.e., how to apply
> a shelved patch to a different tree than it was composed against.  This
> is another question; it is independent of the first one.  (This is the
> question to which "apply a unidiff" and "use diff3" were suggested as
> solutions.)
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Julian Foad-5
Daniel Shahaf wrote:

> The fact that all these representations are interchangeable suggests to
> me that they all have the same compatibility properties.  [...]
>
> We need a format bump iff a db query in 1.10 wouldn't DTRT on a wc that
> 1.11 has touched.  I don't see why shelves would introduce any instance
> of this — even if they add new DB tables.
>
> (I suppose 1.10 might purge pristines that are only used by shelves, but
> then, the worst-case is that somebody who uses 1.10 and 1.11 on the same
> working copy will need to re-fetch the pristine from the server.)

OK, if all the info goes into new db tables.

>>>> And it would be desirable to have a way to:
>>>>
>>>>   * generate a diff from that storage, to output for review or as a patch.
>
> You mean, if we store shelves as unidiffs in .svn/shelves/, user can
> cp(1) them from there?  I think it would be good to require users to run
> 'svn shelf --export-as-unidiff' to get a unidiff: it affords separation
> of concerns (between the user-facing functionality and the backing
> storage).

Fair comment.

> [...]
> My overall impression is that diff3 is the so called king's road here:
> it's designed for this and it's not all that more expensive than the
> alternative.

I agree in terms of the outcome. We'll need to see how expensive the
implementation task to get there would be.

Thanks,
- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Branko Čibej
In reply to this post by Johan Corveleyn-3
On 28.08.2017 22:41, Johan Corveleyn wrote:
> I'm not an expert, but do you really need the modified file itself, if
> you have the patch and a reference to the base file (pristine)? Why
> store both F (pristine) and F' (modified file), if I can reconstruct
> F' out of F + P (patch).

You need source plus delta. We know how to create deltas, and they're
more compact than patches. Only be aware that by then you're well down
the path of re-creating repository functionality in the working copy. :)

(Though, to be fair, that's exactly what's needed for shelving and
checkpointing ... plus a few "tiny" extra features that the WC already has.)

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Julian Foad-5
In reply to this post by Julian Foad-5
Julian Foad wrote:
> Hardening before being safe to use:
>
>   * Help text should contain a short introduction to how to use; set the
> user's expectations; state the limitations (see "Extensions / Not
> Supported Initially" section [1] in design doc.).

Done.
http://svn.apache.org/r1806552

>   * Output should be verbose and clear about what is happening.

http://svn.apache.org/r1806587
...

>   * The prototype should not delete patch files when unshelving, in case
> it goes wrong; instead rename/move them.

Done.
http://svn.apache.org/r1806583

Now instead of deleting it renames to NAME.patch.bak (overwriting any
previous .bak).

>   * Must handle unsupported scenarios gracefully (e.g. kinds of changes
> that we can't yet save and restore in a patch file).
>
>   * If applying a patch fails in any way, or produces conflicts, it
> should notify the user and keep the patch (moved somewhere is OK).
>
>   * When 'unshelve' would touch an already modified file, consider
> aborting instead of patching. Maybe only patch it if '--force' given.
>
>
> Some easy UI improvements for a better first impression:
>
>   * Let 'unshelve' choose the most recent shelved patch by default (like
> 'git stash pop' does).

Done.
http://svn.apache.org/r1806619

>   * Let 'shelve' default to 'this working directory' and an automatic
> name, so that command-line arguments are not required.

http://svn.apache.org/r1806614 (default to this working directory)
...

>   * Tidy up the '--list' output. e.g. show age as minutes/hours/days;
> remove file size in bytes.

http://svn.apache.org/r1806618 (sort by date)
...


- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Towards a Shelving MVP

Julian Foad-5
Julian Foad wrote:
> Julian Foad wrote:
>> Hardening before being safe to use:

After making several such changes (below), I think this is good enough
for a few more people to be trying it out and see how useful it is. I'll
look for some feedback within Assembla. If anyone else here would be
willing to give it a work-out, that would be lovely.

At the same time, I haven't forgotten that I have at least a couple of
substantially different designs to consider, and some smaller
suggestions, thanks to all your previous comments.

- Julian


>>   * Help text should contain a short introduction to how to use; set the
>> user's expectations; state the limitations (see "Extensions / Not
>> Supported Initially" section [1] in design doc.).
>
> Done.
> http://svn.apache.org/r1806552
>
>>   * Output should be verbose and clear about what is happening.
>
> http://svn.apache.org/r1806587
> ...
>
>>   * The prototype should not delete patch files when unshelving, in case
>> it goes wrong; instead rename/move them.
>
> Done.
> http://svn.apache.org/r1806583
>
> Now instead of deleting it renames to NAME.patch.bak (overwriting any
> previous .bak).
>
>>   * Must handle unsupported scenarios gracefully (e.g. kinds of changes
>> that we can't yet save and restore in a patch file).
>>
>>   * If applying a patch fails in any way, or produces conflicts, it
>> should notify the user and keep the patch (moved somewhere is OK).
>>
>>   * When 'unshelve' would touch an already modified file, consider
>> aborting instead of patching. Maybe only patch it if '--force' given.
>>
>>
>> Some easy UI improvements for a better first impression:
>>
>>   * Let 'unshelve' choose the most recent shelved patch by default (like
>> 'git stash pop' does).
>
> Done.
> http://svn.apache.org/r1806619
>
>>   * Let 'shelve' default to 'this working directory' and an automatic
>> name, so that command-line arguments are not required.
>
> http://svn.apache.org/r1806614 (default to this working directory)
> ...
>
>>   * Tidy up the '--list' output. e.g. show age as minutes/hours/days;
>> remove file size in bytes.
>
> http://svn.apache.org/r1806618 (sort by date)
> ...
>
>
> - Julian
>