Sparse checkouts suggestion

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Sparse checkouts suggestion

Paul Hammant-3
Compared to Perforce's client-spec, Subversion's sparse checkouts are quite cumbersome:

svn checkout http://svn.apache.org/repos/asf/subversion --depth=immediates
cd subversion/trunk
svn update --set-depth infinity
cd ../tags
svn update --set-depth immediates
cd 1.7.7
svn update --set-depth infinity

.. and similar.

Could Subversion follow Perforce and allow an alternate mechanism that leveraged include and exclude globbing paths?

Maybe in the root of working copy, the contents of a special file could be honored:

.sparse_mappings.txt

Sample contents:

   exclude **/*
   include trunk/**/*
   include tags/*
   include tags/1.7.7/**/*
   
Where the end user to do svn-up from root (after that file changed), and assuming 'svn st' were 'clean', the working copy would reshape itself. Specifically directories would appear and disappear, and that NOT necessarily be subversion adds or deletes - it'd feel the same as permission changes within the directory tree.  Of course different teammates with the same checkout may see entirely different things (depending on the lines within their .sparse_mappings.txt

When I say special file, I mean Subversion isn't going to mark it as candidate for svn-add when it sees it.  Meaning, it is set on the client side by tooling and left there.  Tooling may include expand/contract scripts similar to what Google have for their monorepo.

Elsewhere i have documented Google's fu and the need for it for scaled Trunk-Based Development in a few places: 

1. https://paulhammant.com/2014/01/06/googlers-subset-their-trunk/ (based on 2009 memories). Yes I know they are in Piper now and out of Perforce.

2. Previously I forked a medium-sized Monorepo on Github, and did the the complete expand/contract work for it - https://github.com/paul-hammant-fork/jooby-monorepo-experiment - in Python.

3. I have generally written about monorepos here: https://trunkbaseddevelopment.com/monorepos/

4. And the expanding contracting kind here: https://trunkbaseddevelopment.com/expanding-contracting-monorepos/

So, in the tradition of the Subversion project, I'd like to discuss this here, then go on to raise a JIRA ticket.

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Branko Čibej
On 13.09.2017 04:22, Paul Hammant wrote:

> Compared to Perforce's client-spec, Subversion's sparse checkouts are
> quite cumbersome:
>
>     svn checkout http://svn.apache.org/repos/asf/subversion
>     --depth=immediates
>     cd subversion/trunk
>     svn update --set-depth infinity
>     cd ../tags
>     svn update --set-depth immediates
>     cd 1.7.7
>     svn update --set-depth infinity
>
> .. and similar.
>
> Could Subversion follow Perforce and allow an _alternate_ mechanism
> that leveraged include and exclude globbing paths?
>
> Maybe in the root of working copy, the contents of a /special/ file
> could be honored:
>
> .sparse_mappings.txt
>
> Sample contents:
>
>    exclude **/*
>    include trunk/**/*
>    include tags/*
>    include tags/1.7.7/**/*
>    
> Where the end user to do svn-up from root (after that file changed),
> and assuming 'svn st' were 'clean', the working copy would reshape
> itself. Specifically directories would appear and disappear, and that
> NOT necessarily be subversion adds or deletes - it'd feel the same as
> permission changes within the directory tree.  Of course different
> teammates with the same checkout may see entirely different things
> (depending on the lines within their .sparse_mappings.txt


This is an old idea. If you want to implement it, a file is not the
right place for the view definition; a property on a directory might be.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
This is an old idea. If you want to implement it, a file is not the
right place for the view definition; a property on a directory might be.

A Svn property that sets for the user and workspace only, and retains no history, right?  Meaning if the user had two checkouts of the same Svn URL, then they could maintain two different sparse mappings, right?

That would make it like Perforce's implementation. The perforce commands for export a client spec, and import again:

p4 client -o > .p4_clientspec_mappings.txt
# modify something
p4 client -i < .p4_clientspec_mappings.txt

Judicious use of this in Google is part of their economic miracle and one of the dynamics of their scaling Trunk-Based Development to 25,000 developers in one trunk in one repo, with 9 million source files at HEAD revision :)

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Branko Čibej
On 13.09.2017 09:25, Paul Hammant wrote:
>
>     This is an old idea. If you want to implement it, a file is not the
>     right place for the view definition; a property on a directory
>     might be.
>
>
> A Svn property that sets for the user and workspace only, and retains
> no history, right?  Meaning if the user had two checkouts of the same
> Svn URL, then they could maintain two different sparse mappings, right?

That would be sort of hard to do. :)


> That would make it like Perforce's implementation. The perforce
> commands for export a client spec, and import again:
>
>     p4 client -o > .p4_clientspec_mappings.txt
>
>     # modify something
>
>     p4 client -i < .p4_clientspec_mappings.txt
>
> Judicious use of this in Google is part of their economic miracle and
> one of the dynamics of their scaling Trunk-Based Development to 25,000
> developers in one trunk in one repo, with 9 million source files at
> HEAD revision :)

Hmm. What exactly are you proposing anyway? I'd like to see this
described in terms of Subversion workflows, not Perforce workflows. Are
you suggesting that every user first checks out a working copy, selects
the view spec, and updates to make it sparse? Because that, whilst of
course possible, is hardly optimal.

I find it hard to believe that every user would want a different view of
the tree. Certainly that's not the way we used the equivalent ClearCase
feature in a previous $dayjob.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Mark Phippard-3
In reply to this post by Paul Hammant-3
Have you seen:




On Sep 12, 2017, at 10:22 PM, Paul Hammant <[hidden email]> wrote:

Compared to Perforce's client-spec, Subversion's sparse checkouts are quite cumbersome:

svn checkout http://svn.apache.org/repos/asf/subversion --depth=immediates
cd subversion/trunk
svn update --set-depth infinity
cd ../tags
svn update --set-depth immediates
cd 1.7.7
svn update --set-depth infinity

.. and similar.

Could Subversion follow Perforce and allow an alternate mechanism that leveraged include and exclude globbing paths?

Maybe in the root of working copy, the contents of a special file could be honored:

.sparse_mappings.txt

Sample contents:

   exclude **/*
   include trunk/**/*
   include tags/*
   include tags/1.7.7/**/*
   
Where the end user to do svn-up from root (after that file changed), and assuming 'svn st' were 'clean', the working copy would reshape itself. Specifically directories would appear and disappear, and that NOT necessarily be subversion adds or deletes - it'd feel the same as permission changes within the directory tree.  Of course different teammates with the same checkout may see entirely different things (depending on the lines within their .sparse_mappings.txt

When I say special file, I mean Subversion isn't going to mark it as candidate for svn-add when it sees it.  Meaning, it is set on the client side by tooling and left there.  Tooling may include expand/contract scripts similar to what Google have for their monorepo.

Elsewhere i have documented Google's fu and the need for it for scaled Trunk-Based Development in a few places: 

1. https://paulhammant.com/2014/01/06/googlers-subset-their-trunk/ (based on 2009 memories). Yes I know they are in Piper now and out of Perforce.

2. Previously I forked a medium-sized Monorepo on Github, and did the the complete expand/contract work for it - https://github.com/paul-hammant-fork/jooby-monorepo-experiment - in Python.

3. I have generally written about monorepos here: https://trunkbaseddevelopment.com/monorepos/

4. And the expanding contracting kind here: https://trunkbaseddevelopment.com/expanding-contracting-monorepos/

So, in the tradition of the Subversion project, I'd like to discuss this here, then go on to raise a JIRA ticket.

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
In reply to this post by Branko Čibej

>     This is an old idea. If you want to implement it, a file is not the
>     right place for the view definition; a property on a directory
>     might be.
>
>
> A Svn property that sets for the user and workspace only, and retains
> no history, right?  Meaning if the user had two checkouts of the same
> Svn URL, then they could maintain two different sparse mappings, right?

That would be sort of hard to do. :)

So you're in favor of svn properties for this or against :-P
 
Are
you suggesting that every user first checks out a working copy, selects
the view spec, and updates to make it sparse? Because that, whilst of
course possible, is hardly optimal.

No.

I would expect, like Googlers do/did, 'checkout --depth none'
then run a script to populate that hidden file
then do 'svn up'
 
I find it hard to believe that every user would want a different view of
the tree. Certainly that's not the way we used the equivalent ClearCase
feature in a previous $dayjob.

Workflow for noob to the Adsense team (Perforce command changed to Subversion as you requested)

1. svn co svn://vcs/trunk --depth none
2. ./gcheckout.sh adsense 
3. svn up

Workflow for noob to the Adwords team:

1. svn co svn://vcs/trunk --depth none
2. ./gcheckout.sh adwords
3. svn up 

An eyeball comparison of the two workstation's working copy directories would discover that there were many directories/files in common, and many different (as you would expect for two different applications with different teams, release cadences etc).  Google shares code internally at source level.

Workflow for a senior engineer wanting to make a change to the common code between adsense and adwords:

1. cd to existing working copy in 'no pending changes' status
2. ./gcheckout.sh adwords+adwords
3. svn up 

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
In reply to this post by Mark Phippard-3

Excellent!

I see you posted a StackOverflow answer on that too, Mark - https://stackoverflow.com/questions/7481860/create-folder-structure-based-on-file

I'm one of those people that's blind to a lib or framework until I see examples of how to use (can't read tutorials, can't read ref docs). I found https://prabhugs.wordpress.com/2010/10/11/effective-and-easy-sparse-checkouts-svn-viewspec-py-script/ too which gets me halfway there.

The svn-viewspec script is implicitly include-centric, right?  Perforce's choice back in the 90's was to allow excludes and includes which is more powerful even if it is dangerous.

Also: no top posting on the subversion lists, Mark. At least not if you intend to stick around :) 

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3

Oh tools/client-side/svn-viewspec.py doesn't have any tests :-(

The good news is that it can be refactored to have test quite easily. I think I'll be able to donate some code.

Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Stefan Sperling-9
On Wed, Sep 13, 2017 at 06:53:14AM -0400, Paul Hammant wrote:
> Oh tools/client-side/svn-viewspec.py doesn't have any tests :-(
>
> The good news is that it can be refactored to have test quite easily. I
> think I'll be able to donate some code.

I would also be happy to see further improvements to this script.

A script is always a useful prototype for built-in features.
See svnmerge.py (quirks of the built-in implementation nonwithstanding :)
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
I have a test for the inlined example. Well it would pass, but PyTest's assert function is about 10x inferior to JUnit's or Hamcrest's assertEquals() and isn't telling me where I've futzed up a CR somewhere. This even *with* optional args (that should be mandatory -s and -vv). Gotta go to work now - more later.
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
T'was line endings and redundant spaces, fixed with a strip(), but as I say, Java's asserts are better to noobs.

Here's the test: https://gist.github.com/paul-hammant/058161485a227299e5d7c34cc6a33264

I had to refactor the actual script too. A tiny backwards-compatible bit.

-ph
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
OK, so I am up at 93% coverage with a PyTest script. As an Apache Member I'm already CLA'd. I could start to write tests that break the script in ways that it should handle more gracefully, but I need assurance that I'm not wasting my time.

I'm not expecting committership (especially since my last published last C was Xmas os 91), but am expecting to be told there is interest in merging my pull-request. *Cough* applying my patches, I mean :-P

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Branko Čibej
In reply to this post by Paul Hammant-3
On 13.09.2017 12:40, Paul Hammant wrote:

>
>     >     This is an old idea. If you want to implement it, a file is
>     not the
>     >     right place for the view definition; a property on a directory
>     >     might be.
>     >
>     >
>     > A Svn property that sets for the user and workspace only, and
>     retains
>     > no history, right?  Meaning if the user had two checkouts of the
>     same
>     > Svn URL, then they could maintain two different sparse mappings,
>     right?
>
>     That would be sort of hard to do. :)
>
>
> So you're in favor of svn properties for this or against :-P
>  
>
>     Are
>     you suggesting that every user first checks out a working copy,
>     selects
>     the view spec, and updates to make it sparse? Because that, whilst of
>     course possible, is hardly optimal.
>
>
> No.
>
> I would expect, like Googlers do/did, 'checkout --depth none'
> then run a script to populate that hidden file
> then do 'svn up'
>  
>
>     I find it hard to believe that every user would want a different
>     view of
>     the tree. Certainly that's not the way we used the equivalent
>     ClearCase
>     feature in a previous $dayjob.
>
>
> Workflow for noob to the Adsense team (Perforce command changed to
> Subversion as you requested)
>
>     1. svn co svn://vcs/trunk --depth none
>     2. ./gcheckout.sh adsense 
>     3. svn up
>
>
> Workflow for noob to the Adwords team:
>
>     1. svn co svn://vcs/trunk --depth none
>     2. ./gcheckout.sh adwords
>     3. svn up 
>
>
> An eyeball comparison of the two workstation's working copy
> directories would discover that there were many directories/files in
> common, and many different (as you would expect for two different
> applications with different teams, release cadences etc).  Google
> shares code internally at source level.
>
> Workflow for a senior engineer wanting to make a change to the common
> code between adsense and adwords:
>
> 1. cd to existing working copy in 'no pending changes' status
> 2. ./gcheckout.sh adwords+adwords
> 3. svn up

How about something along the lines of this:

$ svn co svn://vcs/trunk --view foo

with:

$ svn propget svn:views svn://vsc/trunk
[foo]
bar/**
baz/qux

[bar]
baz/**


In other words: available views are defined in a svn:views property on
the directory, and the user selects which (if any) view to use during
checkout, update or switch. Obviously once the working copy structure is
set up it'd remain sticky unless explicitly changed.

For user-specific layouts, I can imagine having a --view-file or
--view-prop option to tell the client where to read the view definitions
from; but the svn:views property would be the default.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Branko Čibej
In reply to this post by Paul Hammant-3
On 13.09.2017 19:47, Paul Hammant wrote:
> OK, so I am up at 93% coverage with a PyTest script. As an Apache
> Member I'm already CLA'd. I could start to write tests that break the
> script in ways that it should handle more gracefully, but I need
> assurance that I'm not wasting my time.
>
> I'm not expecting committership (especially since my last published
> last C was Xmas os 91), but am expecting to be told there is interest
> in merging my pull-request. *Cough* applying my patches, I mean :-P

The Subversion project has the universal commit bit. If you're already a
committer anywhere, you can commit here. Of course it's nice to ask
first. :)

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
Thanks. 

Here's what I expect to land - https://github.com/apache/subversion/pull/5
Ten tests - ten passing in less than 0.1s

I don't know if there is a clean workflow from Git commits back into Subversion so, I'll make the same commit through the Svn HTTPS interface, if code review comments are approving.

- Paul

Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Branko Čibej
On 14.09.2017 12:26, Paul Hammant wrote:
> Thanks. 
>
> Here's what I expect to land - https://github.com/apache/subversion/pull/5

Why o why did you go to the trouble of adding the global variables in
svn-viewspec.py? They add exactly zero value and cause nothing but code
churn. -1 to that change ... the rest there look cosmetic and OK.

BTW, you left trailing spaces on some of the lines (github diff shows
them); please fix that.

The tests should be in tools/client-side. subversion/tests/cmdline are
for testing the main command-line tools, and have their own test
infrastructure (which predates PyTest, FWIW). We do not install, nor
officially support, the bits in the tools/ directory.

Also please read the community guide, especially the part about log
messages, before committing.
http://subversion.apache.org/docs/community-guide/

> Ten tests - ten passing in less than 0.1s
>
> I don't know if there is a clean workflow from Git commits back into
> Subversion so, I'll make the same commit through the Svn HTTPS
> interface, if code review comments are approving.

There isn't currently, as far as I know.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
I ran pep8 over the sources and thought I'd remediated all the training spaces already. I'll finish the job.

My alternative to the four globals is to pass the function pointers as parameters through one or two intermediates to the functions that would use them.  Would that be acceptable?  I can't just delete the globals (as is) because the places that need then would break on invocation.  You saw why I was doing that, right?

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Branko Čibej
On 14.09.2017 14:50, Paul Hammant wrote:
> I ran pep8 over the sources and thought I'd remediated all the
> training spaces already. I'll finish the job.
>
> My alternative to the four globals is to pass the function pointers as
> parameters through one or two intermediates to the functions that
> would use them.  Would that be acceptable?  I can't just delete the
> globals (as is) because the places that need then would break on
> invocation.  You saw why I was doing that, right?

Function parameters are better. Why not create a context object and pass
just one extra arg around?

Not sure though why you need os.system on that list ... these days it'd
be a lot better to use the subprocess module anyway.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Paul Hammant-3
My aim was to minimally impact the prod source. The context object is a good idea.

I've been using https://amoffat.github.io/sh/ and quite enjoying it for my other python/svn project.

Reply | Threaded
Open this post in threaded view
|

Re: Sparse checkouts suggestion

Branko Čibej
On 14.09.2017 15:04, Paul Hammant wrote:
> My aim was to minimally impact the prod source. The context object is
> a good idea.
>
> I've been using https://amoffat.github.io/sh/ and quite enjoying it
> for my other python/svn project.

Yes, that one's pretty nice, but as a rule we try not to burden our
users with dependencies outside the standard library. The script uses
os.system, what, twice? Hardly worth replacing.

On the other hand, we have wrappers around subprocess.Popen in
subversion/tests/cmdline/svntest/main.py; you should be able to recycle
those.

-- Brane
12