Quick question about the sha1-checksum for directories in svn.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Quick question about the sha1-checksum for directories in svn.

Paul Hammant-3
While a _directory_ has a revision number, it does not have a SHA1 does it?

If so there's no point asking how it's calculated, is there ?

- Paul


Reply | Threaded
Open this post in threaded view
|

Re: Quick question about the sha1-checksum for directories in svn.

Julian Foad-5
Paul Hammant wrote:
> While a _directory_ has a revision number, it does not have a SHA1 does it?
>
> If so there's no point asking how it's calculated, is there ?

Correct.

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Quick question about the sha1-checksum for directories in svn.

Paul Hammant-3
> Correct.

Thanks Julian.

Observation: If it did then Subversion would qualify as a full Merkle tree implementation. Albeit through APIs designed for other purposes :-p ;) 
Reply | Threaded
Open this post in threaded view
|

Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Julian Foad-5
Paul Hammant wrote:
> Observation: If [a dir had a SHA1] then Subversion would qualify as a *full* Merkle
> tree implementation. Albeit through APIs designed for other purposes :-p ;)

There's a bit more to it than just adding a SHA1 for a directory:

   * The currently implemented SHA1 of a file doesn't include its
properties, for a start. That's easy to address: add another that does.

   * We might want one Merkle tree that includes svn:mergeinfo values
and another that doesn't, so that branches can compare equal even though
the physical storage representation of their mergeinfo differs. That
could be said to be a deficiency of the way mergeinfo is currently
stored, and an alternative solution would be to change that.

   * We might also want a Merkle tree excluding all properties, so that
the tree can be compared with a plain unversioned tree.

   * How to deal with server authz restrictions. Should a Merkle tree
represent only that part of the tree that the current user is allowed to
see, and/or the "whole" tree regardless of authz restrictions? Perhaps
both kinds are useful.

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Paul Hammant-3
Not that my vote counts for much, but I'd prefer w/o props, obeying read permissions.
Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Branko Čibej
On 05.10.2017 16:19, Paul Hammant wrote:
> Not that my vote counts for much, but I'd prefer w/o props, obeying
> read permissions.

"Obeying read permissions" means that the directory hashes would have to
be computed dynamically for each user.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Paul Hammant-3
"Obeying read permissions" means that the directory hashes would have to
> be computed dynamically for each user.

Yes I know. 

I've a frankenstein project over here - https://github.com/paul-hammant/merkle-rust - There's a solid Rust merkle tree that watches for changes at nodes then re-calcs the SHA1 of the parent dir (and the parent of that, and all the way back to root).

Because I can't actually write Rust yet, I've made a Python webserver, and Python test that shows what nodes should report on the wire (plain text in this demo). Here's is the GET result for node A/AK/A/Alaska/ ( a list of resources and their SHA1s) ->

96b55f350ccb1911d7f9d7e7645d6d11178ad753
2013.json 1674790a70b984c9041ab86c370f942861ead004
2016.json 194f6519cd60b773a82857cf1aeba8dad4a223ed
2020.json 20e3ff1ade2385c593f73fd44fd157391d2424e7
2050.json 19b2da433a273840deddb7a46b16891acab16e3f
2060.json 45418423999c155abc434e175d42ccf6534bee6d
2068.json 69576d3632c7ce8b0b2a42d87e9e75049bdaff9d
2070.json 2d50d3100744fb7c4c5ead72a2896909fbe2ba6a
2090.json 434e72590bdaa03176ebabc18e52dc0a24918da9
2100.json a884cf405af5a20276b3b1cc72885833905ffa97
2105.json f7bfd6be756c919e4ea69cb466f31ae0b2fd213d
2110.json c88ed14784907db94b11941b760892742bd043f3
2122.json e5247216a7a1f32851807d4b4b9cd1cf152cd57d
2130.json 375fc259d26710afe31e1367a59de3d5dea729b1
2150.json b5c17ce941a9647fbc179cb1b769348cfdaebb98
2164.json ab2f71a47910463bdde92a77313f7e5edba00063
2170.json c1d0842e2c77d53bee5c684e0e8ad580a2fff05f
2180.json 9759646b5a2811efc6bfaee84a2b813f85cb1e1a
2185.json e2beddc3f245ca79908d9a1590aa177151e66e4d
2188.json 5881cdc3a15eac0cba15a3e07ad54176cabeeedf
2195.json b7085173d8685deecf70c6efb63d92ec6db2cf21
2198.json 78a399df3eeb4f696812d6d46882e4029190cf67
2220.json 594a20ec3d550eae2ad848fa8c0e08d50bd4e7fa
2230.json ec4c2c8dfc3cf4c1e6719de0544aaeba7731fc9c
2240.json 798b13c1c4aa7ee2e5744f8b8a2b553c1447229b
2261.json 0cd460d99cc7a871230cbe0f6f2ab703339e1630
2270.json 4e77074d4adc2ac7bb486408b00bda26f98820ea
2275.json 91ce27fc8b643540dda88afdc5b5b62d97a026fa
2282.json 3bec0f1f9e3ae818dd096303f796ce28e2fe6d08
2290.json fd4014808dd77351b939060f1283d395deb0cea5

Here's the GET result for the SHA1 of that A/AK/A/Alaska/.sha1 ->

96b55f350ccb1911d7f9d7e7645d6d11178ad753

You know already, Brane, but others might not: I take the first text (the list) run it through 'sha1sum' to make the second text (the single SHA1).  That the relatively dynamic SHA1s that would obey read permissions for the user in question, only need to SHA1 a small amount of data, not whole files again. Of course that is recursive so it would need to be maintained once per user group, or user where there's different permissions.

- Paul

Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Julian Foad-5
In reply to this post by Branko Čibej
Branko Čibej wrote:
> On 05.10.2017 16:19, Paul Hammant wrote:
>> Not that my vote counts for much, but I'd prefer w/o props, obeying
>> read permissions.
>
> "Obeying read permissions" means that the directory hashes would have to
> be computed dynamically for each user.

Correct, but let's not imply that's a showstopper.

Calculation of a directory's hash would have to happen for each
directory where the user has mixed access to the immediate children, and
for all parents of such a directory up to the root. For any subtree
where the user has full access, we can use a stored value.

That means in typical authz patterns (a few subtrees excluded) there is
very little calculation required, as long as the authz subsystem can
efficiently tell us whether the user has full or mixed or no access to a
given subtree.

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Paul Hammant-3
Agree. 

For those that are super interested in this, I did a series of four blog entries on Merkle trees and Blockchains.

Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Branko Čibej
In reply to this post by Julian Foad-5
On 05.10.2017 16:46, Julian Foad wrote:
> Branko Čibej wrote:
>> On 05.10.2017 16:19, Paul Hammant wrote:
>>> Not that my vote counts for much, but I'd prefer w/o props, obeying
>>> read permissions.
>>
>> "Obeying read permissions" means that the directory hashes would have to
>> be computed dynamically for each user.
>
> Correct, but let's not imply that's a showstopper.

I never said it was.

> Calculation of a directory's hash would have to happen for each
> directory where the user has mixed access to the immediate children,
> and for all parents of such a directory up to the root.

And /that/ is the painful part: the fact that you need a depth-first
traversal of the tree in order to calculate the hash for the root
directory. And the reason why we're not exposing the directory hash,
even if the FS stores it.

> For any subtree where the user has full access, we can use a stored
> value.

Assuming we know that said user has full access ... which we might,
depending on where the hash calculation would take place, i.e., whether
it would have useful access to the authz info.


-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Daniel Shahaf-2
Branko Čibej wrote on Thu, 05 Oct 2017 18:44 +0200:
> On 05.10.2017 16:46, Julian Foad wrote:
> > Calculation of a directory's hash would have to happen for each
> > directory where the user has mixed access to the immediate children,
> > and for all parents of such a directory up to the root.
>
> And /that/ is the painful part: the fact that you need a depth-first
> traversal of the tree in order to calculate the hash for the root
> directory. And the reason why we're not exposing the directory hash,
> even if the FS stores it.

What if we only returned a checksum for nodes to which the user had full
recursive access?  E.g., with "[/A/B] *=", the caller would be able to retrieve
checksums for /A/C, /A/D, /A/mu, and /A's property hash, and for descendants
of the first two, but that's it.
Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Branko Čibej
On 05.10.2017 19:12, Daniel Shahaf wrote:

> Branko Čibej wrote on Thu, 05 Oct 2017 18:44 +0200:
>> On 05.10.2017 16:46, Julian Foad wrote:
>>> Calculation of a directory's hash would have to happen for each
>>> directory where the user has mixed access to the immediate children,
>>> and for all parents of such a directory up to the root.
>> And /that/ is the painful part: the fact that you need a depth-first
>> traversal of the tree in order to calculate the hash for the root
>> directory. And the reason why we're not exposing the directory hash,
>> even if the FS stores it.
> What if we only returned a checksum for nodes to which the user had full
> recursive access?  E.g., with "[/A/B] *=", the caller would be able to retrieve
> checksums for /A/C, /A/D, /A/mu, and /A's property hash, and for descendants
> of the first two, but that's it.

That would leak permission settings. A user would know that she only
sees a partial directory merely by checking for the presence of the
directory checksum.

-- Brane
Reply | Threaded
Open this post in threaded view
|

RE: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Bert Huijben-5


> -----Original Message-----
> From: Branko Čibej [mailto:[hidden email]]
> Sent: donderdag 5 oktober 2017 19:29
> To: [hidden email]
> Subject: Re: Merkle trees in svn [was: Quick question about the sha1-
> checksum for directories in svn.]
>
> On 05.10.2017 19:12, Daniel Shahaf wrote:
> > Branko Čibej wrote on Thu, 05 Oct 2017 18:44 +0200:
> >> On 05.10.2017 16:46, Julian Foad wrote:
> >>> Calculation of a directory's hash would have to happen for each
> >>> directory where the user has mixed access to the immediate children,
> >>> and for all parents of such a directory up to the root.
> >> And /that/ is the painful part: the fact that you need a depth-first
> >> traversal of the tree in order to calculate the hash for the root
> >> directory. And the reason why we're not exposing the directory hash,
> >> even if the FS stores it.
> > What if we only returned a checksum for nodes to which the user had
> > full recursive access?  E.g., with "[/A/B] *=", the caller would be
> > able to retrieve checksums for /A/C, /A/D, /A/mu, and /A's property
> > hash, and for descendants of the first two, but that's it.
>
> That would leak permission settings. A user would know that she only sees a
> partial directory merely by checking for the presence of the directory
> checksum.

We already explicitly leak that there are server excluded subtrees in/for our delta editor / reporter design, so this would not be a security regression.

        Bert
>
> -- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Paul Hammant-3
As my forthcoming multi-user app that uses Subversion as a backing store is going to kill Svn it with Depth- PROPFINDs from root, I really want to see this implemented.  

Because I can't wait I will implement something that calculates SHA1s for the directory in question, and drops it into a '.sha1' file in the folder in question. It'll be a job that runs perpetually, and keeps committing as things change.

To cut through the permissions based quandary that y'all are having, how about ignore the subsetting that's implicit from permissions, and just calc the SHA1 for the 'all' situation, requiring the client-side end-user of the this setup to correlate a localally calculated SHA1 for the permitted set and the remote canonical SHA1 for the potential superset. If the same algorithm is used on the client and the server, then all the end user can determine is that they are not (for some directories) permitted to read all files.

Crude example, using command line unix tools:

$ cat foo
2013.json 1674790a70b984c9041ab86c370f942861ead004
2016.json 194f6519cd60b773a82857cf1aeba8dad4a223ed
2020.json 20e3ff1ade2385c593f73fd44fd157391d2424e7
2050.json 19b2da433a273840deddb7a46b16891acab16e3f
2060.json 45418423999c155abc434e175d42ccf6534bee6d
2068.json 69576d3632c7ce8b0b2a42d87e9e75049bdaff9d
$ cat foo | sha1sum 
c1874a0ba80cb51245cb78f567dbbd46271d7ba1  -
$ head -3 foo | sponge foo
$ cat foo 
2013.json 1674790a70b984c9041ab86c370f942861ead004
2016.json 194f6519cd60b773a82857cf1aeba8dad4a223ed
2020.json 20e3ff1ade2385c593f73fd44fd157391d2424e7
$ cat foo | sha1sum 
1cbbec9747ab817dac26892cfd437f72d7366858  -

^ the server side says c1874a... for the 'directory' that foo represents and passes that to all clients that ask (provided they are permitted to the directory).  The client side holds that ref, and 1cbbec... maybe in a database of some sort. This way, it can track whether it has kept abreast of the Svn server or not. It may determine that in needs to do a depth-1 PROPFIND and because the server side SHA1 for the directory changed, and in the not-permitted-to-see-all situation may determine that that was ultimately unnecessary.  

Of course I can alternatively correlate subversion revision numbers right now with locally calc'd SHA1, but it doesn't feel very Merkle-like.
Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Branko Čibej
On 10.10.2017 16:02, Paul Hammant wrote:
> As my forthcoming multi-user app that uses Subversion as a backing
> store is going to kill Svn it with Depth-∞ PROPFINDs from root, I
> really want to see this implemented.  
>
> Because I can't wait

... I will implement a patch that exposes directory checksums in
Subversion. How 'bout that instead? :)

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Merkle trees in svn [was: Quick question about the sha1-checksum for directories in svn.]

Paul Hammant-3
Yikes! I'll have to run through the build setup for Subversion - 52 pages of requirements before launching ./config, right? - including wind-of-newt, left handed screwdriver, two four-leaf clovers ;) :-P

Seriously: sounds good - and I appreciate it :)