Questions about a script for regular backups

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Questions about a script for regular backups

Anton Shepelev
[Having failed to post this message via Gmane, I am sending it by e-mail]

Hello, all

In order to write a backup script in the Windows batch
language, I was reading the section "Migrating Repository
Data Elsewhere" from "Repository Maintenance":

   http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html

where I found the following interesting paragraph:

   Another neat trick you can perform with this
   --incremental option involves appending to an existing
   dump file a new range of dumped revisions. For example,
   you might have a post-commit hook that simply appends the
   repository dump of the single revision that triggered the
   hook. Or you might have a script that runs nightly to
   append dump file data for all the revisions that were
   added to the repository since the last time the script
   ran. Used like this, svnadmin dump can be one way to back
   up changes to your repository over time in case of a
   system crash or some other catastrophic event.

The book unfortunately does not seem to give any examples of
this usage, leaving the following questions:

  1.  Is "appending" to be understood literally, that is
      using the >> operator on a previously existing dump
      file, or is it a figure of speach describing a
      supplementary dump file that shall be applied "on top"
      of a previous one?

  2.  How does one determine the revision range for a
      routine incremental dump -- by calling
      `svnlook youngest' before dumping?

  3.  Must the backup script somehow store the last revision
      in the dump between calls?  If so, I shall have to
      keep in a file and not let anybody touch it.

--
Please, do not forward replies to the list to my e-mail.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Andreas Stieger
Hello,

> In order to write a backup script in the Windows batch
> language, I was reading the section "Migrating Repository
> Data Elsewhere" from "Repository Maintenance":
>
>    http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html

The dump format is not the best option for backups. The restore time is much too slow as you need to recover from a serialized format. In many hand-baked scripts the dump method misses point-in-time recovery capabilities, and few people implement backup usability checks by loading the dump. If you have content-aware file based backup software available use that in the on-disk repository format. Just make sure you take a consistent snapshot, which can be achieved by briefly locking it (svnadmin lock) or operating on a consistent copy (svnadmin hotcopy).

Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Mark Phippard-3
In reply to this post by Anton Shepelev
On Thu, Aug 22, 2019 at 9:16 AM Anton Shepelev <[hidden email]> wrote:
[Having failed to post this message via Gmane, I am sending it by e-mail]

Hello, all

In order to write a backup script in the Windows batch
language, I was reading the section "Migrating Repository
Data Elsewhere" from "Repository Maintenance":

   http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html

where I found the following interesting paragraph:

   Another neat trick you can perform with this
   --incremental option involves appending to an existing
   dump file a new range of dumped revisions. For example,
   you might have a post-commit hook that simply appends the
   repository dump of the single revision that triggered the
   hook. Or you might have a script that runs nightly to
   append dump file data for all the revisions that were
   added to the repository since the last time the script
   ran. Used like this, svnadmin dump can be one way to back
   up changes to your repository over time in case of a
   system crash or some other catastrophic event.

The book unfortunately does not seem to give any examples of
this usage, leaving the following questions:

  1.  Is "appending" to be understood literally, that is
      using the >> operator on a previously existing dump
      file, or is it a figure of speach describing a
      supplementary dump file that shall be applied "on top"
      of a previous one?

  2.  How does one determine the revision range for a
      routine incremental dump -- by calling
      `svnlook youngest' before dumping?

  3.  Must the backup script somehow store the last revision
      in the dump between calls?  If so, I shall have to
      keep in a file and not let anybody touch it.


My first choice option would be to setup a repository on a second server and use svnsync from a post-commit hook script to sync the change.  After that, I would use svnadmin hotcopy with the new --incremental option (as of 1.8?).  Dump is not a great choice for backups.

The main advantage of svnsync is you can push the change via HTTP or SVN to a different system where as hotcopy needs FS access so the only way to get the repos on to a second server is if you can mount the FS via NFS or something.

--
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Anton Shepelev
In reply to this post by Andreas Stieger
[replying via Gmane]

Andreas Stieger:

>The dump format is not the best option for backups. The
>restore time is much too slow as you need to recover from a
>serialized format. In many hand-baked scripts the dump
>method misses point-in-time recovery capabilities, ->

Why should I need those if SVN repositories store the
complete history?

>-> and few people implement backup usability checks by
>loading the dump.

Is not a dump guarranteed to be usable if `svn dump'
succeeded?  If not, how do I load that dump without
intefering with the current work?

>If you have content-aware file based backup software
>available use that in the on-disk repository format.

The Unison file synchroniser, to work efficeintly on
Windows, has an option to use file size and modification
date to detect changes.  Would that work with SVN?

Do you suggest that I backup the contents of

  csvn\data\repositories

>Just make sure you take a consistent snapshot, which can be
>achieved by briefly locking it (svnadmin lock) or operating
>on a consistent copy (svnadmin hotcopy).

Is a hot-copy portable between SVN versions?  How safe is it
to rely on a hot copy instead of a dump?

--
Please, do not forward replies to the list to my e-mail.

Reply | Threaded
Open this post in threaded view
|

RE: Questions about a script for regular backups

Bo Berglund
In reply to this post by Mark Phippard-3
On Thu, 22 Aug 2019 09:38:02 -0400, Mark Phippard <[hidden email]> wrote:

>My first choice option would be to setup a repository on a second server
>and use svnsync from a post-commit hook script to sync the change.  After
>that, I would use svnadmin hotcopy with the new --incremental option (as of
>1.8?).  Dump is not a great choice for backups.
>
>The main advantage of svnsync is you can push the change via HTTP or SVN to
>a different system where as hotcopy needs FS access so the only way to get
>the repos on to a second server is if you can mount the FS via NFS or
>something.

That is also what I did!
Our main server runs on a Windows Server on the corporate LAN.
The backup server is a Linux box in a different location altogether.
Both locations have fiber access to the Internet.

The backup server is set up with https access (thanks to LetsEncrypt and Certbot)
through the router.

I have synced the servers after first loading the backup server from dump files so
as not to have to use Internet bandwidth for the original data transfer.

On the Windows man server I have set up a nightly task that uses svnsync to
synchronize the two servers. It has been running just fine for 18 months without fail.
Recommended solution.

Bo Berglund

Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Pierre Fourès
In reply to this post by Anton Shepelev
Hello,

Le jeu. 22 août 2019 à 15:52, Anton Shepelev <[hidden email]> a écrit :

>
> Andreas Stieger:
>
> >The dump format is not the best option for backups. The
> >restore time is much too slow as you need to recover from a
> >serialized format. In many hand-baked scripts the dump
> >method misses point-in-time recovery capabilities, ->
>
> >Just make sure you take a consistent snapshot, which can be
> >achieved by briefly locking it (svnadmin lock) or operating
> >on a consistent copy (svnadmin hotcopy).
>
> Is a hot-copy portable between SVN versions?  How safe is it
> to rely on a hot copy instead of a dump?
>

Indeed, I've encountered the problem that of restoring dumps was way
too slow and I ended up with a "belt and suspenders" solution
consisting of doing hot-copies to guarantee timely restoration time
(on systems with same software configurations), but also dumps to
guarantee restoration (on systems where the software configurations
would differs). For one reason or an other, but mainly if I need to
upgrade subversion and a breaking change would occurs, if the
hot-copies restorations wouldn't work, I would admittedly screw up to
restore it timely, but I would be able to restore it eventually.
However, while I do hot-copies every night, I intended to do dumps
only on week-ends. Up to now, the systems (svn-master and the storage
solution) handle the extra load of doing both solution every night, so
I let it that way (but might reconsider it in the future).

Admittedly, this situation should be very unlikely, but I feel more at
ease to have took it into account. More over, I also set it up it in
the event to handle two use case situation : the first is to complete
timely restoration in case of emergencies, this is handled with
hot-copies ; the second is to handle server upgrades (with software
upgrades), this is handled with dumps. For this second use case,
clearly, this shouldn't be done in emergency situations, so the dump
solution fit fine in this use case while also ensuring (and being
designed for) smooths upgrades between distinct software revisions. Of
course, if not implemented in the backup solution, I would never have
a dump ready when required for a server upgrade. Having integrated it
in the nightly (or weekly) backups, I know I always have a fresh dump
ready for when I intend to upgrade my server. BTW, I talk here of
logical server (the https://svn.company.com/), not physical (or
virtual) instances. I usually never upgrades production running
server. I prefer to install the "upgraded one" from a fresh install
and take the opportunity to do a full restoration to double-check
everything is fine and recoverable. In this particular pursuit, I find
the dumps to be very valuable.

Best Regards,
Pierre
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Anton Shepelev
In reply to this post by Mark Phippard-3
Mark Phippard:

>My first choice option would be to setup a repository on a
>second server and use svnsync from a post-commit hook
>script to sync the change.  After that, I would use
>svnadmin hotcopy with the new --incremental option (as of
>1.8?).  Dump is not a great choice for backups.

Thank you, but I should prefer a traditional backup
approach.  You and other posters say that dumps are poor
choice, so I shall backup incremental hot copies.  But the
question remains that I have asked already in another reply:
are hot-copies a reliable means of long-term storage.
Cannot they become obsolete when a new version of SVN comes
out?  Are they portable across operating systems and
filesystems? (I fear not)

--
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Mark Phippard-3
On Thu, Aug 22, 2019 at 10:38 AM Anton Shepelev <[hidden email]> wrote:
Mark Phippard:

>My first choice option would be to setup a repository on a
>second server and use svnsync from a post-commit hook
>script to sync the change.  After that, I would use
>svnadmin hotcopy with the new --incremental option (as of
>1.8?).  Dump is not a great choice for backups.

Thank you, but I should prefer a traditional backup
approach.  You and other posters say that dumps are poor
choice, so I shall backup incremental hot copies.  But the
question remains that I have asked already in another reply:
are hot-copies a reliable means of long-term storage.

Yes.  A hotcopy is basically just an intelligent backup/copy of the repository. It is similar to what a backup/file copy tool might do except that it is aware of in progress transactions and make sure you have a consistent repository copy.


Cannot they become obsolete when a new version of SVN comes
out?

No.  It is a valid copy of the repository.

  Are they portable across operating systems and
filesystems? (I fear not)

Yes, they are absolutely portable across OS and FS. As is the repos itself.  The only issue when going across these is managing the OS level permissions of the copy.  IOW, if you run something as root the copy will tend to be owned by root which might make it not ready for consumption without a chown/chmod.

I used to regular move fsfs repositories between an AS/400 EBCDIC server and Windows without issue.

The problem with dumps is that they have to be loaded to become usable and it also only copies the repository content not other things like locks and hook scripts.  The hotcopy is copying the repository files directly so you have everything and you could even be serving the hotcopy from a hotswappable server.

--
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Anton Shepelev
Mark Phippard to Anton Shepelev about hot copies:

>>Are they portable across operating systems and
>>filesystems? (I fear not)
>
>Yes, they are absolutely portable across OS and FS. As is
>the repos itself.  The only issue when going across these
>is managing the OS level permissions of the copy.  IOW, if
>you run something as root the copy will tend to be owned by
>root which might make it not ready for consumption without
>a chown/chmod.
>
>I used to regular move fsfs repositories between an AS/400
>EBCDIC server and Windows without issue.

But SVN book has this:

   As described in the section called "Berkeley DB", hot-
   copied Berkeley DB repositories are not portable across
   operating systems, nor will they work on machines with a
   different "endianness" than the machine where they were
   created.

--
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Mark Phippard-3
On Thu, Aug 22, 2019 at 10:55 AM Anton Shepelev <[hidden email]> wrote:
Mark Phippard to Anton Shepelev about hot copies:

>>Are they portable across operating systems and
>>filesystems? (I fear not)
>
>Yes, they are absolutely portable across OS and FS. As is
>the repos itself.  The only issue when going across these
>is managing the OS level permissions of the copy.  IOW, if
>you run something as root the copy will tend to be owned by
>root which might make it not ready for consumption without
>a chown/chmod.
>
>I used to regular move fsfs repositories between an AS/400
>EBCDIC server and Windows without issue.

But SVN book has this:

   As described in the section called "Berkeley DB", hot-
   copied Berkeley DB repositories are not portable across
   operating systems, nor will they work on machines with a
   different "endianness" than the machine where they were
   created.

Almost no one uses the BDB repository format.  The fsfs format became the default in SVN 1.1 or 1.2 and it is the only format used anymore.

--
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Anton Shepelev
Mark Phippard:

>Almost no one uses the BDB repository format.  The fsfs
>format became the default in SVN 1.1 or 1.2 and it is the
>only format used anymore.

Phew.  We do have FSFS.  Thank you.

--
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Nico Kadel-Garcia-2
In reply to this post by Anton Shepelev
On Thu, Aug 22, 2019 at 9:52 AM Anton Shepelev <[hidden email]> wrote:

>
> [replying via Gmane]
>
> Andreas Stieger:
>
> >The dump format is not the best option for backups. The
> >restore time is much too slow as you need to recover from a
> >serialized format. In many hand-baked scripts the dump
> >method misses point-in-time recovery capabilities, ->
>
> Why should I need those if SVN repositories store the
> complete history?

Because, on a bulky repository with bulky binaries, it is *butt slow*,
you can't easily prune the bulky binaries, and you will inevitably
have split-brain during time between the dump and the next dump/load.
Split-brain Is Not Your Friend(tm).
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Pierre Fourès
In reply to this post by Mark Phippard-3
Hello,

Le jeu. 22 août 2019 à 16:47, Mark Phippard <[hidden email]> a écrit :

>
>
>> Cannot they become obsolete when a new version of SVN comes
>> out?
>
>
> No.  It is a valid copy of the repository.
>
>>   Are they portable across operating systems and
>> filesystems? (I fear not)
>
>
> Yes, they are absolutely portable across OS and FS. As is the repos itself.

This prove to work in practice, but is it guaranteed that the fsfs
repos format remain compatible between 1.X subsequent subversion
releases ?

It appears the fsfs repos format sometime change between 1.X
subversion releases. For example, Subversion 1.9 introduced fsfs
format version 7. The release notes [1] mention and recommend to do a
full dump / load cycle to be able to take benefits of this new format
improvements. Nonetheless, the notes also say that "older formats
remain supported". But this seems to be a beneficial side effect, not
a guarantee. It not seem enforced that backward compatibility will be
ensured for all 1.X subsequent subversion releases. To my
understanding, what's guaranteed to remain stable and compatible
between 1.X releases is the protocol between client and server, not
the underlying storing system. This is the reason I went to use
hot-copies for backups *and* dumps for migrations / reinstall. First
off all, it ensures that I will use the latest repos format available
for the particular instance of subversion I would run, and not miss to
upgrade it in order to take all benefits introduced by the targeted
subversion instance. Then, it ensures that in the case of un expected
situation where I would need to downgrade the subversion server
version, I wouldn't face the case of an upgraded fsfs repos format
unable to be read / handled by the said instance.

To my understanding, albeit very slow to load, dumps are absolutely
portable, meaning backward and forward compatible between subversion
server version. You mention the repos are absolutely portable across
OS and FS. Do you also mean between different subversion server
versions ? For instance, how would have it been handled if, by the
time Debian Jessie was out as the Stable Debian, and providing
subversion 1.8, I would have run subversion 1.9 on Ubuntu Xenial (and
used the repos format version 7), and then, for some external reasons
had to made the move to Debian Jessie. I doubt subversion 1.8 could be
able to read the hot-copies I would have done on the Ubuntu server. Or
would it ? If not, this means repos wouldn't be portable across OS
(while in their most current version at a specified date, for example,
early 2017 for the sake of this example). However, to my
understanding, would I have used dumps to backup my Ubuntu server, I
would have been able to restore the repos. Admittedly, I would have
lost the new functionalities introduced in subversion 1.9, but I still
would have been able to run subversion and access my repos, which
seems not to be the case in the event I would just rely on repos
hot-copies. Or would it ?

I would be really interested to get your view on all this in order to
see if I misunderstand what to expect from the hot-copies and the
dumps, and if my setup is overkill, or if it doesn't meet the
requirements I thought it would.

[1] https://subversion.apache.org/docs/release-notes/1.9

Best Regards,
Pierre.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Mark Phippard-3
On Fri, Aug 23, 2019 at 4:16 AM Pierre Fourès <[hidden email]> wrote:
Hello,

Le jeu. 22 août 2019 à 16:47, Mark Phippard <[hidden email]> a écrit :
>
>
>> Cannot they become obsolete when a new version of SVN comes
>> out?
>
>
> No.  It is a valid copy of the repository.
>
>>   Are they portable across operating systems and
>> filesystems? (I fear not)
>
>
> Yes, they are absolutely portable across OS and FS. As is the repos itself.

This prove to work in practice, but is it guaranteed that the fsfs
repos format remain compatible between 1.X subsequent subversion
releases ?

Yes it is.

When you upgrade your server to new version you do not have to touch existing repositories. Think what a nightmare that would be for hosting services or anyone with a lot of repositories.  It is not uncommon for a new release to introduce a new repository format with some new features ... though usually it is just some new efficiency in how the data is stored.  You need to dump/load if you are interested in getting these changes but the server is capable of reading and writing every repository format.

 

It appears the fsfs repos format sometime change between 1.X
subversion releases. For example, Subversion 1.9 introduced fsfs
format version 7. The release notes [1] mention and recommend to do a
full dump / load cycle to be able to take benefits of this new format
improvements.

Correct, you need to dump/load if you want to use the new format.

There is nothing wrong with having full dumps of your repository and you need it to upgrade the format, but hot-copies are totally viable as a backup and have a lot of advantages when it comes to the recovery process in the event you need the backup.  I would not rush to using new formats just because they are available. I have avoided the new format in 1.9 as its benefits seemed tuned to scenarios that do not match my needs at all and it has slower performance for what I think is the most common use case which is using the Apache server hosting lots of repositories.

Anyway ... the only danger of a repository format is if you upgrade to latest and then for some reason need to downgrade your server binaries to an older version.  You can always use an older format with a newer version.

--
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Nathan Hartman
On Fri, Aug 23, 2019 at 9:53 AM Mark Phippard <[hidden email]> wrote:
Anyway ... the only danger of a repository format is if you upgrade to latest and then for some reason need to downgrade your server binaries to an older version.  You can always use an older format with a newer version.

If you did wish to downgrade to an older version, wouldn't a dump and load make that possible?

Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Mark Phippard-3
On Fri, Aug 23, 2019 at 11:06 AM Nathan Hartman <[hidden email]> wrote:
On Fri, Aug 23, 2019 at 9:53 AM Mark Phippard <[hidden email]> wrote:
Anyway ... the only danger of a repository format is if you upgrade to latest and then for some reason need to downgrade your server binaries to an older version.  You can always use an older format with a newer version.

If you did wish to downgrade to an older version, wouldn't a dump and load make that possible?


Absolutely.  Just pointing you that is the only time you would run into something that would not just work and would require you to do something.

--
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Pierre Fourès
Le ven. 23 août 2019 à 17:10, Mark Phippard <[hidden email]> a écrit :

>
> On Fri, Aug 23, 2019 at 11:06 AM Nathan Hartman <[hidden email]> wrote:
>>
>> On Fri, Aug 23, 2019 at 9:53 AM Mark Phippard <[hidden email]> wrote:
>>>
>>> Anyway ... the only danger of a repository format is if you upgrade to latest and then for some reason need to downgrade your server binaries to an older version.  You can always use an older format with a newer version.
>>
>>
>> If you did wish to downgrade to an older version, wouldn't a dump and load make that possible?
>>
>
> Absolutely.  Just pointing you that is the only time you would run into something that would not just work and would require you to do something.
>

Thanks a lot Mark for your clarifications.

Pierre.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Anton Shepelev
In reply to this post by Anton Shepelev
Thanks to everybody for their replies.  I now understand
that --incremental hot-copies are sufficient for regular
backups, which can then be mirrored by content-aware file-
synchronisation tools, but the problem remains of preventing
an accidental propagation of corrupt data into the backup.
How do you solve it?

--
Please, do not forward replies to my e-mail.

Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Andreas Stieger
Hello,

> that --incremental hot-copies are sufficient for regular
> backups, which can then be mirrored by content-aware file-
> synchronisation tools, but the problem remains of preventing
> an accidental propagation of corrupt data into the backup.
> How do you solve it?

What the fruit do you mean? The whole purpose of a backup is that you can restore previous points in time. That means multiple points in time, whenever the backup happened to be run. Don't just make a copy and overwrite it every time. That is just copy, not a backup. Select backup software that can do that.

Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Questions about a script for regular backups

Anton Shepelev
Andreas Stieger to Anton Shepelev:

> > Thanks to everybody for their replies.  I now understand
> > that -- incremental hot-copies are sufficient for
> > regular backups, which can then be mirrored by content-
> > aware file- synchronisation tools, but the problem
> > remains of preventing an accidental propagation of
> > corrupt data into the backup.  How do you solve it?
>
> What the fruit do you mean?  The whole purpose of a backup
> is that you can restore previous points in time.  That
> means multiple points in time, whenever the backup
> happened to be run.  Don't just make a copy and overwrite
> it every time. That is just copy, not a backup. Select
> backup software that can do that.

No, it depends on one's purpose.  If it is to keep the data
in case of HDD crashes, a single mirror is sufficient.  Then
again, since an SVN repository maintains its whole history,
a point-in-time recovery is easily effected by
`svn up -r N'.

The only potential problem is some quiet data corruption,
which is why I ask: will `hotcopy' propagate data corruption
or will it detect it via internal integrity checks and fail?

--
Please, do not forward replies to my e-mail.

12