Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Daniel Shahaf-2
[hidden email] wrote on Fri, Dec 07, 2018 at 12:29:56 -0000:
> Add a tool for generating a svn.haxx.se archive URL to message-id mapping.

Yay!

> +++ subversion/site/publish/.message-ids.tsv Fri Dec  7 12:29:56 2018
> @@ -1,3 +1,5 @@
> +# Message-ids of archived emails that are referenced by a svn.haxx.se URL.
> +# Generated by tools/haxx-url-to-message-id.sh on 2018-12-07

Could we run this periodically unattended?  We could teach the svn-role
bot to checkout the site source, run this script and commit the results.

The cron job would be —

fn=publish/.message-ids.tsv
cd ~/src/svn/site
svn up -q
tools/haxx-url-to-message-id.sh > $fn
svn ci -m "* $fn: Automatically regenerated" $fn

That's it, I think.  (There's no 'svn st' call because, when there are
no changes to commit, 'svn ci' does nothing and exits 0.)

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Julian Foad-5
Daniel Shahaf wrote:

> > +++ subversion/site/publish/.message-ids.tsv Fri Dec  7 12:29:56 2018
> > +# Message-ids of archived emails that are referenced by a svn.haxx.se URL.
> > +# Generated by tools/haxx-url-to-message-id.sh on 2018-12-07
>
> Could we run this periodically unattended?  We could teach the svn-role
> bot to checkout the site source, run this script and commit the results.
>
> The cron job would be —
>
> fn=publish/.message-ids.tsv
> cd ~/src/svn/site
> svn up -q
> tools/haxx-url-to-message-id.sh > $fn
> svn ci -m "* $fn: Automatically regenerated" $fn

I logged in to svn-qavm3 and added it to crontab of user 'svnsvn':

# Update our Haxx-URL-to-Message-Id map (a manual cron entry, for now)
0 4 * * * fn=publish/.message-ids.tsv; cd ~/src/svn/site; svn up -q; tools/haxx-url-to-message-id.sh > $fn; svn ci -m "* $fn: Automatically regenerated" $fn

More could be done, of course. Robustness: teach it to only accumulate new entries, and not to wipe the file when there's a network glitch. The other cron entries are managed by Puppet, so presumably this should be too.

--
- Julian
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Daniel Shahaf-2
Julian Foad wrote on Mon, Dec 10, 2018 at 18:23:00 +0000:

> Daniel Shahaf wrote:
> > > +++ subversion/site/publish/.message-ids.tsv Fri Dec  7 12:29:56 2018
> > > +# Message-ids of archived emails that are referenced by a svn.haxx.se URL.
> > > +# Generated by tools/haxx-url-to-message-id.sh on 2018-12-07
> >
> > Could we run this periodically unattended?  We could teach the svn-role
> > bot to checkout the site source, run this script and commit the results.
> >
> > The cron job would be —
> >
> > fn=publish/.message-ids.tsv
> > cd ~/src/svn/site
> > svn up -q
> > tools/haxx-url-to-message-id.sh > $fn
> > svn ci -m "* $fn: Automatically regenerated" $fn
>
> I logged in to svn-qavm3 and added it to crontab of user 'svnsvn':
>

Thanks for setting this up.

> # Update our Haxx-URL-to-Message-Id map (a manual cron entry, for now)
> 0 4 * * * fn=publish/.message-ids.tsv; cd ~/src/svn/site; svn up -q; tools/haxx-url-to-message-id.sh > $fn; svn ci -m "* $fn: Automatically regenerated" $fn
>
> More could be done, of course.
> Robustness: teach it to only accumulate new entries, and not to wipe the file when there's a network glitch.

That's easy enough: just write 'set -e' at the start of the line, so any error
aborts the script.  (Or change all semicolons to double ampersands)

> The other cron entries are managed by Puppet, so presumably this should be too.

Yes.

Another nice to have improvement would be making the 'svn ci' invocation
silent — by adding --quiet, or by running it under chronic(1) (from package
'moreutils') — so the people on the crontab's MAILTO list don't get emails for
successful commits.

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Julian Foad-5
Daniel Shahaf wrote:
> Julian Foad wrote on Mon, Dec 10, 2018 at 18:23:00 +0000:
> > Robustness: teach it to only accumulate new entries, and not to wipe the file when there's a network glitch.
>
> That's easy enough: just write 'set -e' [or use] double ampersands

I have done that but I think the script will still generate bad output if the 'curl' commands fail; see TODO below.

> Another nice to have improvement would be making the 'svn ci' invocation
> silent — by adding --quiet, or by running it under chronic(1) (from package
> 'moreutils') — so the people on the crontab's MAILTO list don't get emails for
> successful commits.

Added '-q'etc.

I also made it:
  * not commit if the only change is in the the "# Generated ... <date>" line.
  * search only in the "publish" subtree, so it doesn't pick up the example in its own script

New version:

# Update our Haxx-URL-to-Message-Id map (a manual cron entry, for now)
0 4 * * * fn=publish/.message-ids.tsv; cd ~/src/svn/site && svn up -q && tools/haxx-url-to-message-id.sh publish > $fn.tmp && if diff -q -I'^# Generated' $fn $fn.tmp > /dev/null; then rm $fn.tmp; else mv $fn.tmp $fn && svn ci -q -m "* $fn: Automatically regenerated" $fn; fi

TODO:
  * only add new entries (actually it happens to work this way already, because it finds its own previous output file as part of the scan);
  * only query 'haxx.se' for new entries, not all the existing entries again each time;
  * the 'curl | perl' part of the script wants a way to fail if the 'curl' fails (Bash has 'set -o pipefail' for this).

--
- Julian
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Daniel Shahaf-2
Julian Foad wrote on Tue, Dec 11, 2018 at 10:08:29 +0000:
> Added '-q'etc.
>

Thanks.

> I also made it:
>   * not commit if the only change is in the the "# Generated ... <date>" line.
>   * search only in the "publish" subtree, so it doesn't pick up the example in its own script

I don't think it will pick up the example in its own source, because the
regexp therein doesn't match itself.  (That is, «lambda s: re.search(s, s)»
would return None for the 'https?://…' regex)

>   * the 'curl | perl' part of the script wants a way to fail if the 'curl' fails (Bash has 'set -o pipefail' for this).

We could use a tmpfile, or target bash explicitly (in the #! line too),
or simplify the Perl so it dies if it gets no input (as implemented it's
a "do a transformation for every line" loop, and zero lines aren't an
error)…
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Julian Foad-5
Daniel Shahaf wrote:
> >   * search only in the "publish" subtree, so it doesn't pick up the example in its own script
>
> I don't think it will pick up the example in its own source, because the
> regexp therein doesn't match itself.

It found the "2010-01/0001" example in the script's comments, not the regex itself, in r1848647.

> >   * the 'curl | perl' part of the script wants a way to fail if the 'curl' fails (Bash has 'set -o pipefail' for this).
>
> We could…

Thank you for fixing that in r1848665.

--
- Julian
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Daniel Shahaf-2
Julian Foad wrote on Tue, 11 Dec 2018 11:26 +0000:
> It found the "2010-01/0001" example in the script's comments, not the
> regex itself, in r1848647.

Speaking of which, when I tested the script locally it added some more
URLs that it found in svn-base files in my working copy (due to [1]); that is:
the mapping is complete only for HEAD of the site, not for its history.

Cheers,

Daniel

[1] https://subversion.apache.org/docs/release-notes/1.7#wc-pristines (SVN-4071)
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1848393 - in /subversion/site: publish/.message-ids.tsv tools/haxx-url-to-message-id.sh

Julian Foad-5
In reply to this post by Julian Foad-5
Julian Foad wrote on 2018-12-11:
> # Update our Haxx-URL-to-Message-Id map (a manual cron entry, for now)
> 0 4 * * * fn=publish/.message-ids.tsv; cd ~/src/svn/site && svn up -q &&
> tools/haxx-url-to-message-id.sh publish > $fn.tmp && if diff -q -I'^#
> Generated' $fn $fn.tmp > /dev/null; then rm $fn.tmp; else mv $fn.tmp $fn
> && svn ci -q -m "* $fn: Automatically regenerated" $fn; fi

In r1848836 I moved that logic into 'tools/generate-message-id-map.py'. The crontab entry is now:

# Update our Haxx-URL-to-Message-Id map (a manual cron entry, for now)
0 4 * * * cd ~/src/svn/site && svn up -q && tools/generate-message-id-map.py

--
- Julian