Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Evgeny Kotkov
Julian Foad <[hidden email]> writes:

> After any issues raised in this discussion are resolved, we feel we should
> go ahead and produce RC1 as soon as possible.

I think that I am seeing a 1.10 regression in terms of httpd's memory
usage during large imports.

In my environment, when I `svn import` an unpacked version of Boost
(https://www.boost.org/) that consists of around 60,000 files, I see that
the memory consumption by the server process rapidly grows up to 1.5 GB.
The environment is a Windows 8.1 x64 machine with httpd 2.4.29 configured
to use short-circuit authz and HTTPS.  Apparently, this behavior is specific
to 1.10, as I cannot reproduce it with 1.9 binaries.

 (I haven't investigated the issue any further at this time, and it might as
  well be reproducible in other configurations.)


Thanks,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Stefan Fuhrmann
On 05.12.2017 22:05, Evgeny Kotkov wrote:

> Julian Foad <[hidden email]> writes:
>
>> After any issues raised in this discussion are resolved, we feel we should
>> go ahead and produce RC1 as soon as possible.
>
> I think that I am seeing a 1.10 regression in terms of httpd's memory
> usage during large imports.
>
> In my environment, when I `svn import` an unpacked version of Boost
> (https://www.boost.org/) that consists of around 60,000 files, I see that
> the memory consumption by the server process rapidly grows up to 1.5 GB.
> The environment is a Windows 8.1 x64 machine with httpd 2.4.29 configured
> to use short-circuit authz and HTTPS.  Apparently, this behavior is specific
> to 1.10, as I cannot reproduce it with 1.9 binaries.
>
>   (I haven't investigated the issue any further at this time, and it might as
>    well be reproducible in other configurations.)

Would it be possible for you to bisect this to
find the offending revision?  My random guess
would be that in the context of mod_dav_svn, we
might use an unsuitable pool for authz caching.

-- Stefan^2.
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Evgeny Kotkov
Stefan Fuhrmann <[hidden email]> writes:

> Would it be possible for you to bisect this to find the offending revision?
> My random guess would be that in the context of mod_dav_svn, we might use
> an unsuitable pool for authz caching.

While looking through the various 1.10-related topics, I remembered about
this issue.  I have been able to narrow it down in my environment to
https://svn.apache.org/r1778923 (Ensure that even long-lived DAV
connections use up-to-date authz.)

Perhaps, a simpler reproduction script would be to issue an 'svn log' for
a medium-sized repository.  In my environment, doing so for the trunk of
TortoiseSVN's repository with 25,000 revisions causes the httpd process
to consume up to a 1 GB of RAM while processing the request.  Overall,
the log takes around 11 seconds instead of 2, compared to 1.9.7.

Unless I am missing something, this might be worth considering before the
1.10 GA release.


Thanks,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Stefan Sperling
On Fri, Mar 02, 2018 at 05:15:46PM +0300, Evgeny Kotkov wrote:
> Unless I am missing something, this might be worth considering before the
> 1.10 GA release.

Not about the actual bug, just a meta comment on the process:

I'd rather ship 1.10.0 at the prospected release date followed closely
by 1.10.1 to fix bugs such as these, than delay general access to 1.10.0
even further.

You may not have realized this, but I have been waiting for 1.10.0 to
happen for *over a year* https://svn.haxx.se/dev/archive-2017-01/0043.shtml
For all this time, I have wanted the conflict resolver to get real world
exposure because I need feedback from users out there to improve it.
I kept mostly quiet because I didn't want to push too hard for this
release all by myself because of the relatively high share of burden
this would imply. So I waited for activity from the community to make
it happen as a true collective effort.
I was glad when Julian volunteered to drive the process.

If this one bug really bothers you enough to hold the planned release back
it makes me wonder why you didn't push for a fix much earlier. We have had
plenty of time. I don't mean to disrespect the fact that you may not have
had time recently -- we are all constraint for time these days.
But I also believe we shouldn't panic over every bug that slips into this
.0 release. It is OK to ship 1.10.0 with known bugs that aren't very
serious security / data corruption issues. There's a section in the
release notes which lists known issues. I would prefer to document this
memory consumption problem there and patch it like any other regular bug.
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Philip Martin
In reply to this post by Evgeny Kotkov
Evgeny Kotkov <[hidden email]> writes:

> Perhaps, a simpler reproduction script would be to issue an 'svn log' for
> a medium-sized repository.  In my environment, doing so for the trunk of
> TortoiseSVN's repository with 25,000 revisions causes the httpd process
> to consume up to a 1 GB of RAM while processing the request.  Overall,
> the log takes around 11 seconds instead of 2, compared to 1.9.7.

I can reproduce the problem in my environment using my mirror of
Subversion.  Running log with authz configured causes apache to use
several GB of memory, that's for about 81,500 revisions.  Reverting
r1779188 and r1778923 solves the memory growth problem.

--
Philip
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Stefan Sperling
On Fri, Mar 02, 2018 at 03:56:40PM +0000, Philip Martin wrote:

> Evgeny Kotkov <[hidden email]> writes:
>
> > Perhaps, a simpler reproduction script would be to issue an 'svn log' for
> > a medium-sized repository.  In my environment, doing so for the trunk of
> > TortoiseSVN's repository with 25,000 revisions causes the httpd process
> > to consume up to a 1 GB of RAM while processing the request.  Overall,
> > the log takes around 11 seconds instead of 2, compared to 1.9.7.
>
> I can reproduce the problem in my environment using my mirror of
> Subversion.  Running log with authz configured causes apache to use
> several GB of memory, that's for about 81,500 revisions.  Reverting
> r1779188 and r1778923 solves the memory growth problem.

Hmmm. Does this help? The authz_tests pass with it.

Index: subversion/mod_authz_svn/mod_authz_svn.c
===================================================================
--- subversion/mod_authz_svn/mod_authz_svn.c (revision 1825730)
+++ subversion/mod_authz_svn/mod_authz_svn.c (working copy)
@@ -395,10 +395,11 @@ resolve_repos_relative_url(const char **path, cons
 }
 
 /*
- * Get the, possibly cached, svn_authz_t for this request.
+ * Get the svn_authz_t for this request.
  */
 static svn_authz_t *
 get_access_conf(request_rec *r, authz_svn_config_rec *conf,
+                apr_pool_t *result_pool,
                 apr_pool_t *scratch_pool)
 {
   const char *access_file;
@@ -467,7 +468,7 @@ get_access_conf(request_rec *r, authz_svn_config_r
   svn_err = svn_repos_authz_read3(&access_conf,
                                   access_file, groups_file,
                                   TRUE, NULL,
-                                  r->connection->pool, scratch_pool);
+                                  result_pool, scratch_pool);
 
   if (svn_err)
     {
@@ -687,7 +688,7 @@ req_check_access(request_rec *r,
     }
 
   /* Retrieve/cache authorization file */
-  access_conf = get_access_conf(r,conf, r->pool);
+  access_conf = get_access_conf(r,conf, r->pool, r->pool);
   if (access_conf == NULL)
     return DECLINED;
 
@@ -804,7 +805,7 @@ subreq_bypass2(request_rec *r,
     }
 
   /* Retrieve authorization file */
-  access_conf = get_access_conf(r, conf, scratch_pool);
+  access_conf = get_access_conf(r, conf, scratch_pool, scratch_pool);
   if (access_conf == NULL)
     return HTTP_FORBIDDEN;
 

Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Evgeny Kotkov
In reply to this post by Stefan Sperling
Stefan Sperling <[hidden email]> writes:

> I'd rather ship 1.10.0 at the prospected release date followed closely
> by 1.10.1 to fix bugs such as these, than delay general access to 1.10.0
> even further.

While I do not have significant objections against such plan, I find the
idea of shipping a performance feature that causes a massive slowdown
instead of an improvement somewhat controversial.  (In other words,
I am -0 for that.)

> You may not have realized this, but I have been waiting for 1.10.0 to
> happen for *over a year* https://svn.haxx.se/dev/archive-2017-01/0043.shtml
> For all this time, I have wanted the conflict resolver to get real world
> exposure because I need feedback from users out there to improve it.
> I kept mostly quiet because I didn't want to push too hard for this
> release all by myself because of the relatively high share of burden
> this would imply. So I waited for activity from the community to make
> it happen as a true collective effort.

Not too sure about how this is connected to the soak period and to the
release process — speaking of which, I would say that your e-mail may
discourage people from reporting issues during the soak period.

> If this one bug really bothers you enough to hold the planned release back
> it makes me wonder why you didn't push for a fix much earlier. We have had
> plenty of time.

I haven't been and am not pushing for a fix.  Rather than that, I have just
included the additional information about the problem with a comment that
it might be viable to look into before the GA release.

Moreover, I reported the issue at the very moment I found it with an edge-case
reproduction.  Once I was asked to bisect for a specific revision, I should
have probably stated that I won't have time to do that.  But I have been
thinking that I would be able to find some.  When I stumbled across it again,
I found the revision and the simple reproduction — but as it seems, this
hasn't been the most appropriate time for including these new details.

Putting all that aside, I wouldn't say that it is productive to discuss issues
in such way.  In my opinion, we should be doing it the other way around by
actually encouraging reports of various problems during the soak period.


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Philip Martin
In reply to this post by Stefan Sperling
Stefan Sperling <[hidden email]> writes:

> Hmmm. Does this help? The authz_tests pass with it.
>
> Index: subversion/mod_authz_svn/mod_authz_svn.c
> ===================================================================
> --- subversion/mod_authz_svn/mod_authz_svn.c (revision 1825730)
> +++ subversion/mod_authz_svn/mod_authz_svn.c (working copy)
> @@ -395,10 +395,11 @@ resolve_repos_relative_url(const char **path, cons
>  }
>  
>  /*
> - * Get the, possibly cached, svn_authz_t for this request.
> + * Get the svn_authz_t for this request.
>   */
>  static svn_authz_t *
>  get_access_conf(request_rec *r, authz_svn_config_rec *conf,
> +                apr_pool_t *result_pool,
>                  apr_pool_t *scratch_pool)
>  {
>    const char *access_file;
> @@ -467,7 +468,7 @@ get_access_conf(request_rec *r, authz_svn_config_r
>    svn_err = svn_repos_authz_read3(&access_conf,
>                                    access_file, groups_file,
>                                    TRUE, NULL,
> -                                  r->connection->pool, scratch_pool);
> +                                  result_pool, scratch_pool);
>  
>    if (svn_err)
>      {
> @@ -687,7 +688,7 @@ req_check_access(request_rec *r,
>      }
>  
>    /* Retrieve/cache authorization file */
> -  access_conf = get_access_conf(r,conf, r->pool);
> +  access_conf = get_access_conf(r,conf, r->pool, r->pool);
>    if (access_conf == NULL)
>      return DECLINED;
>  
> @@ -804,7 +805,7 @@ subreq_bypass2(request_rec *r,
>      }
>  
>    /* Retrieve authorization file */
> -  access_conf = get_access_conf(r, conf, scratch_pool);
> +  access_conf = get_access_conf(r, conf, scratch_pool, scratch_pool);
>    if (access_conf == NULL)
>      return HTTP_FORBIDDEN;

Yes, that solves the memory use problem.  There is still a time penalty:

Your patch:

 $ time svn log -q http://localhost:8888/repo/asf/subversion > /dev/null

 real 0m7.695s
 user 0m0.429s
 sys 0m0.283s

 apache httpd cpu: 7.30

Revert r1779188,1778923

 $ time svn log -q http://localhost:8888/repo/asf/subversion > /dev/null

 real 0m5.102s
 user 0m0.460s
 sys 0m0.250s

 apache httpd cpu: 5.10

--
Philip
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Stefan Sperling
In reply to this post by Evgeny Kotkov
On Fri, Mar 02, 2018 at 09:02:02PM +0300, Evgeny Kotkov wrote:

> Stefan Sperling <[hidden email]> writes:
>
> > I'd rather ship 1.10.0 at the prospected release date followed closely
> > by 1.10.1 to fix bugs such as these, than delay general access to 1.10.0
> > even further.
>
> While I do not have significant objections against such plan, I find the
> idea of shipping a performance feature that causes a massive slowdown
> instead of an improvement somewhat controversial.  (In other words,
> I am -0 for that.)
>
> > You may not have realized this, but I have been waiting for 1.10.0 to
> > happen for *over a year* https://svn.haxx.se/dev/archive-2017-01/0043.shtml
> > For all this time, I have wanted the conflict resolver to get real world
> > exposure because I need feedback from users out there to improve it.
> > I kept mostly quiet because I didn't want to push too hard for this
> > release all by myself because of the relatively high share of burden
> > this would imply. So I waited for activity from the community to make
> > it happen as a true collective effort.
>
> Not too sure about how this is connected to the soak period and to the
> release process — speaking of which, I would say that your e-mail may
> discourage people from reporting issues during the soak period.

I am not trying to discourage people from reporting and fixing problems.
I am sorry if what I wrote could be interpreted in this way.

I am just questioning the usefulness of halting the presses and restarting
the soak for another month for something that isn't a security / data
corruption issue. I anticipate that problems of similar severity to this
one will still be discovered after we release 1.10.0, regardless of
whether we release 1.10.0 at end of March or later.

Though maybe my idea of the impact of this bug is wrong?
If this really makes some repositories entirely unusable with authz enabled
then of course it should be considered a blocker. Is it this severe?
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Stefan Sperling
In reply to this post by Philip Martin
On Fri, Mar 02, 2018 at 06:20:48PM +0000, Philip Martin wrote:

> Stefan Sperling <[hidden email]> writes:
>
> > Hmmm. Does this help? The authz_tests pass with it.
> >
> > Index: subversion/mod_authz_svn/mod_authz_svn.c
> > ===================================================================
> > --- subversion/mod_authz_svn/mod_authz_svn.c (revision 1825730)
> > +++ subversion/mod_authz_svn/mod_authz_svn.c (working copy)
> > @@ -395,10 +395,11 @@ resolve_repos_relative_url(const char **path, cons
> >  }
> >  
> >  /*
> > - * Get the, possibly cached, svn_authz_t for this request.
> > + * Get the svn_authz_t for this request.
> >   */
> >  static svn_authz_t *
> >  get_access_conf(request_rec *r, authz_svn_config_rec *conf,
> > +                apr_pool_t *result_pool,
> >                  apr_pool_t *scratch_pool)
> >  {
> >    const char *access_file;
> > @@ -467,7 +468,7 @@ get_access_conf(request_rec *r, authz_svn_config_r
> >    svn_err = svn_repos_authz_read3(&access_conf,
> >                                    access_file, groups_file,
> >                                    TRUE, NULL,
> > -                                  r->connection->pool, scratch_pool);
> > +                                  result_pool, scratch_pool);
> >  
> >    if (svn_err)
> >      {
> > @@ -687,7 +688,7 @@ req_check_access(request_rec *r,
> >      }
> >  
> >    /* Retrieve/cache authorization file */
> > -  access_conf = get_access_conf(r,conf, r->pool);
> > +  access_conf = get_access_conf(r,conf, r->pool, r->pool);
> >    if (access_conf == NULL)
> >      return DECLINED;
> >  
> > @@ -804,7 +805,7 @@ subreq_bypass2(request_rec *r,
> >      }
> >  
> >    /* Retrieve authorization file */
> > -  access_conf = get_access_conf(r, conf, scratch_pool);
> > +  access_conf = get_access_conf(r, conf, scratch_pool, scratch_pool);
> >    if (access_conf == NULL)
> >      return HTTP_FORBIDDEN;
>
> Yes, that solves the memory use problem.

Nice, I'll commit it then. This might not be a final fix but at least
it's a step forward.

> There is still a time penalty:
>
> Your patch:
>
>  $ time svn log -q http://localhost:8888/repo/asf/subversion > /dev/null
>
>  real 0m7.695s
>  user 0m0.429s
>  sys 0m0.283s
>
>  apache httpd cpu: 7.30
>
> Revert r1779188,1778923
>
>  $ time svn log -q http://localhost:8888/repo/asf/subversion > /dev/null
>
>  real 0m5.102s
>  user 0m0.460s
>  sys 0m0.250s
>
>  apache httpd cpu: 5.10

If I understand it correctly, that's because 1778923 disabled caching
on a per-connection basis? I don't really understand the log message.
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)

Stefan Sperling
In reply to this post by Stefan Sperling
On Fri, Mar 02, 2018 at 07:21:15PM +0100, Stefan Sperling wrote:
> I am just questioning the usefulness of halting the presses and restarting
> the soak for another month for something that isn't a security / data
> corruption issue. I anticipate that problems of similar severity to this
> one will still be discovered after we release 1.10.0, regardless of
> whether we release 1.10.0 at end of March or later.
>
> Though maybe my idea of the impact of this bug is wrong?
> If this really makes some repositories entirely unusable with authz enabled
> then of course it should be considered a blocker. Is it this severe?

I have misread our flowchart at:
http://subversion.apache.org/docs/community-guide/svn-soak-management.png

It seems for this kind of issue we'd extend soak by just one week instead
of four? I wouldn't mind a one-week extension for this kind of bug fix.
One week seems reasonable.
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Philip Martin
In reply to this post by Stefan Sperling
Stefan Sperling <[hidden email]> writes:

> On Fri, Mar 02, 2018 at 06:20:48PM +0000, Philip Martin wrote:
>>
>> Yes, that solves the memory use problem.
>
> Nice, I'll commit it then. This might not be a final fix but at least
> it's a step forward.
>
>> There is still a time penalty:
>>
>> Your patch:
>>
>>  $ time svn log -q http://localhost:8888/repo/asf/subversion > /dev/null
>>
>>  real 0m7.695s
>>  user 0m0.429s
>>  sys 0m0.283s
>>
>>  apache httpd cpu: 7.30
>>
>> Revert r1779188,1778923
>>
>>  $ time svn log -q http://localhost:8888/repo/asf/subversion > /dev/null
>>
>>  real 0m5.102s
>>  user 0m0.460s
>>  sys 0m0.250s
>>
>>  apache httpd cpu: 5.10
>
> If I understand it correctly, that's because 1778923 disabled caching
> on a per-connection basis? I don't really understand the log message.

For comparison, 1.9 gives

 $ time svn log -q http://localhost:8888/repo/asf/subversion > /dev/null

 real 0m10.044s
 user 0m0.404s
 sys 0m0.064s

 apache httpd cpu: 9.98

1.10 would be nearly twice as fast as 1.9 but rereading the authz
removes over half of the gain.

All of those figures are the first run after starting Apache, i.e. when
the OS cache and the Subversion cache is cold.  With a hot Subversion
cache the values are:

            1.9: 6.0s
   patched 1.10: 5.5s
  reverted 1.10: 3.3s

Again, 1.10 would be nearly twice as fast, but now rereading the authz
removes most of that gain.

--
Philip
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Philip Martin
Philip Martin <[hidden email]> writes:

> All of those figures are the first run after starting Apache, i.e. when
> the OS cache and the Subversion cache is cold.  With a hot Subversion

I meant to write

 "the OS cache is hot and the Subversion cache is cold"

> cache the values are:
>
>             1.9: 6.0s
>    patched 1.10: 5.5s
>   reverted 1.10: 3.3s
>
> Again, 1.10 would be nearly twice as fast, but now rereading the authz
> removes most of that gain.
>
> --
> Philip

--
Philip
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Philip Martin
In reply to this post by Philip Martin
Philip Martin <[hidden email]> writes:

> Again, 1.10 would be nearly twice as fast, but now rereading the authz
> removes most of that gain.

I think I see the underlying problem: the authz code now incorporates a
cache based on the md5 checksum of the rules, so when the rules are
unchanged the cached value can be reused.  This cache relies on the
caller being able to pass an svn_repos_t to svn_repos_authz_read3() and,
while svnserve does indeed pass such a pointer, mod_authz_svn is passing
NULL.  That means mod_authz_svn does not take advantage of the new authz
cache.

Stefan's pool patch helps, but I believe the authz rereading in
mod_authz_svn should be reverted from 1.10 unless we can make it take
advantage of the new authz cache.

--
Philip
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Stefan Sperling
On Fri, Mar 02, 2018 at 09:40:28PM +0000, Philip Martin wrote:

> Philip Martin <[hidden email]> writes:
>
> > Again, 1.10 would be nearly twice as fast, but now rereading the authz
> > removes most of that gain.
>
> I think I see the underlying problem: the authz code now incorporates a
> cache based on the md5 checksum of the rules, so when the rules are
> unchanged the cached value can be reused.  This cache relies on the
> caller being able to pass an svn_repos_t to svn_repos_authz_read3() and,
> while svnserve does indeed pass such a pointer, mod_authz_svn is passing
> NULL.  That means mod_authz_svn does not take advantage of the new authz
> cache.
>
> Stefan's pool patch helps, but I believe the authz rereading in
> mod_authz_svn should be reverted from 1.10 unless we can make it take
> advantage of the new authz cache.

Yes, the problem seems to be that mod_authz_svn has no way of storing
per-connection state at present. The memory usage problem happened
because it kept adding new access_conf objects to the per-connection
pool since it has no way of knowing whether a previous request did
already create the same object.

We could store per-connection data by using members of r->conn, such
as r->conn->id or r->notes to index into a cache of svn_repos_t stored
in r->conn's pool. But where could we store a pointer to this cache
so that it could be read back from the request structure r?

Maybe we could use a global cache of svn_repos_t objects which live
throughout the lifetime of mod_authz_svn and are shared across connections?
It probably won't grow out of bounds as the number of repositories is
relatively constant. But it's unclear how to ensure it stays in sync
with the on-disk state. Hmm...
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Stefan Sperling
On Sat, Mar 03, 2018 at 11:43:31AM +0100, Stefan Sperling wrote:

> On Fri, Mar 02, 2018 at 09:40:28PM +0000, Philip Martin wrote:
> > Philip Martin <[hidden email]> writes:
> >
> > > Again, 1.10 would be nearly twice as fast, but now rereading the authz
> > > removes most of that gain.
> >
> > I think I see the underlying problem: the authz code now incorporates a
> > cache based on the md5 checksum of the rules, so when the rules are
> > unchanged the cached value can be reused.  This cache relies on the
> > caller being able to pass an svn_repos_t to svn_repos_authz_read3() and,
> > while svnserve does indeed pass such a pointer, mod_authz_svn is passing
> > NULL.  That means mod_authz_svn does not take advantage of the new authz
> > cache.
> >
> > Stefan's pool patch helps, but I believe the authz rereading in
> > mod_authz_svn should be reverted from 1.10 unless we can make it take
> > advantage of the new authz cache.
>
> Yes, the problem seems to be that mod_authz_svn has no way of storing
> per-connection state at present. The memory usage problem happened
> because it kept adding new access_conf objects to the per-connection
> pool since it has no way of knowing whether a previous request did
> already create the same object.
>
> We could store per-connection data by using members of r->conn, such
> as r->conn->id or r->notes to index into a cache of svn_repos_t stored
> in r->conn's pool. But where could we store a pointer to this cache
> so that it could be read back from the request structure r?
>
> Maybe we could use a global cache of svn_repos_t objects which live
> throughout the lifetime of mod_authz_svn and are shared across connections?
> It probably won't grow out of bounds as the number of repositories is
> relatively constant. But it's unclear how to ensure it stays in sync
> with the on-disk state. Hmm...

I may have found a way to store an svn_repos_t object on the connection.
Does it help? (authz_tests pass)

Index: subversion/mod_authz_svn/mod_authz_svn.c
===================================================================
--- subversion/mod_authz_svn/mod_authz_svn.c (revision 1825762)
+++ subversion/mod_authz_svn/mod_authz_svn.c (working copy)
@@ -409,6 +409,7 @@ get_access_conf(request_rec *r, authz_svn_config_r
   svn_authz_t *access_conf = NULL;
   svn_error_t *svn_err = SVN_NO_ERROR;
   dav_error *dav_err;
+  svn_repos_t *repos;
 
   dav_err = dav_svn_get_repos_path2(r, conf->base_path, &repos_path, scratch_pool);
   if (dav_err)
@@ -465,9 +466,24 @@ get_access_conf(request_rec *r, authz_svn_config_r
                     "Path to groups file is %s", groups_file);
     }
 
+  /* Store an svn_repos_t on this connection to allow authz config caching. */
+  repos = ap_get_module_config(r->connection->conn_config, &authz_svn_module);
+  if (repos == NULL)
+    {
+      svn_err = svn_repos_open3(&repos, repos_path, NULL, r->connection->pool,
+                                scratch_pool);
+      if (svn_err)
+        {
+          log_svn_error(APLOG_MARK, r, "Failed to open repository:",
+                        svn_err, scratch_pool);
+          return NULL;
+        }
+      ap_set_module_config(r->connection->conn_config, &authz_svn_module, repos);
+    }
+
   svn_err = svn_repos_authz_read3(&access_conf,
                                   access_file, groups_file,
-                                  TRUE, NULL,
+                                  TRUE, repos,
                                   result_pool, scratch_pool);
 
   if (svn_err)
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Philip Martin
Stefan Sperling <[hidden email]> writes:

> I may have found a way to store an svn_repos_t object on the connection.
> Does it help? (authz_tests pass)

No significant improvement.

I am timing

  time svn log http://localhost:8888/repos/asf/subversion \
     --username pm --password mp > /dev/null

and I realised my earlier timings had the apache LogLevel set to Debug
which means apache spends a significant time logging.  Lowering the
LogLevel to Crit improves the speed.  Timings for 7 runs after starting
apache:

         trunk      patched     reverted
          1.11         1.11         1.11        1.9

          5.2s         5.1s         3.1s        7.9s
          3.4s         3.4s         1.3s        4.5s
          3.7s         3.8s         1.7s        4.5s
          3.9s         3.9s         1.8s        4.5s
          4.0s         4.0s         1.9s        4.5s
          3.9s         3.9s         1.8s        4.5s
          3.8s         3.8s         1.8s        4.5s

Now that apache is logging less the performance hit is even clearer.
1.11 could be 2 to 3 times faster than 1.9 but the authz effect means it
is only about 10% faster.

Note an odd effect in the above numbers.  The second run for 1.11 is
always the fastest. The first run in each case is the slowest, the
Subversion cache is cold.  The second run is faster, the Subversion
cache is hot.  In 1.9 subsequent runs are comparable to the second, but
in 1.11 the subsequent runs are a bit slower than the second.  I don't
know what causes this effect.

--
Philip
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Paul Hammant-3
In reply to this post by Philip Martin
> Yes, the problem seems to be that mod_authz_svn has no way of storing
per-connection state at present.

Please excuse the interjection here: My eyes spied "per connection" and I wondered if something I'm doing elsewhere that is closer (but not exactly) "per user" might help in terms of thinking.

I've a server-side 'wrapper' around Svn that's calculating authz permutations for each user. That transcends any sessions and connections for the user.  Two users in the same groups(s) with no other variance in authz config will use the same cached Svn entries everywhere.  I've speedy unit tests too for it, but I've a ways to go because of the exclude grammar of authz's little language.

If the thing works well enough, but eats too much of the disk-backed-ram-cache I'm using, I could extend it to do the same per directory. Because right now it is effectively a 'from root of repo' determination for a user.

Of course, you don't want the code I've done because it is not compatible with the current Svn toolchain. I'm thinking about a Rust rewrite, but that's a year away, even if it was compatible with the way Subversion is today.

Regards,

- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Stefan Sperling
In reply to this post by Philip Martin
On Sat, Mar 03, 2018 at 01:06:49PM +0000, Philip Martin wrote:

> Stefan Sperling <[hidden email]> writes:
>
> > I may have found a way to store an svn_repos_t object on the connection.
> > Does it help? (authz_tests pass)
>
> No significant improvement.
>
> I am timing
>
>   time svn log http://localhost:8888/repos/asf/subversion \
>      --username pm --password mp > /dev/null
>
> and I realised my earlier timings had the apache LogLevel set to Debug
> which means apache spends a significant time logging.  Lowering the
> LogLevel to Crit improves the speed.  Timings for 7 runs after starting
> apache:
>
>          trunk      patched     reverted
>           1.11         1.11         1.11        1.9
>
>           5.2s         5.1s         3.1s        7.9s
>           3.4s         3.4s         1.3s        4.5s
>           3.7s         3.8s         1.7s        4.5s
>           3.9s         3.9s         1.8s        4.5s
>           4.0s         4.0s         1.9s        4.5s
>           3.9s         3.9s         1.8s        4.5s
>           3.8s         3.8s         1.8s        4.5s
>
> Now that apache is logging less the performance hit is even clearer.
> 1.11 could be 2 to 3 times faster than 1.9 but the authz effect means it
> is only about 10% faster.
>
> Note an odd effect in the above numbers.  The second run for 1.11 is
> always the fastest. The first run in each case is the slowest, the
> Subversion cache is cold.  The second run is faster, the Subversion
> cache is hot.  In 1.9 subsequent runs are comparable to the second, but
> in 1.11 the subsequent runs are a bit slower than the second.  I don't
> know what causes this effect.

I think I see why my previous patch didn't affect performance.

The cache lookup happens in authz_read() with svn_object_pool__lookup().
The authz config cache is allocated in an object pool which uses the
result_pool passed to svn_repos_authz_read3(). An item is evicted
from an object pool when the pool the item was allocated in gets
cleaned up (see add_object_ref() in libsvn_subr/object_pool.c).
My commit r1825736 reduced the lifetime of the result pool to the
lifetime of the request, which apparently means the cached authz
config is removed from the object pool during request pool cleanup,
and doesn't survive throughout the connection.

So it looks like we'll have to pass the connection pool as result
pool instead. And this should now be safe since only one allocation
should occur if we pass an svn_repos_t as well?

We will still allocate one svn_authz_t object per request on the
connection. But that struct is small, it contains only a few pointers.
This problem could be addressed with an API change to svn_repos_authz_read3()
which adds a third pool argument for use by the object pool cache.

It seems the intention of r1778923 was to use this object pool cache
but the config cache lookup didn't work because a NULL pointer was passed
for the svn_repos_t argument. I don't fully understand why. Did we get a
new "authz_id' in authz_read() on every request?

There might be another bug to fix here: It looks like get_repos_config()
in libsvn_repos/config_file.c is missing a cache lookup:

  /* Fetch checksum and see whether we already have a matching config */
  SVN_ERR(svn_fs_file_checksum(checksum, svn_checksum_md5, root, fs_path,
                               TRUE, access->pool));

Given the comment, I would expect a cache lookup based on the checksum
here, but the code below just does this:

  *stream = representation_stream(root, fs_path, access->pool);

  return SVN_NO_ERROR;
}


Regardless, could you try one more test run with this diff against
trunk to see if it has any impact already?

Index: subversion/mod_authz_svn/mod_authz_svn.c
===================================================================
--- subversion/mod_authz_svn/mod_authz_svn.c (revision 1825762)
+++ subversion/mod_authz_svn/mod_authz_svn.c (working copy)
@@ -399,7 +399,6 @@ resolve_repos_relative_url(const char **path, cons
  */
 static svn_authz_t *
 get_access_conf(request_rec *r, authz_svn_config_rec *conf,
-                apr_pool_t *result_pool,
                 apr_pool_t *scratch_pool)
 {
   const char *access_file;
@@ -409,6 +408,7 @@ get_access_conf(request_rec *r, authz_svn_config_r
   svn_authz_t *access_conf = NULL;
   svn_error_t *svn_err = SVN_NO_ERROR;
   dav_error *dav_err;
+  svn_repos_t *repos;
 
   dav_err = dav_svn_get_repos_path2(r, conf->base_path, &repos_path, scratch_pool);
   if (dav_err)
@@ -465,10 +465,25 @@ get_access_conf(request_rec *r, authz_svn_config_r
                     "Path to groups file is %s", groups_file);
     }
 
+  /* Store an svn_repos_t on this connection to allow authz config caching. */
+  repos = ap_get_module_config(r->connection->conn_config, &authz_svn_module);
+  if (repos == NULL)
+    {
+      svn_err = svn_repos_open3(&repos, repos_path, NULL, r->connection->pool,
+                                scratch_pool);
+      if (svn_err)
+        {
+          log_svn_error(APLOG_MARK, r, "Failed to open repository:",
+                        svn_err, scratch_pool);
+          return NULL;
+        }
+      ap_set_module_config(r->connection->conn_config, &authz_svn_module, repos);
+    }
+
   svn_err = svn_repos_authz_read3(&access_conf,
                                   access_file, groups_file,
-                                  TRUE, NULL,
-                                  result_pool, scratch_pool);
+                                  TRUE, repos,
+                                  r->connection->pool, scratch_pool);
 
   if (svn_err)
     {
@@ -688,7 +703,7 @@ req_check_access(request_rec *r,
     }
 
   /* Retrieve/cache authorization file */
-  access_conf = get_access_conf(r,conf, r->pool, r->pool);
+  access_conf = get_access_conf(r,conf, r->pool);
   if (access_conf == NULL)
     return DECLINED;
 
@@ -805,7 +820,7 @@ subreq_bypass2(request_rec *r,
     }
 
   /* Retrieve authorization file */
-  access_conf = get_access_conf(r, conf, scratch_pool, scratch_pool);
+  access_conf = get_access_conf(r, conf, scratch_pool);
   if (access_conf == NULL)
     return HTTP_FORBIDDEN;
 
Reply | Threaded
Open this post in threaded view
|

Re: Potential regression: high server-side memory consumption during import

Philip Martin
In reply to this post by Philip Martin
Philip Martin <[hidden email]> writes:

> Note an odd effect in the above numbers.  The second run for 1.11 is
> always the fastest. The first run in each case is the slowest, the
> Subversion cache is cold.  The second run is faster, the Subversion
> cache is hot.  In 1.9 subsequent runs are comparable to the second, but
> in 1.11 the subsequent runs are a bit slower than the second.  I don't
> know what causes this effect.

This is a cache size effect.  The early runs can write to the cache
without having to evict, later runs can only write if they first evict.
I happened to choose a cache size that allowed the first two runs to do
very little eviction, the third and subsequent runs need to evict more.
If I make the cache bigger I can postpone the eviction and the third run
becomes as fast as the second.

Here are the cache stats before/after a run once the timing has
stabilised:

gets : 11770766, 4388248 hits (37.28%)
sets : 7309448 (99.01% of misses)
failures: 0
used : 777 MB (94.56%) of 822 MB data cache / 958 MB total cache memory
2225228 entries (99.45%) of 2237592 total
58607 buckets with 7 entries
64670 buckets with 6 entries
58592 buckets with 5 entries
48780 buckets with 4 entries
38212 buckets with 3 entries
27996 buckets with 2 entries
22422 buckets with 1 entries
377 buckets with 0 entries

gets : 12914318, 4814238 hits (37.28%)
sets : 8019703 (99.01% of misses)
failures: 0
used : 778 MB (94.69%) of 822 MB data cache / 958 MB total cache memory
2229936 entries (99.66%) of 2237592 total
59173 buckets with 7 entries
65009 buckets with 6 entries
58773 buckets with 5 entries
48692 buckets with 4 entries
37862 buckets with 3 entries
27838 buckets with 2 entries
21926 buckets with 1 entries
383 buckets with 0 entries

--
Philip
123