Better choice for Linux semaphore than spinlock?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Better choice for Linux semaphore than spinlock?

Doug Robinson
Folks:

From a Subversion user:

“... we have very high concurrent connections to Subversion that seem to crater Subversion. The SVN Serve process we use to access the Subversion repository is using the “svn” protocol by our “system user”, mostly read-only.  Then, we, on behalf of the user make request to Subversion using the “http” protocol to fetch their data. So we have lots of connections to Subversion. But the volume of concurrent requests over the “svn” protocol cause the “svnserve” process to consume CPU cycles in a kernel “mutex-lock” which is implemented using “spin locks”. The “svnserve” process makes the mutex calls using the “apache” (APR) semaphore wait calls, but on Linux this is a “mutext-lock” request.”

So is there a better, more scalable, semaphore that can be used?

Cheers.

Doug
--
DOUGLAS B ROBINSON SENIOR PRODUCT MANAGER

The LiveData Company
Find out more wandisco.com


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Reply | Threaded
Open this post in threaded view
|

Re: Better choice for Linux semaphore than spinlock?

Doug Robinson
Folks:

I spoke with this user late last week.  They stated that they can only get approximately 400 parallel SVN operations before the "system time" consumes all available CPU for an 8-core machine.  Adding more cores won't help because of the nature of spin locks (it makes things worse).  Turns out that even with ~100 parallel SVN operations the "system time" starts becoming significant/measurable (~10%).  Both HTTP (mod_dav_svn) and "svnserve" protocols participate in the lock contention.

Your help would be greatly appreciated.

Cheers.

Doug

On Fri, Oct 4, 2019 at 9:20 AM Doug Robinson <[hidden email]> wrote:
Folks:

From a Subversion user:

“... we have very high concurrent connections to Subversion that seem to crater Subversion. The SVN Serve process we use to access the Subversion repository is using the “svn” protocol by our “system user”, mostly read-only.  Then, we, on behalf of the user make request to Subversion using the “http” protocol to fetch their data. So we have lots of connections to Subversion. But the volume of concurrent requests over the “svn” protocol cause the “svnserve” process to consume CPU cycles in a kernel “mutex-lock” which is implemented using “spin locks”. The “svnserve” process makes the mutex calls using the “apache” (APR) semaphore wait calls, but on Linux this is a “mutext-lock” request.”

So is there a better, more scalable, semaphore that can be used?

Cheers.

Doug
--
DOUGLAS B ROBINSON SENIOR PRODUCT MANAGER


--
DOUGLAS B ROBINSON SENIOR PRODUCT MANAGER

The LiveData Company
Find out more wandisco.com


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Reply | Threaded
Open this post in threaded view
|

Re: Better choice for Linux semaphore than spinlock?

Branko Čibej
On Mon, 7 Oct 2019, 19:47 Doug Robinson, <[hidden email]> wrote:
Folks:

I spoke with this user late last week.  They stated that they can only get approximately 400 parallel SVN operations before the "system time" consumes all available CPU for an 8-core machine.  Adding more cores won't help because of the nature of spin locks (it makes things worse).  Turns out that even with ~100 parallel SVN operations the "system time" starts becoming significant/measurable (~10%).  Both HTTP (mod_dav_svn) and "svnserve" protocols participate in the lock contention.

Your help would be greatly appreciated.


Whew. So. Reducing this issue to "use a more efficient lock" is not going to work, and you provided far too little information to even attempt a diagnosis. For starters, I recommend gathering as much info as possible (anonymised of course) about the server configuration, everything from httpd an svnserve to the repository config and underlying filesystem, if possible. Getting stack traces of the "stuck" threads would be necessary, too. Without knowing exactly what is happening, these kinds of problems are extremely hard to understand, let alone fix.

I'd be surprised if the spinlock is the actual culprit. AFAIK, kernel-level locks hand off to the scheduler if they spin too long; on multiprocessor machines, this is usually more efficient than immediately yielding and causing an expensive context switch. It's possible that you're seeing an unfortunate timing "resonance" that might go away with either more *or* less cores being available. The behaviour is really hard to predict.

-- Brane



On Fri, Oct 4, 2019 at 9:20 AM Doug Robinson <[hidden email]> wrote:
Folks:

From a Subversion user:

“... we have very high concurrent connections to Subversion that seem to crater Subversion. The SVN Serve process we use to access the Subversion repository is using the “svn” protocol by our “system user”, mostly read-only.  Then, we, on behalf of the user make request to Subversion using the “http” protocol to fetch their data. So we have lots of connections to Subversion. But the volume of concurrent requests over the “svn” protocol cause the “svnserve” process to consume CPU cycles in a kernel “mutex-lock” which is implemented using “spin locks”. The “svnserve” process makes the mutex calls using the “apache” (APR) semaphore wait calls, but on Linux this is a “mutext-lock” request.”

So is there a better, more scalable, semaphore that can be used?


Reply | Threaded
Open this post in threaded view
|

Re: Better choice for Linux semaphore than spinlock?

Ruediger Pluem


On 10/07/2019 08:40 PM, Branko Čibej wrote:

> On Mon, 7 Oct 2019, 19:47 Doug Robinson, <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Folks:
>
>     I spoke with this user late last week.  They stated that they can only get approximately 400 parallel SVN operations
>     before the "system time" consumes all available CPU for an 8-core machine.  Adding more cores won't help because of
>     the nature of spin locks (it makes things worse).  Turns out that even with ~100 parallel SVN operations the "system
>     time" starts becoming significant/measurable (~10%).  Both HTTP (mod_dav_svn) and "svnserve" protocols participate
>     in the lock contention.
>
>     Your help would be greatly appreciated.
>
>
>
> Whew. So. Reducing this issue to "use a more efficient lock" is not going to work, and you provided far too little
> information to even attempt a diagnosis. For starters, I recommend gathering as much info as possible (anonymised of
> course) about the server configuration, everything from httpd an svnserve to the repository config and underlying
> filesystem, if possible. Getting stack traces of the "stuck" threads would be necessary, too. Without knowing exactly
> what is happening, these kinds of problems are extremely hard to understand, let alone fix.

Plus depending on which part of the code requires this lock a different locking mechanism that might suit better for
this use case can possibly be chosen via configuration changes (e.g. httpd allows this for most of its locking).

Regards

Rüdiger
Reply | Threaded
Open this post in threaded view
|

Re: Better choice for Linux semaphore than spinlock?

Doug Robinson
In reply to this post by Branko Čibej
Brane:

On Mon, Oct 7, 2019 at 2:40 PM Branko Čibej <[hidden email]> wrote:
On Mon, 7 Oct 2019, 19:47 Doug Robinson, <[hidden email]> wrote:
I spoke with this user late last week.  They stated that they can only get approximately 400 parallel SVN operations before the "system time" consumes all available CPU for an 8-core machine.  Adding more cores won't help because of the nature of spin locks (it makes things worse).  Turns out that even with ~100 parallel SVN operations the "system time" starts becoming significant/measurable (~10%).  Both HTTP (mod_dav_svn) and "svnserve" protocols participate in the lock contention.

Your help would be greatly appreciated.

Whew. So. Reducing this issue to "use a more efficient lock" is not going to work, and you provided far too little information to even attempt a diagnosis. For starters, I recommend gathering as much info as possible (anonymised of course) about the server configuration, everything from httpd an svnserve to the repository config and underlying filesystem, if possible. Getting stack traces of the "stuck" threads would be necessary, too. Without knowing exactly what is happening, these kinds of problems are extremely hard to understand, let alone fix.

I'll try to get this information and report back.  Or perhaps they can join this conversation (I gave them a pointer).

I'd be surprised if the spinlock is the actual culprit. AFAIK, kernel-level locks hand off to the scheduler if they spin too long; on multiprocessor machines, this is usually more efficient than immediately yielding and causing an expensive context switch. It's possible that you're seeing an unfortunate timing "resonance" that might go away with either more *or* less cores being available. The behaviour is really hard to predict.

Note: the told me that RHEL support was used and that they identified the culprit as SVN mutex locks being translated into spin-locks at the OS level.
They also provided the example of Apache itself already having worked around this in better ways but because this is really buried deep in mod_dav_svn/svnserve the Apache work-arounds won't help.

Again, I'll see what I can obtain in terms of stack tracebacks, etc.

Cheers.

Doug
 

-- Brane



On Fri, Oct 4, 2019 at 9:20 AM Doug Robinson <[hidden email]> wrote:
Folks:

From a Subversion user:

“... we have very high concurrent connections to Subversion that seem to crater Subversion. The SVN Serve process we use to access the Subversion repository is using the “svn” protocol by our “system user”, mostly read-only.  Then, we, on behalf of the user make request to Subversion using the “http” protocol to fetch their data. So we have lots of connections to Subversion. But the volume of concurrent requests over the “svn” protocol cause the “svnserve” process to consume CPU cycles in a kernel “mutex-lock” which is implemented using “spin locks”. The “svnserve” process makes the mutex calls using the “apache” (APR) semaphore wait calls, but on Linux this is a “mutext-lock” request.”

So is there a better, more scalable, semaphore that can be used?




--
DOUGLAS B ROBINSON SENIOR PRODUCT MANAGER

The LiveData Company
Find out more wandisco.com


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Reply | Threaded
Open this post in threaded view
|

Re: Better choice for Linux semaphore than spinlock?

Doug Robinson
In reply to this post by Ruediger Pluem
Rüdiger:

On Mon, Oct 7, 2019 at 3:51 PM Ruediger Pluem <[hidden email]> wrote:
On 10/07/2019 08:40 PM, Branko Čibej wrote:
> On Mon, 7 Oct 2019, 19:47 Doug Robinson, <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Folks:
>
>     I spoke with this user late last week.  They stated that they can only get approximately 400 parallel SVN operations
>     before the "system time" consumes all available CPU for an 8-core machine.  Adding more cores won't help because of
>     the nature of spin locks (it makes things worse).  Turns out that even with ~100 parallel SVN operations the "system
>     time" starts becoming significant/measurable (~10%).  Both HTTP (mod_dav_svn) and "svnserve" protocols participate
>     in the lock contention.
>
>     Your help would be greatly appreciated.
>
> Whew. So. Reducing this issue to "use a more efficient lock" is not going to work, and you provided far too little
> information to even attempt a diagnosis. For starters, I recommend gathering as much info as possible (anonymised of
> course) about the server configuration, everything from httpd an svnserve to the repository config and underlying
> filesystem, if possible. Getting stack traces of the "stuck" threads would be necessary, too. Without knowing exactly
> what is happening, these kinds of problems are extremely hard to understand, let alone fix.

Plus depending on which part of the code requires this lock a different locking mechanism that might suit better for
this use case can possibly be chosen via configuration changes (e.g. httpd allows this for most of its locking).

That would be awesome!  I'll definitely try to get those stack tracebacks.

Cheers.

Doug
--
DOUGLAS B ROBINSON SENIOR PRODUCT MANAGER

The LiveData Company
Find out more wandisco.com


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Reply | Threaded
Open this post in threaded view
|

Re: Better choice for Linux semaphore than spinlock?

eponymous alias
Perhaps these links might be of help in some way:

https://webkit.org/blog/6161/locking-in-webkit/
https://blog.mozilla.org/nfroyd/2017/03/29/on-mutex-performance-part-1/
https://preshing.com/20111118/locks-arent-slow-lock-contention-is/

On Monday, October 7, 2019, 1:56:14 PM PDT, Doug Robinson <[hidden email]> wrote:

Rüdiger:

On Mon, Oct 7, 2019 at 3:51 PM Ruediger Pluem <[hidden email]> wrote:

On 10/07/2019 08:40 PM, Branko Čibej wrote:
> On Mon, 7 Oct 2019, 19:47 Doug Robinson, <[hidden email] <mailto:[hidden email]>> wrote:
>
> Folks:
>
> I spoke with this user late last week. They stated that they can only get approximately 400 parallel SVN operations
> before the "system time" consumes all available CPU for an 8-core machine. Adding more cores won't help because of
> the nature of spin locks (it makes things worse). Turns out that even with ~100 parallel SVN operations the "system
> time" starts becoming significant/measurable (~10%). Both HTTP (mod_dav_svn) and "svnserve" protocols participate
> in the lock contention.
>
> Your help would be greatly appreciated.
>
> Whew. So. Reducing this issue to "use a more efficient lock" is not going to work, and you provided far too little
> information to even attempt a diagnosis. For starters, I recommend gathering as much info as possible (anonymised of
> course) about the server configuration, everything from httpd an svnserve to the repository config and underlying
> filesystem, if possible. Getting stack traces of the "stuck" threads would be necessary, too. Without knowing exactly
> what is happening, these kinds of problems are extremely hard to understand, let alone fix.

Plus depending on which part of the code requires this lock a different locking mechanism that might suit better for
this use case can possibly be chosen via configuration changes (e.g. httpd allows this for most of its locking).

That would be awesome! I'll definitely try to get those stack tracebacks.

Cheers.

Doug