[PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files

Evgeny Kotkov
Hi everyone,

Currently, apr_file_write() can cause an excessive amount of syscalls
for buffered files on Windows in some situations.  This happens because
for buffered files, writing is implemented with a loop that keeps copying
the data to the internal 4KB buffer and writing the contents of this
buffer to disk with WriteFile().  In other words, performing a 100 KB
write currently results in 25 consecutive WriteFile(4096) syscalls.

This patch series reduces the amount of syscalls in such situations by
performing a single WriteFile() call without any buffering, if possible.
If some data is already buffered, then the buffer is first filled, flushed
and the remaining part of the data is written with a single WriteFile().

My measurements indicate that writing in larger chunks and avoiding
such syscalls can save a certain part of the CPU time.  For example,
"svn import" does both small and large buffered writes with the following
pattern:

    write: nbytes = 9
    write: nbytes = 5
    write: nbytes = 86267
    write: nbytes = 9
    write: nbytes = 5
    write: nbytes = 86413
    write: nbytes = 9
    write: nbytes = 5
    write: nbytes = 86415
    write: nbytes = 9
    write: nbytes = 5
    write: nbytes = 67680
    ...

When I test "svn import" for a large file before and after this patch, I
see that

  - the amount of syscalls decreases from 199,403 to 17,061 and

  - the overall CPU time decreases from 7.28 s to 6.68 s.  (I think that's
    a lot, considering that "svn import" does a many other CPU-intensive
    operations.)

The implementation is split into three dependent patches.  The first two
patches lay the necessary groundwork by factoring out a couple of helper
functions.  The third patch is the core change that implements the
described optimization.

The log messages are included in the beginning of each patch file.


Thanks,
Evgeny Kotkov

1-factor-out-write-helper-v1.patch.txt (3K) Download Attachment
2-factor-out-write-buffered-v1.patch.txt (4K) Download Attachment
3-reduce-syscalls-for-buffered-writes-v1.patch.txt (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: [PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files

Markus Schaber
Hi, Evgeny,

Great work, IMHO.

From: Evgeny Kotkov [mailto:[hidden email]]
 
> This patch series reduces the amount of syscalls in such situations by
> performing a single WriteFile() call without any buffering, if possible.
> If some data is already buffered, then the buffer is first filled, flushed
> and the remaining part of the data is written with a single WriteFile().

Why first refilling the buffer, instead of flushing the partial buffer content, and then the original data to write?

From my first glance, it looks like the number of syscalls should be the same, and we could skip the mem copy?


Mit freundlichen Grüßen

Markus Schaber

CODESYS® Eine Marke der 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: [hidden email] | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Geschäftsführer: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Handelsregister: Kempten HRB 6186 | USt-IDNr.: DE 167014915

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind
oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files

Evgeny Kotkov
Markus Schaber <[hidden email]> writes:

> Hi, Evgeny,
>
> Great work, IMHO.

Thank you :-)

>> This patch series reduces the amount of syscalls in such situations by
>> performing a single WriteFile() call without any buffering, if possible.
>> If some data is already buffered, then the buffer is first filled, flushed
>> and the remaining part of the data is written with a single WriteFile().
>
> Why first refilling the buffer, instead of flushing the partial buffer
> content, and then the original data to write?
>
> From my first glance, it looks like the number of syscalls should be
> the same, and we could skip the mem copy?

My original intention here was to avoid issuing small and non-4K-sized
writes, which could happen if we don't refill the buffer.

For instance, in the example above doing so would result in

    WriteFile(13)
    WriteFile(86267)
    WriteFile(13)
    WriteFile(86400)
    ...

instead of

    WriteFile(4096)
    WriteFile(82185)
    WriteFile(4096)
    WriteFile(82317)
    ...

But I should also say that I ran a couple of tests and I can't see any
measurable performance difference between these two approaches, so it
might indeed make sense to switch to the first of them and avoid a memcpy().

In the meanwhile, apparently, there is an oversight in the core V1 patch
(3-reduce-syscalls-for-buffered-writes-v1.patch.txt):

If the buffer is not empty, and the caller issues a write with a chunk
that slightly exceeds the buffer size, for example, 4100 bytes, it would
result in flushing the buffer _and_ writing the remaining couple of bytes
with WriteFile().  An appropriate thing to do here would be to flush the
buffer, but put the few remaining bytes into the buffer, instead of writing
them to disk and thus making an unnecessary syscall.

With all this in mind, I will prepare the V2 version of the core patch.


Thanks,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files

Evgeny Kotkov
Evgeny Kotkov <[hidden email]> writes:

> In the meanwhile, apparently, there is an oversight in the core V1 patch
> (3-reduce-syscalls-for-buffered-writes-v1.patch.txt):
>
> If the buffer is not empty, and the caller issues a write with a chunk
> that slightly exceeds the buffer size, for example, 4100 bytes, it would
> result in flushing the buffer _and_ writing the remaining couple of bytes
> with WriteFile().  An appropriate thing to do here would be to flush the
> buffer, but put the few remaining bytes into the buffer, instead of writing
> them to disk and thus making an unnecessary syscall.
>
> With all this in mind, I will prepare the V2 version of the core patch.
I have attached the V2 version of the core patch.

To solve the aforementioned oversight in the V1 patch, I implemented the
optimization in a slightly different manner, by keeping the existing loop
but also handling a condition where we can write the remaining data with
a single WriteFile() call.  Apart from this, the V2 patch also includes an
additional test, test_small_and_large_writes_buffered().

Speaking of the discussed idea of adjusting the strategy to avoid a memcpy()
with the write pattern like

    WriteFile(13)
    WriteFile(86267)
    ...

instead of

    WriteFile(4096)
    WriteFile(82185)
    ...

I preferred to keep the latter approach which keeps the minimum size of
the WriteFile() chunk equal to 4096, for the following reasons:

  - My benchmarks do not show that the alternative pattern that also avoids
    a memcpy() results in a quantifiable performance improvement.

  - The existing code had a property that all WriteFile() calls, except
    for maybe the last one, were performed in chunks with sizes that are
    never less than 4096.  Although I can't reproduce this in my environment,
    it could be that introducing a possibility of writing in smaller chunks
    would result in the decreased performance with specific hardware, non-
    file handles or in situations when the OS write caching is disabled.
    Therefore, currently, I think that it would be better to keep this
    existing property.

  - Probably, implementing the first approach would result in a bit more
    complex code, as I think that it would require introducing an additional
    if-else code path.

The log message is included in the beginning of the patch file.


Thanks,
Evgeny Kotkov

3-reduce-syscalls-for-buffered-writes-v2.patch.txt (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files

Ivan Zhakov-2
In reply to this post by Evgeny Kotkov
On 21 August 2017 at 18:45, Evgeny Kotkov <[hidden email]> wrote:
> Hi everyone,
>
[..]

> The implementation is split into three dependent patches.  The first two
> patches lay the necessary groundwork by factoring out a couple of helper
> functions.
Refactoring patches committed in r1806299 and r1806301.

> The third patch is the core change that implements the
> described optimization.
>
Review of core changes
(3-reduce-syscalls-for-buffered-writes-v2.patch.txt ) is in progress.

--
Ivan Zhakov
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files

Ivan Zhakov-2
In reply to this post by Evgeny Kotkov
On 23 August 2017 at 19:47, Evgeny Kotkov <[hidden email]> wrote:

> Evgeny Kotkov <[hidden email]> writes:
>
>> In the meanwhile, apparently, there is an oversight in the core V1 patch
>> (3-reduce-syscalls-for-buffered-writes-v1.patch.txt):
>>
>> If the buffer is not empty, and the caller issues a write with a chunk
>> that slightly exceeds the buffer size, for example, 4100 bytes, it would
>> result in flushing the buffer _and_ writing the remaining couple of bytes
>> with WriteFile().  An appropriate thing to do here would be to flush the
>> buffer, but put the few remaining bytes into the buffer, instead of writing
>> them to disk and thus making an unnecessary syscall.
>>
>> With all this in mind, I will prepare the V2 version of the core patch.
>
> I have attached the V2 version of the core patch.
>
> To solve the aforementioned oversight in the V1 patch, I implemented the
> optimization in a slightly different manner, by keeping the existing loop
> but also handling a condition where we can write the remaining data with
> a single WriteFile() call.  Apart from this, the V2 patch also includes an
> additional test, test_small_and_large_writes_buffered().
>
> Speaking of the discussed idea of adjusting the strategy to avoid a memcpy()
> with the write pattern like
>
>     WriteFile(13)
>     WriteFile(86267)
>     ...
>
> instead of
>
>     WriteFile(4096)
>     WriteFile(82185)
>     ...
>
> I preferred to keep the latter approach which keeps the minimum size of
> the WriteFile() chunk equal to 4096, for the following reasons:
>
>   - My benchmarks do not show that the alternative pattern that also avoids
>     a memcpy() results in a quantifiable performance improvement.
>
>   - The existing code had a property that all WriteFile() calls, except
>     for maybe the last one, were performed in chunks with sizes that are
>     never less than 4096.  Although I can't reproduce this in my environment,
>     it could be that introducing a possibility of writing in smaller chunks
>     would result in the decreased performance with specific hardware, non-
>     file handles or in situations when the OS write caching is disabled.
>     Therefore, currently, I think that it would be better to keep this
>     existing property.
>
>   - Probably, implementing the first approach would result in a bit more
>     complex code, as I think that it would require introducing an additional
>     if-else code path.
>
> The log message is included in the beginning of the patch file.
>
Committed 3-reduce-syscalls-for-buffered-writes-v2.patch.txt patch in
r1806308. Thanks!

--
Ivan Zhakov