Subject: fsync performance hit on 1.6.1
To: None <tech-kern@netbsd.org>
From: Daniel Brewer <danielb@cat.co.za>
List: tech-kern
Date: 07/02/2003 14:18:15
This is a multi-part message in MIME format.
------=_NextPart_000_010C_01C340A4.C72041A0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hi
When comparing our software's CPU utilization running on 151 & on 161, I =
noticed an inexplicably high usage on 161. After digging deeper with =
gprof, I found that an fsync on 161 takes significantly longer than on =
151. Our software writes captured images into a ring buffer in a memory =
file-system, so other servers can retrieve them. Could someone explain =
the why fsync on 1.6.1 is so significantly slower than on 1.5.1? Is =
there a work-around? Or perhaps a completely different way of doing =
this?
The process involved after capture, is as follows:
flock(fdPostBuf, LOCK_EX)
lseek(fdPostBuf, img_offset, SEEK_SET)
write(fdPostBuf, img_buf, img_size)
fsync(fdPostBuf)
flock(fdPostBuf, LOCK_UN)
Both the 151 and 161 system are running the same processor (1.2GHz =
Celeron) both have the same motherboard, have 128MB ram and are both =
running Western Digital 20Gig drives. Running the 161 box on a 2GHz =
celeron with 256MB ram gives comparable results to the 1.2Ghz Celeron =
with 128MB ram.
Here's an extract from the output of gprof on 1.5.1 :
% cumulative self self total =20
time seconds seconds calls ms/call ms/call name =20
1.95 43.18 0.93 16364 0.06 0.06 write
0.36 45.99 0.17 8025 0.02 0.02 fsync
0.29 46.56 0.14 16052 0.01 0.01 flock
0.04 47.49 0.02 8025 0.00 0.27 =
WritePostEventDisk(int, int, int, int)
Here's the same extract from the output of gprof on 1.6.1 :
% cumulative self self total =20
time seconds seconds calls ms/call ms/call name =20
60.10 23.15 23.15 8067 2.87 2.87 fsync
5.87 34.52 2.26 16337 0.14 0.14 write
0.42 36.76 0.16 16136 0.01 0.01 flock
0.16 37.76 0.06 8067 0.01 3.30 =
WritePostEventDisk(int, int, int, int)
Thanks
-Daniel
------=_NextPart_000_010C_01C340A4.C72041A0
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.3315.2870" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY background=3D"" bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hi<BR><BR>When comparing our software's =
CPU=20
utilization running on 151 & on 161, I noticed an inexplicably high =
usage on=20
161. After digging deeper with gprof, I found that an fsync on 161 takes =
significantly longer than on 151. Our software writes captured images =
into a=20
ring buffer in a memory file-system, so other servers can retrieve them. =
Could=20
someone explain the why fsync on 1.6.1 is so significantly slower =
than on=20
1.5.1? Is there a work-around? Or perhaps a completely different way of =
doing=20
this?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>The process involved after capture, is =
as=20
follows:</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>flock(fdPostBuf, LOCK_EX)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>lseek(fdPostBuf, img_offset, =
SEEK_SET)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>write(fdPostBuf, img_buf, =
img_size)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>fsync(fdPostBuf)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>flock(fdPostBuf, LOCK_UN)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>Both the 151 and 161 system are running =
the same=20
processor (1.2GHz Celeron) both have the same motherboard, have 128MB =
ram and=20
are both running Western Digital 20Gig drives. Running the 161 box on a =
2GHz=20
celeron with 256MB ram gives comparable results to the 1.2Ghz Celeron =
with 128MB=20
ram.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>Here's an extract from the output of =
gprof on 1.5.1=20
:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> % =
cumulative =20
self &nb=
sp; =20
self =20
total =20
<BR> time seconds seconds =
calls ms/call ms/call name </FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 1.95 =20
43.18 0.93 =20
16364 0.06 0.06 =20
write</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 0.36 =20
45.99 0.17 =20
8025 0.02 0.02 =20
fsync</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 0.29 =20
46.56 0.14 =20
16052 0.01 0.01 =20
flock</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 0.04 =20
47.49 0.02 =20
8025 0.00 0.27 =20
WritePostEventDisk(int, int, int, int)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>Here's the same extract from the output =
of gprof on=20
1.6.1 :</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> % =
cumulative =20
self &nb=
sp; =20
self =20
total =20
<BR> time seconds seconds =
calls ms/call ms/call name </FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 60.10 =20
23.15 23.15 =20
8067 2.87 2.87 =20
fsync</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 5.87 =20
34.52 2.26 =20
16337 0.14 0.14 =20
write</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 0.42 =20
36.76 0.16 =20
16136 0.01 0.01 =20
flock</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> 0.16 =20
37.76 0.06 =20
8067 0.01 3.30 =20
WritePostEventDisk(int, int, int, int)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>Thanks</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>-Daniel</FONT></DIV></BODY></HTML>
------=_NextPart_000_010C_01C340A4.C72041A0--