tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: perl 5.22, spamassassin and .db



[added tech-userlevel as it may be a libc/db issue]

On Thu, Oct 29, 2015 at 10:19:52AM +0100, Manuel Bouyer wrote:
> On Wed, Oct 28, 2015 at 10:11:25PM +0100, Manuel Bouyer wrote:
> > Hello,
> > since then upgrade to perl 5.22, I can't run spamassassin with the
> > bayes filter enabled any more (on NetBSD 7.0/amd64, in case it
> > matters). The problem is that the bayes_toks file
> > becomes ridiculously larges (sevral 100s of GB), eventually filling up the
> > disk's space after a few days (it used to be a few MBs)
> > the file is reported to be
> > bayes_toks: Berkeley DB 1.85 (Hash, version 2, native byte-order)
> > 
> > has anyone else seen this ?
> > I'm running spamassassin from procmail (no spamd).
> 
> After running with bayes enabled for half an hour (28 mails processed)
> I have:
> >ls -lh bayes_*
> -rw-------  1 bouyer  wheel  4.7M Oct 29 10:11 bayes_seen
> -rw-------  1 bouyer  wheel  4.7G Oct 29 10:11 bayes_toks
> >du -h bayes*
> 4.5M    bayes_seen
> 1.4G    bayes_toks
> 
> from a previous attempt, the file bayes_toks did grow to 64TB (!) and
> then spamassassin stopped working (this may be the size limit of the
> filesystem).
> 
> I did have problems with perl 5.20 too, so it may not be an issue with perl,
> but with NetBSD 7.0.
> 
> bayes_seen looks good:
> 
> db hash bayes_seen |wc
>       26      26    1404
> 
> but bayes_toks looks corrupted:
> >db hash bayes_toks|wc
> db: Error dumping database: Invalid argument
>      228     453    2181
> 
> any idea ?

I investigated this, and it seems very subtle.
I tested on 2 amd64 hosts, one on bare metal and the second one being a
Xen domU. Both hosts have the exact same userland version (I even checked the
libc md5 to make sure, reinstalled perl and spamassassin from the
2015Q3 same repo). feeding a mailbox with 22 messages to spamassassin will
reproduce the problem on the bare-metal host but not on the Xen host.

On the Xen host, I consistenly get a bayes_toks file with size 65536.
On the bare-metal host, each run gives a different size:
322174976, 321978368, 322109440.

Any idea on how to debug this further is welcome.


-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index