pkgsrc-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
PR/43388 CVS commit: pkgsrc/biology/phylip
The following reply was made to PR pkg/43388; it has been noted by GNATS.
From: OBATA Akio <obache%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost
Cc:
Subject: PR/43388 CVS commit: pkgsrc/biology/phylip
Date: Sat, 10 Jul 2010 11:26:33 +0000
Module Name: pkgsrc
Committed By: obache
Date: Sat Jul 10 11:26:32 UTC 2010
Modified Files:
pkgsrc/biology/phylip: Makefile PLIST distinfo
pkgsrc/biology/phylip/patches: patch-aa
Log Message:
Update phylip to 3.69.
Based on PR#43388 by Wen Heping.
version 3.69 (September, 2009)
* If there are more than about 50 species in the tree, Treedist can
fail to compute distances among the trees. This is due to an overflow
problem inadvertently introduced in version 3.68. There is no
workaround with the 3.68 executable, but if you can recompile you can
fix it by replacing line 1179 of treedist.c, which is currently
maxgrp = pow(2,tip_count);
by
maxgrp = 100000;
This is fixed in version 3.69. Versions prior to 3.68 will not have
this problem.
* In Dnacomp, Pars, and Dollop, if the Shimodaira-Hasegawa test is
performed and there are trees perfectly tied with the best tree, the
P values were incorrect (being 0 instead of 1).
* A team from Iowa State University noticed that time was being wasted
in calculations in Dnapenny in the bound calculations. This has now
been remedied and it should be noticeably faster.
* In the molecular likelihood programs, ancestral state probabilities
were being incorrectly calculated for user trees that had internal
multifurcations. This has been corrected.
version 3.68 (August, 2008)
* We received some reports that Dnaml was freezing on some data sets in
the Windows executables. This seems to have been because of incorrect
handling of small increases in the log-likelihood, causing the
algorithm to fall into loops. It was temporarily cured in version 3.67
by changing the compiler optimization level, downwards from -O3 to
-O1. Now the underlying problem of small differences of log-likelihood
has been addressed too, so you should use the new Windows executables
(3.68) to avoid having these problems on Windows systems.
* We found that the .DMG (disk image) archive for Mac OS X contained
executables for the Intel Mac but not universal binaries that would
work on both Intel Mac and PowerPC systems. Oops. We recompiled and
reposted the archives (on 23 August 2007). They should work on both
kinds of systems now.
* We were told that on a Linux computer with a 64-bit Intel Itanium
chip
the bootstrapping program Seqboot creates blatantly wrong bootstrap
samples with characters sampled too many times (or none). On a 64-bit
AMD processor the program works fine. The problem is in the random
number function "randum" in phylip.c. It seems to be a problem with
optimization on the GCC compiler. It is cured by dropping the compiler
optimization level from -O3 to -O2.
* In Protdist the program would blow up if it computes a distance
greater than 100.0. This is owing to a subscript error in the code
that writes out the distances, in line 1874 where
else if (d[j][k] < 1000.0)
should have been
else if (d[i][j-1] < 1000.0)
If you have this problem and cannot upgrade to version 3.68 or
recompile the program with this change, and your data comes from
bootstrapping, try omitting just that replicate, or else rerunning
the bootstrapping with a different random number seed (which might not
happen to drop as many of the sites that caused these two sequences to
be so distant).
* When Dnadist is used and the lower-triangular output format is
chosen,
the resulting file has headers at the top of columns and is human-
readable but is not machine readable. The (temporary) solution is not
to use this option for the time being.
* In Mac OS X, Drawgram produces some alarming lines of text at the top
of its terminal window when it first runs. These are just scripting
commands that were not erased because we do not clear the screen at
the right moment. The workaround is simply to ignore these commands.
version 3.67 (July, 2007)
* We had our first reports on the behavior of PHYLIP Windows
executables
on Windows Vista. The programs work fine. The only thing that did not
work is the self-extraction program that unpacks the archives. For
some reason it did not work on Vista. The work-around was that, after
you got an archive file like phylipwx.exe onto your system, you had to
change the file extension from "exe" to "zip". Then you had to click
on the file. You were presented with options including "Extract all
files". If you chose that the archive was unpacked. The programs would
then work. Although we provided "zip" archive versions of the package,
we have now got a new version of WinZip which is supposed to have a
self-extractor that works on Windows Vista, and it was used to produce
the self-extracting archive since 27 August 2007.
* On Mac OS X systems, if our distributed executables are placed in a
folder whose path contains a name with an internal blank, such as
/Users/ianr/the files/ then the script that causes each of our
programs to run when you click on the corresponding icon does not
work, and there is an error message. This is a scripting error in our
Mac OS X setup, and it was corrected in version 3.67. In the meantime,
if you have this problem, the solution is to put PHYLIP in a folder
whose path does not have any folder that has a blank in its name. In
the above example, all that would be necessary is to rename the folder
the files to the_files
* We are still getting reports of stickiness of the tree, and
occasionally of negative branch lengths, in Dnamlk and Promlk which
do not do as good a job of searching for best trees as they should.
This has turned out to be an issue of nodes getting stuck when they
collide in moving them on the "time" scale. Some major changes were in
the code in the 3.67 release to eliminate this stickiness and give a
good search.
* An error was made in putting together the matrices for the PAM
mutation model in Protdist, Proml, and Promlk. These programs will
give PAM calculations inconsistent with earlier (v3.65 and before)
versions, and with other programs. The matrices were corrected in
version 3.67. This does not affect JTT or PMB models.
* The W (within-species varation) option of CONTRAST uses somewhat
incorrect equations to infer within-species covariances and
phylogenetic covariances. These were corrected in version 3.67.
Anyone severely impacted by the problem in the meantime should contact
me.
* Protdist sometimes results in distances greater than or equal to
100.000. When this happens, the distance can run together with the
previous number in the output file. For example, a distance of 0.31766
followed by one which is 127.43986 might look like this:
"0.31766127.43986". This causes trouble in any program that tries to
use this distance matrix. One symptom of this may be the program
reporting that two distances which are expected to be equal are
unequal -- but then printing them both out, and they appear to be
equal! In this case it would print out a message warning you that
0.31766 was not equal to 0.31766. It is doing so because one of them
is actually seen by it as 0.31766127 and the other 0.31766. In all
future versions, there will be a blank printed between the two
numbers. For the present, use an editor to find them and insert the
blank by hand. If this is difficult, a Sed script (which can be used
on Linux or Unix machines) has been written by Doug Scofield, and is
available from him at: this link. Many thanks to him for this. As you
can see, this problem is the result of us not thinking of what happens
when the distances are big, and the fix in the code is trivial -- just
ensuring that there is at least one blank between successive
distances.
* Contml, with gene frequencies, has a bug in the transformation to
variables that have approximate Brownian motion as their evolutionary
process. This can lead to wierd trees. It might be preferable to go
back to the 3.5c version if you need to use Contml for this. We
believe that this will be correctly fixed in the 3.67 version. If
people can recompile the source code, they replace the function
transformgfs with this one and recompile (you should be able to save
it from your browser using the Save As choice in its File menu.
version 3.66 (August, 2006)
* Program Treedist was found to compute the Branch Score Distance
incorrectly. It will, in most cases, get the branch lengths in
terminal branches incorrect and then be likely to find a nonzero
distance between trees when they are really identical, and incorrect
distances when they are not identical. Alas, there is no workaround to
avoid this. All distances done with this option before version 3.66
should be regarded as incorrect unless all terminal branches have the
same length, or unless the order of species in the tree is the same as
in the first tree in the file. The Symmetric Difference option, which
does not use branch lengths, works properly.
* Program Dnamlk, when run on Linux or Windows systems, sometimes gave
negative branch lengths for some branches on the tree. This is bad.
Although we at first thought that this was a compiler bug, it seems to
be a lack of initialization of some pointers. Program Promlk may have
the same problem, as they share code. If you have this problem you can
work around it by not using the Global menu option when running Dnamlk
(or Promlk). If you need more extensive tree search the J (Jumble)
option may be your best bet.
* On Windows (at least, on Windows xp), our executables for version
3.65
produce output files (outfile) and output tree files (outtree) that
have end-of-line characters that result in their being hard to read on
the Notepad editor. They appear as one big line. If you use the
Wordpad editor, or Microsoft Word itself, the files will be readable.
This is and end-of-line compiler setting we got wrong when compiling
the programs.
* Programs Dnaml and Proml sometimes failed to iterate branch lengths
in
trees enough -- this can result in them failing to find as good a tree
as the molecular clock versions Dnamlk and Promlk, a phenomenon that
is not supposed to occur. The problem results from the iteration code
in function makenewv giving up too easily when branch lengths are very
short. The resulting branches get "stuck" at length 0 when they should
not. If you can recompile the programs, the problem can be solved by
the following changes:
o In file phylip.h change the value of the constant iterations to
8 instead of 4.
o In files dnaml.c and proml.c, change function makenewv to
replace
done = fabs(y-yold) < epsilon;
by
done = fabs(y-yold) < 0.1*epsilon;
o In dnaml.c, in function makenewv, also replace*
if (yold < epsilon)
yold = epsilon;
by
if (y < epsilon)
y = epsilon;
We think these fix the problem. Some more thorough fixes are
implemented in the 3.66 code.
* The Mac OS X archives (in .dmg form) appeared at first sight not to
have any executables directory in the package. This is owing to
strange placement of icons once we package the files. The OS X
executables are there -- their folder is just way down the window. Use
the scroll bar to look for them. You should be able to use the
View/Rearrange menus to make the folder icons appear in a more
reasonable place. (Or this can be done once all of the contents of the
.dmg archive are copied out to another folder).
* Programs Dnaml and Proml (but not Dnamlk or Promlk), from version
3.64
on, crashed if the Categories (C) option is used, even if all
categories are given the same rate of change. This unpleasant behavior
does not occur if the menu option for "Speedier but rougher analysis"
is changed to "No, not rough". That slows down the run but allows it
to succeed.
The fix turns out to be that all instances in dnaml.c of calls to
function copynode (or all instances in proml.c of calls to
prot_copynode) that involve an argument lrsaves should have the third
argument be rcategs instead of categs.
* In Seqboot, when menu item J is set to Permute species within
characters it is impossible to change menu item W (character weights).
This is a glitch in the menuing code. If you can change the source
code and recompile, change at line 215 of seqboot.c:
((permute || ild || lockhart)
&& (strchr("ACDEFSJPRXNI%1.20",ch) != NULL)) ||
to be:
(permute && (strchr("ACDEFSJPRWXNI%1.20",ch) != NULL)) ||
((ild || lockhart) && (strchr("ACDEFSJPRXNI%1.20",ch) !=
NULL)) ||
If you are stuck with our executables and need this feature, you can
also work around it in the following devious way:
1. Set menu item J to some other setting where menu item W appears
in the menu, such as Bootstrap,
2. Change menu item W
3. Then change item J to Permute species within characters
4. Our Makefile for Unix had some problem finding some of the
X-windows libraries on Mac OS X systems on Intel Macs. This
prevented the compilation of Drawtree and Drawgram. You might
have had to use those two programs by using their PowerMac Mac
OS X executables. All the other programs did compile and run
correctly on Intel Macs.
version 3.65 (August, 2005)
* Protpars sometimes gave the result "0 trees found" or else simply
hung and did not complete its run. This was a bug. The program should
always get at least one tree -- if it does not, that is a bug and not
a judgement on your data, provided the data file is in our format!
* Proml and Restml, and maybe some others, seg-faulted when run on
enough multiple data sets, as in bootstrapping. If you have a version
that has this problem and can recompile the programs, here is a fix
for Proml and Restml. In function "inputdata", replace the lines
makeweights();
if ( firstset ) alloclrsaves();
else resetlrsaves();
by
if ( !firstset ) freelrsaves();
makeweights();
alloclrsaves();
and you can also eliminate the now-unnecessary function
"restlrsaves".
(Thanks to Jacques Rougemont for this).
version 3.64 (July, 2005)
* Treedist had trouble on Windows systems reading trees. This was due
to
problems with the ftell command on CygWin. It has been fixed by having
the files read as binary files.
* Trees with branch lengths compared using Treedist may have incorrect
distances when evaluated as unrooted trees, owing to miscalculation of
branch lengths for the bottommost branches.
* Runs of Seqboot on Mac OS X systems with gene frequencies data have
showed incorrect results -- wrong numbers of loci sampled, for
example. This is due to bad code generated by the Metrowerks
Codewarrior compiler when set to higher levels of optimization (our
source code is OK). We will recompile the program at a lower level of
optimization in the next bug-fixing release. If you can follow our
compiling instructions and have this compiler, you can produce a
correctly working executable. Alternatively you can use the gcc
compiler and use our Unix Makefile to recompile this program (by
typing "make seqboot"). This is quite easy to do and all Mac OS X
releases have the gcc compiler in them -- it only needs to be
installed.
* In runs of Proml, Dnaml or Restml with user trees, if one puts in a
user tree with an internal multifurcation and asks the program to re-
estimate the branch lengths for that tree, the branch lengths in only
two of the furcs will be re-estimated if they already have branch
lengths. This is due to a bug in the function "initrav" causing it to
fail to enter one or more of the subtrees. A workaround until the next
release is as follows: Use Retree to remove all branch lengths on the
tree. The tree's branch lengths will then all be re-estimated when it
is used as a user tree.
* The example output in the Treedist documentation gives distances
computed by version 3.62 or earlier, in which the tree distance is not
square-rooted.
version 3.63 (December, 2004)
* The DNA and protein likelihood programs could have problems with
underflow if very large numbers of sequences were analyzed. Underflow
protection code was needed to make this much less likely to happen.
* A number of programs had the problem that when M (multiple data set)
runs are done, if the data sets differ in the number of characters
from data set to data set, they only allocate enough memory for the
first data set, and then can crash on subsequent, larger, data sets.
For bootstrap and permutation runs this should not be a problem, but
for jackknife runs it might be. One work-around until we fixed this
was to move the data set with the most characters to the front, so
that enough space is allocated. The programs we think had this problem
are: Clique, Dnacomp, Proml, Promlk, Protdist, Dollop, Gendist, Pars,
Restml, and Restdist.
* When the Branch Score distances are computed in program Treedist, the
sum of squares of differences between branches was not square-rooted,
as the documentation web page says it is.
* Fitch and Contml may die when asked to do Jumbling, in some cases.
* Dnaml had inconsistencies in results when branch lengths of a user
tree were estimated, and when the same numbers were provided in the
user tree.
* Trees fed into Contrast could cause trouble if they contained
unifurcations (forks with only one descendant). The program did not
complain about this, as it should have.
* End-of-line characters in input files in certain cases caused trouble
in Mac OS X (for example when the files came over from Windows).
* When printing a rooted tree out in Kitsch, the root was not placed
intermediate between its two decsendants.
* The variable numtrees was sometimes used when still uninitialized in
Pars.
* Restdist had a site-aliasing bookkeeping bug that could lead to
incorrect results.
* Restml would not allow site lengths greater than 8, because an array
was of fixed size when it should have been dynamically allocated.
* The variable name howmany conflicts with predefined names in some
older Sun compilers. It will henceforth be deliberately misspelled to
avoid this.
* With larger data sets being analyzed, Proml, Promlk, Dnaml, and
Dnamlk have had to have underflow protection code installed, as
likelihoods were getting too small.
* Treedist was giving wrong answers when asked to compute all distances
between trees in two files that had unequal numbers of trees. This
was a bookkeeping error.
* The variable scanned was uninitialized in the Drawtree and Drawgram
programs, which could sometimes cause problems.
* The lack of initialization of a variable, delta in Dnadist meant that
different results could be obtained from interactive runs than were
obtained in runs under the control of a command file.
* Dnadist was sometimes stopping when encountering sequences that had
an infinite or indeterminate distance (i.e. when the sequences were
too different or when they had no sites in common), when it should
have printed out "-1" and continued. When it was supposed to print
"-1" in some recent versions of PHYLIP it printed "1.0000" instead.
version 3.62 (September, 2004)
* The ftp link used by our "Get Me PHYLIP" page to fetch the version
3.62 Linux gzip'ed sources and documentation archive was incorrect
until recently (I hadn't updated it to fetch version 3.62). If you had
trouble fetching this archive in version 3.62, please try one more
time. It will work now.
* A number of people have found, with Fitch and with Contml, that
version 3.61 crashes on multiple Jumbling (option J) or on bootstrap
runs. This is fairly serious. It does not happen with versions of
these programs earlier than 3.6 (such as 3.6a3 or 3.573c). This
release fixes these problems.
To generate a diff of this commit:
cvs rdiff -u -r1.22 -r1.23 pkgsrc/biology/phylip/Makefile
cvs rdiff -u -r1.6 -r1.7 pkgsrc/biology/phylip/PLIST
cvs rdiff -u -r1.5 -r1.6 pkgsrc/biology/phylip/distinfo
cvs rdiff -u -r1.2 -r1.3 pkgsrc/biology/phylip/patches/patch-aa
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Home |
Main Index |
Thread Index |
Old Index