pkgsrc-Changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

CVS commit: pkgsrc/textproc/p5-Text-Unidecode



Module Name:    pkgsrc
Committed By:   wen
Date:           Thu Feb 18 03:38:36 UTC 2016

Modified Files:
        pkgsrc/textproc/p5-Text-Unidecode: Makefile distinfo

Log Message:
Update to 1.27

Upstream changes:
2015-10-21   Sean M. Burke  sburke%cpan.org@localhost
        * RELEASE 1.27.  (Stable.)
        The release, 1.25_01, didn't blow up, so this is just
        a re-release of it as a normal ("stable") version.
        * Minor changes to the documentation.  Nothing substantial.
        * Release 1.26 had a confusing mistake in the ChangeLog.
        Ignore v1.26.

2015-10-21   Sean M. Burke  sburke%cpan.org@localhost
        * RELEASE 1.26.  Mistake.  See above for change notes
        between v1.25_01 and v1.27.

2015-10-16   Sean M. Burke  sburke%cpan.org@localhost
        * RELEASE 1.25_01.
        * !DEVELOPER RELEASE!, OH GOD HELP US ALL!

        * Here's a new thing that makes me nervous and hesitant, and that I've
        been talking myself into for weeks:

          **************************************************************
          *  I've switched to accepting values in the range 0x80-0x9F  *
          *  as if they are the Windows-1252 ("ANSI") characters.      *
          **************************************************************

        Previously they had all mapped to emptystring.

        Technically, Unicode specifies those codepoints as control characters
        that I've never heard of, "C1 Controls"...
          ...
          U+0087 ESA - End of Selected Area
          U+0088 HTS - Character (Horizontal) Tabulation Set
          U+0089 HTJ - Character (Horizontal) Tabulation with Justification
          ...
        ( See "C1" in https://en.wikipedia.org/wiki/C0_and_C1_control_codes )

        And Unidecode mapped all of those to emptystring.  Now they are treated
        as if you fed the Windows-1252 characters, as that is an extremely
        common thing to have happen.

        So if you feed character value 0x80 to it, it is taken to mean "��"
        (which Unidecode then decodes as "EUR", at the moment at least).
        (This doesn't interfere with the fact that U+20AC is the proper
        Unicode place for the "��" to be found.)

        And the smartquotes at 0x91 to 0x94, �� �� �� �� turn into ' ' " " so yaaaay!

        Note that in theory, according to C1 Controls, 0x85 is "NEL: Next
        Line", "Equivalent to CR+LF. Used to mark end-of-line on some IBM
        mainframes."
        I could map this to \n or \r\n or whatever, but I've never seen 0x85 in
        use in the wild, and I never heard anyone complain about my not having
        mapped it to "\n" in all the Unidecode versions since the first, in 2001.
        So instead, Unidecode takes 0x85 as its Windows-1252 value, the
        ellipsis "��" which of course it Unidecodes as "..."

        I'm not thrilled with the idea of going off spec but I think this
        should be okay, and it has massive DWIM value.
        Let's hope I'm not dividing Unicode times infinity by zero and then the
        whole universe will disa

        That's why I'm making this a developer release.  Unless anything
        besplodes by November 1st, I'll re-issue this as a stable release.


To generate a diff of this commit:
cvs rdiff -u -r1.14 -r1.15 pkgsrc/textproc/p5-Text-Unidecode/Makefile
cvs rdiff -u -r1.6 -r1.7 pkgsrc/textproc/p5-Text-Unidecode/distinfo

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.




Home | Main Index | Thread Index | Old Index