tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: libcodecs(3), take 4
Usually libraries don't print error messages to stderr, nor exit.
I don't like regex usages; it increases failure points by doing memory
allocation. Normalization should be done in programs, not in libraries.
You keep confusing us to add "charset" functionality (leet this time)
to libcodes(3), ignoring iconv(3).
I hope this library becomes more modular.
OTOH I'm not convinced that the add operation's "usefulness".
(I'd be nice if you consider to spend some more time to research existing
libraries (especially Citrus iconv and libarchive), and listen to opinions
of "library gurus"...)
On Sat, Oct 02, 2010 at 05:11:59AM +0200, Alistair Crooks wrote:
> http://www.netbsd.org/~agc/codecs-20101001.tar.gz
>
> I think I've addressed most of the issues that were brought up, and
> soda-san said he was happy now the contentious charset transformations
> have been removed.
>
> In particular, the following changes have been made:
>
> + fixed memory mangement problems
> + then reactivated the "maybe constant" multiplier regexp
> + and activate the "free transformations on exit" functionality
> + fixup bin2hex transformation
> + add test for resolve and reverseresolve
> + manual page improvements
> + make codecs_valid_op() return a bool
> + document codecs_valid_op()
> + redid the way installed codecs are listed
> + add an unhexdump() transformation
> + add an example leet-speak conversion transformation
> + add a HOWTO
>
> I've done some more development on libcodecs(3), and have attached
> leet.c, a use of libcodecs(3) which shows how to add a transformation
> to the "leet" character set. Based on the wikipedia charset, probably
> not what you'd expect. There's also a HOWTO attached to this mail
> which should explain things a bit more. If it doesn't, please don't
> hesitate to complain.
>
> I'm aiming to add this to the repo at the start of next week.
>
> Regards,
> Alistair
>
> PS. To answer Yamamoto-san's point that he couldn't see any use
> cases, there are lots in the standard transformations in libcodecs(3).
> At the same time, there is a case for getting rid of the following
> programs from base:
>
> asa
> uuencode
> uudecode
> perhaps vis (with some more work)
> perhaps the digest programs
> all the od functionality that I use (with hexdump) and more (via unhexdump)
>
> as well as adding base64 and base85 encoding/decoding for free, and
> haviong command line access to decent randomisation and zero'd areas
> without resorting to using dd.
> 1. libcodecs(3)
>
> libcodecs(3) is a library which provides a single framework and
> interface for functions which carry out a transformation on input data
> to produce data as output.
>
> Standard transformations are provided, ranging from binary to hex and
> hexdump and undump functions, to gzip, bzip2 compression, network
> address resolution and reverse resolution, hash functions, message
> digests, and many more.
>
> This document shows how to add a new transformation, and how to use
> the transformation in code.
>
>
> 2. writing a transformation
>
> The signature for a transformation is as follows:
>
> int transform(const char *in, const size_t insize, const char *op, void *vp,
> size_t outsize);
>
> bounded input data is provided as (in, insize), the operation to
> perform is given in "op", and the transformation will be made on the
> input data to give output in "vp". The number of characters in the
> output is returned from the function.
>
> For an example of such a function, please see the leet() function
> in leet.c:
>
> /* convert alphabetic chars to the leet char set -- see above */
> int
> toleet(const char *in, const size_t insize, const char *op, void *vp,
> size_t outsize)
> {
> const char *cp;
> size_t i;
> size_t o;
> char *out = (char *)vp;
>
> for (i = 0, o = 0 ; i < insize && o < outsize - 1 ; i++) {
> if (isalpha((uint8_t)in[i])) {
> cp = leet[tolower((uint8_t)in[i]) - 'a'];
> (void) memcpy(&out[o], cp, strlen(cp));
> o += strlen(cp);
> } else {
> out[o++] = in[i];
> }
> }
> out[o] = 0x0;
> return (int)o;
> }
>
>
> 3. Adding the Transformation
>
> The libcodecs(3) library can be instantiated many times, by using
> separate codecs_t tables to hold the transformations which can be
> made.
>
> So adding a transformation to a table is as simple as initialising the
> storage for the table, and adding the desired transformation
> function(s).
>
> codecs_t codecs;
>
> (void) memset(&codecs, 0x0, sizeof(codecs));
> codecs_add(&codecs, "leet", toleet, "500%", 1);
>
> The codecs_add() function is used to make the transformation available.
>
> The first argument is the table of transformations. Multiple tables
> can be used.
>
> The second argument is a regular expression which is used to match the
> transformation (this is useful for cases where more than one
> transformation function is available).
>
> The third argument is the transformation function itself. This function
> will get called when the transformation framework matches the regular
> expression.
>
> The fourth argument is used to allocate the space for dynamically
> allocated storage in the codecs_alloc_transform() function. This is
> in the format of "percentage + constant", where percentage is the
> worst case of multiple of the amount of input data needed, and the
> constant is an additional number of bytes.
>
> The fifth and final argument gives an indication whether input is
> needed to the transformation function. Some transformations just fill
> in output without needing any input to transform, such as randomize,
> or zero, which produce random data, and zeroed out data, respectively.
>
> 4. Making the Transformation
>
> To make the transformation, we need to use the codecs table to match
> up the correct transformation, and give it the data. The simplest way
> to do this is to transform the data in-place.
>
> cc = codecs_inplace_transform(&codecs, buf, strlen(buf), "leet");
>
> This is not always possible, since sometimes the input needs to be
> preserved. If this is the case, then the storage for the output can
> be allocated dynamically.
>
> cc = codecs_alloc_transform(&codecs, buf, strlen(buf),
> "leet", (void **)(void *)&out, &outsize);
>
> (sorry about the ugly casts, please blame^U)
>
> Alistair Crooks
> Fri Oct 1 07:01:26 PDT 2010
> /*-
> * Copyright (c) 2010 Alistair Crooks <agc%NetBSD.org@localhost>
> * All rights reserved.
> *
> * Redistribution and use in source and binary forms, with or without
> * modification, are permitted provided that the following conditions
> * are met:
> * 1. Redistributions of source code must retain the above copyright
> * notice, this list of conditions and the following disclaimer.
> * 2. Redistributions in binary form must reproduce the above copyright
> * notice, this list of conditions and the following disclaimer in the
> * documentation and/or other materials provided with the distribution.
> *
> * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
> * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
> * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
> * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
> * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
> * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> */
> #include <codecs.h>
> #include <ctype.h>
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> static const char *leet[] = {
> "4",
> "8",
> "(",
> "|)",
> "3",
> "|=",
> "6",
> "|-|",
> "!",
> "_|",
> "X",
> "1",
> "/\\/\\",
> "|\\|",
> "0",
> "|*",
> "0_",
> "|2",
> "5",
> "7",
> "|_|",
> "\\/",
> "\\/\\/",
> "%",
> "j",
> "2"
> };
>
> /* convert alphabetic chars to the leet char set -- see above */
> int
> toleet(const char *in, const size_t insize, const char *op, void *vp, size_t
> outsize)
> {
> const char *cp;
> size_t i;
> size_t o;
> char *out = (char *)vp;
>
> for (i = 0, o = 0 ; i < insize && o < outsize - 1 ; i++) {
> if (isalpha((uint8_t)in[i])) {
> cp = leet[tolower((uint8_t)in[i]) - 'a'];
> (void) memcpy(&out[o], cp, strlen(cp));
> o += strlen(cp);
> } else {
> out[o++] = in[i];
> }
> }
> out[o] = 0x0;
> return (int)o;
> }
>
> int
> main(int argc, char **argv)
> {
> codecs_t codecs;
> char buf[BUFSIZ];
> int cc;
>
> (void) memset(&codecs, 0x0, sizeof(codecs));
> codecs_add(&codecs, "leet", toleet, "500%", 1);
> for (;;) {
> (void) fprintf(stderr, "Leet> ");
> /* thanks, yes, i know, this is superfluous */
> (void) fflush(stderr);
> if (fgets(buf, sizeof(buf), stdin) == NULL) {
> break;
> }
> cc = codecs_inplace_transform(&codecs, buf, strlen(buf),
> "leet");
> if (cc <= 0) {
> break;
> }
> printf("%s", buf);
> }
> exit(EXIT_SUCCESS);
> }
> LIBCODECS(3) NetBSD Library Functions Manual LIBCODECS(3)
>
> NNAAMMEE
> lliibbccooddeeccss -- string coding and decoding functions for
> transforming data
>
> LLIIBBRRAARRYY
> library ``libcodecs''
>
> SSYYNNOOPPSSIISS
> ##iinncclluuddee <<ccooddeeccss..hh>>
>
> _i_n_t
>
> ccooddeeccss__ttrraannssffoorrmm(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_i_n,
> _c_o_n_s_t _s_i_z_e___t _i_n_s_i_z_e,
> _c_o_n_s_t _c_h_a_r _*_o_p_e_r_a_t_i_o_n,
> _v_o_i_d _*_o_u_t, _s_i_z_e___t _o_u_t_s_i_z_e);
>
> _i_n_t
>
> ccooddeeccss__aalllloocc__ttrraannssffoorrmm(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_i_n,
> _c_o_n_s_t _s_i_z_e___t _i_n_s_i_z_e,
> _c_o_n_s_t _c_h_a_r _*_o_p_e_r_a_t_i_o_n, _v_o_i_d
> _*_*_o_u_t_p,
> _s_i_z_e___t _*_o_u_t_s_i_z_e);
>
> _i_n_t
>
> ccooddeeccss__iinnppllaaccee__ttrraannssffoorrmm(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _v_o_i_d _*_i_n_p_u_t, _i_n_t
> _s_i_z_e,
> _c_o_n_s_t _c_h_a_r _*_o_p_e_r_a_t_i_o_n);
>
> _i_n_t
> ccooddeeccss__ssiizzee(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r
> _*_o_p_e_r_a_t_i_o_n,
> _c_o_n_s_t _s_i_z_e___t _i_n_s_i_z_e);
>
> _i_n_t
> ccooddeeccss__vvaalliidd__oopp(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_o_p);
>
> _b_o_o_l
>
> ccooddeeccss__iinnppuutt__nneeeeddeedd(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r
> _*_o_p_e_r_a_t_i_o_n);
>
> _i_n_t
> ccooddeeccss__bbeeggiinn(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_s_u_b_s_e_t,
> _._._.);
>
> _i_n_t
> ccooddeeccss__lloocckkddoowwnn(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s);
>
> _i_n_t
> ccooddeeccss__aadddd(_c_o_d_e_c_s___t
> _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r
> _*_o_p_e_r_a_t_i_o_n,
> _i_n_t _(_*_)_(_c_o_n_s_t _c_h_a_r _*_,
> _c_o_n_s_t _s_i_z_e___t_, _c_o_n_s_t _c_h_a_r _*_,
> _v_o_i_d _*_, _s_i_z_e___t_),
> _c_o_n_s_t _c_h_a_r _*_m_u_l_t_i_p_l_i_e_r,
> _c_o_n_s_t _b_o_o_l _i_n_p_u_t___n_e_e_d_e_d);
>
> DDEESSCCRRIIPPTTIIOONN
> lliibbccooddeeccss is a library interface which implements
> various transformations
> from input data to output data. Text is transformed by the
> lliibbccooddeeccss
> library, converting the input to the output format. New transformations
> can be added to the table. The table can also be locked to prevent fur-
> ther transformations being added. A lot of these transformations are
> available at the system level already. However,
> lliibbccooddeeccss provides a
> single, consistent interface to the transformations, in a way that is
> easy to provide as an interface for scripting languages and from the
> shell.
>
> The basic way of using the lliibbccooddeeccss library is to
> call the
> ccooddeeccss__ttrraannssffoorrmm() function to transform
> the text. Two alternate func-
> tions are provided,
> ccooddeecc__aalllloocc__ttrraannssffoorrmm() which will
> dynamically allo-
> cate the space for the output array using calloc(3). In-place transfor-
> mations can be made using the
> ccooddeeccss__iinnppllaaccee__ttrraannssffoorrmm()
> function. An
> ``in-place'' transformation means that the transformation will be done
> using temporary storage which is allocated, and then the transformed text
> will be copied over the original input, thereby making the operation
> appear to have transformed the text in situ.
>
> The transformation table holding information on all the possible trans-
> formations can be initialised using the
> ccooddeeccss__bbeeggiinn() function. The
> function can be used to limit the transformations which get loaded into
> the transformation table. At the present time, the following subsets of
> transformations are defined:
>
> all will load all the following subsets of transformations
>
> charset will load all the transformations relating to character sets,
> including base64 and base85, EBCDIC, RAD50, etc.
>
> digest will load all the transformations relating to message digests,
> including md5, sha1, etc
>
> fill will load all the transformations relating to region fill,
> including zero and randomise
>
> format will load all the transformations relating to formatting of out-
> put, such as hexadecimal dumping, rotation, etc
>
> edit will load all the transformations relating to editing of output,
> such as sed and edit functionality
>
> hash will load all the transformations relating to 32bit hashing.
>
> network will load all the transformations relating to network name reso-
> lution
>
> It is not necessary to call this function prior to using any of the func-
> tionality in the lliibbccooddeeccss library -- if the table has
> not been ini-
> tialised by the time of the first call, then it will be called automati-
> cally.
>
> The internal transformation information carries information on the worst-
> case size of the output array. This size can be calculated using the
> ccooddeeccss__ssiizzee() function, passing into the function
> the size of the input
> buffer. The ccooddeeccss__iinnppuutt__nneeeeddeedd()
> function will return an indication
> whether an input buffer is needed. Please note that an input buffer is
> needed for the
> ccooddeeccss__iinnppllaaccee__ttrraannssffoorrmm()
> transformation call. The
> ccooddeeccss__vvaalliidd__oopp() function is used to
> verify that the current operation
> is a known transformation.
>
> The idea behind the lliibbccooddeeccss library is that
> individual transformations
> are defined by a C function with a pre-set calling signature. This can
> be a wrapper around existing functionality, like the digest or strvis(3)
> transformations, or user provided. This transformation is added to the
> table of transformations using the ccooddeeccss__aadddd()
> function. Some pre-
> defined transformations are provided, as explained below. The caller can
> then invoke the transformation in one of three ways:
>
> codecs_transform
> by providing input data, and an area for the output of the
> transformation to be placed.
>
> codecs_alloc_transform
> by providing input data, the area containing the output will be
> dynamically allocated using calloc(3)
>
> codecs_inplace_transform
> in which the transformation will be made, and the output data
> will be copied in place over the input data.
>
> There are a number of pre-defined transformations provided:
>
> asa [format] perform Fortran control character transformations
> in the form of the POSIX asa(1) command.
>
> base64decode
> [charset] perform atob, or base64, decoding. Each sequence
> of 4 bytes is transformed back into a 3 byte sequence.
>
> base64encode
> [charset] perform atob, or base64, encoding. Each sequence
> of 3 bytes is transformed into a 4 byte sequence from the
> pre-defined 64-byte set.
>
> base85decode
> [charset] perform base85 decoding. Each sequence of 5 bytes
> is transformed back into a 4 byte sequence.
>
> base85encode
> [charset] perform base85 encoding. Each sequence of 4 bytes
> is transformed into a 5 byte sequence from the pre-defined
> 85-byte set.
>
> bin2hex [charset] encodes the input string as 4-character C-string
> style hexadecimal constants.
>
> bswap16 [format] perform a bytewise swap of the 16-bit entity
>
> bswap32 [format] perform a bytewise swap of the 32-bit entity
>
> bswap64 [format] perform a bytewise swap of the 64-bit entity
>
> dos2unix [format] DOS style line-endings are transformed into Unix
> style line-endings.
>
> edit [edit] edit the input text with the ``EDITOR'' or ``VISUAL''
> editor, as defined in the environment.
>
> from-uri [charset] convert from a percent-encoded URI to ASCII text.
>
> full-uuencode
> [charset] convert the given text into uuencoded text (see
> also the uuencode and uudecode transforms), adding a file
> header and trailer.
>
> gethostinfo [network] attempt to reverse resolve the hostname, given the
> IP address (either IPv4 or IPv6) as input.
>
> getipaddress
> [network] attempt to resolve the IP address (both IPv4 and
> IPv6) given the hostname as input.
>
> gunzip [compress] decompress the input buffer using zlib(3)
>
> gzip [compress] compress the input buffer using zlib(3)
>
> hex2bin [charset] decodes the input string from 4-character C-string
> style hexadecimal constants to binary output.
>
> hexdump [format] converts the input text to an ASCII-clean hexadeci-
> mal dump format, including a printable representation of the
> input text.
>
> list [fill] lists the available codecs in the current instance.
>
> md5 [digest] calculate the MD5 digest using MD5_Data(3)
>
> metaphone [charset] calculate the metaphone phonetic value for the
> input.
>
> rad50decode [charset] converts the input text from DEC RADIX-50 format
> to the original text. Due to the limited range of the
> RADIX-50 character set, some of the original text may have
> been lost.
>
> rad50encode [charset] converts the input text to DEC RADIX-50 format
> from the original text. Due to the limited range of the
> RADIX-50 character set, some of the original text may have
> been lost.
>
> randomise [fill] fill the output with random values.
>
> rmd160 [digest] calculate the RMD160 digest using RMD160_Data(3)
>
> rot [format] transform the input text with a circular rotation.
> The most famous of these is the Caesar rot13(6) transforma-
> tion, but this transformation allows any length of rotation
> to be used.
>
> secs2str [format] transforms the input value (as the ASCII-encoded
> decimal value of seconds since the start of the epoch) to a
> colon-separated value representing the date.
>
> sed [edit] performs a sed(1) transformation on a regular expres-
> sion. Please note that full, extended regular expressions,
> as defined in re_format(7) are used to match.
>
> size [digest] returns the size of the input as a decimal string
>
> sha1 [digest] calculate the SHA1 digest using SHA1Data(3)
>
> sha256 [digest] calculate the SHA256 digest using SHA256_Data(3)
>
> sha512 [digest] calculate the SHA512 digest using SHA512_Data(3)
>
> soundex [charset] calculate the soundex phonetic value for the
> input.
>
> str2secs [format] transforms the input value (as the colon-separated
> value representing the date) to an ASCII-encoded decimal
> value representing seconds since the start of the epoch.
>
> strunvis [charset] uses the unstrvis(3) transformation on the input
> data.
>
> strvis [charset] uses the strvis(3) transformation on the input
> data.
>
> strvisc [charset] uses the strvisc(3) transformation on the input
> data.
>
> substring [edit] extract a substring of the input string, and place it
> in the output string.
>
> to-uri [charset] convert from a percent-encoded URI to ASCII text.
>
> to-lower [charset] change any uppercase letters in the input string
> to lowercase.
>
> to-upper [charset] change any lowercase letters in the input string
> to uppercase.
>
> unhexdump [format] converts the input text from the ASCII-clean hexa-
> decimal dump format, created by the hexdump transformation,
> back to its original binary form.
>
> unix2dos [charset] the Unix-style line-endings are converted to DOS
> style line-endings.
>
> uudecode [charset] transform the input text from uudecode(1) text to
> the original text.
>
> uuencode [charset] encode the input text as uuencode(1) text.
>
> zero [fill] produce an area containing NUL bytes in the output.
>
> A number of hash functions have also been implemented, namely:
>
> dumbhash [hash] implements a simple hashing scheme based on the
> addition of the value of each character in the string.
>
> dumbmulhash [hash] implements a simple hashing scheme based on the
> addition of the value of each character in the string mul-
> tiplied by its position in the string.
>
> lennart [hash] implements a simple and fast generic string hasher
> based on Peter K. Pearson's article in CACM 33-6, pp. 677.
>
> crchash [hash] implements a hash used in CRC calculations
>
> perlhash [hash] implements the addition-based hash algorithm used
> internally in the perl interpreter.
>
> perlxorhash [hash] implements the XOR-based hash algorithm used inter-
> nally in the perl interpreter.
>
> pythonhash [hash] implements the hash algorithm used internally in
> the python interpreter.
>
> mousehash [hash] implements an XOR-based hash algorithm from der
> Mouse.
>
> bernstein [hash] implements a multiplicative-based hash algorithm
> from Daniel Bernstein.
>
> honeyman [hash] implements an XOR-based hash algorithm from Peter
> Honeyman.
>
> pjwhash [hash] implements the so called `hashpjw' function by P.J.
> Weinberger from Aho/Sethi/Ullman, COMPILERS: Principles,
> Techniques and Tools, 1986, 1987 Bell Telephone Laborato-
> ries, Inc.
>
> bobhash [hash] implements another, more complex hash algorithm.
>
> torekhash [hash] implements a hash algorithm due to Chris Torek, and
> using Duff's device.
>
> byacchash [hash] implements the hash function found in Berkeley
> byacc(1) program
>
> tclhash [hash] implements the hash algorithm used internally in
> the tcl interpreter.
>
> gawkhash [hash] implements the hash algorithm used internally in
> the gawk interpreter, also using Duff's device.
>
> gcc3_hash [hash] implements one of the hash algorithms found in gcc3
>
> gcc3_hash2 [hash] implements another of the hash algorithms found in
> gcc3
>
> nemhash [hash] implements another hash function
>
> RREETTUURRNN VVAALLUUEESS
> On a successful transformation, the
> ccooddeeccss__ttrraannssffoorrmm()
> ccooddeecc__aalllloocc__ttrraannssffoorrmm() and
> ccooddeeccss__iinnppllaaccee__ttrraannssffoorrmm()
> functions return
> the actual number of bytes in the output transformation. On a successful
> initialisation, ccooddeeccss__bbeeggiinn() will return a
> value of 1. The
> ccooddeeccss__ssiizzee() function returns the number of bytes
> which will be needed
> to contain the given transformation with the given size of input bytes.
>
> SSEEEE AALLSSOO
> asa(1), sed(1), uudecode(1), uuencode(1), calloc(3), MD5Data(3),
> RMD160Data(3), SHA1Data(3), SHA256_Data(3), SHA512_Data(3), strvis(3),
> strvisc(3), unstrvis(3), zlib(3), rot13(6), re_format(7)
>
> HHIISSTTOORRYY
> The lliibbccooddeeccss library first appeared in NetBSD 6.0.
>
> AAUUTTHHOORRSS
> Alistair Crooks <agc%NetBSD.org@localhost>
>
> NetBSD 5.0 September 30, 2010 NetBSD 5.0
--
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
Home |
Main Index |
Thread Index |
Old Index