tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
libcodecs(3), take 2
Many thanks to everyone for the feedback, both on and off list.
I've taken everything on board, and made the following changes:
peer review changes
+ name change to libcodecs(3) and codecs(1)
+ name changes for some external functions
+ type of output array in allocated space codec changed
to (arguably more correct) void **
other changes
+ autoconf glue
+ c++ guard in codecs.h
+ new bz2 compression added
+ 64bit original size in gzip compression now in network-order
+ added endianness runtime indication -- I'm still in 2 minds
about this, and may rip it out again
+ various bugs fixed
The new archive is in
http://www.netbsd.org/~agc/codecs-20100920.tar.gz
and I've attached the libcodecs(3) library man page.
Once again, all feedback gratefully received.
Many thanks,
Alistair
LIBCODECS(3) NetBSD Library Functions Manual LIBCODECS(3)
NNAAMMEE
lliibbccooddeeccss -- string coding and decoding functions for
transforming data
LLIIBBRRAARRYY
library ``libcodecs''
SSYYNNOOPPSSIISS
##iinncclluuddee <<ccooddeeccss..hh>>
_i_n_t
ccooddeeccss__ttrraannssffoorrmm(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_i_n, _c_o_n_s_t
_s_i_z_e___t _i_n_s_i_z_e,
_c_o_n_s_t _c_h_a_r _*_o_p_e_r_a_t_i_o_n,
_v_o_i_d _*_o_u_t, _s_i_z_e___t _o_u_t_s_i_z_e);
_i_n_t
ccooddeeccss__aalllloocc__ttrraannssffoorrmm(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_i_n,
_c_o_n_s_t _s_i_z_e___t _i_n_s_i_z_e, _c_o_n_s_t
_c_h_a_r _*_o_p_e_r_a_t_i_o_n, _v_o_i_d _*_*_o_u_t_p,
_s_i_z_e___t _*_o_u_t_s_i_z_e);
_i_n_t
ccooddeeccss__iinnppllaaccee__ttrraannssffoorrmm(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s, _v_o_i_d _*_i_n_p_u_t, _i_n_t _s_i_z_e,
_c_o_n_s_t _c_h_a_r _*_o_p_e_r_a_t_i_o_n);
_i_n_t
ccooddeeccss__ssiizzee(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r
_*_o_p_e_r_a_t_i_o_n,
_c_o_n_s_t _u_n_s_i_g_n_e_d _i_n_s_i_z_e);
_i_n_t
ccooddeeccss__iinnppuutt__nneeeeddeedd(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r
_*_o_p_e_r_a_t_i_o_n);
_i_n_t
ccooddeeccss__bbeeggiinn(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_s_u_b_s_e_t,
_._._.);
_i_n_t
ccooddeeccss__lloocckkddoowwnn(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s);
_i_n_t
ccooddeeccss__aadddd(_c_o_d_e_c_s___t
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r
_*_o_p_e_r_a_t_i_o_n,
_i_n_t _(_*_)_(_c_o_n_s_t _c_h_a_r _*_,
_c_o_n_s_t _s_i_z_e___t_, _c_o_n_s_t _c_h_a_r _*_,
_v_o_i_d _*_, _s_i_z_e___t_),
_c_o_n_s_t _c_h_a_r _*_m_u_l_t_i_p_l_i_e_r,
_c_o_n_s_t _i_n_t _i_n_p_u_t___n_e_e_d_e_d);
DDEESSCCRRIIPPTTIIOONN
lliibbccooddeeccss is a library interface which implements
various transformations
from input data to output data. Text is transformed by the
lliibbccooddeeccss
library, converting the input to the output format. New transformations
can be added to the table. The table can also be locked to prevent fur-
ther transformations being added. A lot of these transformations are
available at the system level already. However,
lliibbccooddeeccss provides a
single, consistent interface to the transformations, in a way that is
easy to provide as an interface for scripting languages and from the
shell.
The basic way of using the lliibbccooddeeccss library is to call
the ccooddeecc() func-
tion to transform the text. Two alternate functions are provided,
aaccooddeecc() which will dynamically allocate the space for the
output array
using calloc(3). In-place transformations can be made using the
iippccooddeecc() function. An ``in-place'' transformation means
that the trans-
formation will be done using temporary storage which is allocated, and
then the transformed text will be copied over the original input, thereby
making the operation appear to have transformed the text in situ.
The transformation table holding information on all the possible trans-
formations can be initialised using the
ccooddeeccss__bbeeggiinn() function. The
function can be used to limit the transformations which get loaded into
the transformation table. At the present time, the following subsets of
transformations are defined:
all will load all the following subsets of transformations
charset will load all the transformations relating to character sets,
including base64 and base85, EBCDIC, RAD50, etc.
digest will load all the transformations relating to message digests,
including md5, sha1, etc
fill will load all the transformations relating to region fill,
including zero and randomise
format will load all the transformations relating to formatting of out-
put, such as hexadecimal dumping, rotation, etc
edit will load all the transformations relating to editing of output,
such as sed and edit functionality
hash will load all the transformations relating to 32bit hashing.
network will load all the transformations relating to network name reso-
lution
It is not necessary to call this function prior to using any of the func-
tionality in the lliibbccooddeeccss library -- if the table has
not been ini-
tialised by the time of the first call, then it will be called automati-
cally.
The internal transformation information carries information on the worst-
case size of the output array. This size can be calculated using the
ccooddeeccss__ssiizzee() function, passing into the function
the size of the input
buffer. The ccooddeeccss__iinnppuutt__nneeeeddeedd()
function will return an indication
whether an input buffer is needed. Please note that an input buffer is
needed for the iippccooddeecc() ``in-place'' transformation call.
The
ccooddeeccss__vvaalliidd__oopp() function is used to verify
that the current operation
is a known transformation.
There are a number of pre-defined transformations provided:
asa [format] perform Fortran control character transformations
in the form of the POSIX asa(1) command.
ascii2ebcdic
[charset] convert the input from ASCII character encodings
to EBCDIC character encodings.
base64decode
[charset] perform atob, or base64, decoding. Each sequence
of 4 bytes is transformed back into a 3 byte sequence.
base64encode
[charset] perform atob, or base64, encoding. Each sequence
of 3 bytes is transformed into a 4 byte sequence from the
pre-defined 64-byte set.
base85decode
[charset] perform base85 decoding. Each sequence of 5 bytes
is transformed back into a 4 byte sequence.
base85encode
[charset] perform base85 encoding. Each sequence of 4 bytes
is transformed into a 5 byte sequence from the pre-defined
85-byte set.
bin2hex [charset] encodes the input string as 4-character C-string
style hexadecimal constants.
bswap16 [format] perform a bytewise swap of the 16-bit entity
bswap32 [format] perform a bytewise swap of the 32-bit entity
bswap64 [format] perform a bytewise swap of the 64-bit entity
dos2unix [format] DOS style line-endings are transformed into Unix
style line-endings.
ebcdic2ascii
[charset] convert the input from EBCDIC character encodings
to ASCII character encodings.
edit [edit] edit the input text with the ``EDITOR'' or ``VISUAL''
editor, as defined in the environment.
from-uri [charset] convert from a percent-encoded URI to ASCII text.
full-uuencode
[charset] convert the given text into uuencoded text (see
also the uuencode and uudecode transforms), adding a file
header and trailer.
gethostinfo [network] attempt to resolve the hostname, given the IP
address (either IPv4 or IPv6) as input.
getipaddress
[network] attempt to reverse resolve the IP address (both
IPv4 and IPv6) given the hostname as input.
gunzip [compress] decompress the input buffer using zlib(3)
gzip [compress] compress the input buffer using zlib(3)
hex2bin [charset] decodes the input string from 4-character C-string
style hexadecimal constants to binary output.
hexdump [format] converts the input text to an ASCII-clean hexadeci-
mal dump format, including a printable representation of the
input text.
md5 [digest] calculate the MD5 digest using MD5_Data(3)
metaphone [charset] calculate the metaphone phonetic value for the
input.
rad50decode [charset] converts the input text from DEC RADIX-50 format
to the original text. Due to the limited range of the
RADIX-50 character set, some of the original text may have
been lost.
rad50encode [charset] converts the input text to DEC RADIX-50 format
from the original text. Due to the limited range of the
RADIX-50 character set, some of the original text may have
been lost.
randomise [fill] fill the output with random values.
rmd160 [digest] calculate the RMD160 digest using RMD160_Data(3)
rot [format] transform the input text with a circular rotation.
The most famous of these is the Caesar rot13(6) transforma-
tion, but this transformation allows any length of rotation
to be used.
secs2str [format] transforms the input value (as the ASCII-encoded
decimal value of seconds since the start of the epoch) to a
colon-separated value representing the date.
sed [edit] performs a sed(1) transformation on a regular expres-
sion. Please note that full, extended regular expressions,
as defined in re_format(7) are used to match.
size [digest] returns the size of the input as a decimal string
sha1 [digest] calculate the SHA1 digest using SHA1Data(3)
sha256 [digest] calculate the SHA256 digest using SHA256_Data(3)
sha512 [digest] calculate the SHA512 digest using SHA512_Data(3)
soundex [charset] calculate the soundex phonetic value for the
input.
str2secs [format] transforms the input value (as the colon-separated
value representing the date) to an ASCII-encoded decimal
value representing seconds since the start of the epoch.
strunvis [charset] uses the unstrvis(3) transformation on the input
data.
strvis [charset] uses the strvis(3) transformation on the input
data.
strvisc [charset] uses the strvisc(3) transformation on the input
data.
substring [edit] extract a substring of the input string, and place it
in the output string.
to-uri [charset] convert from a percent-encoded URI to ASCII text.
to-lower [charset] change any uppercase letters in the input string
to lowercase.
to-unicode [charset] translate to unicode-16 from UTF-8
to-upper [charset] change any lowercase letters in the input string
to uppercase.
to-utf8 [charset] translate from unicode-16 to UTF-8
unix2dos [charset] the Unix-style line-endings are converted to DOS
style line-endings.
uudecode [charset] transform the input text from uudecode(1) text to
the original text.
uuencode [charset] encode the input text as uuencode(1) text.
zero [fill] produce an area containing NUL bytes in the output.
A number of hash functions have also been implemented, namely:
dumbhash [hash] implements a simple hashing scheme based on the
addition of the value of each character in the string.
dumbmulhash [hash] implements a simple hashing scheme based on the
addition of the value of each character in the string mul-
tiplied by its position in the string.
lennart [hash] implements a simple and fast generic string hasher
based on Peter K. Pearson's article in CACM 33-6, pp. 677.
crchash [hash] implements a hash used in CRC calculations
perlhash [hash] implements the addition-based hash algorithm used
internally in the perl interpreter.
perlxorhash [hash] implements the XOR-based hash algorithm used inter-
nally in the perl interpreter.
pythonhash [hash] implements the hash algorithm used internally in
the python interpreter.
mousehash [hash] implements an XOR-based hash algorithm from der
Mouse.
bernstein [hash] implements a multiplicative-based hash algorithm
from Daniel Bernstein.
honeyman [hash] implements an XOR-based hash algorithm from Peter
Honeyman.
pjwhash [hash] implements the so called `hashpjw' function by P.J.
Weinberger from Aho/Sethi/Ullman, COMPILERS: Principles,
Techniques and Tools, 1986, 1987 Bell Telephone Laborato-
ries, Inc.
bobhash [hash] implements another, more complex hash algorithm.
torekhash [hash] implements a hash algorithm due to Chris Torek, and
using Duff's device.
byacchash [hash] implements the hash function found in Berkeley
byacc(1) program
tclhash [hash] implements the hash algorithm used internally in
the tcl interpreter.
gawkhash [hash] implements the hash algorithm used internally in
the gawk interpreter, also using Duff's device.
gcc3_hash [hash] implements one of the hash algorithms found in gcc3
gcc3_hash2 [hash] implements another of the hash algorithms found in
gcc3
nemhash [hash] implements another hash function
SSEEEE AALLSSOO
asa(1), sed(1), uudecode(1), uuencode(1), calloc(3), MD5Data(3),
RMD160Data(3), SHA1Data(3), SHA256_Data(3), SHA512_Data(3), strvis(3),
strvisc(3), unstrvis(3), zlib(3), rot13(6), re_format(7)
HHIISSTTOORRYY
The lliibbccooddeeccss library first appeared in NetBSD 6.0.
AAUUTTHHOORRSS
Alistair Crooks <agc%NetBSD.org@localhost>
NetBSD 5.0 September 18, 2010 NetBSD 5.0
Home |
Main Index |
Thread Index |
Old Index