On Jun 20, 2008, at 5:11 PM, Aleksey Cheusov wrote:
Since you want to quote the standards, look at the table for characters not specifically listed and you find for \c:A backslash character followed by any character not described in this table or in the table in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\\' , '\a', '\b', \f', '\n', '\r', '\t', '\v'). Meaning: UndefinedThis remark is for regular expressions, not for constant strings. Regexps are different case. And my patch concerns lexical analisis only.
Yes, but when reading the rationale section they specifically mention the characters accepted via \ escapes has varied over time and they simply put down the set which has to be supported. They then go on to talk about it's overall relation to C and specifically excluding the \x style of escapes for hex sequences.
Based on that (plus the regex section) I still think anything beyond the the base escapes is undefined territory and even your examples show different awks do different things.
James
P.S. I understand that suggested change is rather radical. And in truth to tell I don't believe you apply this patch :-) Though, I think this whould be a right decision. -- Best regards, Aleksey Cheusov.