Putting it that way reveals, perhaps, part of the problem. There is no particular reason why the code that reads data from the wire data stream (usually but not necessarily TCP) has to know where packet boundaries are. As long as every packet boundary falls at a place such that the data->encryption->decryption->data pipeline can deliver the whole packet without needing more input data (which is the point of padding to a multiple of the blocksize), the protocol can work. This does mean that implementations which expect to issue wire data stream reads for exactly the right amount of data early in packet processing will be constrained in what encryption algorithms they implement, though as far as I know no algorithms which would make that a practical constraint are implemented for ssh (and my suggestion above is not one, since the encryption algorithm layer can provide the wire length early).
Hm. Yes, that's true. And it's even a good idea; the recent problems with encrypted packet lengths would be much less serious if reading and decrypting the TCP stream were decoupled from processing the SSH packet stream. Of course, that introduces some interesting buffering requirements to allow a NEWKEYS message to result in a pipeline flush, but it shouldn't be unreasonable.
Hrm, except the MAC is unencrypted, so they can't be completely decoupled. That doesn't make it unworkable, though, provided one defines an appropriate interface to the encryption code.
Yes, there are doubtless implementations which assume that encrypted and cleartext are the same size. They will need internal rework if they are to implement any of this stuff, but I don't see that as a reason to twist the design uncomfortably; it just means implementing it for them, if done at all, will be more work.
Yes, that's probably true.Interestingly, if we can agree that using plaintext lengths does _not_ require changing the way the size of the padding field is determined, then it becomes possible for an encryption algorithm to use plaintext lengths without changing the base protocol, the modularity argument goes away, and the need to enable negotiation of cleartext lengths independently of the encryption algorithm becomes less pressing (but possibly still desirable). It does mean that if we do both, we need to point out that implementations which support both such an algorithm and algorithm-independnet cleartext lengths not inadvertently send both the length and the _next_ 4 bytes in plaintext. :-)