Source-Changes-HG archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
[src/netbsd-9]: src/sys/arch/mips/cavium Pull up following revision(s) (reque...
details: https://anonhg.NetBSD.org/src/rev/9bb850d6ebd5
branches: netbsd-9
changeset: 1001770:9bb850d6ebd5
user: martin <martin%NetBSD.org@localhost>
date: Tue May 19 17:35:50 2020 +0000
description:
Pull up following revision(s) (requested by simonb in ticket #918):
sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.3
sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.4
sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.5
sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.6 (+ patch)
sys/arch/mips/cavium/dev/octeon_rnmreg.h: revision 1.2
sys/arch/mips/cavium/dev/octeon_rnmreg.h: revision 1.3
sys/arch/mips/cavium/octeonvar.h: revision 1.7
Add a few more bits.
XXX convert to __BITS.
--
If bus_space_map fails, just don't attach the driver instead of panicing.
Check RNG built in self test, don't attach if that fails too.
--
Oceton RNG/RNM driver modernisation to fit new entropy world order by
riastradh@, with some tweaks to get working in RNG mode.
XXX TODO: work out how to get raw entropy mode working.
--
Rework octeon_rnm(4) random number generator driver.
- Do a little on-line self-test for fun.
- Draw raw samples from the ring oscillators.
- Draw substantially more samples:
=3D> early RO samples seem to have considerably lower entropy
=3D> consecutive RO samples are not independent
- Make sure to use rnd_add_data_sync in the callback.
=3D> not technically needed in HEAD, but would be needed for pullup
--
Adjust entropy estimate for the Octeon.
We are hedging in serial and in parallel, and more conservative than
the Linux driver from Cavium seems to be, so although I don't know
exactly what the thermal jitter of the device is, this seems like a
reasonable compromise.
diffstat:
sys/arch/mips/cavium/dev/octeon_rnm.c | 373 +++++++++++++++++++++++-------
sys/arch/mips/cavium/dev/octeon_rnmreg.h | 22 +-
sys/arch/mips/cavium/octeonvar.h | 7 +-
3 files changed, 310 insertions(+), 92 deletions(-)
diffs (truncated from 530 to 300 lines):
diff -r 85959e95a901 -r 9bb850d6ebd5 sys/arch/mips/cavium/dev/octeon_rnm.c
--- a/sys/arch/mips/cavium/dev/octeon_rnm.c Tue May 19 16:24:38 2020 +0000
+++ b/sys/arch/mips/cavium/dev/octeon_rnm.c Tue May 19 17:35:50 2020 +0000
@@ -1,4 +1,4 @@
-/* $NetBSD: octeon_rnm.c,v 1.2 2019/01/08 19:41:09 jdolecek Exp $ */
+/* $NetBSD: octeon_rnm.c,v 1.2.4.1 2020/05/19 17:35:50 martin Exp $ */
/*
* Copyright (c) 2007 Internet Initiative Japan, Inc.
@@ -26,15 +26,86 @@
* SUCH DAMAGE.
*/
+/*
+ * Cavium Octeon Random Number Generator / Random Number Memory `RNM'
+ *
+ * The RNM unit consists of:
+ *
+ * 1. 128 ring oscillators
+ * 2. an LFSR/SHA-1 conditioner
+ * 3. a 512-byte FIFO
+ *
+ * When the unit is enabled, there are three modes of operation:
+ *
+ * (a) deterministic: the ring oscillators are disabled and the
+ * LFSR/SHA-1 conditioner operates on fixed inputs to give
+ * reproducible results for testing,
+ *
+ * (b) conditioned entropy: the ring oscillators are enabled and
+ * samples from them are fed through the LFSR/SHA-1
+ * conditioner before being put into the FIFO, and
+ *
+ * (c) raw entropy: the ring oscillators are enabled, and a group
+ * of eight of them selected at any one time is sampled and
+ * fed into the FIFO.
+ *
+ * Details:
+ *
+ * - The FIFO is refilled whenever we read out of it, either with
+ * a load address or an IOBDMA operation.
+ *
+ * - The conditioner takes 81 cycles to produce a 64-bit block of
+ * output in the FIFO whether in deterministic or conditioned
+ * entropy mode, each block consisting of the first 64 bits of a
+ * SHA-1 hash.
+ *
+ * - A group of eight ring oscillators take 8 cycles to produce a
+ * 64-bit block of output in the FIFO in raw entropy mode, each
+ * block consisting of eight consecutive samples from each RO in
+ * parallel.
+ *
+ * The first sample of each RO always seems to be zero. Further,
+ * consecutive samples from a single ring oscillator are not
+ * independent, so naive debiasing like a von Neumann extractor
+ * falls flat on its face. And parallel ring oscillators powered
+ * by the same source may not be independent either, if they end
+ * up locked.
+ *
+ * We read out one FIFO's worth of raw samples from groups of 8
+ * ring oscillators at a time, of 128 total, by going through them
+ * round robin. We take 32 consecutive samples from each ring
+ * oscillator in a group of 8 in parallel before we count one bit
+ * of entropy. To get 256 bits of entropy, we read 4Kbit of data
+ * from each of two 8-RO groups.
+ *
+ * We could use the on-board LFSR/SHA-1 conditioner like the Linux
+ * driver written by Cavium does, but it's not clear how many RO
+ * samples go into the conditioner, and our entropy pool is a
+ * perfectly good conditioner itself, so it seems there is little
+ * advantage -- other than expedience -- to using the LFSR/SHA-1
+ * conditioner. All the manual says is that it samples 125 of the
+ * 128 ROs. But the Cavium SHA-1 CPU instruction is advertised to
+ * have a latency of 100 cycles, so it seems implausible that much
+ * more than one sample from each RO could be squeezed in there.
+ *
+ * The hardware exposes only 64 bits of each SHA-1 hash, and the
+ * Linux driver uses 32 bits of that -- which, if treated as full
+ * entropy, would mean an assessment of 3.9 bits of RO samples to
+ * get 1 bit of entropy, whereas we take 256 bits of RO samples to
+ * get one bit of entropy, so this seems reasonably conservative.
+ *
+ * Reference: Cavium Networks OCTEON Plus CN50XX Hardware Reference
+ * Manual, CN50XX-HM-0.99E PRELIMINARY, July 2008.
+ */
+
#include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: octeon_rnm.c,v 1.2 2019/01/08 19:41:09 jdolecek Exp $");
+__KERNEL_RCSID(0, "$NetBSD: octeon_rnm.c,v 1.2.4.1 2020/05/19 17:35:50 martin Exp $");
#include <sys/param.h>
#include <sys/device.h>
-#include <sys/systm.h>
-#include <sys/sysctl.h>
#include <sys/kernel.h>
#include <sys/rndsource.h>
+#include <sys/systm.h>
#include <mips/locore.h>
#include <mips/cavium/include/iobusvar.h>
@@ -44,27 +115,31 @@
#include <sys/bus.h>
-#define RNG_DELAY_CLOCK 91
-#define RNG_DEF_BURST_COUNT 10
+//#define OCTEON_RNM_DEBUG
-int octeon_rnm_burst_count = RNG_DEF_BURST_COUNT;
+#define ENT_DELAY_CLOCK 8 /* cycles for each 64-bit RO sample batch */
+#define RNG_DELAY_CLOCK 81 /* cycles for each SHA-1 output */
+#define NROGROUPS 16
+#define RNG_FIFO_WORDS (512/sizeof(uint64_t))
struct octeon_rnm_softc {
- device_t sc_dev;
-
bus_space_tag_t sc_bust;
bus_space_handle_t sc_regh;
-
+ kmutex_t sc_lock;
krndsource_t sc_rndsrc; /* /dev/random source */
- struct callout sc_rngto; /* rng timeout */
- int sc_rnghz; /* rng poll time */
+ unsigned sc_rogroup;
};
static int octeon_rnm_match(device_t, struct cfdata *, void *);
static void octeon_rnm_attach(device_t, device_t, void *);
-static void octeon_rnm_rng(void *);
-static inline uint64_t octeon_rnm_load(struct octeon_rnm_softc *);
-static inline int octeon_rnm_iobdma(struct octeon_rnm_softc *);
+static void octeon_rnm_rng(size_t, void *);
+static void octeon_rnm_reset(struct octeon_rnm_softc *);
+static void octeon_rnm_conditioned_deterministic(struct octeon_rnm_softc *);
+static void octeon_rnm_conditioned_entropy(struct octeon_rnm_softc *);
+static void octeon_rnm_raw_entropy(struct octeon_rnm_softc *, unsigned);
+static uint64_t octeon_rnm_load(struct octeon_rnm_softc *);
+static void octeon_rnm_iobdma(struct octeon_rnm_softc *, uint64_t *, unsigned);
+static void octeon_rnm_delay(uint32_t);
CFATTACH_DECL_NEW(octeon_rnm, sizeof(struct octeon_rnm_softc),
octeon_rnm_match, octeon_rnm_attach, NULL, NULL);
@@ -73,16 +148,12 @@
octeon_rnm_match(device_t parent, struct cfdata *cf, void *aux)
{
struct iobus_attach_args *aa = aux;
- int result = 0;
if (strcmp(cf->cf_name, aa->aa_name) != 0)
- goto out;
+ return 0;
if (cf->cf_unit != aa->aa_unitno)
- goto out;
- result = 1;
-
-out:
- return result;
+ return 0;
+ return 1;
}
static void
@@ -90,59 +161,186 @@
{
struct octeon_rnm_softc *sc = device_private(self);
struct iobus_attach_args *aa = aux;
- int status;
+ uint64_t bist_status, sample, expected = UINT64_C(0xd654ff35fadf866b);
aprint_normal("\n");
- sc->sc_dev = self;
+ /* Map the device registers, all two of them. */
sc->sc_bust = aa->aa_bust;
- status = bus_space_map(aa->aa_bust, aa->aa_unit->addr, RNM_SIZE,
- 0, &sc->sc_regh);
- if (status != 0)
- panic(": can't map i/o space");
+ if (bus_space_map(aa->aa_bust, aa->aa_unit->addr, RNM_SIZE,
+ 0, &sc->sc_regh) != 0) {
+ aprint_error_dev(self, "unable to map device\n");
+ return;
+ }
- bus_space_write_8(sc->sc_bust, sc->sc_regh, RNM_CTL_STATUS_OFFSET,
- RNM_CTL_STATUS_RNG_EN | RNM_CTL_STATUS_ENT_EN);
+ /* Verify that the built-in self-test succeeded. */
+ bist_status = bus_space_read_8(sc->sc_bust, sc->sc_regh,
+ RNM_BIST_STATUS_OFFSET);
+ if (bist_status) {
+ aprint_error_dev(self, "RNG built in self test failed: %#lx\n",
+ bist_status);
+ return;
+ }
+
+ /* Create a mutex to serialize access to the FIFO. */
+ mutex_init(&sc->sc_lock, MUTEX_DEFAULT, IPL_VM);
- if (hz >= 100)
- sc->sc_rnghz = hz / 100;
- else
- sc->sc_rnghz = 1;
+ /*
+ * Reset the core, enable the RNG engine without entropy, wait
+ * 81 cycles for it to produce a single sample, and draw the
+ * deterministic sample to test.
+ *
+ * XXX Verify that the output matches the SHA-1 computation
+ * described by the data sheet, not just a known answer.
+ */
+ octeon_rnm_reset(sc);
+ octeon_rnm_conditioned_deterministic(sc);
+ octeon_rnm_delay(RNG_DELAY_CLOCK*1);
+ sample = octeon_rnm_load(sc);
+ if (sample != expected)
+ aprint_error_dev(self, "self-test: read %016"PRIx64","
+ " expected %016"PRIx64, sample, expected);
- rnd_attach_source(&sc->sc_rndsrc, device_xname(sc->sc_dev),
- RND_TYPE_RNG, RND_FLAG_NO_ESTIMATE);
-
- callout_init(&sc->sc_rngto, 0);
+ /*
+ * Reset the core again to clear the FIFO, and enable the RNG
+ * engine with entropy exposed directly. Start from the first
+ * group of ring oscillators; as we gather samples we will
+ * rotate through the rest of them.
+ */
+ octeon_rnm_reset(sc);
+ sc->sc_rogroup = 0;
+ octeon_rnm_raw_entropy(sc, sc->sc_rogroup);
+ octeon_rnm_delay(ENT_DELAY_CLOCK*RNG_FIFO_WORDS);
- octeon_rnm_rng(sc);
-
- aprint_normal("%s: random number generator enabled: %dhz\n",
- device_xname(sc->sc_dev), sc->sc_rnghz);
+ /* Attach the rndsource. */
+ rndsource_setcb(&sc->sc_rndsrc, octeon_rnm_rng, sc);
+ rnd_attach_source(&sc->sc_rndsrc, device_xname(self), RND_TYPE_RNG,
+ RND_FLAG_DEFAULT | RND_FLAG_HASCB);
}
static void
-octeon_rnm_rng(void *vsc)
+octeon_rnm_rng(size_t nbytes, void *vsc)
{
+ const unsigned BPB = 256; /* bits of data per bit of entropy */
+ uint64_t sample[32];
struct octeon_rnm_softc *sc = vsc;
- uint64_t rn;
- int i;
+ size_t needed = NBBY*nbytes;
+ unsigned i;
+
+ /* Sample the ring oscillators round-robin. */
+ mutex_enter(&sc->sc_lock);
+ while (needed) {
+ /*
+ * Switch to the next RO group once we drain the FIFO.
+ * By the time rnd_add_data is done, we will have
+ * processed all 512 bytes of the FIFO. We assume it
+ * takes at least one cycle per byte (realistically,
+ * more like ~80cpb to draw from the FIFO and then
+ * process it with rnd_add_data), so there is no need
+ * for any other delays.
+ */
+ sc->sc_rogroup++;
+ sc->sc_rogroup %= NROGROUPS;
+ octeon_rnm_raw_entropy(sc, sc->sc_rogroup);
- for (i = 0; i < octeon_rnm_burst_count; i++) {
- rn = octeon_rnm_load(sc);
- rnd_add_data(&sc->sc_rndsrc,
- &rn, sizeof(rn), sizeof(rn) * NBBY);
/*
- * XXX
- * delay should be over RNG_DELAY_CLOCK cycles at least,
- * we need nanodelay() or clkdelay().
+ * Gather half the FIFO at a time -- we are limited to
+ * 256 bytes because of limits on the CVMSEG buffer.
*/
- delay(1);
+ CTASSERT(sizeof sample == 256);
+ CTASSERT(2*__arraycount(sample) == RNG_FIFO_WORDS);
+ for (i = 0; i < 2; i++) {
+ octeon_rnm_iobdma(sc, sample, __arraycount(sample));
+#ifdef OCTEON_RNM_DEBUG
+ hexdump(printf, "rnm", sample, sizeof sample);
+#endif
+ rnd_add_data_sync(&sc->sc_rndsrc, sample,
+ sizeof sample, NBBY*sizeof(sample)/BPB);
+ needed -= MIN(needed, MAX(1, NBBY*sizeof(sample)/BPB));
+ }
+
+ /* Yield if requested. */
+ if (__predict_false(curcpu()->ci_schedstate.spc_flags &
+ SPCF_SHOULDYIELD)) {
+ mutex_exit(&sc->sc_lock);
+ preempt();
Home |
Main Index |
Thread Index |
Old Index