Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/netbsd-9]: src/sys/arch/mips/cavium Pull up following revision(s) (reque...



details:   https://anonhg.NetBSD.org/src/rev/9bb850d6ebd5
branches:  netbsd-9
changeset: 1001770:9bb850d6ebd5
user:      martin <martin%NetBSD.org@localhost>
date:      Tue May 19 17:35:50 2020 +0000

description:
Pull up following revision(s) (requested by simonb in ticket #918):

        sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.3
        sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.4
        sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.5
        sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.6 (+ patch)
        sys/arch/mips/cavium/dev/octeon_rnmreg.h: revision 1.2
        sys/arch/mips/cavium/dev/octeon_rnmreg.h: revision 1.3
        sys/arch/mips/cavium/octeonvar.h: revision 1.7

Add a few more bits.
XXX convert to __BITS.
--
If bus_space_map fails, just don't attach the driver instead of panicing.
Check RNG built in self test, don't attach if that fails too.
--
Oceton RNG/RNM driver modernisation to fit new entropy world order by
riastradh@, with some tweaks to get working in RNG mode.
XXX TODO: work out how to get raw entropy mode working.
--
Rework octeon_rnm(4) random number generator driver.
- Do a little on-line self-test for fun.
- Draw raw samples from the ring oscillators.
- Draw substantially more samples:
  =3D> early RO samples seem to have considerably lower entropy
  =3D> consecutive RO samples are not independent
- Make sure to use rnd_add_data_sync in the callback.
  =3D> not technically needed in HEAD, but would be needed for pullup
--
Adjust entropy estimate for the Octeon.
We are hedging in serial and in parallel, and more conservative than
the Linux driver from Cavium seems to be, so although I don't know
exactly what the thermal jitter of the device is, this seems like a
reasonable compromise.

diffstat:

 sys/arch/mips/cavium/dev/octeon_rnm.c    |  373 +++++++++++++++++++++++-------
 sys/arch/mips/cavium/dev/octeon_rnmreg.h |   22 +-
 sys/arch/mips/cavium/octeonvar.h         |    7 +-
 3 files changed, 310 insertions(+), 92 deletions(-)

diffs (truncated from 530 to 300 lines):

diff -r 85959e95a901 -r 9bb850d6ebd5 sys/arch/mips/cavium/dev/octeon_rnm.c
--- a/sys/arch/mips/cavium/dev/octeon_rnm.c     Tue May 19 16:24:38 2020 +0000
+++ b/sys/arch/mips/cavium/dev/octeon_rnm.c     Tue May 19 17:35:50 2020 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: octeon_rnm.c,v 1.2 2019/01/08 19:41:09 jdolecek Exp $  */
+/*     $NetBSD: octeon_rnm.c,v 1.2.4.1 2020/05/19 17:35:50 martin Exp $        */
 
 /*
  * Copyright (c) 2007 Internet Initiative Japan, Inc.
@@ -26,15 +26,86 @@
  * SUCH DAMAGE.
  */
 
+/*
+ * Cavium Octeon Random Number Generator / Random Number Memory `RNM'
+ *
+ *     The RNM unit consists of:
+ *
+ *     1. 128 ring oscillators
+ *     2. an LFSR/SHA-1 conditioner
+ *     3. a 512-byte FIFO
+ *
+ *     When the unit is enabled, there are three modes of operation:
+ *
+ *     (a) deterministic: the ring oscillators are disabled and the
+ *         LFSR/SHA-1 conditioner operates on fixed inputs to give
+ *         reproducible results for testing,
+ *
+ *     (b) conditioned entropy: the ring oscillators are enabled and
+ *         samples from them are fed through the LFSR/SHA-1
+ *         conditioner before being put into the FIFO, and
+ *
+ *     (c) raw entropy: the ring oscillators are enabled, and a group
+ *         of eight of them selected at any one time is sampled and
+ *         fed into the FIFO.
+ *
+ *     Details:
+ *
+ *     - The FIFO is refilled whenever we read out of it, either with
+ *       a load address or an IOBDMA operation.
+ *
+ *     - The conditioner takes 81 cycles to produce a 64-bit block of
+ *       output in the FIFO whether in deterministic or conditioned
+ *       entropy mode, each block consisting of the first 64 bits of a
+ *       SHA-1 hash.
+ *
+ *     - A group of eight ring oscillators take 8 cycles to produce a
+ *       64-bit block of output in the FIFO in raw entropy mode, each
+ *       block consisting of eight consecutive samples from each RO in
+ *       parallel.
+ *
+ *     The first sample of each RO always seems to be zero.  Further,
+ *     consecutive samples from a single ring oscillator are not
+ *     independent, so naive debiasing like a von Neumann extractor
+ *     falls flat on its face.  And parallel ring oscillators powered
+ *     by the same source may not be independent either, if they end
+ *     up locked.
+ *
+ *     We read out one FIFO's worth of raw samples from groups of 8
+ *     ring oscillators at a time, of 128 total, by going through them
+ *     round robin.  We take 32 consecutive samples from each ring
+ *     oscillator in a group of 8 in parallel before we count one bit
+ *     of entropy.  To get 256 bits of entropy, we read 4Kbit of data
+ *     from each of two 8-RO groups.
+ *
+ *     We could use the on-board LFSR/SHA-1 conditioner like the Linux
+ *     driver written by Cavium does, but it's not clear how many RO
+ *     samples go into the conditioner, and our entropy pool is a
+ *     perfectly good conditioner itself, so it seems there is little
+ *     advantage -- other than expedience -- to using the LFSR/SHA-1
+ *     conditioner.  All the manual says is that it samples 125 of the
+ *     128 ROs.  But the Cavium SHA-1 CPU instruction is advertised to
+ *     have a latency of 100 cycles, so it seems implausible that much
+ *     more than one sample from each RO could be squeezed in there.
+ *
+ *     The hardware exposes only 64 bits of each SHA-1 hash, and the
+ *     Linux driver uses 32 bits of that -- which, if treated as full
+ *     entropy, would mean an assessment of 3.9 bits of RO samples to
+ *     get 1 bit of entropy, whereas we take 256 bits of RO samples to
+ *     get one bit of entropy, so this seems reasonably conservative.
+ *
+ * Reference: Cavium Networks OCTEON Plus CN50XX Hardware Reference
+ * Manual, CN50XX-HM-0.99E PRELIMINARY, July 2008.
+ */
+
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: octeon_rnm.c,v 1.2 2019/01/08 19:41:09 jdolecek Exp $");
+__KERNEL_RCSID(0, "$NetBSD: octeon_rnm.c,v 1.2.4.1 2020/05/19 17:35:50 martin Exp $");
 
 #include <sys/param.h>
 #include <sys/device.h>
-#include <sys/systm.h>
-#include <sys/sysctl.h>
 #include <sys/kernel.h>
 #include <sys/rndsource.h>
+#include <sys/systm.h>
 
 #include <mips/locore.h>
 #include <mips/cavium/include/iobusvar.h>
@@ -44,27 +115,31 @@
 
 #include <sys/bus.h>
 
-#define RNG_DELAY_CLOCK 91
-#define RNG_DEF_BURST_COUNT 10
+//#define      OCTEON_RNM_DEBUG
 
-int octeon_rnm_burst_count = RNG_DEF_BURST_COUNT;
+#define        ENT_DELAY_CLOCK 8       /* cycles for each 64-bit RO sample batch */
+#define        RNG_DELAY_CLOCK 81      /* cycles for each SHA-1 output */
+#define        NROGROUPS       16
+#define        RNG_FIFO_WORDS  (512/sizeof(uint64_t))
 
 struct octeon_rnm_softc {
-       device_t sc_dev;
-
        bus_space_tag_t         sc_bust;
        bus_space_handle_t      sc_regh;
-
+       kmutex_t                sc_lock;
        krndsource_t            sc_rndsrc;      /* /dev/random source */
-       struct callout          sc_rngto;       /* rng timeout */
-       int                     sc_rnghz;       /* rng poll time */
+       unsigned                sc_rogroup;
 };
 
 static int octeon_rnm_match(device_t, struct cfdata *, void *);
 static void octeon_rnm_attach(device_t, device_t, void *);
-static void octeon_rnm_rng(void *);
-static inline uint64_t octeon_rnm_load(struct octeon_rnm_softc *);
-static inline int octeon_rnm_iobdma(struct octeon_rnm_softc *);
+static void octeon_rnm_rng(size_t, void *);
+static void octeon_rnm_reset(struct octeon_rnm_softc *);
+static void octeon_rnm_conditioned_deterministic(struct octeon_rnm_softc *);
+static void octeon_rnm_conditioned_entropy(struct octeon_rnm_softc *);
+static void octeon_rnm_raw_entropy(struct octeon_rnm_softc *, unsigned);
+static uint64_t octeon_rnm_load(struct octeon_rnm_softc *);
+static void octeon_rnm_iobdma(struct octeon_rnm_softc *, uint64_t *, unsigned);
+static void octeon_rnm_delay(uint32_t);
 
 CFATTACH_DECL_NEW(octeon_rnm, sizeof(struct octeon_rnm_softc),
     octeon_rnm_match, octeon_rnm_attach, NULL, NULL);
@@ -73,16 +148,12 @@
 octeon_rnm_match(device_t parent, struct cfdata *cf, void *aux)
 {
        struct iobus_attach_args *aa = aux;
-       int result = 0;
 
        if (strcmp(cf->cf_name, aa->aa_name) != 0)
-               goto out;
+               return 0;
        if (cf->cf_unit != aa->aa_unitno)
-               goto out;
-       result = 1;
-
-out:
-       return result;
+               return 0;
+       return 1;
 }
 
 static void
@@ -90,59 +161,186 @@
 {
        struct octeon_rnm_softc *sc = device_private(self);
        struct iobus_attach_args *aa = aux;
-       int status;
+       uint64_t bist_status, sample, expected = UINT64_C(0xd654ff35fadf866b);
 
        aprint_normal("\n");
 
-       sc->sc_dev = self;
+       /* Map the device registers, all two of them.  */
        sc->sc_bust = aa->aa_bust;
-       status = bus_space_map(aa->aa_bust, aa->aa_unit->addr, RNM_SIZE,
-           0, &sc->sc_regh);
-       if (status != 0)
-               panic(": can't map i/o space");
+       if (bus_space_map(aa->aa_bust, aa->aa_unit->addr, RNM_SIZE,
+           0, &sc->sc_regh) != 0) {
+               aprint_error_dev(self, "unable to map device\n");
+               return;
+       }
 
-       bus_space_write_8(sc->sc_bust, sc->sc_regh, RNM_CTL_STATUS_OFFSET,
-           RNM_CTL_STATUS_RNG_EN | RNM_CTL_STATUS_ENT_EN);
+       /* Verify that the built-in self-test succeeded.  */
+       bist_status = bus_space_read_8(sc->sc_bust, sc->sc_regh,
+           RNM_BIST_STATUS_OFFSET);
+       if (bist_status) {
+               aprint_error_dev(self, "RNG built in self test failed: %#lx\n",
+                   bist_status);
+               return;
+       }
+
+       /* Create a mutex to serialize access to the FIFO.  */
+       mutex_init(&sc->sc_lock, MUTEX_DEFAULT, IPL_VM);
 
-       if (hz >= 100)
-               sc->sc_rnghz = hz / 100;
-       else 
-               sc->sc_rnghz = 1;
+       /*
+        * Reset the core, enable the RNG engine without entropy, wait
+        * 81 cycles for it to produce a single sample, and draw the
+        * deterministic sample to test.
+        *
+        * XXX Verify that the output matches the SHA-1 computation
+        * described by the data sheet, not just a known answer.
+        */
+       octeon_rnm_reset(sc);
+       octeon_rnm_conditioned_deterministic(sc);
+       octeon_rnm_delay(RNG_DELAY_CLOCK*1);
+       sample = octeon_rnm_load(sc);
+       if (sample != expected)
+               aprint_error_dev(self, "self-test: read %016"PRIx64","
+                   " expected %016"PRIx64, sample, expected);
 
-       rnd_attach_source(&sc->sc_rndsrc, device_xname(sc->sc_dev),
-           RND_TYPE_RNG, RND_FLAG_NO_ESTIMATE);
-
-       callout_init(&sc->sc_rngto, 0);
+       /*
+        * Reset the core again to clear the FIFO, and enable the RNG
+        * engine with entropy exposed directly.  Start from the first
+        * group of ring oscillators; as we gather samples we will
+        * rotate through the rest of them.
+        */
+       octeon_rnm_reset(sc);
+       sc->sc_rogroup = 0;
+       octeon_rnm_raw_entropy(sc, sc->sc_rogroup);
+       octeon_rnm_delay(ENT_DELAY_CLOCK*RNG_FIFO_WORDS);
 
-       octeon_rnm_rng(sc);
-
-       aprint_normal("%s: random number generator enabled: %dhz\n",
-           device_xname(sc->sc_dev), sc->sc_rnghz);
+       /* Attach the rndsource.  */
+       rndsource_setcb(&sc->sc_rndsrc, octeon_rnm_rng, sc);
+       rnd_attach_source(&sc->sc_rndsrc, device_xname(self), RND_TYPE_RNG,
+           RND_FLAG_DEFAULT | RND_FLAG_HASCB);
 }
 
 static void
-octeon_rnm_rng(void *vsc)
+octeon_rnm_rng(size_t nbytes, void *vsc)
 {
+       const unsigned BPB = 256; /* bits of data per bit of entropy */
+       uint64_t sample[32];
        struct octeon_rnm_softc *sc = vsc;
-       uint64_t rn;
-       int i;
+       size_t needed = NBBY*nbytes;
+       unsigned i;
+
+       /* Sample the ring oscillators round-robin.  */
+       mutex_enter(&sc->sc_lock);
+       while (needed) {
+               /*
+                * Switch to the next RO group once we drain the FIFO.
+                * By the time rnd_add_data is done, we will have
+                * processed all 512 bytes of the FIFO.  We assume it
+                * takes at least one cycle per byte (realistically,
+                * more like ~80cpb to draw from the FIFO and then
+                * process it with rnd_add_data), so there is no need
+                * for any other delays.
+                */
+               sc->sc_rogroup++;
+               sc->sc_rogroup %= NROGROUPS;
+               octeon_rnm_raw_entropy(sc, sc->sc_rogroup);
 
-       for (i = 0; i < octeon_rnm_burst_count; i++) {
-               rn = octeon_rnm_load(sc);
-               rnd_add_data(&sc->sc_rndsrc,
-                               &rn, sizeof(rn), sizeof(rn) * NBBY);
                /*
-                * XXX
-                * delay should be over RNG_DELAY_CLOCK cycles at least,
-                * we need nanodelay() or clkdelay().
+                * Gather half the FIFO at a time -- we are limited to
+                * 256 bytes because of limits on the CVMSEG buffer.
                 */
-               delay(1);
+               CTASSERT(sizeof sample == 256);
+               CTASSERT(2*__arraycount(sample) == RNG_FIFO_WORDS);
+               for (i = 0; i < 2; i++) {
+                       octeon_rnm_iobdma(sc, sample, __arraycount(sample));
+#ifdef OCTEON_RNM_DEBUG
+                       hexdump(printf, "rnm", sample, sizeof sample);
+#endif
+                       rnd_add_data_sync(&sc->sc_rndsrc, sample,
+                           sizeof sample, NBBY*sizeof(sample)/BPB);
+                       needed -= MIN(needed, MAX(1, NBBY*sizeof(sample)/BPB));
+               }
+
+               /* Yield if requested.  */
+               if (__predict_false(curcpu()->ci_schedstate.spc_flags &
+                       SPCF_SHOULDYIELD)) {
+                       mutex_exit(&sc->sc_lock);
+                       preempt();



Home | Main Index | Thread Index | Old Index