UCI Datasets → TinyMind Examples

Scope: the TinyMind example programs that are built around a UCI Machine Learning Repository dataset — what each one does, which dataset it uses, and how the data is sourced. This is the as-built record (one entry per runnable example under examples/), not a survey of what could be built.

Source: UCI Machine Learning Repository.


Data-source convention

Each example states one of three data modes:

  • Ships real data — the UCI CSV is bundled in the example dir and copied into ./output/ by the Makefile; nothing to download.
  • Optional real data — runs offline on a deterministic synthetic series by default; drop the named UCI file into ./output/ and the loader picks it up.
  • Synthetic, UCI-inspired — reproduces the documented generative/failure rules of the UCI dataset; no real-data loader (the dataset’s phenomenon is the subject, not its exact rows).

Most examples are the NeuralNet<> / LstmNeuralNetwork<> train-and-deploy path run entirely in QValue Q16.16 fixed-point, the on-MCU shape. The Japanese Vowels example is the exception that proves the rule: it trains offline in float and then sweeps fixed-point formats to find the smallest one (Q8.8) that holds the float accuracy.


Examples

Iris — species classifier

  • Dataset: UCI Iris — 150 inst · 4 features · 3-class.
  • Model: MLP 4→8→3, ReLU hidden, 3 sigmoid (argmax). z-score/3 input scaling, 30k iters.
  • Data: ships real data (iris.data, ~4 KB).
  • Result: 100% test accuracy (30/30). The smallest end-to-end fixed-point classifier.
  • examples/iris · page

Energy Efficiency — building-load regression

  • Dataset: UCI Energy Efficiency — 768 inst · 8 features · 2-target regression.
  • Model: MLP 8→16→2, ReLU hidden, 2 linear outputs (heating + cooling load). LinearActivationPolicy + GradientClipByValue.
  • Data: ships real data (ENB2012_data.csv, ~35 KB).
  • Result: heating R² ≈ 0.90, cooling R² ≈ 0.88. TinyMind’s smallest regression example.
  • examples/energy_efficiency · page

Optical Handwritten Digits — 8×8 image classifier

  • Dataset: UCI Optical Recognition of Handwritten Digits — 8×8 bitmaps (64 px, 0..16) · 10-class.
  • Model: MLP 64→32→10, ReLU hidden, 10 sigmoid (argmax). Per-pixel z-score with constant-zero guard, 60k iters.
  • Data: ships real data (optdigits.tra 3823 rows + optdigits.tes 1797 rows).
  • Result: ~96% test accuracy (1729/1797). Real 64-feature image task in fixed-point.
  • examples/optical_digits · page

Predictive Maintenance — binary failure classifier

  • Dataset: AI4I 2020 Predictive Maintenance — milling-machine readings · binary (failure / no-failure).
  • Model: MLP 10→24→1, ReLU hidden, single sigmoid. 5 process features + 3 physics product features (power, overstrain, temp gap) + 2-dim variant one-hot. 50/50 balanced sampling for the ~3.4% failure rate.
  • Data: optional real data (ai4i2020.csv); else synthesizes 10k rows from the documented HDF/PWF/OSF/TWF/RNF rules.
  • Result: recall ~0.89, precision ~0.80, F1 ~0.84.
  • examples/predictive_maintenance · page

Human Activity Recognition — recurrent (LSTM)

  • Dataset: UCI HAR Using Smartphones — tri-axial accelerometer · activity classes.
  • Model: Q16.16 LSTM 3→16 (tanh)→4 sigmoid, stateful per 32-step window, argmax at final step. Classes: WALKING / WALKING_UPSTAIRS / SITTING / STANDING.
  • Data: optional real data (long-format har.csv); else synthesizes physically-motivated accelerometer windows.
  • Result: ~97.5% test accuracy (195/200). LstmNeuralNetwork<> for sequence classification.
  • examples/har_activity · page

Japanese Vowels — recurrent (Elman), offline-float → fixed-point sweep

  • Dataset: UCI Japanese Vowels — 9 speakers · 12 LPC cepstrum coeffs · 7–29 frame sequences · 270 train / 370 test.
  • Model: ElmanNeuralNetwork 12→16 (tanh, recurrent)→9 sigmoid, per-frame scores summed over the utterance, argmax speaker. z-score/3 input scaling.
  • Data: ships real data (ae.train / ae.test).
  • Result: the deployment-flow demo — trained offline in double, then swept across fixed-point formats. Q8.8 matches double-precision accuracy exactly (94.05%) in 1.2 KB of weights (4× smaller); Q4.4 collapses. Lands on the Kudo et al. 1999 paper baseline (94.1%).
  • examples/elman_vowels · page

Air Quality Forecasting — recurrent (LSTM)

  • Dataset: UCI Air Quality — hourly pollutant series.
  • Model: Q16.16 LSTM 1→16 (tanh)→1 sigmoid, next-hour forecaster. 24-hour state reset per BPTT segment, series normalized to [0.1, 0.9].
  • Data: optional real data (AirQualityUCI.csv, CO(GT) column, ;-separated, decimal comma, -200 = missing); else a synthetic daily-cycle CO series.
  • Result: one-step-ahead MAE ≈ 0.13 over a 1.3–3.0 range.
  • examples/air_quality · page

Gas Sensor Array Drift — drift demonstration

  • Dataset: UCI Gas Sensor Array Drift — 16 chemo-resistive sensors × 8 features, 36 months / 10 batches · 6-gas classification.
  • Model: MLP 128→32→6, ReLU hidden, 6 sigmoid (argmax). Train on batch 1, evaluate every later batch; normalization fixed to batch-1 stats (deliberately not drift-corrected).
  • Data: synthetic, UCI-inspired (per-batch multiplicative gain + additive offset stand in for sensor aging).
  • Result: accuracy decays from ~1.0 (batch 1) to ~0.73 (batch 10) — the drift curve is the point.
  • examples/gas_sensor_drift · page

Summary

Example UCI dataset Task Capability Data mode
Iris Iris 3-class NeuralNet<> MLP ships real
Energy Efficiency Energy Efficiency 2-target regression MLP, linear readout ships real
Optical Digits Optical Digits (8×8) 10-class image MLP ships real
Predictive Maintenance AI4I 2020 binary MLP + physics features optional real
HAR Activity HAR Smartphones seq classification LstmNeuralNetwork<> optional real
Japanese Vowels Japanese Vowels seq classification ElmanNeuralNetwork<>, float→Q8.8 sweep ships real
Air Quality Air Quality seq forecasting LstmNeuralNetwork<> optional real
Gas Sensor Drift Gas Sensor Array Drift 6-class + drift MLP + batchnorm story synthetic, UCI-inspired

See the Example Gallery for the behavior plots, or each example’s own page for the full write-up.


Back to top

Dan McLeran — danmcleran@gmail.com — MIT License

This site uses Just the Docs, a documentation theme for Jekyll.