UCI Datasets → TinyMind Examples

Scope: the TinyMind example programs that are built around a UCI Machine Learning Repository dataset — what each one does, which dataset it uses, and how the data is sourced. This is the as-built record (one entry per runnable example under examples/), not a survey of what could be built.

Source: UCI Machine Learning Repository.

Data-source convention

Each example states one of three data modes:

Ships real data — the UCI CSV is bundled in the example dir and copied into ./output/ by the Makefile; nothing to download.
Optional real data — runs offline on a deterministic synthetic series by default; drop the named UCI file into ./output/ and the loader picks it up.
Synthetic, UCI-inspired — reproduces the documented generative/failure rules of the UCI dataset; no real-data loader (the dataset’s phenomenon is the subject, not its exact rows).

Most examples are the NeuralNet<> / LstmNeuralNetwork<> train-and-deploy path run entirely in QValue Q16.16 fixed-point, the on-MCU shape. The Japanese Vowels example is the exception that proves the rule: it trains offline in float and then sweeps fixed-point formats to find the smallest one (Q8.8) that holds the float accuracy.

Examples

Iris — species classifier

Dataset: UCI Iris — 150 inst · 4 features · 3-class.
Model: MLP 4→8→3, ReLU hidden, 3 sigmoid (argmax). z-score/3 input scaling, 30k iters.
Data: ships real data (iris.data, ~4 KB).
Result: 100% test accuracy (30/30). The smallest end-to-end fixed-point classifier.
examples/iris · page

Energy Efficiency — building-load regression

Dataset: UCI Energy Efficiency — 768 inst · 8 features · 2-target regression.
Model: MLP 8→16→2, ReLU hidden, 2 linear outputs (heating + cooling load). LinearActivationPolicy + GradientClipByValue.
Data: ships real data (ENB2012_data.csv, ~35 KB).
Result: heating R² ≈ 0.90, cooling R² ≈ 0.88. TinyMind’s smallest regression example.
examples/energy_efficiency · page

Optical Handwritten Digits — 8×8 image classifier

Dataset: UCI Optical Recognition of Handwritten Digits — 8×8 bitmaps (64 px, 0..16) · 10-class.
Model: MLP 64→32→10, ReLU hidden, 10 sigmoid (argmax). Per-pixel z-score with constant-zero guard, 60k iters.
Data: ships real data (optdigits.tra 3823 rows + optdigits.tes 1797 rows).
Result: ~96% test accuracy (1729/1797). Real 64-feature image task in fixed-point.
examples/optical_digits · page

Predictive Maintenance — binary failure classifier

Dataset: AI4I 2020 Predictive Maintenance — milling-machine readings · binary (failure / no-failure).
Model: MLP 10→24→1, ReLU hidden, single sigmoid. 5 process features + 3 physics product features (power, overstrain, temp gap) + 2-dim variant one-hot. 50/50 balanced sampling for the ~3.4% failure rate.
Data: optional real data (ai4i2020.csv); else synthesizes 10k rows from the documented HDF/PWF/OSF/TWF/RNF rules.
Result: recall ~0.89, precision ~0.80, F1 ~0.84.
examples/predictive_maintenance · page

Human Activity Recognition — recurrent (LSTM)

Dataset: UCI HAR Using Smartphones — tri-axial accelerometer · activity classes.
Model: Q16.16 LSTM 3→16 (tanh)→4 sigmoid, stateful per 32-step window, argmax at final step. Classes: WALKING / WALKING_UPSTAIRS / SITTING / STANDING.
Data: optional real data (long-format har.csv); else synthesizes physically-motivated accelerometer windows.
Result: ~97.5% test accuracy (195/200). LstmNeuralNetwork<> for sequence classification.
examples/har_activity · page

Japanese Vowels — recurrent (Elman), offline-float → fixed-point sweep

Dataset: UCI Japanese Vowels — 9 speakers · 12 LPC cepstrum coeffs · 7–29 frame sequences · 270 train / 370 test.
Model: ElmanNeuralNetwork 12→16 (tanh, recurrent)→9 sigmoid, per-frame scores summed over the utterance, argmax speaker. z-score/3 input scaling.
Data: ships real data (ae.train / ae.test).
Result: the deployment-flow demo — trained offline in double, then swept across fixed-point formats. Q8.8 matches double-precision accuracy exactly (94.05%) in 1.2 KB of weights (4× smaller); Q4.4 collapses. Lands on the Kudo et al. 1999 paper baseline (94.1%).
examples/elman_vowels · page

Air Quality Forecasting — recurrent (LSTM)

Dataset: UCI Air Quality — hourly pollutant series.
Model: Q16.16 LSTM 1→16 (tanh)→1 sigmoid, next-hour forecaster. 24-hour state reset per BPTT segment, series normalized to [0.1, 0.9].
Data: optional real data (AirQualityUCI.csv, CO(GT) column, ;-separated, decimal comma, -200 = missing); else a synthetic daily-cycle CO series.
Result: one-step-ahead MAE ≈ 0.13 over a 1.3–3.0 range.
examples/air_quality · page

Gas Sensor Array Drift — drift demonstration

Dataset: UCI Gas Sensor Array Drift — 16 chemo-resistive sensors × 8 features, 36 months / 10 batches · 6-gas classification.
Model: MLP 128→32→6, ReLU hidden, 6 sigmoid (argmax). Train on batch 1, evaluate every later batch; normalization fixed to batch-1 stats (deliberately not drift-corrected).
Data: synthetic, UCI-inspired (per-batch multiplicative gain + additive offset stand in for sensor aging).
Result: accuracy decays from ~1.0 (batch 1) to ~0.73 (batch 10) — the drift curve is the point.
examples/gas_sensor_drift · page

Summary

Example	UCI dataset	Task	Capability	Data mode
Iris	Iris	3-class	`NeuralNet<>` MLP	ships real
Energy Efficiency	Energy Efficiency	2-target regression	MLP, linear readout	ships real
Optical Digits	Optical Digits (8×8)	10-class image	MLP	ships real
Predictive Maintenance	AI4I 2020	binary	MLP + physics features	optional real
HAR Activity	HAR Smartphones	seq classification	`LstmNeuralNetwork<>`	optional real
Japanese Vowels	Japanese Vowels	seq classification	`ElmanNeuralNetwork<>`, float→Q8.8 sweep	ships real
Air Quality	Air Quality	seq forecasting	`LstmNeuralNetwork<>`	optional real
Gas Sensor Drift	Gas Sensor Array Drift	6-class + drift	MLP + batchnorm story	synthetic, UCI-inspired

See the Example Gallery for the behavior plots, or each example’s own page for the full write-up.