PyTorch → int8 XOR
An end-to-end post-training int8 quantization demo: a PyTorch XOR MLP is trained in float32, calibrated per-tensor, and emitted as a weights.hpp, then run through a pure-integer TinyMind forward pass.
How it works
- Pipeline:
nn.Linear(2,4)→nn.ReLU→nn.Linear(4,1)→nn.Sigmoidin PyTorch maps toQDense<int8,int8,int32,int8>→qreluBuffer→QDense→ int8 sigmoid LUT in TinyMind. Calibration follows TFLite / CMSIS-NN convention: symmetric per-tensor weights (zero_point=0), asymmetric activations, int32 biases atinput_scale * weight_scale. - Demonstrates the int8 affine quantization path and its host-side calibration tooling — the
(scale, zero_point)metadata emitted by the Python script is turned back into aRequantizer(multiplier, shift)pair viatinymind::buildRequantizer, and the sigmoid output grid is built withbuildQSigmoidLUT. For an MCU deployment those integer triples are baked in once on the host, so the inference binary needs neither<cmath>nor float math. - Pure-integer inference classifies all four XOR corners correctly (
int8 XOR accuracy: 4/4); the committedweights.hppships an exact textbook 2-4-1 ReLU+Sigmoid solver so the demo runs without PyTorch.
Build and run
cd examples/pytorch_quant/xor
make release
make run
make plot # needs matplotlib; a venv/pyenv works if it is not already in your Python
Building with TINYMIND_ENABLE_QUANTIZATION=1 (plus FLOAT=1 STD=1 so the demo can rebuild the Requantizer and sigmoid LUT from the calibration scales). To regenerate weights.hpp from a fresh PyTorch training + 9×9-grid calibration run, use make regenerate-weights (i.e. python3 xor_quant.py, requires torch). The deployable MCU shape is FLOAT=0 STD=0 with the integer (multiplier, shift, zero_point) triples baked in.
Output

The heatmap sweeps the pure-integer network over the [0,1]² input grid. The two off-diagonal corners (0,1) and (1,0) read deep red (P(XOR=1) → 1) while the matching corners (0,0) and (1,1) read deep blue (→ 0), with the white 0.5 contour tracing the learned XOR boundary — the int8 network reproduces the classic non-linearly-separable decision surface.