Weight Import/Export and PyTorch Interoperability
Tinymind provides weight serialization for all network types (MLP, LSTM, GRU, KAN), enabling a powerful workflow: train in PyTorch on a powerful workstation, export weights to a text file, and deploy in tinymind C++ on an embedded device with no training overhead.
Why This Matters for Embedded
Training a neural network requires 2-3x the memory of inference alone (gradients, delta weights, momentum terms). On a microcontroller with 4-8 KB of RAM, this overhead can be the difference between feasible and impossible. The train-offline-deploy-lean workflow eliminates this entirely:
| Mode | MLP (2->3->1) Q8.8 | LSTM (2->3->1) Q8.8 | GRU (2->3->1) Q8.8 |
|---|---|---|---|
| Trainable | 328 bytes | 952 bytes | 808 bytes |
Non-trainable (IsTrainable=false) | 144 bytes | 384 bytes | 336 bytes |
| Savings | 56% | 60% | 58% |
By setting IsTrainable=false, the compiler strips all training infrastructure – backpropagation code, gradient storage, optimizer state – and produces a minimal inference-only binary.
This approach also provides on-device privacy (sensor data never leaves the device), zero-latency inference (no network round-trip), and battery efficiency (no radio transmission).
Weight File Managers
Tinymind provides three file manager templates, one for each network family:
| File Manager | Network Types | Header |
|---|---|---|
NetworkPropertiesFileManager<NNType> | MLP, NeuralNetwork | nnproperties.hpp |
RecurrentNetworkPropertiesFileManager<NNType> | LSTM, GRU, Elman | nnproperties.hpp |
KanNetworkPropertiesFileManager<KanType> | KAN | nnproperties.hpp |
Common API
// Save weights to text file (one value per line)
std::ofstream outFile("weights.txt");
FileManager::storeNetworkWeights(network, outFile);
// Save weights to binary file
FileManager::storeNetworkWeights(network, "weights.bin");
// Load weights from text file
std::ifstream inFile("weights.txt");
FileManager::template loadNetworkWeights<SourceType, DestType>(network, inFile);
Weight File Formats
Feed-Forward Networks (MLP)
Values are stored one per line in this order:
- Input-to-first-hidden weights:
NumberOfInputs * NumberOfHiddenNeurons - Input layer bias weights:
NumberOfHiddenNeurons - Hidden-to-hidden weights (when
NumberOfHiddenLayers > 1):NumberOfHiddenNeurons^2 + NumberOfHiddenNeuronsper layer - Last-hidden-to-output weights:
NumberOfHiddenNeurons * NumberOfOutputs - Output layer bias weights:
NumberOfOutputs
Recurrent Networks (LSTM, GRU)
- Input-to-hidden weights (gated):
NumberOfInputs * NumberOfHiddenNeurons * GateMultiplier - Input layer bias weights (gated):
NumberOfHiddenNeurons * GateMultiplier - Recurrent-to-hidden weights (gated):
NumberOfHiddenNeurons^2 * GateMultiplier - Last-hidden-to-output weights (not gated):
NumberOfHiddenNeurons * NumberOfOutputs - Output layer bias weights:
NumberOfOutputs
GateMultiplier is 4 for LSTM and 3 for GRU.
Value Encoding
- Fixed-point: Raw integer representation scaled by
2^FractionalBits. For example, 1.5 in Q16.16 is stored as98304(= 1.5 * 65536). - Floating-point: Standard decimal string representation.
PyTorch Export: MLP
Source code: examples/pytorch/xor/xor.py
Q16.16 Conversion Function
def float_to_q16_16(x: float) -> int:
"""Convert a Python float to signed Q16.16 integer representation."""
val = int(round(x * (1 << 16)))
if val < -2**31:
val = -2**31
elif val > 2**31 - 1:
val = 2**31 - 1
return val
Export Function
The export function writes weights in the exact order that NetworkPropertiesFileManager::loadNetworkWeights expects:
def save_to_tinymind_format(self, path: str) -> None:
from collections import OrderedDict
data = OrderedDict()
# 1. Input -> hidden weights (transposed: PyTorch stores [out, in])
rows, cols = self.fc1.weight.T.shape
for i in range(rows):
for j in range(cols):
data[f'Input{i}{j}Weight'] = float_to_q16_16(self.fc1.weight.T[i, j].item())
# 2. Input bias -> hidden
for j in range(len(self.fc1.bias)):
data[f'InputBias0{j}Weight'] = float_to_q16_16(self.fc1.bias[j].item())
# 3. Hidden -> output weights
rows, cols = self.fc2.weight.T.shape
for i in range(rows):
for j in range(cols):
data[f'Hidden0{i}{j}Weight'] = float_to_q16_16(self.fc2.weight.T[i, j].item())
# 4. Hidden bias -> output
for j in range(len(self.fc2.bias)):
data[f'Hidden0Bias{j}Weight'] = float_to_q16_16(self.fc2.bias[j].item())
with open(path, 'w') as f:
f.write('\n'.join(str(v) for v in data.values()) + '\n')
C++ Import: MLP Inference
Source code: examples/pytorch/xor/xor.cpp
typedef tinymind::QValue<16, 16, true> ValueType;
// Non-trainable network (inference only) -- saves memory
typedef tinymind::FixedPointTransferFunctions<ValueType,
tinymind::NullRandomNumberPolicy<ValueType>,
tinymind::ReluActivationPolicy<ValueType>,
tinymind::SigmoidActivationPolicy<ValueType>> TransferFunctionsType;
typedef tinymind::MultilayerPerceptron<ValueType, 2, 1, 3, 1,
TransferFunctionsType, false> NeuralNetworkType; // false = non-trainable
typedef tinymind::NetworkPropertiesFileManager<NeuralNetworkType> FileManager;
NeuralNetworkType testNeuralNet;
// Load PyTorch-exported weights
std::ifstream weightsFile("../input/xor_weights_q16_16.txt");
FileManager::template loadNetworkWeights<ValueType, ValueType>(testNeuralNet, weightsFile);
// Run inference
ValueType values[2], learnedValues[1];
values[0] = ValueType(1, 0); // 1.0
values[1] = ValueType(0, 0); // 0.0
testNeuralNet.feedForward(&values[0]);
testNeuralNet.getLearnedValues(&learnedValues[0]);
// learnedValues[0] should be close to 1.0 (XOR of 1,0)
PyTorch Export: GRU
Source code: examples/pytorch/gru/gru_export.py
GRU export is more complex because PyTorch and tinymind use different gate orderings:
- PyTorch gate order: r (reset), z (update), n (new/candidate)
- Tinymind gate order: z (update), r (reset), n (candidate)
The export script handles this reordering:
def export_gru_weights(model, path, use_q16_16=True):
H = model.hidden_size
convert = float_to_q16_16 if use_q16_16 else lambda x: x
# Gate reorder: PyTorch [r, z, n] -> TinyMind [z, r, n]
gate_reorder = [1, 0, 2]
w_ih = model.gru.weight_ih_l0.detach().numpy()
w_hh = model.gru.weight_hh_l0.detach().numpy()
b_ih = model.gru.bias_ih_l0.detach().numpy()
b_hh = model.gru.bias_hh_l0.detach().numpy()
w_out = model.fc.weight.detach().numpy()
b_out = model.fc.bias.detach().numpy()
I = w_ih.shape[1]
values = []
# 1. Input-to-hidden weights (gated, reordered)
for i in range(I):
for h in range(H):
for g in range(3):
pg = gate_reorder[g]
values.append(convert(float(w_ih[pg * H + h, i])))
# 2. Input bias (combine PyTorch's two bias vectors)
for h in range(H):
for g in range(3):
pg = gate_reorder[g]
values.append(convert(float(b_ih[pg * H + h] + b_hh[pg * H + h])))
# 3. Recurrent-to-hidden weights (gated, reordered)
for r in range(H):
for h in range(H):
for g in range(3):
pg = gate_reorder[g]
values.append(convert(float(w_hh[pg * H + h, r])))
# 4. Hidden-to-output weights (not gated)
for h in range(H):
for o in range(w_out.shape[0]):
values.append(convert(float(w_out[o, h])))
# 5. Output bias
for o in range(len(b_out)):
values.append(convert(float(b_out[o])))
with open(path, 'w') as f:
for v in values:
f.write(f"{v}\n")
Key details:
- PyTorch stores two separate bias vectors (
bias_ihandbias_hh); tinymind expects one combined bias, so the export script adds them together. - Weight matrices must be transposed (PyTorch uses
[out_features, in_features]layout). - Gate indices must be reordered from PyTorch’s
[r, z, n]to tinymind’s[z, r, n].
Workflow Summary
- Train your network in PyTorch with full floating-point precision, GPU acceleration, and the full PyTorch ecosystem.
- Export weights using the provided Python scripts, converting to Q-format integer representation if deploying with fixed-point.
- Load weights in C++ using the appropriate
FileManager::loadNetworkWeights(). - Deploy as a non-trainable (
IsTrainable=false) network for minimum memory footprint.
This workflow gives you the best of both worlds: state-of-the-art training infrastructure from PyTorch and ultra-efficient inference from tinymind C++.