Kolmogorov-Arnold Networks (KAN)
Tinymind implements Kolmogorov-Arnold Networks (KAN), a neural network architecture where learnable activation functions are placed on the edges (connections) rather than at the nodes. In a KAN, each edge has its own B-spline activation function, and nodes simply sum their inputs. This is in contrast to standard MLPs where edges carry scalar weights and nodes apply a shared activation function.
KAN can be more parameter-efficient than MLP for certain smooth, low-dimensional functions, learning the activation shape rather than relying on fixed activations with learned weights.
KAN on Embedded: Trading Memory for Accuracy
KAN uses more memory per connection than MLP (8 parameters per edge vs 1 weight), so it is 3-4x larger for the same topology. However, KAN can sometimes approximate smooth functions with fewer neurons than an equivalent MLP, potentially offsetting the per-edge cost.
Even at its largest, a trainable KAN (2->5->1) in Q8.8 fixed-point is 1,192 bytes – still well within reach for any ARM Cortex-M class device. For inference-only deployment (IsTrainable=false), this drops to 416 bytes.
For fixed-point targets, always use SplineDegree=1 (piecewise linear). Higher-degree polynomials involve multi-step intermediate computations that risk overflow in Q-format arithmetic.
Template Declaration
template<
typename ValueType,
size_t NumberOfInputs,
size_t NumberOfHiddenLayers,
size_t NumberOfNeuronsInHiddenLayers,
size_t NumberOfOutputs,
typename TransferFunctionsPolicy,
bool IsTrainable = true,
size_t BatchSize = 1,
size_t GridSize = 5,
size_t SplineDegree = 3
>
class KolmogorovArnoldNetwork
Template Parameters:
ValueType- Numeric type (QValue,float, ordouble)NumberOfInputs- Number of input neuronsNumberOfHiddenLayers- Number of hidden layers (>= 1)NumberOfNeuronsInHiddenLayers- Neurons per hidden layerNumberOfOutputs- Number of output neuronsTransferFunctionsPolicy-KanTransferFunctions<...>policy classIsTrainable- Enable/disable training (non-trainable mode saves memory)BatchSize- Gradient accumulation batch sizeGridSize- Number of B-spline grid intervals (default 5)SplineDegree- B-spline polynomial degree (default 3)
Edge Function
Each KAN edge computes:
phi(x) = w_b * SiLU(x) + w_s * spline(x)
Where:
w_bis the base weight (scalar)SiLU(x) = x * sigmoid(x)is the residual activation (reuses existing sigmoid lookup tables)w_sis the spline weight (scalar)spline(x)is a B-spline evaluated usingGridSize + SplineDegreelearnable coefficients
Each edge stores 2 + GridSize + SplineDegree learnable parameters. For GridSize=5, SplineDegree=1 (piecewise linear), that’s 8 parameters per edge vs 1 weight per edge in an MLP.
B-Spline Evaluation
Tinymind uses the De Boor algorithm for efficient B-spline evaluation. The UniformKnotVector template generates evenly spaced knot vectors at initialization.
template<typename ValueType, size_t GridSize, size_t SplineDegree>
struct UniformKnotVector
{
static const size_t NumberOfKnots = GridSize + 2 * SplineDegree + 1;
static const size_t NumberOfBasisFunctions = GridSize + SplineDegree;
// ...
};
KAN Transfer Functions
KAN uses its own transfer functions policy class:
template<
typename ValueType,
class KanRandomNumberGeneratorPolicy,
unsigned NumberOfOutputNeurons = 1,
class KanNetworkInitializationPolicy = tinymind::DefaultNetworkInitializer<ValueType>,
class KanErrorCalculatorPolicy = tinymind::MeanSquaredErrorCalculator<ValueType, NumberOfOutputNeurons>,
class KanZeroTolerancePolicy = tinymind::ZeroToleranceCalculator<ValueType>
>
struct KanTransferFunctions
Example: KAN XOR
Source code: examples/kan_xor/
Network Definition
typedef tinymind::QValue<8, 8, true> ValueType;
static const size_t GRID_SIZE = 5;
static const size_t SPLINE_DEGREE = 1; // piecewise linear -- best for fixed-point
typedef tinymind::KanTransferFunctions<ValueType,
RandomNumberGenerator,
1> TransferFunctionsType;
typedef tinymind::KolmogorovArnoldNetwork<ValueType,
2, // inputs
1, // hidden layers
5, // neurons per hidden layer
1, // outputs
TransferFunctionsType,
true, // trainable
1, // batch size
GRID_SIZE,
SPLINE_DEGREE> KanNetworkType;
Training Loop
The KAN API is identical to MultilayerPerceptron:
KanNetworkType testKanNet;
ValueType values[2], output[1], learnedValues[1], error;
for (unsigned i = 0; i < 20000; ++i)
{
generateXorTrainingValue(values[0], values[1], output[0]);
testKanNet.feedForward(&values[0]);
error = testKanNet.calculateError(&output[0]);
if (!KanNetworkType::KanTransferFunctionsPolicy::isWithinZeroTolerance(error))
{
testKanNet.trainNetwork(&output[0]);
}
testKanNet.getLearnedValues(&learnedValues[0]);
}
Building The Example
cd examples/kan_xor
make # debug build
make release # optimized build
cd output
./kan_xor
Size Comparison: KAN vs MLP
| MLP [2]->[3]->[1] | KAN [2]->[5]->[1] G=5 k=1 | |
|---|---|---|
| Trainable (Q8.8) | 328 bytes | 1,192 bytes |
| Non-trainable (Q8.8) | 144 bytes | 416 bytes |
| Trainable (double) | 1,008 bytes | 4,208 bytes |
| Parameters per edge | 1 scalar weight | 8 (6 coefficients + w_b + w_s) |
Weight Import/Export
KAN weights can be saved and loaded using KanNetworkPropertiesFileManager:
typedef tinymind::KanNetworkPropertiesFileManager<KanNetworkType> FileManager;
// Save
std::ofstream outFile("kan_weights.txt");
FileManager::storeNetworkWeights(testKanNet, outFile);
// Load
std::ifstream inFile("kan_weights.txt");
FileManager::template loadNetworkWeights<ValueType, ValueType>(testKanNet, inFile);
When To Use KAN vs MLP
- KAN excels at learning smooth, low-dimensional functions with fewer neurons than an equivalent MLP. The learnable activation shape on each edge gives KAN more expressiveness per connection.
- MLP is more memory-efficient per connection (1 weight vs 8+ parameters per edge) and benefits from decades of optimization. For problems where fixed activations like tanh or ReLU are sufficient, MLP is the better choice on embedded systems.
- For fixed-point targets, always use
SplineDegree=1(piecewise linear) to avoid overflow from higher-order polynomial terms.