Elara libraries implementation notes

## Elara math/Elara array In Python it is frequent to write code like this: ``` x = np.linspace(0, 50) x2 = x**2 x3 = np.sin(x2) ``` In Rust, this would be very bad coding. If the underlying power and sine math operations wanted to replicate the "Python feel" and keep all three variables (`x`, `x2`, `x3`) accessible in memory, then in Rust those operations will require a clone each time, which wastes memory (since you'll have 3 copies of the same array). Meanwhile, if those same (power and sine) operations followed standard Rust convention, then the memory would be moved twice: first `x -> x2` and then `x2 -> x3`, meaning that `x` and `x2` will both be inaccessible afterwards. Both are definitely not ideal. So ultimately there are only three real options: 1. Every time you perform one of those operations (power and sine, specifically), they will mutate `x`, and the result should return a reference to `x`. This means that `x2` and `x3` will both be references to `x` and will be mutable references. An explicit clone 2. Implement all operations via copy, and say explicitly in the user guide that it is highly recommended to use the `.mapv()` (vector map) operation as much as possible to be able to do all your operations at once (for instance, `y = x.mapv(|el| math.sin(pow(el, 2))).collect()`) so that you don't need to use intermediate variables that waste memory by unnecessarily copying. 3. Lazy operations: instead of returning an actual new array, you'd return a type like `MathOp<Sin<Pow<2>>, NdArray>` that evaluates the actual values only when they're read The [conventions of ndarray](https://docs.rs/ndarray/latest/ndarray/struct.ArrayBase.html#arithmetic-operations) follow a relatively composite approach that allows both strategies (1) and (2). However, Elara Math is designed for speed, so the plan is to maximize performance, but this requires careful writing of code: | Operation | Implementation | Dependent on order of operands? | Overwrites/moves (if any) | | -------------------------------------------------------- | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | ------------------------------------------- | | `c = &a + &b` where `&a` and `&b` are mutable references | Consumes `a` and updates it, then returns it. `a` is moved to `c`. | Yes, first operand is always the operand is always the array that is moved and updated. | `a` is moved to `c`, `b` remains accessible | | `c = a + b` | Consumes `a` and updates it, then returns it. `a` is moved to `c`. | Yes, first operand is always the operand is always the array that is moved and updated. | `a` is moved to `c`, `b` remains accessible | | `c = a + &b` | Consumes `a` and updates it, then returns it. `a` is moved to `c`. | Yes, the operand passed by value (as opposed to by reference) is always the array that is moved and updated. | `a` is moved to `c`, `b` remains accessible | | `c = b + &a` | Consumes `b` and updates it, then returns it. `b` is moved to `c`. | Yes, the operand passed by value (as opposed to by reference) is always the array that is moved and updated. | `b` is moved to `c`, `a` remains accessible | | `c = a.clone() + b` | Copies data from `a` and `b` into a new array `c`, then returns the new array. | No | None, `a` and `b` both remain accessible | | `c = a.sin()` | Consumes `a` and updates it. `a` is moved to `c`. | N/A | `a` is moved to `c` | | `c = a.sin_copy()` | Equivalent to `c = a.copy().sin()`. Copies data from `a` and returns a new array. | N/A | None, `a` remains accessible | | `c = a.mapv(...)` | Consumes `a` and updates it. `a` is moved to `c`. | N/A | `a` is moved to `c` | | `c = a.mapv_copy(...)` | Equivalent to `c = a.copy().mapv(...)`. Copies data from `a` and returns a new array. | N/A | None, `a` and `c` both remain accessible. | | `c = a.dot(&b)` or `c = a.dot(b)` | Consumes `a` and updates it. `a` is moved to `c`. | N/A | `a` is moved to `c` | | `c = a.dot_copy(&b)` or `c = a.dot_copy(b)` | Equivalent to `c = a.copy().dot(&b)`. Copies data from `a` and returns a new array. | N/A | None, `a` and `c` both remain accessible. | The general idea is this: **Elara array and Elara math will aggressively avoid clones by consuming and updating your arrays**. This means that if you want to make sure that the original array stays, you **must** pass by a clone, e.g. `c = a.clone() + b` rather than `c = a + b`. This means that most operations are fast and efficient by default (although we'll need to do benchmarking to be sure). Finally, another important performance optimization is to switch from the backing datatype for storing the data in the `NdArray` from `Vec<T>` to `Cow` (copy-on-write smart pointer). After all, NdArrays by definition are supposed to be fixed-length arrays: you can reshape them and slice them but you're not supposed to insert/remove elements from them. Thus, there is no need for the backing datatype to be a dynamically-sized array when that's not necessary in the first place. Additionally, the `shape` should also be a `Cow`: this is less to do with performance and more to do with user-friendliness. Right now, it is necessary to write `Ndarray<T, N>` in specifying the shape in the type signature, which is extremely unintuitive and makes the library very difficult to write and use, since you have to both specify the type and the dimensionality of the array (which you might not even remember!) Rather, it is better to omit that entirely and just do a single generic parameter (that being the dtype). A minimal sketch of the revised code would be this: ```rust use std::borrow::Cow; #[derive(Clone, Debug)] struct NdArray<'a, T: Clone> { shape: Cow<'a, [usize]>, data: Cow<'a, [T]>, } // Convenience types for specific // types of arrays type NdArrayi32<'a> = NdArray<'a, i32> type NdArrayu32<'a> = NdArray<'a, u32> type NdArrayf32<'a> = NdArray<'a, f32> type NdArrayf64<'a> = NdArray<'a, f64> // Subtypes/classes for 2D and 3D // arrays (since some operations // e.g. cross/dot product are only // well-defined for arrays of // certain dimensionality) // this strong typing helps avoid // runtime errors like ".cross() operation // not supported for non-2D/3D arrays" #[derive(Clone, Debug)] struct NdArray1D<'a, T: Clone> { shape: [usize; 1], data: Cow<'a, [T]>, } // dot product exclusively supported // you can run .flatten() on other // ndarrays to convert them to NdArray1D #[derive(Clone, Debug)] struct NdArray2D<'a, T: Clone> { shape: [usize; 2], data: Cow<'a, [T]>, } #[derive(Clone, Debug)] struct NdArray3D<'a, T: Clone> { shape: [usize; 3], data: Cow<'a, [T]>, } impl<'a, T: Clone> NdArray<'a, T> { // We default to not owning unless // we have to, this also makes it // (hopefully) easier to avoid // the constant borrow checker // problems fn new(data: &'a[T], shape: &'a[usize]) -> Self { NdArray { shape: Cow::Borrowed(shape), data: Cow::Borrowed(data) } } // This DOES own the data but it comes from passing // already owned-data (so this will cause a move) // (in this case, either an array or Vec, though // array is preferred) fn new_owned<A, S>(data: A, shape: S) -> Self where A: Into<Vec<T>>, S: Into<Vec<usize>> { NdArray { // not 100% sure if .into() // causes a clone, hopefully // it is just a borrow data: Cow::Owned(data.into()), shape: Cow::Owned(shape.into()), } } fn reshape<S>(&mut self, shape: S) where S: Into<Vec<usize>> + Clone { // Clone here is cheap since // shape is not going to be a // big array and it avoids // possible problems later on self.shape = Cow::from(shape.into()); } } fn main() { // example of passing data by reference // let my_array = NdArray::new( // &[1, 2, 3, // 4, 5, 6], &[2, 3]); // println!("{:?}", my_array); // example of passing data and // transfering ownership let mut my_array = NdArray::new_owned( [1, 2, 3, 4, 5, 6], [2, 3]); println!("Before reshape: {:?}", my_array); my_array.reshape([3, 2]); // To avoid a move, you must either // pass by reference or clone (which is expensive) let another_array = &my_array; // 1st option // let another_array = my_array.clone(); // 2nd option println!("Linked array {:?}", another_array); println!("After reshape: {:?}", my_array); } ``` It should also be mentioned that in the user guide (on cargo doc), when we eventually work on `ndarray` integration, it is necessary to do `use elara_array::NdArray as ElaraArray` or similar to avoid namespace clashes. Planned data saving API: ```rust // use a specialized nanoserde-based // binary file format for serializing arrays arr.save("array.elr"); NdArray::from_file("array.elr"); ``` Planned indexing/slicing API: ```rust // Indexing (for both getting and setting data) // these are implemented by two index implementations, // one for views and one for direct indices impl Index<[f64; N]> for NdArray<T, N>; impl Index<NdArrayView<f64, N>> for Ndarray<T, N>; // Construct a view // These replace numpy-style slices let yourview<f64, N> = NdArrayView::row_view(start, end, step); arr[[1, 3, 5]]; // direct indexing arr[yourview]; // view indexing // Slices pub fn slice(&self, slice: &[Range<usize>]) -> ArrayView<T, N>; pub struct ArraySlice; // returns view of entire array a[&[.., ..]] // returns view of 1st inner element a[&[s!(0), ..]] // this is equal to a[&[..1, ..]] // returns view of range a[&[1..2, 1..2]] // for more exotic slices use the dedicated slicer let s = ArraySlice::new_columns([1, 5, 8]); a[s] // returns another view ``` > **Acknowledgement:** a lot of these ideas came from the `nd_array` crate and `NdArray` and they deserve the credit for that. ## Elara ML For Elara ML there are actually 3 APIs planned to be implemented. The first one is PyTorch-style: ```rust pub struct MyModel { input_layer: Input, hidden_1: Dense, hidden_2: Dense, output_layer: Output } impl MyModel { // x and y are here only for shape determination fn new(x: Tensor, y: Tensor) -> MyModel { // automatic shape determination by passing // another layer as first argument let input_layer = Input::new(x); let hidden_1 = Dense::new(input_layer, 16); let hidden_2 = Dense::new(hidden_1, 16); let output_layer = Output::new(hidden_2, y); MyModel { input_layer, hidden_1, hidden_2, output_layer } } } impl Model for MyModel { // Models can only have one output, for // multi-input-output neural networks you // need to chain together multiple Models fn forward(x: Tensor) -> Tensor { // These absolutely don't need to // be in the same order as you declared // in new() (but probably should be so // that the auto shape determination works) let a = self.input_layer.forward(x); let b = self.hidden_1.forward(a); let c = self.hidden_2.forward(b); let d = self.output_layer.forward(c); d } } fn main() { let model = MyModel::new(); model.compile(Optimizers::SGD); model.fit(&x, &y, 500, 0.00001, true); } ``` This API makes it easiest to use pre-made models, because you can simply import the model and compile it. However, it might be too much abstraction - it can be a little hard to see what the model is actually doing, especially with methods like `compile()` and `fit()` that no longer have a 1-1 correspondence with performing operations on tensors. The second uses a macro `Sequential!` to imitate Keras's sequential API. This makes it easiest to learn, but again, abstracts away too much, which is not ideal, especially given how much debugging is done when making NNs. The third is most barebones, and is the Jax-inspired API. It looks like this: ```rust // This is just a convenient way of // holding layers, there is nothing // special about this struct struct Layers { pub input_layer: Input, pub hidden_1: Dense, pub hidden_2: Dense, pub output_layer: Output } impl Layers { // x and y are here only for shape determination fn new(x: Tensor, y: Tensor) -> Layers { // automatic shape determination by passing // another layer as first argument let input_layer = Input::new(x); let hidden_1 = Dense::new(input_layer, 16); let hidden_2 = Dense::new(hidden_1, 16); let output_layer = Output::new(hidden_2, y); MyModel { input_layer, hidden_1, hidden_2, output_layer } } // Note: for zero_grad() // and update(), these can be // made less verbose by creating an // iter() method - see // https://stackoverflow.com/questions/30218886/how-to-implement-iterator-and-intoiterator-for-a-simple-struct fn zero_grad(&self) { self.input_layer.zero_grad(); self.hidden_1.zero_grad(); self.hidden_2.zero_grad(); self.output_layer.zero_grad(); } fn update(&self, lr: f64) { self.input_layer.update(lr); self.hidden_1.update(lr); self.hidden_2.update(lr); self.output_layer.update(lr); } fn save(&self) { let weights = NNSerializer::new("weights.bin"); // Add labels to weights; they will be referred // to by their labels when the weights are loaded weights.add(self.input_layer, "input_layer"); weights.add(self.hidden_1, "hidden_1"); weights.add(self.hidden_2, "hidden_2"); weights.add(self.output_layer, "output_layer"); weights.write(); } } fn forward(layers: Layers, x: Tensor) -> Tensor { let a = layers.input_layer.forward(x); let b = layers.hidden_1.forward(a); let c = layers.hidden_2.forward(b); let d = layers.output_layer.forward(c); d } fn mean_squared_error(y: Tensor, y_pred: Tensor) -> Tensor { (&y_pred - &y).pow(2) } fn main() { // load x and y... let layers = Layers::new(); let pbar = TrainingProgress::new(); // used to display progress bars // here we write our custom optimizer for i in 0..1000 { let preds = forward(layers, x); // preds and loss are both tensors, so they // can work with all the standard tensor methods, // including output to graphviz files! let loss = mean_squared_error(y, preds); pbar.update(i, &loss); // shows latest progress let lr = 1.0 - 0.9*i/100.0 loss.backward(); layers.update(lr); layers.zero_grad(); } } ``` This approach has just the right amount of abstraction, and is very flexible, because it allows defining custom forward passes (with the ability to do multiple inputs or multiple outputs), custom loss functions, and custom optimizers. Furthermore, this API can easily interoperate with the PyTorch-style API. So this will be the API that is primarily focused on. ## Elara UI Elara UI/UX guidelines: high accessibility, visual comfort, and clarity are the main priorities. Elara UI should have responsivity by setting breakpoint functions in Component trait. For Elara UI also create pure CPU-based backend that uses a Rust-ported version of `fenster`. Users can choose which backend they want: - GPU backend is faster and leads to smoother UI rendering but uses a lot of battery and can be glitchy if graphics drivers aren't working correctly, and may not be compatible with very old devices - For low-powered devices CPU rendering is better, but it is much slower