wyrm

A low-overhead, define-by-run autodifferentiation library

14 releases (8 breaking)

0.9.1 Jun 2, 2018
0.8.1 May 27, 2018
0.7.2 Jan 21, 2018
0.2.0 Dec 26, 2017

#6 in Machine learning

Download history 1/week @ 2018-08-17 17/week @ 2018-08-24 1/week @ 2018-08-31 22/week @ 2018-09-07 3/week @ 2018-09-14 1/week @ 2018-09-21 44/week @ 2018-09-28 24/week @ 2018-10-05 3/week @ 2018-10-12 168/week @ 2018-10-19 15/week @ 2018-10-26 31/week @ 2018-11-02 97/week @ 2018-11-09

187 downloads per month
Used in 1 crate

MIT license

180KB
4.5K SLoC

Wyrm

Crates.io badge Docs.rs badge Build Status

A reverse mode, define-by-run, low-overhead autodifferentiation library.

Features

Performs backpropagation through arbitrary, define-by-run computation graphs, emphasizing low overhead estimation of sparse, small models on the CPU.

Highlights:

  1. Low overhead.
  2. Built-in support for sparse gradients.
  3. Define-by-run.
  4. Trivial Hogwild-style parallelisation, scaling linearly with the number of CPU cores available.

Quickstart

The following defines a univariate linear regression model, then backpropagates through it.

let slope = ParameterNode::new(random_matrix(1, 1));
let intercept = ParameterNode::new(random_matrix(1, 1));

let x = InputNode::new(random_matrix(1, 1));
let y = InputNode::new(random_matrix(1, 1));

let y_hat = slope.clone() * x.clone() + intercept.clone();
let mut loss = (y.clone() - y_hat).square();

To optimize the parameters, create an optimizer object and go through several epochs of learning:

let mut optimizer = SGD::new(0.1, vec![slope.clone(), intercept.clone()]);

for _ in 0..num_epochs {
    let x_value: f32 = rand::random();
    let y_value = 3.0 * x_value + 5.0;

    // You can re-use the computation graph
    // by giving the input nodes new values.
    x.set_value(x_value);
    y.set_value(y_value);

    loss.forward();
    loss.backward(1.0);

    optimizer.step(loss.parameters());
}

You can use rayon to fit your model in parallel, by first creating a set of shared parameters, then building a per-thread copy of the model:

let slope_param = Arc::new(HogwildParameter::new(random_matrix(1, 1)));
let intercept_param = Arc::new(HogwildParameter::new(random_matrix(1, 1)));
let num_epochs = 10;

(0..rayon::current_num_threads())
    .into_par_iter()
       .for_each(|_| {
           let slope = ParameterNode::shared(slope_param.clone());
           let intercept = ParameterNode::shared(intercept_param.clone());
           let x = InputNode::new(random_matrix(1, 1));
           let y = InputNode::new(random_matrix(1, 1));
           let y_hat = slope.clone() * x.clone() + intercept.clone();
           let mut loss = (y.clone() - y_hat).square();

           let mut optimizer = SGD::new(0.1, vec![slope.clone(), intercept.clone()]);

           for _ in 0..num_epochs {
               let x_value: f32 = rand::random();
               let y_value = 3.0 * x_value + 5.0;

               x.set_value(x_value);
               y.set_value(y_value);

               loss.forward();
               loss.backward(1.0);

               optimizer.step(loss.parameters());
           }
       });

BLAS support

You should enable BLAS support to get (much) better performance out of matrix-multiplication-heavy workloads. To do so, add the following to your Cargo.toml:

ndarray = { version = "0.11.0", features = ["blas", "serde-1"] }
blas-src = { version = "0.1.2", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.5.6", default-features = false, features = ["cblas"] }

Dependencies

~4MB
~77K SLoC