no-std build bytecount

count occurrences of a given byte, or the number of UTF-8 code points, in a byte slice, fast

10 releases

0.3.2 Aug 3, 2018
0.3.1 Jan 30, 2018
0.2.0 Aug 25, 2017
0.1.7 Jul 19, 2017
0.1.2 Sep 27, 2016

#6 in Algorithms

Download history 3405/week @ 2018-06-17 3804/week @ 2018-06-24 4073/week @ 2018-07-01 3921/week @ 2018-07-08 4804/week @ 2018-07-15 4859/week @ 2018-07-22 3272/week @ 2018-07-29 3537/week @ 2018-08-05 3769/week @ 2018-08-12 3155/week @ 2018-08-19 2826/week @ 2018-08-26 3833/week @ 2018-09-02 2795/week @ 2018-09-09

19,283 downloads per month
Used in 162 crates (11 directly)

Apache-2.0/MIT

16KB
212 lines

bytecount

Counting bytes really fast

Build Status Windows build status Current Version License: Apache 2.0/MIT

This uses the "hyperscreamingcount" algorithm by Joshua Landau to count bytes faster than anything else. The newlinebench repository has further benchmarks.

To use bytecount in your crate, if you have cargo-edit, just type cargo add bytecount in a terminal with the crate root as the current path. Otherwise you can manually edit your Cargo.toml to add bytecount = 0.3.2 to your [dependencies] section.

In your crate root (lib.rs or main.rs, depending on if you are writing a library or application), add extern crate bytecount;. Now you can simply use bytecount::count as follows:

extern crate bytecount;

fn main() {
    let mytext = "some potentially large text, perhaps read from disk?";
    let spaces = bytecount::count(mytext.as_bytes(), b' ');
    ..
}

bytecount supports two features to make use of modern CPU's features to speed up counting considerably. To allow your users to use them, add the following to your Cargo.toml:

[features]
avx-accel = ["bytecount/avx-accel"]
simd-accel = ["bytecount/simd-accel"]

Now your users can compile with SSE support (available on most modern x86_64 processors) using:

cargo build --release --features simd-accel

Or even with AVX support (which likely requires compiling for the native target CPU):

RUSTFLAGS="-C target-cpu=native" cargo build --release --features "simd-accel avx-accel"

The algorithm is explained in depth here.

Note that for very short slices, the data parallelism will likely not win much performance gains. In those cases, a naive count with a 32-bit counter may be a superior solution, unless counting really large byte slices.

License

Licensed under either of at your discretion:

Dependencies

~38KB

  • simd-accel? simd 0.2