#string #search #text #aho #corasick

bin aho-corasick

Fast multiple substring searching with finite state machines

26 releases

0.6.6 Jul 12, 2018
0.6.4 Nov 30, 2017
0.6.3 Mar 17, 2017
0.5.3 Sep 11, 2016
0.3.0 Jul 14, 2015

#2 in Text processing

Download history 37507/week @ 2018-05-06 47526/week @ 2018-05-13 50017/week @ 2018-05-20 46636/week @ 2018-05-27 48945/week @ 2018-06-03 49278/week @ 2018-06-10 45835/week @ 2018-06-17 48953/week @ 2018-06-24 57438/week @ 2018-07-01 55563/week @ 2018-07-08 60059/week @ 2018-07-15 62188/week @ 2018-07-22 51464/week @ 2018-07-29

217,233 downloads per month

This crate provides an implementation of the Aho-Corasick algorithm. Its intended use case is for fast substring matching, particularly when matching multiple substrings in a search text. This is achieved by compiling the substrings into a finite state machine.

This implementation provides optimal algorithmic time complexity. Construction of the finite state machine is O(p) where p is the length of the substrings concatenated. Matching against search text is O(n + p + m), where n is the length of the search text and m is the number of matches.

Build status

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/aho-corasick/.

Example

The documentation contains several examples, and there is a more complete example as a full program in examples/dict-search.rs.

Here is a quick example showing simple substring matching:

use aho_corasick::{Automaton, AcAutomaton, Match};

let aut = AcAutomaton::new(vec!["apple", "maple"]);
let mut it = aut.find("I like maple apples.");
assert_eq!(it.next(), Some(Match {
    pati: 1,
    start: 7,
    end: 12,
}));
assert_eq!(it.next(), Some(Match {
    pati: 0,
    start: 13,
    end: 18,
}));
assert_eq!(it.next(), None);

Alternatives

Aho-Corasick is useful for matching multiple substrings against many long strings. If your long string is fixed, then you might consider building a suffix array of the search text (which takes O(n) time). Matches can then be found in O(plogn) time.

Unlicense/MIT license

Dependencies

Reverse deps