#regex #onig #oniguruma

onig

Rust-Onig is a set of Rust bindings for the Oniguruma regular expression library. Oniguruma is a modern regex library with support for multiple character encodings and regex syntaxes.

42 releases (stable)

4.3.2 Feb 9, 2019
4.2.1 Nov 10, 2018
4.1.0 Jul 29, 2018
3.2.2 Apr 25, 2018
0.4.0 Jan 31, 2016

#36 in Text processing

Download history 1791/week @ 2018-10-27 1476/week @ 2018-11-03 1808/week @ 2018-11-10 3329/week @ 2018-11-17 2118/week @ 2018-11-24 1373/week @ 2018-12-01 1131/week @ 2018-12-08 1237/week @ 2018-12-15 967/week @ 2018-12-22 915/week @ 2018-12-29 1063/week @ 2019-01-05 1479/week @ 2019-01-12 1488/week @ 2019-01-19 1363/week @ 2019-01-26 2488/week @ 2019-02-02

6,896 downloads per month
Used in 12 crates (4 directly)

MIT license

2.5MB
75K SLoC

C 72K SLoC // 0.0% comments Rust 3K SLoC // 0.4% comments Python 1K SLoC // 0.0% comments Shell 57 SLoC // 0.1% comments C++ 18 SLoC // 0.4% comments Batch 14 SLoC

lib.rs:

This crate provides a safe wrapper around the Oniguruma regular expression library.

Examples

use onig::Regex;

let regex = Regex::new("e(l+)").unwrap();
for (i, pos) in regex.captures("hello").unwrap().iter_pos().enumerate() {
    match pos {
         Some((beg, end)) =>
             println!("Group {} captured in position {}:{}", i, beg, end),
         None =>
             println!("Group {} is not captured", i)
    }
}

Match vs Search

There are two basic things you can do with a Regex pattern; test if the pattern matches the whole of a given string, and search for occurences of the pattern within a string. Oniguruma exposes these two concepts with the match and search APIs.

In addition two these two base Onigurma APIs this crate exposes a third find API, built on top of the search API.

# use onig::Regex;
let pattern = Regex::new("hello").unwrap();
assert_eq!(true, pattern.find("hello world").is_some());
assert_eq!(false, pattern.is_match("hello world"));

The Match API

Functions in the match API check if a pattern matches the entire string. The simplest of these is Regex::is_match. This retuns a true if the pattern matches the string. For more complex useage then Regex::match_with_options and Regex::match_with_encoding can be used. These allow the capture groups to be inspected, matching with different options, and matching sub-sections of a given text.

The Search API

Function in the search API search for a pattern anywhere within a string. The simplist of these is Regex::find. This returns the offset of the first occurence of the pattern within the string. For more complex useage Regex::search_with_options and Regex::search_with_encoding can be used. These allow capture groups to be inspected, searching with different options and searching within subsections of a given text.

The Find API

The find API is built on top of the search API. Functions in this API allow iteration across all matches of the pattern within a string, not just the first one. The functions deal with some of the complexities of this, such as zero-length matches.

The simplest step-up from the basic search API Regex::find is getting the captures relating to a match with the Regex::capturess method. To find capture information for all matches within a string Regex::find_iter and Regex::captures_iter can be used. The former exposes the start and end of the match as Regex::find does, the latter exposes the whole capture group information as Regex::captures does.

The std::pattern API

In addition to the main Oniguruma API it is possible to use the Regex object with the std::pattern API. To enable support compile with the std-pattern feature. If you're using Cargo you can do this by adding the following to your Cargo.toml:

[dependencies.onig]
version = "1.2"
features = ["std-pattern"]

Dependencies