#time #nlp #parse

two_timer

parser for English time expressions

9 stable releases

✓ Uses Rust 2018 edition

new 1.0.8 Feb 16, 2019
1.0.7 Feb 10, 2019
1.0.3 Jan 20, 2019
0.1.0 Dec 29, 2018

#47 in Date and time

Download history 10/week @ 2018-12-25 24/week @ 2019-01-01 21/week @ 2019-01-08 22/week @ 2019-01-15 8/week @ 2019-01-22 25/week @ 2019-01-29 12/week @ 2019-02-05

41 downloads per month
Used in 1 crate

GPL-2.0 license

65KB
1.5K SLoC

two-timer

Rust library for parsing English time expressions into start and end timestamps

This takes English expressions and returns a time range which ideally matches the expression. You might use this for registering the temporal extent of an event, say, or finding lines in a log file.

Some expressions it can handle:

  • from now to eternity
  • today
  • tomorrow
  • last month
  • this year
  • 5/6/69
  • June 6, 2010
  • forever
  • 3:00 AM
  • 3AM
  • June '05
  • Monday through next Thursday
  • from mon at 15:00:05 to now
  • 1960-05-06
  • 5000BCE
  • next weekend
  • 2000
  • the nineteenth of March 1810
  • the 5th of November
  • the ides of March
  • the first
  • two seconds before 12:00 PM
  • 1 week after May first
  • 15 minutes around 12:13:43 PM
  • noon on May 6, 1969
  • midnight on May 6, 1969
  • Friday the 13th
  • 2 weeks ago
  • ten seconds from now
  • 5 minutes before and after midnight
  • 1969-05-06 12:03:05

The complete API is available at https://docs.rs/two_timer/0.1.0/two_timer/.


lib.rs:

This crate provides a parse function to convert English time expressions into a pair of timestamps representing a time range. It converts "today" into the first and last moments of today, "May 6, 1968" into the first and last moments of that day, "last year" into the first and last moments of that year, and so on. It does this even for expressions generally interpreted as referring to a point in time, such as "3 PM". In these cases the width of the time span varies according to the specificity of the expression. "3 PM" has a granularity of an hour, "3:00 PM", of a minute, "3:00:00 PM", of a second. For pointwise expression the first moment is the point explicitly named. The parse expression actually returns a 3-tuple consisting of the two timestamps and whether the expression is literally a range -- two time expressions separated by a preposition such as "to", "through", "up to", or "until".

Example

extern crate two_timer;
use two_timer::{parse, Config};
extern crate chrono;
use chrono::naive::NaiveDate;

pub fn main() {
let phrases = [
"now",
"this year",
"last Friday",
"from now to the end of time",
"Ragnarok",
"at 3:00 pm today",
"5/6/69",
"Tuesday, May 6, 1969 at 3:52 AM",
"March 15, 44 BC",
"Friday the 13th",
"five minutes before and after midnight",
];
// find the maximum phrase length for pretty formatting
let max = phrases
.iter()
.max_by(|a, b| a.len().cmp(&b.len()))
.unwrap()
.len();
for phrase in phrases.iter() {
match parse(phrase, None) {
Ok((d1, d2, _)) => println!("{:width$} => {} --- {}", phrase, d1, d2, width = max),
Err(e) => println!("{:?}", e),
}
}
let now = NaiveDate::from_ymd_opt(1066, 10, 14).unwrap().and_hms(12, 30, 15);
println!("\nlet \"now\" be some moment during the Battle of Hastings, specifically {}\n", now);
let conf = Config::new().now(now);
for phrase in phrases.iter() {
match parse(phrase, Some(conf.clone())) {
Ok((d1, d2, _)) => println!("{:width$} => {} --- {}", phrase, d1, d2, width = max),
Err(e) => println!("{:?}", e),
}
}
}

produces

now                                    => 2019-02-03 14:40:00 --- 2019-02-03 14:41:00
this year                              => 2019-01-01 00:00:00 --- 2020-01-01 00:00:00
last Friday                            => 2019-01-25 00:00:00 --- 2019-01-26 00:00:00
from now to the end of time            => 2019-02-03 14:40:00 --- +262143-12-31 23:59:59.999
Ragnarok                               => +262143-12-31 23:59:59.999 --- +262143-12-31 23:59:59.999
at 3:00 pm today                       => 2019-02-03 15:00:00 --- 2019-02-03 15:01:00
5/6/69                                 => 1969-05-06 00:00:00 --- 1969-05-07 00:00:00
Tuesday, May 6, 1969 at 3:52 AM        => 1969-05-06 03:52:00 --- 1969-05-06 03:53:00
March 15, 44 BC                        => -0043-03-15 00:00:00 --- -0043-03-16 00:00:00
Friday the 13th                        => 2018-07-13 00:00:00 --- 2018-07-14 00:00:00
five minutes before and after midnight => 2019-02-02 23:55:00 --- 2019-02-03 00:05:00

let "now" be some moment during the Battle of Hastings, specifically 1066-10-14 12:30:15

now                                    => 1066-10-14 12:30:00 --- 1066-10-14 12:31:00
this year                              => 1066-01-01 00:00:00 --- 1067-01-01 00:00:00
last Friday                            => 1066-10-05 00:00:00 --- 1066-10-06 00:00:00
from now to the end of time            => 1066-10-14 12:30:00 --- +262143-12-31 23:59:59.999
Ragnarok                               => +262143-12-31 23:59:59.999 --- +262143-12-31 23:59:59.999
at 3:00 pm today                       => 1066-10-14 15:00:00 --- 1066-10-14 15:01:00
5/6/69                                 => 0969-05-06 00:00:00 --- 0969-05-07 00:00:00
Tuesday, May 6, 1969 at 3:52 AM        => 1969-05-06 03:52:00 --- 1969-05-06 03:53:00
March 15, 44 BC                        => -0043-03-15 00:00:00 --- -0043-03-16 00:00:00
Friday the 13th                        => 1066-07-13 00:00:00 --- 1066-07-14 00:00:00
five minutes before and after midnight => 1066-10-13 23:55:00 --- 1066-10-14 00:05:00

For the full grammar of time expressions, view the source of the parse function and scroll up. The grammar is provided at the top of the file.

Relative Times

It is common in English to use time expressions which must be interpreted relative to some context. The context may be verb tense, other events in the discourse, or other semantic or pragmatic clues. The two_timer parse function doesn't attempt to infer context perfectly, but it does make some attempt to get the context right. So, for instance "last Monday through Friday", said on Saturday, will end on a different day from "next Monday through Friday". The general rules are

  1. a fully-specified expression in a pair will provide the context for the other expression
  2. a relative expression will be interpreted as appropriate given its order -- the second expression describes a time after the first
  3. if neither expression is fully-specified, the first will be interpreted relative to "now" and the second relative ot the first

The rules of interpretation for relative time expressions in ranges will likely be refined further in the future.

Clock Time

The parse function interprets expressions such as "3:00" as referring to time on a 24 hour clock, so "3:00" will be interpreted as "3:00 AM". This is true even in ranges such as "3:00 PM to 4", where the more natural interpretation might be "3:00 PM to 4:00 PM".

Years Near 0

Since it is common to abbreviate years to the last two digits of the century, two-digit years will be interpreted as abbreviated unless followed by a suffix such as "B.C.E." or "AD". They will be interpreted as the the nearest appropriate previous year to the current moment, so in 2010 "'11" will be interpreted as 1911, not 2011.

The Second Time in Ranges

For single expressions, like "this year", "today", "3:00", or "next month", the second of the two timestamps is straightforward -- it is the end of the relevant temporal unit. "1971" will be interpreted as the first moment of the first day of 1971 through, but excluding, the first moment of the first day of 1972, so the second timestamp will be this first excluded moment.

When the parsed expression describes a range, we're really dealing with two potentially overlapping pairs of timestamps and the choice of the terminal timestamp gets trickier. The general rule will be that if the second interval is shorter than a day, the first timestamp is the first excluded moment, so "today to 3:00 PM" means the first moment of the day up to, but excluding, 3:00 PM. If the second unit is as big as or larger than a day, which timestamp is used varies according to the preposition. "This week up to Friday" excludes all of Friday. "This week through Friday" includes all of Friday. Prepositions are assumed to fall into either the "to" class or the "through" class. You may also use a series of dashes as a synonym for "through", so "this week - fri" is equivalent to "this week through Friday". For the most current list of prepositions in each class, consult the grammar used for parsing, but as of the moment, these are the rules:

up_to => [["to", "until", "up to", "till"]]
through => [["up through", "through", "thru"]] | r("-+")

Pay Periods

I'm writing this library in anticipation of, for the sake of amusement, rewriting JobLog in Rust. This means I need the time expressions parsed to include pay periods. Pay periods, though, are defined relative to some reference date -- a particular Sunday, say -- and have a variable period. two_timer, and JobLog, assume pay periods are of a fixed length and tile the timeline without overlap, so a pay period of a calendrical month is problematic.

If you need to interpret "last pay period", say, you will need to specify when this pay period began, or when some pay period began or will begin, and a pay period length in days. The parse function has a second optional argument, a Config object, whose chief function outside of testing is to provide this information. So, for example, you could do this:

# extern crate two_timer;
# use two_timer::{parse, Config};
let (reference_time, _, _) = parse("5/6/69", None).unwrap();
let config = Config::new().pay_period_start(Some(reference_time.date()));
let (t1, t2, _) = parse("next pay period", Some(config)).unwrap();

Ambiguous Year Formats

two_timer will try various year-month-day permutations until one of them parses given that days are in the range 1-31 and months, 1-12. This is the order in which it tries these permutations:

  1. year/month/day
  2. year/day/month
  3. month/day/year
  4. day/month/year

The potential unit separators are /, ., and -. Whitespace is optional.

Timezones

At the moment two_timer only produces "naive" times. Sorry about that.

Dependencies

~2.5MB
~44K SLoC