#distributed #query #data #processing #sql

bin+lib datafusion

DataFusion is a SQL parser, planner, and execution framework for Rust with support for CSV and Apache Parquet file formats

32 releases

0.3.3 Sep 3, 2018
0.3.1 Jul 3, 2018
0.2.2 Mar 26, 2018

#39 in Database interfaces

Download history 153/week @ 2018-07-23 105/week @ 2018-07-30 126/week @ 2018-08-06 435/week @ 2018-08-13 25/week @ 2018-08-20 48/week @ 2018-08-27 32/week @ 2018-09-03 30/week @ 2018-09-10 14/week @ 2018-09-17 2/week @ 2018-09-24 101/week @ 2018-10-01 30/week @ 2018-10-08 13/week @ 2018-10-15

537 downloads per month

Apache-2.0

313KB
6K SLoC

DataFusion: SQL Query Execution in Rust

License Version Build Status Coverage Status Gitter chat

DataFusion is a SQL parser, planner, and query execution library for Rust. A DataFrame API is also provided.

The following features are currently supported:

  • SQL Parser, Planner and Optimizer
  • DataFrame API
  • Columnar processing using Apache Arrow
  • Support for local CSV and Apache Parquet files
  • Single-threaded execution of SQL queries, supporting:
    • Projection
    • Selection
    • Scalar Functions
    • Aggregates (Min, Max, Count)
    • Grouping
  • User-defined Scalar Functions (UDFs)

DataFusion can be used as a crate dependency in your project to add SQL support for custom data sources.

A Docker image is also available if you just want to run SQL queries against your CSV and Parquet files.

I have plans to make DataFusion a fully distributed compute platform with features similar to Apache Spark, but I need help from contributors to get there.

Project Home Page

The project home page is now at https://datafusion.rs and contains the roadmap as well as documentation for using this crate. I am using GitHub issues to track development tasks and feedback.

Prerequisites

  • Rust nightly (required by parquet-rs crate)

Building DataFusion

See BUILDING.md.

Gitter

There is a Gitter channel where you can ask questions about the project or make feature suggestions too.

Contributing

Contributors are welcome! Please see CONTRIBUTING.md for details.

Dependencies

~32MB
~483K SLoC