Overview

AGT is a high-performance ETL (Extract, Transform, Load) engine built on ClickHouse, designed to streamline data engineering pipelines through SQL-native transformations. By leveraging ClickHouse’s powerful columnar database, AGT enables data engineers to create scalable and declarative ETL workflows entirely in SQL, eliminating the need for custom DSLs or external orchestration tools. With support for over 50 input/output formats and seamless integration with external databases, AGT is a versatile solution for modern data pipelines.

AGT empowers teams to build fast, maintainable, and context-aware pipelines for both batch and micro-batch processing. Deployable as a standalone binary, job, or service (e.g., via Kubernetes or cron), it delivers exceptional speed and flexibility while maintaining deterministic and reproducible results. AGT simplifies complex data engineering tasks, making it an ideal choice for organizations seeking efficient, SQL-driven data transformation workflows.

How AGT Works

AGT pipelines are defined as a sequence of templatized SQL queries, configured in a pipeline.yaml file, that handle all data processing logic, from extracting data from a source, transforming it, to loading it into a destination. These pipelines are executed by a ClickHouse instance through one of three engine options:

Remote: Queries run on an existing ClickHouse cluster accessible by the AGT process.
Local: Queries are executed by a ClickHouse server spawned and managed by AGT.

Each pipeline step consists of templatized SQL queries that utilize vars, dynamic key-value pairs passed via CLI at startup or generated during execution. The result set of a step’s final query automatically becomes vars for the next step, enabling dynamic data flow. Steps typically read from and write to temporary tables, outputting metadata to drive subsequent transformations, ensuring flexible and context-aware processing.

Previous Your first pipeline Next Tutorial: Hackernews