AGP (Asynchronous Query Processor) is a system designed to manage the asynchronous execution of SQL queries against a ClickHouse cluster. It handles query orchestration, execution, and result management efficiently, making it suitable for long-running analytical workloads where users can submit queries and retrieve results without blocking.
The public repository is available at https://github.com/agnosticeng/agp .
Purpose
AGP enables scalable, non-blocking query processing by queuing SQL queries, executing them via distributed workers, and providing status tracking and result retrieval. Key capabilities include query deduplication (via query_id or SQL hash to avoid redundant runs), result storage in object stores with expiration, and tier-based resource allocation for prioritizing executions based on user roles (e.g., higher tiers get more CPU or longer timeouts).
Components
- API Server: Exposes endpoints for creating executions, listing queries, polling statuses (PENDING, RUNNING, CANCELED, FAILED, SUCCEEDED), and retrieving results. Built with OpenAPI for easy integration.
- Workers: Poll PostgreSQL for queued executions, run queries on ClickHouse, update statuses, and store results. Configurable for tiers with distinct ClickHouse settings or clusters.
- Bookkeeper: Ensures system reliability by detecting dead workers, recovering stuck executions, and expiring old results to maintain stability.
- Storage: Uses PostgreSQL for queuing and metadata, with results persisted to object stores (e.g., S3) or local filesystems for flexible access.
AGP’s architecture supports efficient resource use and staleness-tolerant analytics, integrating seamlessly with ClickHouse for high-performance querying. For API details, refer to the embedded Swagger UI or OpenAPI spec in the repository.