Your first pipeline

This tutorial guides you through setting up a data pipeline on the Agnostic platform, a managed service that deploys cloud/data pipelines using ClickHouse and Apache Iceberg with automatic scaling and compute instance sizing, eliminating DevOps overhead. This tutorial focuses on the platform setup process, as pipeline configuration details are covered in the AGT section of the documentation.

Access the Agnostic platform at https://app.agnostic.tech using your credentials. Ensure an admin has configured your team’s managed infrastructure and project settings, including access to ClickHouse and Iceberg resources.

login

From the platform’s dashboard:

  1. Navigate to the Pipelines section.
  2. Click New to begin configuring a new pipeline.

create_pipeline

Link your GitHub repository to the platform:

  1. In the pipeline creation interface, select Connect GitHub Repository.
  2. Authenticate with GitHub and choose the repository containing your pipeline configuration files (e.g., pipeline.yaml and associated queries).
  3. Select the relevant files or directory (e.g., examples/hackernews_posts from the AGT repository).

Set up the pipeline:

  1. Name the Pipeline: Provide a clear name (e.g., “HackerNews ETL”).
  2. Set Variables: Specify required vars, such as:
    • ICEBERG_DESTINATION_TABLE_LOCATION: The S3 endpoint URL (e.g., s3://your-bucket/hn_posts).
    • ORDER_BY: The sorting column (e.g., id).
    • Other variables as needed by your pipeline’s queries.
  3. Verify that the selected configuration files align with your pipeline’s requirements, as detailed in the AGT documentation.

pipeline_form

  1. Click Create Pipeline to deploy it on Agnostic’s managed infrastructure, which automatically provisions and scales resources.
  2. Monitor the pipeline’s status in the dashboard for logs and progress updates.
  3. If issues arise, review logs or check configuration files and variables.

pipeline

To query the newly created data, use AGP (Agnostic Query Platform), as described in the AGP section of the documentation. AGP provides tools to connect to your Iceberg table and analyze the output efficiently.

This tutorial covers the essentials of deploying a pipeline on the Agnostic platform. For pipeline configuration details, refer to the AGT section , and for querying data, see the AGP documentation.