Athena

Protocol: JSON 1.1 Endpoint: http://localhost:4566/

Floci emulates Amazon Athena with real SQL execution powered by a floci-duck sidecar container running DuckDB. When a query is submitted, Floci spins up the sidecar on first use, injects CREATE OR REPLACE VIEW statements for each Glue-registered table pointing to S3 data, then executes the SQL and stores results as CSV in S3.

Supported Actions

Action	Description
`StartQueryExecution`	Submits a SQL query; executed asynchronously via DuckDB
`GetQueryExecution`	Returns query status (`QUEUED`, `RUNNING`, `SUCCEEDED`, `FAILED`)
`GetQueryResults`	Returns the result set for a completed query
`ListQueryExecutions`	Returns a list of past query executions
`StopQueryExecution`	Cancels a running query
`CreateWorkGroup`	Creates a new workgroup
`GetWorkGroup`	Returns information about a workgroup
`DeleteWorkGroup`	Deletes a workgroup
`ListWorkGroups`	Lists all workgroups

How it works

Lazy sidecar start: On the first StartQueryExecution call, Floci checks for a local floci/floci-duck:latest image and starts the container. Subsequent queries reuse the running container.
Glue DDL injection: Floci reads all Glue tables for the target database and generates CREATE OR REPLACE VIEW statements mapping each table name to its S3 location via DuckDB's read_parquet, read_json_auto, or read_csv_auto functions — chosen based on the table's InputFormat or SerDe serialization library.
Query execution: The user's SQL is wrapped in COPY (...) TO 's3://...' (FORMAT CSV, HEADER) and executed. Results are written directly to the output S3 path.
Results retrieval: GetQueryResults reads the CSV back from S3 and returns it in the standard Athena ResultSet shape.

Format inference

The DuckDB read function is chosen from the Glue table's StorageDescriptor:

Condition	Read function
`InputFormat` or `SerializationLibrary` contains `parquet`	`read_parquet`
`InputFormat` or `SerializationLibrary` contains `json`	`read_json_auto`
`InputFormat` contains `hive`	`read_json_auto`
Anything else	`read_csv_auto`

Configuration

Property	Default	Description
`FLOCI_SERVICES_ATHENA_MOCK`	`false`	Set to `true` to disable DuckDB execution — queries immediately succeed with empty results
`FLOCI_SERVICES_DUCK_DEFAULT_IMAGE`	`floci/floci-duck:latest`	DuckDB sidecar image pulled on first use
`FLOCI_SERVICES_DUCK_URL`	(unset)	Point to an existing floci-duck instance and skip container management

Example — simple query

export AWS_ENDPOINT_URL=http://localhost:4566

# Start a query
QUERY_ID=$(aws athena start-query-execution \
  --query-string "SELECT 42 AS answer" \
  --query 'QueryExecutionId' \
  --output text)

# Wait for completion
aws athena get-query-execution --query-execution-id $QUERY_ID

# Get results
aws athena get-query-results --query-execution-id $QUERY_ID

Example — data lake query (S3 + Glue + Athena)

export AWS_ENDPOINT_URL=http://localhost:4566

# 1. Create S3 bucket and upload data
aws s3 mb s3://my-data-lake
echo '{"id":1,"amount":10.0}
{"id":2,"amount":20.0}
{"id":3,"amount":30.0}' | aws s3 cp - s3://my-data-lake/orders/data.json

# 2. Register table in Glue
aws glue create-database --database-input '{"Name":"analytics"}'

aws glue create-table \
  --database-name analytics \
  --table-input '{
    "Name": "orders",
    "StorageDescriptor": {
      "Location": "s3://my-data-lake/orders/",
      "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
      "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
      "SerdeInfo": {
        "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
      },
      "Columns": [
        {"Name": "id",     "Type": "int"},
        {"Name": "amount", "Type": "double"}
      ]
    }
  }'

# 3. Run Athena query
QUERY_ID=$(aws athena start-query-execution \
  --query-string "SELECT sum(amount) AS total FROM orders" \
  --query-execution-context Database=analytics \
  --query 'QueryExecutionId' \
  --output text)

# 4. Poll until done
while true; do
  STATE=$(aws athena get-query-execution \
    --query-execution-id $QUERY_ID \
    --query 'QueryExecution.Status.State' \
    --output text)
  [ "$STATE" = "SUCCEEDED" ] && break
  [ "$STATE" = "FAILED" ] && echo "Query failed" && exit 1
  sleep 1
done

# 5. Fetch results
aws athena get-query-results --query-execution-id $QUERY_ID

Shared sidecar with S3 Select

The floci-duck sidecar is shared between Athena and S3 Select. Once started by the first Athena query, it is also used by SelectObjectContent for CSV (with FileHeaderInfo=USE), JSON, and Parquet inputs. If Athena has not yet executed a query, S3 Select falls back to the built-in Java evaluator for CSV and JSON — Parquet always requires the sidecar.

See S3 Select for details on execution modes and supported SQL operators.

Mock mode

Set FLOCI_SERVICES_ATHENA_MOCK=true to skip DuckDB entirely for Athena. In this mode queries transition to SUCCEEDED immediately with an empty result set — useful for unit tests that only exercise the Athena state machine, not the query results.

When mock mode is enabled the sidecar does not start. S3 Select will use the Java evaluator for CSV and JSON. Parquet queries will fail unless FLOCI_SERVICES_DUCK_URL points to an already-running floci-duck instance.