Athena
Protocol: JSON 1.1
Endpoint: http://localhost:4566/
Floci emulates Amazon Athena with real SQL execution powered by a floci-duck sidecar container running DuckDB. When a query is submitted, Floci spins up the sidecar on first use, injects CREATE OR REPLACE VIEW statements for each Glue-registered table pointing to S3 data, then executes the SQL and stores results as CSV in S3.
Supported Actions
| Action | Description |
|---|---|
StartQueryExecution |
Submits a SQL query; executed asynchronously via DuckDB |
GetQueryExecution |
Returns query status (QUEUED, RUNNING, SUCCEEDED, FAILED) |
GetQueryResults |
Returns the result set for a completed query |
ListQueryExecutions |
Returns a list of past query executions |
StopQueryExecution |
Cancels a running query |
CreateWorkGroup |
Creates a new workgroup |
GetWorkGroup |
Returns information about a workgroup |
ListWorkGroups |
Lists all workgroups |
How it works
- Lazy sidecar start: On the first
StartQueryExecutioncall, Floci checks for a localfloci/floci-duck:latestimage and starts the container. Subsequent queries reuse the running container. - Glue DDL injection: Floci reads all Glue tables for the target database and generates
CREATE OR REPLACE VIEWstatements mapping each table name to its S3 location via DuckDB'sread_parquet,read_json_auto, orread_csv_autofunctions — chosen based on the table'sInputFormator SerDe serialization library. - Query execution: The user's SQL is wrapped in
COPY (...) TO 's3://...' (FORMAT CSV, HEADER)and executed. Results are written directly to the output S3 path. - Results retrieval:
GetQueryResultsreads the CSV back from S3 and returns it in the standard AthenaResultSetshape.
Format inference
The DuckDB read function is chosen from the Glue table's StorageDescriptor:
| Condition | Read function |
|---|---|
InputFormat or SerializationLibrary contains parquet |
read_parquet |
InputFormat or SerializationLibrary contains json |
read_json_auto |
InputFormat contains hive |
read_json_auto |
| Anything else | read_csv_auto |
Configuration
| Property | Default | Description |
|---|---|---|
FLOCI_SERVICES_ATHENA_MOCK |
false |
Set to true to disable DuckDB execution — queries immediately succeed with empty results |
FLOCI_SERVICES_ATHENA_DEFAULT_IMAGE |
floci/floci-duck:latest |
DuckDB sidecar image |
FLOCI_SERVICES_ATHENA_DUCK_URL |
(unset) | Point to an existing floci-duck instance and skip container management |
Example — simple query
export AWS_ENDPOINT_URL=http://localhost:4566
# Start a query
QUERY_ID=$(aws athena start-query-execution \
--query-string "SELECT 42 AS answer" \
--query 'QueryExecutionId' \
--output text)
# Wait for completion
aws athena get-query-execution --query-execution-id $QUERY_ID
# Get results
aws athena get-query-results --query-execution-id $QUERY_ID
Example — data lake query (S3 + Glue + Athena)
export AWS_ENDPOINT_URL=http://localhost:4566
# 1. Create S3 bucket and upload data
aws s3 mb s3://my-data-lake
echo '{"id":1,"amount":10.0}
{"id":2,"amount":20.0}
{"id":3,"amount":30.0}' | aws s3 cp - s3://my-data-lake/orders/data.json
# 2. Register table in Glue
aws glue create-database --database-input '{"Name":"analytics"}'
aws glue create-table \
--database-name analytics \
--table-input '{
"Name": "orders",
"StorageDescriptor": {
"Location": "s3://my-data-lake/orders/",
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"SerdeInfo": {
"SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
},
"Columns": [
{"Name": "id", "Type": "int"},
{"Name": "amount", "Type": "double"}
]
}
}'
# 3. Run Athena query
QUERY_ID=$(aws athena start-query-execution \
--query-string "SELECT sum(amount) AS total FROM orders" \
--query-execution-context Database=analytics \
--query 'QueryExecutionId' \
--output text)
# 4. Poll until done
while true; do
STATE=$(aws athena get-query-execution \
--query-execution-id $QUERY_ID \
--query 'QueryExecution.Status.State' \
--output text)
[ "$STATE" = "SUCCEEDED" ] && break
[ "$STATE" = "FAILED" ] && echo "Query failed" && exit 1
sleep 1
done
# 5. Fetch results
aws athena get-query-results --query-execution-id $QUERY_ID
Mock mode
Set FLOCI_SERVICES_ATHENA_MOCK=true to skip DuckDB entirely. In this mode queries transition to SUCCEEDED immediately with an empty result set — useful for unit tests that only exercise the Athena state machine, not the query results.