Skip to content

Glue

Protocol: JSON 1.1 Endpoint: http://localhost:4566/

Floci emulates the AWS Glue Data Catalog and Glue Schema Registry, allowing you to manage local data lake metadata and schema-version workflows.

Supported Actions

Data Catalog

Area Actions
Databases CreateDatabase · GetDatabase · GetDatabases
Tables CreateTable · GetTable · GetTables · DeleteTable
Partitions CreatePartition · GetPartitions

Schema Registry

Area Actions
Registries CreateRegistry · GetRegistry · ListRegistries · UpdateRegistry · DeleteRegistry
Schemas CreateSchema · GetSchema · ListSchemas · UpdateSchema · DeleteSchema
Versions RegisterSchemaVersion · GetSchemaByDefinition · GetSchemaVersion · ListSchemaVersions · DeleteSchemaVersions · GetSchemaVersionsDiff · CheckSchemaVersionValidity
Metadata and tags PutSchemaVersionMetadata · RemoveSchemaVersionMetadata · QuerySchemaVersionMetadata · TagResource · UntagResource · GetTags

Supported schema formats are AVRO, JSON, and PROTOBUF. Compatibility modes are NONE, DISABLED, BACKWARD, BACKWARD_ALL, FORWARD, FORWARD_ALL, FULL, and FULL_ALL.

Integration with Athena

The Glue Data Catalog is automatically used by Athena to resolve table names to S3 locations and formats. When you submit an Athena query, Floci reads all Glue tables for the target database and generates DuckDB views on top of the underlying S3 objects before executing the SQL.

Tables can reference a Schema Registry schema version through StorageDescriptor.SchemaReference. On GetTable and GetTables, Floci resolves the schema definition into Glue columns when possible.

The DuckDB read function is selected based on the table's StorageDescriptor.InputFormat and StorageDescriptor.SerdeInfo.SerializationLibrary:

Condition DuckDB function
InputFormat or SerializationLibrary contains parquet read_parquet
InputFormat or SerializationLibrary contains json read_json_auto
InputFormat contains hive read_json_auto
Anything else read_csv_auto

Data Catalog Example

export AWS_ENDPOINT_URL=http://localhost:4566

# Create a database
aws glue create-database \
  --database-input '{"Name": "analytics"}' \
  --endpoint-url $AWS_ENDPOINT_URL

# Create a JSON table (standard AWS format for NDJSON data)
aws glue create-table \
  --database-name analytics \
  --table-input '{
    "Name": "orders",
    "StorageDescriptor": {
      "Location": "s3://my-bucket/orders/",
      "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
      "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
      "SerdeInfo": {
        "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
      },
      "Columns": [
        {"Name": "id",     "Type": "int"},
        {"Name": "amount", "Type": "double"}
      ]
    }
  }' \
  --endpoint-url $AWS_ENDPOINT_URL

# Create a Parquet table
aws glue create-table \
  --database-name analytics \
  --table-input '{
    "Name": "events",
    "StorageDescriptor": {
      "Location": "s3://my-bucket/events/",
      "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
      "SerdeInfo": {
        "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
      },
      "Columns": [
        {"Name": "event_id", "Type": "string"},
        {"Name": "ts",       "Type": "bigint"}
      ]
    }
  }' \
  --endpoint-url $AWS_ENDPOINT_URL

Schema Registry Example

export AWS_ENDPOINT_URL=http://localhost:4566

cat > /tmp/order.avsc <<'JSON'
{"type":"record","name":"Order","namespace":"example","fields":[{"name":"id","type":"long"}]}
JSON

cat > /tmp/order-v2.avsc <<'JSON'
{"type":"record","name":"Order","namespace":"example","fields":[{"name":"id","type":"long"},{"name":"amount","type":["null","double"],"default":null}]}
JSON

aws glue create-registry \
  --registry-name local-registry \
  --endpoint-url $AWS_ENDPOINT_URL

aws glue create-schema \
  --registry-id RegistryName=local-registry \
  --schema-name orders \
  --data-format AVRO \
  --compatibility BACKWARD \
  --schema-definition file:///tmp/order.avsc \
  --endpoint-url $AWS_ENDPOINT_URL

aws glue register-schema-version \
  --schema-id RegistryName=local-registry,SchemaName=orders \
  --schema-definition file:///tmp/order-v2.avsc \
  --endpoint-url $AWS_ENDPOINT_URL

aws glue list-schema-versions \
  --schema-id RegistryName=local-registry,SchemaName=orders \
  --endpoint-url $AWS_ENDPOINT_URL