Usage Guardrails
27 min
amberflo usage query guide the core idea when you run a usage query, you're asking 👉 “show me usage over time , broken down by categories ” usage is a timeseries and categories are groups everything comes down to usage rows = groups × time buckets cardinality cardinality means 👉 how many rows your query can produce example 5 regions 10 services 3 clouds 100 customers maximum groups 5 × 10 × 3 × 100 = 15,000 groups if you query 30 days 15,000 × 30 = 450,000 rows 👉 more combinations = more rows this is groupby examples by customerid → small by customerid + region → bigger by customerid + region + app + cloud → can get large 👉 each extra dimension multiplies the result size how detailed is the time view? hourly → very detailed (lots of data) daily → balanced weekly / monthly → high level summary system limitations response size limit maximum response size 6 mb (compressed) 👉 this is the actual system limit enforced by the api the usage api responses are minified (all formatting removed) and gzip compressed what counts toward this limit? each row in the response looks like \[ { "group" { "groupinfo" { <==group by "region" "us west 2", "customerid" "stark", "cloud" "aws", "app" "order management" } }, "groupvalue" 197973 694, <==total usage "values" \[ <== all rows for this group by entry { <== usage row 1 for time bucket 1 "percentagefromprevious" 0 0, "value" 24747 59, <== usage value in that time bucket "secondssinceepochutc" 1769904000 <== time bucket 1 }, { <== usage row 2 for time bucket 2 "percentagefromprevious" 0 76, "value" 24936 0, <== usage value in that time bucket "secondssinceepochutc" 1769990400 <== time bucket 2 }, ] } ] 👉 each of these = 1 usage row (data point) how to think about limits instead of mb, think in rows rows ≈ groups × time buckets practical capacity each row ≈ 25–50 bytes (minified and compressed) 6 mb ≈ 120k – 240k rows 👉 safe mental model 100k rows per query ⚠️ what increases rows? more groupby fields → more combinations longer time range → more time buckets smaller intervals (hourly vs daily) 👉 these multiply together practical guide how to query usage start from the earlier example 5 regions 10 services 3 clouds 100 customers baseline query total rows (daily for 30 days) 5 × 10 × 3 × 100 × 30 = 450,000 rows 👉 this exceeds the 200k safe guideline step 1 filter by region (daily for 30 days) if you query one region at a time, you remove the region multiplier groups 10 × 3 × 100 = 3,000 groups rows 3,000 × 30 = 90,000 rows 👉 safe step 2 increase detail (hourly for 1 day) now increase time granularity for a shorter range time buckets 24 rows 3,000 × 24 = 72,000 rows 👉 within range key takeaway daily for 30 days (all regions) → 450,000 rows daily for 30 days (1 region) → 90,000 rows hourly for 1 day (1 region) → 72,000 rows 👉 filter first, then increase detail ✅ simple rules keep groupby fields small (≤ 3–5) use daily/monthly for long time ranges filter when possible use sparse mode removes empty time periods → much smaller responses usage visualization think of your query like a table rows → time columns → groups 👉 total rows = size of your response = number of cells = time × groups time us east us west eu apac day 1 120 95 300 210 day 2 130 100 310 220 day 3 125 98 305 215 day 4 140 110 320 230 each table column = one group (e g region) each table row = one time bucket (e g a day) each cell = one usage value → 1 row in the api response in the above example, we have 16 cells or usage values tuning usage scale and performance contact us if you have a specific dimension grouping pattern or cardinality, we can tune the query for your use case appendix example 1 understanding cardinality a more realistic example of dimensions meterapiname api calls dimensions region 10s (us east 1, us west 2, eu west 1, ap south 1, ap northeast 1, ) service 10s (auth, billing, search, ) cloud <6 (aws, azure, gcp) customers 100s instance id 100s step 1 estimate the number of values in each dimension region → 10 values service → 10 values cloud → 3 values customers → 100 values instance id → 100 values instance id and customers are higher cardinality now and will create more usage rows step 2 multiply to estimate total combinations 10 × 10 × 3 × 100 × 100 = 3,000,000 👉 up to 3,000,000 unique groups (records per time bucket) step 3 add time buckets if you query 30 days (daily) 👉 total records 3,000,000 × 30 = 90,000,000 records this is far above the practical response limit 👉 better approach query for one region at a time if you filter to one region 10 × 3 × 100 × 100 = 300,000 groups this is still large, but much smaller than querying all regions at once 👉 better approach filter further by region and service if you filter to one region and one service 3 × 100 × 100 = 30,000 groups the next step is to determine the time range and bucket size for 30 days and daily buckets 30,000 × 30 = 900,000 records 👉 this is above the safe limit — you would need to further filter (for example by customer or shorter time range) fetch 10 customers at a time 3 × 100 × 10 x 30 = 30,000 groups or break up the time range to 3 days at a time 30,000 × 3 = 90,000 records example 2 llm meter cardinality a realistic llm usage meter where dimensions are hierarchical (not independent) meterapiname llm inference dimensions platform <100 (bedrock, azure, openai, ) provider <100 (openai, anthropic, amazon nova, ) model <1000 (gpt 4, claude 3, llama variants, ) endpoint <10 (completions, embeddings, chat, audio, ) region <20 (us east 1, eu west 1, ) workload 100s (batch job, realtime api, training run, ) hierarchical cardinality these dimensions are not independent they follow a hierarchy platform → provider → model → endpoint platform contains providers provider contains models model supports specific endpoints region and workload are additional dimensions that can still multiply the result set 👉 cardinality does not fully multiply across all dimensions equally naive (incorrect) assumption if you assume all dimensions are independent, you might estimate platform × provider × model × endpoint × region × workload example 10 × 50 × 1000 × 10 × 10 × 100 = 500,000,000 groups 👉 this overestimates badly because not every platform has every provider, and not every provider has every model ✅ realistic (hierarchical) way to think about it filtering a higher level dimension automatically reduces lower level dimensions for example, if you filter to platform = bedrock provider = anthropic then the possible model set becomes much smaller a more realistic estimate might be platform = 1 provider = 1 model = 20 endpoint = 3 region = 5 workload = 20 1 × 1 × 20 × 3 × 5 × 20 = 6,000 groups 👉 that is a huge reduction from the naive estimate key idea 👉 filtering a higher level dimension reduces lower level cardinality 👉 think top down filtering shrinks the search space query patterns (how to use this) step 1 start high level query usage by provider and workload at daily granularity for the month groupby provider, workload timegroupinginterval day assume 10 providers assume 100 workloads assume 30 daily buckets rows (10 × 100) × 30 = 30,000 rows 👉 still safe, while giving you visibility by provider and workload step 2 drill into one provider pick one provider (for example, provider = openai) and query usage by model at hourly granularity for the month filter provider = openai groupby model timegroupinginterval hour assume 50 models for that provider assume 30 × 24 = 720 hourly buckets rows 50 × 720 = 36,000 rows 👉 still well within the practical limit step 3 drill further if needed now narrow further to one model and break usage down by endpoint or workload example filter provider = openai, model = gpt 4o groupby endpoint, workload timegroupinginterval hour assume 3 endpoints assume 20 workloads assume 720 hourly buckets rows (3 × 20) × 720 = 43,200 rows 👉 detailed, but still manageable because you filtered first ⚠️ what to avoid avoid querying everything at once, such as groupby provider, model, endpoint, region, workload timegroupinginterval hour over a month even a modest estimate can explode (10 × 50 × 3 × 5 × 20) × 720 = 108,000,000 rows 👉 this will exceed practical limits immediately final takeaway 👉 start broad → filter → then increase detail 👉 increase dimensions and time granularity by filtering and reducing time range event cancellation rules when submitting multiple rules, include them in a single batch submission maximum limit 1 batch per 12 hours , with up to 250 rules per batch backfills backfills are supported for up to 60 days for backfills beyond 60 days, please contact us