Paginated vs. Bulk API Calls: Performance, Cost, and Reliability Trade‑offs
Should you fetch many results at once or smaller batches at a time? A practical guide to the CPU, memory, network, and cloud-cost trade‑offs of paginated vs. one-shot API calls—with actionable rules of thumb.
Paginated vs. Bulk API Calls: Performance, Cost, and Reliability Trade‑offs
Question: Should your client get many results in one call, or smaller batches at a time?
Answer: It depends on payload size, per-request overhead, latency budget, rate limits, and cost model. This post breaks it down with numbers, pros/cons, and clear guidance.
TL;DR Decision Table
Scenario | Recommended Strategy | Why |
---|---|---|
Small payloads (< ~500 KB total) and low per-request overhead | Single bulk call | Fewer round trips; simpler logic |
Medium payloads (~0.5–5 MB) or moderate server work per item | Pagination (smaller page sizes, e.g., 20–50 items/page) | Better memory profile and backpressure; keeps TTFB fast |
Large payloads (> ~5–10 MB) or expensive joins/aggregations | Pagination + server filtering | Avoids timeouts, reduces memory pressure & retries |
Strict rate limits (e.g., 10 req/min) | Fewer, larger pages (e.g., 100+ items) | Minimizes request count under quotas |
Mobile networks / high latency | Larger pages (e.g., 50–100+ items) | Amortize latency & TLS/headers cost |
Realtime UX (progressive rendering) | Smaller pages (e.g., 20–50 items) | Faster first paint; stream results |
Unpredictable item size | Cursor pagination with max byte cap | Bound payload size instead of fixed item count |
Rule of thumb: If the serialized payload for a bulk request of many items is ≤ 1 MB and your 95th percentile latency budget can handle one call, a single bulk call is simpler and often cheaper. Otherwise, paginate with smaller page sizes (e.g., 20–50 items) using cursor tokens.
Performance Dimensions That Matter
- Network: round trips, TCP/TLS, HTTP headers, compression efficiency.
- CPU: serialization (server), deserialization (client), query planning.
- Memory: holding full result sets vs chunked pages; GC pressure.
- Reliability: timeouts, retries, partial failures, idempotency.
- Cloud Costs: request pricing, egress bandwidth, compute time, load balancer LCUs.
Quick Math: Bulk Call vs Several Smaller Paginated Calls (Concrete Example)
Assumptions (reasonable defaults):
- Per-item JSON size ~ 1.5 KB (after gzip).
- Headers + framing ~ 1–2 KB per request (HTTP/2 keep-alive).
- Network latency ~ 100 ms RTT (mobile can be 150–250 ms).
- Server compute ~ 0.3 ms per item to render.
One large bulk call: many items (e.g., 100 items)
- Payload ≈ 150 KB + ~2 KB headers = ~152 KB
- Server CPU ≈ 100 × 0.3 ms = 30 ms (+ query time)
- Latency: 1 round trip + processing
- Failure surface: 1 request (retry once if needed)
Several smaller paginated calls: smaller pages (e.g., 25 items per page, 4 pages)
- Payload per page ≈ 37.5 KB + 2 KB headers ≈ ~39.5 KB
- Total payload ≈ ~158 KB (slightly more due to headers)
- Latency: 4 round trips (mitigated if pipelined/parallelized)
- First content paint: first page arrives earlier → better perceived speed
- Failure surface: 4 requests (retries can be partial, not all-or-nothing)
Takeaway: Bulk can win on total time and egress when payloads are small and networks are stable. Pagination wins for progressive UX, memory control, and resilience—even if total bytes are a bit higher.
CPU & Memory Considerations
Server
- Bulk (large request): one heavy query; large serialization buffer; higher peak memory; longer critical section increases tail latency.
- Paginated (smaller pages): more cursor scans but shorter CPU bursts and lower peak memory; easier to rate-limit and protect DB.
Client
- Bulk: faster total completion but higher peak RAM; potential jank on low-end devices during JSON parse.
- Paginated: smoother memory profile; render incrementally; better for React/SPA lists with virtualization.
Cloud Cost Model (AWS-style)
- API Gateway / Load Balancer: charged per request & LCU (new connections, active connections, processed bytes). Many small pages → more requests, but lower concurrent duration each.
- Compute (Lambda/ECS/EC2): billed by duration × memory/CPU. Bulk may hold CPU longer; pagination spreads compute across requests (can scale horizontally).
- Data Transfer (Egress): charged per GB. Bulk vs pagination mostly equal on bytes; pagination adds a small header overhead.
- Databases: each page can cause extra round trips; prefer keyset/cursor scans to avoid costly
OFFSET
pagination.
Cost heuristic: If you pay per request (API Gateway, Function URLs) and are hard rate-limited, favor fewer, larger pages. If you pay mostly for compute duration/DB, favor pagination to smooth load and reduce timeouts.
Rate Limits, Reliability & Timeouts
- Bulk risks single-point failure: a timeout loses all work.
- Pagination provides natural checkpoints; failed page can be retried from cursor.
- Exponential backoff + jitter is simpler with pagination.
- Idempotency: use opaque cursor tokens and stable sort keys.
Choosing Bulk vs Paginated: Practical Guidance
- Measure item size (avg & p95). If a bulk request of many items results in ≤ 1 MB gzipped, bulk is viable.
- Check p95 latency budget. If one call threatens it, paginate.
- Client device class. Low-end mobile? Prefer smaller pages (e.g., 20–50 items).
- Server hotspots. If joins/aggregations spike CPU, paginate and pre-aggregate.
- Rate limits. Under tight quotas, larger pages (e.g., 50–100+ items).
- UX. Need quick first paint? Smaller pages with progressive rendering.
Pagination Design Patterns (Do This)
- Cursor/Keyset > Offset:
- Use
?cursor=opaque_token
or?after=last_id
. - Stable ordering by an indexed, monotonic column (e.g.,
created_at, id
).
- Use
- Bound by bytes, not just count: e.g.,
?limit=50&max_bytes=500000
. - Return a
next
link:{ "items": [ ... ], "next": "/v1/orders?cursor=eyJjIjoiMTIzIn0" }
- Include total/has_more when cheap; avoid expensive
COUNT(*)
on hot paths. - Consistent filters: echo back applied filters; cursor should encapsulate them.
- Compression: enable gzip/br on both ends; JSON → consider JSON Lines for streams.
- HTTP/2/3: keep connections warm; multiplex to reduce RTT per page.
Anti‑Patterns (Avoid This)
- Offset-based pagination on large tables → slow
OFFSET N
scans and inconsistent pages with concurrent writes. - Unbounded responses → memory blowups, timeouts, app crashes.
- Server-side sorting without index → CPU spikes and long tail latencies.
- Changing sort/filter mid-pagination → duplicates/missing items.
- Opaque cursor that expires too fast → breaks resume/retry flows.
- Huge pages on flaky networks → retries waste bandwidth; prefer smaller chunks.
Example: Client Loop with Backoff (Pseudo)
cursor = null
do {
url = cursor ? `/v1/items?cursor=${cursor}&limit=50` : '/v1/items?limit=50'
resp = await fetch(url, { headers: { 'Accept-Encoding': 'gzip, br' } })
if (!resp.ok) retryWithBackoff()
const { items, next } = await resp.json()
render(items) // progressively render
cursor = next ? parseCursor(next) : null
} while (cursor)
What We Recommend (Rules of Thumb)
- Dashboard/UX lists: smaller pages (e.g., 20–50 items) per page; cursor-based; progressive render.
- Exports/ETL: Bulk download but async—produce a file and notify when ready.
- Mobile APIs: smaller pages (e.g., 20–50) with byte caps and gzip; aim TTFB < 200 ms.
- Admin/internal tools: favor bigger pages (50–100+ items) to minimize clicks.
- Under tight request quotas: larger pages (50–100+ items), ensure indexes and compression.
Checklist
- Cursor/keyset pagination with stable index.
- Gzip/br compression enabled.
- Byte caps for oversized records.
-
next
link + idempotent retries. - Progressive rendering on client.
- Backpressure/rate limiting on server.
- Avoid expensive
COUNT(*)
on hot paths. - Measure p95 size/latency; tune page size periodically.
Bottom Line
If your bulk payload is small (<1 MB gzipped) and networks are stable, one call is fast and cheap.
If payloads vary, clients are mobile, or server work is heavy, smaller pages (e.g., 20–50 items) with cursor tokens strike the best balance of performance, cost, and reliability.
Confidence: High — based on common production patterns, cost models, and measured trade‑offs in typical web/mobile backends.