Code Room
CodingHardcod-g1057
Subject Storage dedup chunkingLevel Senior–Staff~35 minCommon in Storage & CDN · Algorithms & data structures interviewsIndustries Software development, Technology

Question

A backup system uses content-defined chunking to deduplicate a byte stream. Given a string `data`, split it into chunks using this deterministic boundary rule: scan left to right; a chunk boundary ends right AFTER index i (0-based) whenever ((ord(data[i]) * 31 + i) mod `divisor`) == 0, OR when the chunk reaches `max_len` bytes (whichever comes first), and the final partial chunk ends at end-of-string. Each chunk is the substring between boundaries. Deduplicate identical chunk strings: count how many DISTINCT chunk strings result. Return that distinct-chunk count. divisor >= 1, max_len >= 1; len(data) <= 10^5.

Implement
cdc_distinct_chunks(data: str, divisor: int, max_len: int) → int
Examples
in["aaaa",1,10]out1
What a strong answer looks like

State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.

Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.

Run or narrate your approach, then ask the coach.