CodingMediumcod-g967

Subject Storage systemsLevel Mid–Senior~25 minCommon in Storage & CDN · Algorithms & data structures interviewsIndustries Software development, Technology

Question

A content-addressable store deduplicates blobs by content. You are given a list of blobs (strings). Each blob's content hash is h(s) = ((sum((idx + 1) * ord(c) for idx, c in enumerate(s)) * 1000003) ^ len(s)) % 1000000007. Two blobs are considered identical only if their content hashes match AND their actual string contents match (i.e. resolve hash collisions by comparing content). Store each distinct blob once and assign it a chunk id: the first distinct blob gets id 0, the second distinct blob gets id 1, and so on. Return a pair [unique_count, mapping] where unique_count is the number of distinct blobs and mapping is a list giving, for each input blob in order, the chunk id it resolves to.

Implement

content_dedup(blobs: list[str]) → list

Examples

in[["aa","bb","aa","cc"]]out[3,[0,1,0,2]]

What a strong answer looks like

State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.

Learn the concepts

Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.

Run or narrate your approach, then ask the coach.