DeDuplication-ReDuplication

Posted on Sept. 17, 2018
python
javascript
java
string
5772

This article helps you to deduplicate a string given a string and chunk size.

Case : Lets say you have a large text file. Each row contains an email id and some other information (say some product-id). Assume there are millions of rows in the file. How would you effieciently de duplicate the data ?

DeDuplication : The process that returns an intermediate strinig , helps in reduplication of original string.

Steps to follow

 1. Break the string into chunks of the given size (Values of chunk size can be 1KB,10KB and so on).
 2. Find the unique chunks and make a note of where these chunks occur in the string.
 3. The intermediate string should contain the unique strings and their positions.
 4. This string alone should be used to perform reduplication, which constructs the original string.

Example

Input:
      abcdexyzvwabcde
      chunk size: 5 bytes

 Output after deduplication:
      abcde-0-2,xyzvw1

 Output after reduplication:
      abcdexyzvwabcde

Repo : Checkout the working example Here




0 comments

Please log in to leave a comment.