String Blog Posts

DeDuplication-ReDuplication

This article helps you to deduplicate a string given a string and chunk size.
Case : Lets say you have a large text file. Each row contains an email id and some other information (say some product-id). Assume there are millions of rows in the file. How would you effieciently de duplicate the data ?
DeDuplication : The process that returns an intermediate strinig , helps in reduplication of original string.
Steps to follow :
1. Break the string into chunks of the given size (Values of chunk size can be 1KB,10KB and so on).
2. Find the unique chunks and make a note of where these chunks occur in the string.
3. The intermediate string should contain the unique strings and their positions.

0 Likes 5743 Views