DeDuplication-ReDuplication

Posted on Sept. 17, 2018

python

javascript

java

string

6640

This article helps you to deduplicate a string given a string and chunk size.

Case : Lets say you have a large text file. Each row contains an email id and some other information (say some product-id). Assume there are millions of rows in the file. How would you effieciently de duplicate the data ?

DeDuplication : The process that returns an intermediate strinig , helps in reduplication of original string.

Steps to follow :

1. Break the string into chunks of the given size (Values of chunk size can be 1KB,10KB and so on).
2. Find the unique chunks and make a note of where these chunks occur in the string.
3. The intermediate string should contain the unique strings and their positions.
4. This string alone should be used to perform reduplication, which constructs the original string.

Example :

Input:
abcdexyzvwabcde
chunk size: 5 bytes

Output after deduplication:
abcde-0-2,xyzvw1

Output after reduplication:
abcdexyzvwabcde

Repo : Checkout the working example Here

Please log in to leave a comment.

Author

11 posts

Maven<br>Apache Maven is a software project manage

First you create a maven project in eclipse<br>you

This article tells you how to connect the postgres

import java.util.HashMap;
public class Deduplicat

import java.util.Arrays;

public class StringAna

S1=raw_input("enter String")
S2=raw_input("enter

import re

inp_st = "This 1 line has 100 number,

Open CMD with Administrator

1. Check what proce

class Fibonacci { 
    static int fib(int n)

class Toh
{ 
    static void towerOfHanoi(int n,

DeDuplication-ReDuplication

0 comments

Share this

Blog

SiteMap

Social

KoderPlace