Instagram
youtube
Facebook
Twitter

Regular expression program to remove duplicate words from a string using Java HackerRank Solutions

Problem

In this challenge, we use regular expressions (RegEx) to remove instances of words that are repeated more than once, but retain the first occurrence of any case-insensitive repeated word. For example, the words love and to are repeated in the sentence I love Love to To tO code. Can you complete the code in the editor so it will turn I love Love to To tO code into I love to code?

To solve this challenge, complete the following three lines:

  1. Write a RegEx that will match any repeated word.
  2. Complete the second compile argument so that the compiled RegEx is case-insensitive.
  3. Write the two necessary arguments for replaceAll such that each repeated word is replaced with the very first instance the word found in the sentence. It must be the exact first occurrence of the word, as the expected output is case-sensitive.

Note: This challenge uses a custom checker; you will fail the challenge if you modify anything other than the three locations that the comments direct you to complete. To restore the editor's original stub code, create a new buffer by clicking on the branch icon in the top left of the editor.

Input Format

The following input is handled for you the given stub code:

The first line contains an integer, n, denoting the number of sentences.
Each of the n subsequent lines contains a single sentence consisting of English alphabetic letters and whitespace characters.

Output Format

Stub code in the editor prints the sentence modified by the replaceAll line to stdout. The modified string must be a modified version of the initial sentence where all repeat occurrences of each word are removed.

Sample Input

5
Goodbye bye bye world world world
Sam went went to to to his business
Reya is is the the best player in eye eye game
in inthe
Hello hello Ab aB

Sample Output

Goodbye bye world
Sam went to his business
Reya is the best player in eye game
in inthe
Hello Ab

Solution:

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DuplicateWords {

    public static void main(String[] args) {

        String regex = "\\b(\\w+)(\\b\\W+\\b\\1\\b)*";
        Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

        Scanner in = new Scanner(System.in);
        int numSentences = Integer.parseInt(in.nextLine());

        while (numSentences-- > 0) {
            String input = in.nextLine();

            Matcher m = p.matcher(input);

            while (m.find()) {
                input = input.replaceAll("\\b" + Pattern.quote(m.group()) + "\\b", m.group(1));
            }

            System.out.println(input);
        }

        in.close();
    }
}

Steps involved in the above solution:

1. Import required packages and declare the class.

2. Initialize a regex pattern to identify repeated words using the Pattern class. The pattern includes a capturing group to match words and a part to identify repetitions.

3. Initialize a Scanner to read input from the standard input.

4. Read the number of sentences to process.

5. Iterate through each sentence using a while loop.

6. Read the current sentence from the input.

7. Initialize a Matcher object (m) using the regex pattern and the current sentence.

8. Use a while (m.find()) loop to iterate through repeated words in the sentence. Replace each occurrence of a repeated word with its first instance. Print the modified sentence to the console.

9. Print the modified sentence to the console.

10. Close the Scanner to prevent resource leaks.