1 answer

Overview: Pattern-Matching (aka String Search) is the process of algorithmically finding copies of a pattern P...

Question:

Overview:

Pattern-Matching (aka String Search) is the process of algorithmically finding copies of a pattern P inside a (generally much larger) text T. The goal is to implement and compare four classical string-matching algorithms.

Input:

Your code should work for any text either inputted directly or read in from a file.

However, for testing - input file has been provided:

  • The Gettysburg Address (by President Abraham Lincoln, 1863)

You should minimally search for these three patterns in each text: FREE, BRAVE, NATION.

You should convert the entire string to upper case and conduct your searches accordingly.

Methodology and Output:

  1. Implement the Brute Force Algorithm

  2. Implement the Knuth-Morris-Pratt Algorithm

  3. For each algorithm and text/pattern combination

    a)Track and tabulate the number of character-comparison operations   b)Track and tabulate the clock time for each algorithm

Input file for test:

The Gettysburg Address.txt

Fourscore and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting-place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we cannot dedicate — we cannot consecrate — we cannot hallow — this ground. The brave men, living and dead, who struggled here have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us — that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion — that we here highly resolve that these dead shall not have died in vain — that this nation shall have a new birth of freedom and that government of the people, by the people, for the people, shall not perish from the earth.

JAVA/C++ thanks.


Answers

// C++ program for implementation of KMP pattern searching  // algorithm  #include <bits/stdc++.h>   void computeLPSArray(char* pat, int M, int* lps);   // Prints occurrences of txt[] in pat[]  void KMPSearch(char* pat, char* txt)  {          int M = strlen(pat);          int N = strlen(txt);           // create lps[] that will hold the longest prefix suffix          // values for pattern          int lps[M];           // Preprocess the pattern (calculate lps[] array)          computeLPSArray(pat, M, lps);           int i = 0; // index for txt[]          int j = 0; // index for pat[]          while (i < N) {                  if (pat[j] == txt[i]) {                          j++;                          i++;                  }                   if (j == M) {                          printf("Found pattern at index %d ", i - j);                          j = lps[j - 1];                  }                   // mismatch after j matches                  else if (i < N && pat[j] != txt[i]) {                          // Do not match lps[0..lps[j-1]] characters,                          // they will match anyway                          if (j != 0)                                  j = lps[j - 1];                          else                                 i = i + 1;                  }          }  }   // Fills lps[] for given patttern pat[0..M-1]  void computeLPSArray(char* pat, int M, int* lps)  {          // length of the previous longest prefix suffix          int len = 0;           lps[0] = 0; // lps[0] is always 0           // the loop calculates lps[i] for i = 1 to M-1          int i = 1;          while (i < M) {                  if (pat[i] == pat[len]) {                          len++;                          lps[i] = len;                          i++;                  }                  else // (pat[i] != pat[len])                  {                          // This is tricky. Consider the example.                          // AAACAAAA and i = 7. The idea is similar                          // to search step.                          if (len != 0) {                                  len = lps[len - 1];                                   // Also, note that we do not increment                                  // i here                          }                          else // if (len == 0)                          {                                  lps[i] = 0;                                  i++;                          }                  }          }  }   // Driver program to test above function  int main()  {          char txt[] = "ABABDABACDABABCABAB";          char pat[] = "ABABCABAB";          KMPSearch(pat, txt);          return 0;  }  
  • Output:
    Found pattern at index 10

Searching Algorithm: KMP (Knuth Morris Pratt)
Unlike Naive algorithm, where we slide the pattern by one and compare all characters at each shift, we use a value from lps[] to decide the next characters to be matched.

The idea is to not match a character that we know will anyway match.

How to use lps[] to decide next positions (or to know a number of characters to be skipped)?

  • We start comparison of pat[j] with j = 0 with characters of current window of text.
  • We keep matching characters txt[i] and pat[j] and keep incrementing i and j while pat[j] and txt[i] keep matching.
  • When we see a mismatch
    • We know that characters pat[0..j-1] match with txt[i-j…i-1] (Note that j starts with 0 and increment it only when there is a match).
    • We also know (from above definition) that lps[j-1] is count of characters of pat[0…j-1] that are both proper prefix and suffix.
    • From above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. Let us consider above example to understand this.
  txt[] = "AAAAABAAABA"  pat[] = "AAAA" lps[] = {0, 1, 2, 3}   i = 0, j = 0 txt[] = "AAAAABAAABA"  pat[] = "AAAA" txt[i] and pat[j] match, do i++, j++  i = 1, j = 1 txt[] = "AAAAABAAABA"  pat[] = "AAAA" txt[i] and pat[j] match, do i++, j++  i = 2, j = 2 txt[] = "AAAAABAAABA"  pat[] = "AAAA" pat[i] and pat[j] match, do i++, j++  i = 3, j = 3 txt[] = "AAAAABAAABA"  pat[] = "AAAA" txt[i] and pat[j] match, do i++, j++  i = 4, j = 4 Since j == M, print pattern found and reset j, j = lps[j-1] = lps[3] = 3  Here unlike Naive algorithm, we do not match first three  characters of this window. Value of lps[j-1] (in above  step) gave us index of next character to match. i = 4, j = 3 txt[] = "AAAAABAAABA"  pat[] =  "AAAA" txt[i] and pat[j] match, do i++, j++  i = 5, j = 4 Since j == M, print pattern found and reset j, j = lps[j-1] = lps[3] = 3  Again unlike Naive algorithm, we do not match first three  characters of this window. Value of lps[j-1] (in above  step) gave us index of next character to match.

I = 5, j = 3 txt[] = "AAAAABAAABA" pat[] = "AAAA" txt[i] and pat[j] do NOT match and j > 0, change only j j = lps[j-1] = lps[2] = 2 i = 5, j = 2 txt[] = "AAAAABAAABA" pat[] = "AAAA" txt[i] and pat[j] do NOT match and j > 0, change only j j = lps[j-1] = lps[1] = 1 i = 5, j = 1 txt[] = "AAAAABAAABA" pat[] = "AAAA" txt[i] and pat[j] do NOT match and j > 0, change only j j = lps[j-1] = lps[0] = 0 i = 5, j = 0 txt[] = "AAAAABAAABA" pat[] = "AAAA" txt[i] and pat[j] do NOT match and j is 0, we do i++. i = 6, j = 0 txt[] = "AAAAABAAABA" pat[] = "AAAA" txt[i] and pat[j] match, do i++ and j++ i = 7, j = 1 txt[] = "AAAAABAAABA" pat[] = "AAAA" txt[i] and pat[j] match, do i++ and j++ We continue this way...

.

Similar Solved Questions

1 answer
Foto Company makes 6,000 units per year of a part it uses in the products it...
Foto Company makes 6,000 units per year of a part it uses in the products it manufactures. The unit product cost of this part is computed as follows: $12.10 19.70 Direct materials Direct labor Variable manufacturing overhead Fixed manufacturing overhead Unit product cost $43.50 An outside supplier h...
1 answer
Assignment is due: Monday November 12, 2019, 11:30 PM Submit your email assignment to the course...
Assignment is due: Monday November 12, 2019, 11:30 PM Submit your email assignment to the course DROPBOX. To simulate your email response, include and complete (as should be seen in an email response), the following fields: Subject: From: Date: To: Scenario: You are the owner of an industrial printi...
1 answer
The following income statements are provided for Li Company's last two years of operation: Number of...
The following income statements are provided for Li Company's last two years of operation: Number of units produced and sold Sales revenue Cost of goods sold Gross margin General, selling, and administrative expenses Net income Year 1 3,500 $ 101,500 68,000 33,500 13,000 $ 20,500 Year 2 3,000 $8...
1 answer
Wompeuve equilibrium. 3. How the intuition of general equilibrium in a timeless economy should be adapted...
wompeuve equilibrium. 3. How the intuition of general equilibrium in a timeless economy should be adapted to consider time and uncertainty? What is a contingent commodity? 4. Present an example of a contingent claim....
1 answer
QUESTION 16 Dane Company's predetermined overhead rate is based on direct labor hours (DLHs), At year....
QUESTION 16 Dane Company's predetermined overhead rate is based on direct labor hours (DLHs), At year. During the year, the company incurred $200,000 in actual manufacturing overhesd predetermined overhead rate was $40.00 per DLH, how many DLHs were the costs. The Mnng r the curren yer,e O 5,500...
1 answer
Find the surface area and volume of a right cylinder, closed at both end 8. Find...
find the surface area and volume of a right cylinder, closed at both end 8. Find the surface area and volume of a right cylinder, closed at both ends, with a base circumference of 8 in. and a height of 12 in. Find the surface area and the volume of the cylinder, Round your answer to the nearest ...
1 answer
I posted it earlier and they gave me the wrong answer please make sure it's correct...
I posted it earlier and they gave me the wrong answer please make sure it's correct and refer to the letter like a, b, c. that's showing in the box. Thanks!! 2 Exercise 3-2 (Algo) Prepare T-Accounts [LO3-2, LO3-4) Jurvin Enterprises is a manufacturing company that had no beginning inventorie...
1 answer
Are there nuclear reactions going on in our bodies?
Are there nuclear reactions going on in our bodies?...
1 answer
Your client has $102,000 invested in stock A. She would like to build a two-stock portfolio...
Your client has $102,000 invested in stock A. She would like to build a two-stock portfolio by investing another $102,000 in either stock B or C. She wants a portfolio with an expected return of at least 15.5% and as low a risk as possible, but the standard deviation must be no more than 40%. What d...
1 answer
Suppose a plot of inverse wavelength vs frequency has slope equal to 0.388, what is the...
Suppose a plot of inverse wavelength vs frequency has slope equal to 0.388, what is the speed of sound traveling in the tube to 2 decimal places?...
1 answer
The market value of debt is $425 million and the total market value of the firm...
The market value of debt is $425 million and the total market value of the firm is 5925 million. The cost of cquity is 17%, the cost of debt is 10% and the corporate tax rate is 35%. What is the WACC? O A 11.01% OB 12.189 C. 13.78% OD 14.17% E none...
1 answer
Why is a cardiac-oriented definition of death not enough when considering organ transplantation?
Why is a cardiac-oriented definition of death not enough when considering organ transplantation?...
1 answer
The following data refer to Twisto Pretzel Company for the year 20x1. Work-in-process inventory, 12/31/x0 $...
The following data refer to Twisto Pretzel Company for the year 20x1. Work-in-process inventory, 12/31/x0 $ 8,100 Selling and administrative salaries 13,600 Insurance on factory and equipment 3,600 Work-in-process inventory, 12/31/x1 8,200 Finished-goods inventory, 12/31/x0 14,000 ...
1 answer
Recommendation for IKEA inventory supply chain management.
Recommendation for IKEA inventory supply chain management....
1 answer
Suppose that a researcher is interested in estimating the population mean (Miu_X) of a population with...
Suppose that a researcher is interested in estimating the population mean (Miu_X) of a population with the sample average estimator, X_bar. The population has a population standard deviation of Sigma_X. The question that the research has is: “How large should my sample size be?” You help...
1 answer
With significant oversupply of hospital beds in the U.S. what is the rationale for taxpayer support...
With significant oversupply of hospital beds in the U.S. what is the rationale for taxpayer support of the separate and costly hospital system of the Department of Veterans Affairs? I need more expline about this point...
1 answer
This Uuestion: 1 l 4 of 15 to com are given to Find the inde pro...
This Uuestion: 1 l 4 of 15 to com are given to Find the inde pro d w racam man in the given angeb would be considered on e echology The p one and down to find the probably Fora g e of 37 find the probably a sample meaningless than 12.752 or greater than 12.755 when 12.752 and 22 For the given sample...
1 answer
How do you solve the right triangle if a=3.0cm, b=2.6cm, C=90°?
How do you solve the right triangle if a=3.0cm, b=2.6cm, C=90°?...
1 answer
Can someone solve this differential equation. I'll definitely leave good remarks My courses / MAT2022_S /...
can someone solve this differential equation. I'll definitely leave good remarks My courses / MAT2022_S / July 16 2020 / TEST 2 JULY 31, 2020 The particular integral of the differential equation 4 +4 dy + 3y = 3e2x is dx2 dx Select one: et a. 9 e 2x b. 3 3e2x c. e 2x d. 9 Moyt nade...