Matching Strings by Similarity using Levenshtein distance

codevalidated

Create the `df` dataframe containing the product of the two CSVs

We have already read the 2 CSVs in to the df1 and df2 variables. Now, use the itertools.product method to create a resulting dataframe df that will contain the product of the two CSVs. The columns should be named CSV 1 and CSV 2.

As we have 266 rows in df1 and 368 in df2, the resulting df will have 97,888 rows (266 * 368), and it'll look something like:

codevalidated

Create a new column `Ratio Score` that contains the distance for all the rows in `df`

Now apply the function fuzz.partial_ratio to all the companies in df to calculate the distance between them. Store the distance in a new column named Ratio Score. It'll look similar to:

input

How many rows have a Ratio score of `90` or more?

input

What's the corresponding company in CSV2 to `AECOM` in CSV1?

We saw that in CSV1 there's a company AECOM, what's the corresponding value in CSV2?

input

What's the corresponding CSV2 company of Starbucks?

CSV1 company is Starbucks, what's the corresponding name in CSV2?

multiplechoice

Is there a matching company for `Pinnacle West Capital Corporation`?

CSV1 contains Pinnacle West Capital Corporation, is there a matching in CSV2?

input

How many matching companies are there for `County of Los Angeles Deferred Compensation Program`?

CSV1 contains County of Los Angeles Deferred Compensation Program. How many matching companies seem to be in CSV 2?

multiplechoice

Is there a matching company for `The Queens Health Systems`?

CSV1 contains The Queens Health Systems, is there a matching in CSV2?

Santiago Basulto

Project Activities

Create the `df` dataframe containing the product of the two CSVs

Create a new column `Ratio Score` that contains the distance for all the rows in `df`

How many rows have a Ratio score of `90` or more?

What's the corresponding company in CSV2 to `AECOM` in CSV1?

What's the corresponding CSV2 company of Starbucks?

Is there a matching company for `Pinnacle West Capital Corporation`?

How many matching companies are there for `County of Los Angeles Deferred Compensation Program`?

Is there a matching company for `The Queens Health Systems`?

Santiago Basulto

Data Cleaning with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database

Matching Strings by Similarity using Levenshtein distance

Santiago Basulto

Project Activities

Create the `df` dataframe containing the product of the two CSVs

Create a new column `Ratio Score` that contains the distance for all the rows in `df`

How many rows have a Ratio score of `90` or more?

What's the corresponding company in CSV2 to `AECOM` in CSV1?

What's the corresponding CSV2 company of *Starbucks*?

Is there a matching company for `Pinnacle West Capital Corporation`?

How many matching companies are there for `County of Los Angeles Deferred Compensation Program`?

Is there a matching company for `The Queens Health Systems`?

Santiago Basulto

Data Cleaning with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database

What's the corresponding CSV2 company of Starbucks?