All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
We have already read the 2 CSVs in to the
df2 variables. Now, use the
itertools.product method to create a resulting dataframe
df that will contain the product of the two CSVs. The columns should be named
CSV 1 and
As we have
266 rows in
df2, the resulting
df will have
97,888 rows (
266 * 368), and it'll look something like:
Now apply the function
fuzz.partial_ratio to all the companies in
df to calculate the distance between them. Store the distance in a new column named
Ratio Score. It'll look similar to:
We saw that in CSV1 there's a company
AECOM, what's the corresponding value in CSV2?
CSV1 company is
Starbucks, what's the corresponding name in CSV2?
Pinnacle West Capital Corporation, is there a matching in CSV2?
County of Los Angeles Deferred Compensation Program. How many matching companies seem to be in CSV 2?
The Queens Health Systems, is there a matching in CSV2?