All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
We have already read the 2 CSVs in to the df1
and df2
variables. Now, use the itertools.product
method to create a resulting dataframe df
that will contain the product of the two CSVs. The columns should be named CSV 1
and CSV 2
.
As we have 266
rows in df1
and 368
in df2
, the resulting df
will have 97,888
rows (266 * 368
), and it'll look something like:
Now apply the function fuzz.partial_ratio
to all the companies in df
to calculate the distance between them. Store the distance in a new column named Ratio Score
. It'll look similar to:
We saw that in CSV1 there's a company AECOM
, what's the corresponding value in CSV2?
CSV1 company is Starbucks
, what's the corresponding name in CSV2?
CSV1 contains Pinnacle West Capital Corporation
, is there a matching in CSV2?
CSV1 contains County of Los Angeles Deferred Compensation Program
. How many matching companies seem to be in CSV 2?
CSV1 contains The Queens Health Systems
, is there a matching in CSV2?