Practice Data Cleaning and String Handling with City Bike data

codevalidated

Capitalize the column `first name`

If you explore the DataFrame, you'll see that the column first name is "inconsistent" with its capitalization. Some names are capitalized (Alexis, Jodi), but some others are not (misty, matrick).

Create a new series capital_first_name that contains the results of the column first name correctly capitalized.

codevalidated

Make the Column `last name` as lower case

Now we can see the Column last name has very messy format too. Some of the middle letters are capitalized like in HarRISs, DaniEEIs and some of the first names are not capitalize. So, convert all of them to lower case and store the result in the variable lower_last_name

codevalidated

Make `last name` as Upper case. Store your answer in the variable `last_name_upper`

codevalidated

How many users in the Column `usertype` are `Customer`

Lets count all the Customers in the column usertype and sum them up. Store your sum in the customer_counts variable.

multiplechoice

How many users in the Column `usetype` are `Subscribers`

As you got the total number of Customers in usertype from the previous question, then also find how many Subscribers are there in total.

You can subtract the number of Customers from the total lenght of the dataframe to find the remain - which are Subscribers.

codevalidated

Find the words in Column `pin` which contain the substring `lol` and store your selection in the variable `word_having_lols`

codevalidated

Find the names in the Column `first name` which start with the letter `Z`

Find all the names in the first name column that start with the latter "Z". Store the result in the variable starts_with_z.

Be careful! It's capital Z, not lowercase z.

input

How many first names start with the word 'Z'

codevalidated

Find the names in the Column `last name` which end with 't' and store your result in the variable `ends_with_t`

input

How many Values in the Column `last names` end with the word 't'

codevalidated

Join the `bikeid` in the Column `bikeid` by a `<space>

Use str.join() method to join the bikeid with in the Column bikeid and store the output in the variable spaced_bikeids

codevalidated

Create a new Column named `name length` having all the lengths of names from the Column `first name`

multiplechoice

Find if the Column `pin` is alpha numeric or it contains digits only

multiplechoice

Verify if the Column `tripduration` has any non-numeric values or it contains digits only

multiplechoice

Check if any name in the Column `first name` has digit(s) or number(s) in it

codevalidated

Split the emails in the `emails` column at `@` to find the Domain names and store them in the variable `email_domains`

Once you split all the emails in the Column emails on @ then the value at second index will be the domain of the email.

codevalidated

Replace the emails having `.edu` with `.org` and store the output in the variable `edu_to_org`

codevalidated

Repace the numeric and the St values in `end station name` Column with `<space>` so that we can filter the address without street numbers. Store your result in the variable `clean_address`

Jawad haider

Project Activities

Capitalize the column `first name`

Make the Column `last name` as lower case

Make `last name` as Upper case. Store your answer in the variable `last_name_upper`

How many users in the Column `usertype` are `Customer`

How many users in the Column `usetype` are `Subscribers`

Find the words in Column `pin` which contain the substring `lol` and store your selection in the variable `word_having_lols`

Find the names in the Column `first name` which start with the letter `Z`

How many first names start with the word 'Z'

Find the names in the Column `last name` which end with 't' and store your result in the variable `ends_with_t`

How many Values in the Column `last names` end with the word 't'

Join the `bikeid` in the Column `bikeid` by a `<space>

Create a new Column named `name length` having all the lengths of names from the Column `first name`

Find if the Column `pin` is alpha numeric or it contains digits only

Verify if the Column `tripduration` has any non-numeric values or it contains digits only

Check if any name in the Column `first name` has digit(s) or number(s) in it

Split the emails in the `emails` column at `@` to find the Domain names and store them in the variable `email_domains`

Replace the emails having `.edu` with `.org` and store the output in the variable `edu_to_org`

Repace the numeric and the St values in `end station name` Column with `<space>` so that we can filter the address without street numbers. Store your result in the variable `clean_address`

Jawad haider

Data Cleaning with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database